Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build a Service that Pulls Tweets at a Regular Interval #78

Open
pahdo opened this issue Mar 22, 2019 · 4 comments
Open

Build a Service that Pulls Tweets at a Regular Interval #78

pahdo opened this issue Mar 22, 2019 · 4 comments
Assignees

Comments

@pahdo
Copy link
Collaborator

pahdo commented Mar 22, 2019

Key topics

  • Software Engineering
  • Infrastructure
  • Amazon Web Services

Objective
We are currently pulling tweets on an ad-hoc basis. Since we need as much training data as possible to build good models, and because Twitter only gives us 7 days of data at any given time, we want to build a service that regularly pulls and saves data from Twitter.

First steps
In order of complexity, we will want to spin up a server on glitch.me, Heroku, or AWS EC2. The simplest implementation of this tweet puller would be a job that hits the Twitter API and dumps the response to a file. We can schedule this job using cron.

Useful tools
Crontab Man Pages - man7.org
Twitter API Docs

@rileypredum
Copy link
Collaborator

Would love to learn how to set this up and the thinking behind it. Will you be working on it @pahdo?

@pahdo
Copy link
Collaborator Author

pahdo commented Mar 26, 2019

@rileypredum Sure! I would love for you to take it on if you're interested, or I can also complete it if necessary. I'll probably work on the tasks that people don't think are as interesting.

@rileypredum
Copy link
Collaborator

@pahdo we also want to make sure the tweets are grabbing everything we want in its current form. Have you checked out my updates to the tween_scrape_and_clean notebook? It grabs more fields now.

@pahdo
Copy link
Collaborator Author

pahdo commented Mar 28, 2019

Yep! That's a good point. I saw that work has been done on pulling tweets and metadata. I'll go ahead and set up the basic infrastructure to run some arbitrary job on a regular interval. We can probably just use your code when it's ready.

@nathanhc nathanhc self-assigned this Apr 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants