Tweet Collection (Python)

This app listens for tweets about a specified set of topics and keywords, then stores the tweets in a local CSV file or a Google BigQuery table.

Installation

Create and activate a virtual environment, using anaconda for example, if you like that kind of thing:

conda create -n tweets-env python=3.7 # (first time only)
conda activate tweets-env

Install package dependencies:

pip install -r requirements.txt # (first time only)

Setup

Create a ".env" file and set your environment variables there. See the ".env.example" file and instructions below for more details.

Twitter API Credentials

Obtain credentials which provide read and write access to the Twitter API. Set the environment variables TWITTER_CONSUMER_KEY, TWITTER_CONSUMER_SECRET, TWITTER_ACCESS_TOKEN, and TWITTER_ACCESS_TOKEN_SECRET accordingly.

Google API Credentials

To store tweets to a local CSV file, skip this section. Otherwise, to store tweets in Google BigQuery, set the STORAGE_ENV environment variable to "remote" and continue...

From the Google Cloud console, enable the BigQuery API, then generate and download the corresponding service account credentials (for example into the root directory of this repo as "credentials.json") and set the GOOGLE_APPLICATION_CREDENTIALS environment variable accordingly.

Google BigQuery Setup

Login to the Google BigQuery console, create three datasets named "impeachment_production", "impeachment_development", and "impeachment_test". If you choose a different dataset name stem besides "impeachment" then set the APP_NAME env var accordingly.

Within each dataset, create a table called "tweets", using the following table schema:

status_id:STRING,
status_text:STRING,
truncated:BOOLEAN,
retweet_status_id:STRING,
reply_status_id:STRING,
reply_user_id:STRING,
is_quote:BOOLEAN,
geo:STRING,
created_at:TIMESTAMP,
user_id:STRING,
user_name:STRING,
user_screen_name:STRING,
user_description:STRING,
user_location:STRING,
user_verified:BOOLEAN,
user_created_at:TIMESTAMP

And create a topics table with the following schema:

[
    {
        "name": "topic",
        "type": "STRING",
        "mode": "REQUIRED"
    },
    {
        "name": "created_at",
        "type": "TIMESTAMP"
    }
]

Sendgrid API Credentials

If you don't care about sending notification emails, skip this section. Otherwise set the WILL_NOTIFY environment variable to "True" and continue...

Sign up for a SendGrid account and verify your account, as necessary. Create an API Key with "full access" permissions, and set it as the SENDGRID_API_KEY environment variable.

Finally set the FROM_EMAIL and TO_EMAILS environment variables to designate sender and recipients of error notification emails.

Seeding Topics

To specify the list of keywords and phases to filter, create a topics CSV file at "data/topics.csv", and insert / modify contents resembling:

topic
impeach
impeached
impeachment
#TrumpImpeachment
#ImpeachAndConvict
#ImpeachAndConvictTrump
#IGReport
#SenateHearing
#IGHearing
#FactsMatter
Trump to Pelosi

NOTE: "topic" is the column name, and is required

If using local storage, this CSV file will act as the topics list. Otherwise if using remote storage, seed the development and production databases (and test the storage service):

APP_ENV="development" STORAGE_ENV="remote" python -m app.storage_service
APP_ENV="production" STORAGE_ENV="remote" python -m app.storage_service

NOTE: yes, seed the production database from your local machine, and not on production iteself, because there will be no topics CSV file on the production server (use your own)

NOTE: the test database will be seeded with mock values the first time tests are run

Usage

Run the tweet collector:

python -m app.tweet_collector
# ... OR ...
BATCH_SIZE=200 STORAGE_ENV="remote" python -m app.tweet_collector
# ... OR ...
APP_ENV="development" STORAGE_ENV="remote" WILL_NOTIFY=True python -m app.tweet_collector

Testing

Install pytest:

pip install pytest # (first time only)

Run tests:

pytest --disable-pytest-warnings

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
app		app
data		data
test		test
.env.example		.env.example
.gitignore		.gitignore
CREDITS.md		CREDITS.md
DEPLOYING.md		DEPLOYING.md
LICENSE.md		LICENSE.md
Procfile		Procfile
README.md		README.md
conftest.py		conftest.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tweet Collection (Python)

Installation

Setup

Twitter API Credentials

Google API Credentials

Google BigQuery Setup

Sendgrid API Credentials

Seeding Topics

Usage

Testing

Deploying

License

About

Releases

Packages

Languages

License

s2t2/tweet-collection-py

Folders and files

Latest commit

History

Repository files navigation

Tweet Collection (Python)

Installation

Setup

Twitter API Credentials

Google API Credentials

Google BigQuery Setup

Sendgrid API Credentials

Seeding Topics

Usage

Testing

Deploying

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages