Skip to content

Some automated Twitter querying needed to automate the Jediswap community incentive program.

License

Notifications You must be signed in to change notification settings

jediswaplabs/jediswap-force-wielder

Repository files navigation

Jediswap Force Wielder

GitHub Python 3.9+ Commits Libraries.io dependency status for GitHub repo

A script interacting with the Twitter API on a regular basis, fetching, filtering & storing data. Any tweet that mentions the JediSwap Twitter account or quotes a tweet posted by the account will be fetched and passed through a filtering stage. Any tweets that are not dropped during filtering are stored in monthly csv files with columns for the attributes mentioned below:

Data obtained per tweet

  • tweet contents, timestamp, referenced tweets, conversation id
  • tweet views, replies, quotes, retweets, likes
  • author id, username, followers, following, tweet count, listed count
  • if reply: tagged accounts in media of parent tweet (scraped)
  • if reply: mentions of parent tweet

Each time the script is run, it searches mentions and quotes backwards through time until it encounters the most recent tweet from the known data. This is an attempt at the most sparse implementation possible with regards to the total amount of requests, so no tweet should ever be queried for twice. How often the script has to be run to avoid any gaps in the data depends solely on your API tier and the expected activity of your Twitter account & followers. For example, if you are allowed to query for the last 800 mentions using the Twitter mentions timeline, you need to run the script often enough so that there will be less than 800 new mentions since the last time the script fetched data from the API. No harm is done by running the script much more often than necessary to insure against gaps in the data.

Web scraping

Some essential information cannot be queried via the Twitter API 2.0, for example the list of users that are tagged in a photo of a tweet. In these cases, the script scrapes the information from the Twitter frontend using Selenium. For this to work, you will have to install the version of Chromedriver that most closely matches your installed Google Chrome browser. And since the information is only visible to signed in Twitter users, you'll have to create a user data folder as described here and run the script Selenium_Twitter_Login.py once in order to sign into Twitter manually and create a session cookie that the script can then use for the automated scraping. Should it expire, just repeat this step before running the main script.

Usage

To run, a Twitter developer account is needed. Once an account is registered, paste your API bearer token next to the key API_BEARER_TOKEN in the .env file, as shown in sample.env, omitting any quotes. Paste the Twitter user id you want to use the script for next to the key TWITTER_USER_ID, also without any quotes. In main.py, set out_path to where you want the csv file to be generated.

Run main.py to start the script:

python main.py

Configuration

  • Query parameters can be customized via get_query_params() in query_and_filter.py. This will affect each API response and alter the returned response fields uniformly.

  • If called directly, the lower-level querying functions in query_and_filter.py accept additional query parameters as a dictionary add_params, which will be appended to the parameters defined in get_query_params(). This way, an API search can be refined or restricted to a specific time interval.

  • filter_patterns in query_and_filter.py can be expanded to drop tweets programmatically. It uses regex to exclude any tweet where a search pattern matches the tweet contents.

  • For more advanced filtering and filtering based on tweet attributes other than tweet["text"], functions can be appended to pandas_pipes.py and added to the pipeline in main.py.

License

This project is licensed under the MIT license. See the LICENSE file for details. Collaboration welcome!

About

Some automated Twitter querying needed to automate the Jediswap community incentive program.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages