Twitter API in Python

Some tests and experiments with Twitter API in Python, by Tweepy. During my holidays I tried to explore some very simple Natural Language Processing and Graph Theory tools, paying attention to inductive and algorithmic biases. Unfortunately it hasn't aged well due to Twitter's new 2023 policies.

Trend Analysis

First section of the notebook will search for trend topics in a given area (done for Italy, WOEID=23424853) and show the most common words linked to them. Here's the result for September 16th, 2022.

Most important inductive biases of this analysis:

result_type attribute of api.search_tweets function is set to "mixed", so the analysis will include both "popular" and "recent" tweets.

This analysis does not consider retweets. You could be interested in a system of weights based on popularity.

(Some of the many possible) improvements:

Stop words should be extended

Should be better investigated the time dependence of results and what's the best amount of tweets to be considered.

Would be interesting to combine this simple code with some ML tool for sentiment analysis.

Networks and graphs

This section will show relationships between twitter users quantified by a given function and it's my very first attempt to do some basic network analysis. The distance between two users is here based on people who both follow. It's quantified by the following formula: $$dist(user_1, user_2) = \frac{count(F_1 \cap F_2)}{count(F_1)+count(F_2)-count(F_1 \cap F_2)}$$ where $F_i$ indicates users followed by $user_i$.

The algorithm is pretty simple: get a user id (starting id is mine), choose k random twitter friends, calculate the distances, choose a random user between the three friends, iterate the process i times. Runtimes are long because the twitter rate limitations. Save the Pandas DataFrame as a .csv file is useful to continue the analysis in different sessions and cumulate the results.

Most important inductive biases:

Of course the definition of distance between users.

The two hyperparameters that manage the for cycle: $i$ number of iterations, $k$ friends choose for iteration.

The random.sample algorithm.

(Some of the many possible) improvements:

Should be better investigate the rule of hyperparameters.

This approach is very time expensive in order to get some significant results. Might be better to use a more structured sample algorithm.

In general, I have an approximately zero experience with networks and that was only a game during my holidays.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
images		images
LICENSE		LICENSE
README.md		README.md
Twitter.ipynb		Twitter.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter API in Python

Trend Analysis

Most important inductive biases of this analysis:

(Some of the many possible) improvements:

Networks and graphs

Most important inductive biases:

(Some of the many possible) improvements:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Twitter API in Python

Trend Analysis

Most important inductive biases of this analysis:

(Some of the many possible) improvements:

Networks and graphs

Most important inductive biases:

(Some of the many possible) improvements:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages