Skip to content
Hans-Georg Maaßen and the Retweets
Jupyter Notebook R
Branch: master
Clone or download
Latest commit 6cfa744 Aug 26, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
design add content Aug 14, 2019
1_get_tweets.ipynb fix typos Aug 26, 2019
2_create_vis.ipynb add content Aug 14, 2019
3_bar_plot.R add content Aug 14, 2019
LICENSE Initial commit Aug 13, 2019 Update Aug 15, 2019
data.csv add content Aug 14, 2019


If people are retweeting @hgmaassen, who are they retweeting besides him? An analysis. Read the article (in German).

clusters of twitter accounts


We construct an embedding for Twitter acounts to visualize clusters. We apply techniques normaly used to construct Word Embeddings. As far as we know, we are the first ones to use the method like this.


  1. iterate over all accounts and count co-occurrences (in the sense: who are they retweeting besides @hgmaassen as a binary choice, count them pair-wise in a 2D matrix)
  2. Pointwise Mutual Information to normalize counts and construct a vector space
  3. choose N accounts, i.e. the ones with the highest total count, and apply PCA to project them onto a 2D plane for visualization

This will result into an image where points that are closer together have a similar retweet behaviour of its recipients.

See 2_create_vis.ipynb for more details.

Some reference if you want to dig deeper in the (NLP) topic: "Improving Distributional Similarity with Lessons Learned from Word Embeddings" by Omer Levy, Yoav Goldberg, Ido Dagan, TACL 2015.

I am not sure wheter I should write/experiment more on the method. If you have an opinion on it, write me an email.



You can’t perform that action at this time.