A contribution to NaNoGenMo 2018
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
russian-troll-tweets-master
README.md
Russians_model
Russians_model.trainables.syn1neg.npy
Russians_model.wv.vectors.npy
makeTrollTalk.py
makeWordEmbeddingModel.py
novel.txt
pos_dict.pkl

README.md

Troll Talk

A contribution to NaNoGenMo 2018

Earlier this summer FiveThirtyEight shared a corpus of nearly three million tweets associated with accounts linked to Russia's Internet Research Agency. The evidence suggests these tweets were part of a campaign to influence the 2016 US election. What was communicated, and how do we make sense of it?

One possibility is to simulate a conversation among the trolls using a word embedding model and tf-idf transforms.

  • Build an embedding model of all the words in the Russian troll tweets corpus. This will enable the use of Gensim's Word2Vec module, specifically the most_similar function which can generate analogies for each word in a given text with a pair of pre-selected words (such as liberal and conservative).
  • Transform the corpus of tweets into a tf-idf matrix.
  • Implement the following algorithm until 50,000 words have printed, beginning with a randomly selected tweet.
    • Print the tweet.
    • Remove the tf-idf vector for the tweet from the matrix (this avoids repetition).
    • Replace each word in the tweet by analogy with the word pair and the embedding model.
    • Print the modified tweet.
    • Transform the modified tweet as a tf-idf vector based on the structure of the matrix.
    • Select the tweet for which the vector in the matrix is most similar to the vector of the modified tweet (using cosine similarity).
    • Repeat.