updated for ELO 2018
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.ipynb_checkpoints
Scripts for prepping data
.Rhistory
ELO2018.css
FirstTimeISawMe.png
NCF_Flaubert_model
NCF_Flaubert_tsne.tsv
NCF_Flaubert_tsne_plot.png
NCF_Flaubert_tsne_plot.svg
NCF_pos_dict.pkl
NCFbibliography.csv
README.md
RussianTrolls_model
RussianTrolls_model.trainables.syn1neg.npy
RussianTrolls_model.wv.vectors.npy
RussianTrolls_pos_dict.pkl
RussianTrolls_tsne.tsv
RussianTrolls_tsne_plot.png
inventText.ipynb
tsne_code.R

README.md

Word Vector Text Inventor

Based on WVTM, a contribution to 2016 NaNoGenMo, and previously dubbed WVTG (Word Vector Topic Generator). This is a hack in progress I will present at the 2018 ELO conference.

The repository contains the code and data necessary to invent responses to an assertion from Baudelaire's Enivrez-vous. Each word in the invented text is based on an analogy with the word pair bien / mal and on word vectors derived from 30 volumes written by Gustave Flaubert.

The file

inventText.ipynb

documents and executes a method of algorithmic rhetorical invention.

The code can also produce comparable results in English with the nearly 3,000,000 tweets by Russian trolls collected by FiveThirtyEight.

A quick explanation of what's under the hood

Using gensim to build a word2vec model based on a corpus of French texts , the code takes a pair of words (e.g. "homme" and "femme") and a text as parameters to generate a new text. Each word in the original text is replaced by a word that is "most similar" to it according to the word pair. For instance, if "roi" is a word in the original text, it would be replaced by analogy:

>>> model.most_similar(positive=['femme', 'roi'], negative=['homme'], topn=1)
[(u'reine', 0.8085041046142578)]

If the word vector model is unable to complete an analogy, the word from the asserted text does not change in the invented text.