semantic space and size
a proof-of-concept for exploring cultural bias on (very) small corpora using cosine similarity and word2vec
note: this repo contains references to offensive language found in alt-right and other subreddits, including assault, gendered insults & profanity. links to subreddits, as well as corpora from subreddits used in this project, may also include offensive material.
find the main code for this project here
data for this project, including full subreddits, clean & raw text, vocabulary, & final cosine similarity comparisons are here
citations & docs
this repository makes use of a number of outside libraries and resources. here are a few, with documentation and other information.
libraries & packages
see also: Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.