# Science in Culture
This research project seeks to analyze the perception of science in culture. Techniques from NLP such as word embeddings (word2vec) and sentiment analysis are used.

## Reference Code
1. https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/word2vec.ipynb
2. https://github.com/RaRe-Technologies/gensim/blob/ba1ce894a5192fc493a865c535202695bb3c0424/docs/notebooks/Word2Vec_FastText_Comparison.ipynb

## Other references
1.   Cámara, M., & A., J. (2012). Political dimensions of scientific culture: Highlights from the Ibero-American survey on the social perception of science and scientific culture. Public Understanding of Science, 21(3), 369–384. https://doi.org/10.1177/0963662510373871
2. Jones, M. (2014). Cultural Characters and Climate Change: How Heroes Shape Our Perception of Climate Science. Social Science Quarterly, 95(1), 1-39.
3. Ruest, Nick, 2017, "#climatemarch tweets April 19-May 3, 2017", https://doi.org/10.5683/SP/KZZVZW, Scholars Portal Dataverse, V1
4. Ruest, Nick, 2017, "#MarchForScience tweets April 12-26, 2017", https://doi.org/10.5683/SP/7BC9V1, Scholars Portal Dataverse, V1
5. http://www.nltk.org/nltk_data/ id: brown; size: 3314357; author: W. N. Francis and H. Kucera; copyright: ; license: May be used for non-commercial purposes.
6. Salehan, Kim, & Lee. (2018). Are there any relationships between technology and cultural values? A country-level trend study of the association between information communication technology and cultural values. Information & Management, 55(6), 725-745.
7. Vishwanath, A., & Chen, H. (2008). Personal communication technologies as an extension of the self: A cross‐cultural comparison of people's associations with technology and their symbolic proximity with others. Journal of the American Society for Information Science and Technology, 59(11), 1761-1775.



## Imports

In [10]:
import nltk
from gensim.models import Word2Vec
from gensim.models.word2vec import Text8Corpus
import numpy as np
import smart_open, os

## Training the models
The data for this project is not included in this repository due to restrictions on Twitter data. The two datasets used in this project are the widely available Brown corpus and Twitter data relating to #ClimateMarch and #MarchForScience hashtags. Roughly two million tweets were hydrated using twarc.

In [20]:
params = {
    'alpha': 0.05,
    'size': 100,
    'window': 5,
    'iter': 5,
    'min_count': 5,
    'sample': 1e-4,
    'sg': 1,
    'hs': 0,
    'negative': 5
}

brown_model = Word2Vec(Text8Corpus('data/brown_corp.txt'), **params)

In [30]:
climate_model = Word2Vec(Text8Corpus('data/cleaned/climate_tweets_cleaned.txt'), **params)

In [50]:
mfs_model = Word2Vec(Text8Corpus('data/cleaned/MarchForScience_tweets_cleaned.txt'), **params)

## Evaluating the models
Let's do some basic tests of cosine similarity.

In [42]:
brown_model.wv.most_similar(positive=['science', 'fear'], topn=10)

[('Utopian', 0.910775899887085),
 ('virtue', 0.9025671482086182),
 ('humanity', 0.8978033065795898),
 ('injustice', 0.8942326307296753),
 ('superior', 0.8850595951080322),
 ('instinct', 0.8850191235542297),
 ('legends', 0.8841550946235657),
 ('profound', 0.8839781880378723),
 ('heroic', 0.8820221424102783),
 ('erotic', 0.8816669583320618)]

In [41]:
climate_model.wv.most_similar(positive=['science', 'fear'], topn=10)

[('Libs', 0.696629524230957),
 ('oceans', 0.6743819117546082),
 ('might', 0.6427975296974182),
 ('peacefully', 0.5920714735984802),
 ('Muslims', 0.5760078430175781),
 ('NY', 0.5724131464958191),
 ('Hiding', 0.5645971298217773),
 ('Hi', 0.55558842420578),
 ('rise.', 0.5526690483093262),
 ('chalkboard', 0.5496262311935425)]

In [55]:
mfs_model.wv.most_similar(positive=['science', 'fear'], topn=10)

[('MC…', 0.6706839799880981),
 ('enterprise', 0.6391789317131042),
 ('ignorance…', 0.6318367123603821),
 ('triumph', 0.5998080968856812),
 ('open,', 0.5971928834915161),
 ('stay!', 0.5955246090888977),
 ('facets', 0.5903943777084351),
 ('protesting,', 0.5866920948028564),
 ('represses', 0.5861929655075073),
 ('less"', 0.5806410312652588)]

In [49]:
brown_model.predict_output_word(['fear', 'science'])

[('science', 0.0006998557),
 ('philosophy', 0.0006726175),
 ('fear', 0.00059489015),
 ('mind', 0.0005511252),
 ('religion', 0.0005273468),
 ('pure', 0.00047090003),
 ('importance', 0.00046973577),
 ('feelings', 0.00046481914),
 ('poems', 0.00046039498),
 ('poetic', 0.0004552021)]

In [48]:
climate_model.predict_output_word(['fear', 'science'])

[('science', 0.014418488),
 ('rise.', 0.0102302795),
 ('might', 0.006852584),
 ('oceans', 0.0046408046),
 ('Muslims', 0.0039983448),
 ('declaring', 0.0037104057),
 ("We'll", 0.0035391354),
 ('fear', 0.0023272592),
 ('But', 0.0021787095),
 ('inaugurati…', 0.0019829252)]

In [54]:
mfs_model.predict_output_word(['science', 'fear'])

[('fear', 0.022886122),
 ('ignorance…', 0.002145881),
 ('less"', 0.0015034123),
 ('Saturday', 0.0010085657),
 ('benefit', 0.0010045078),
 ('truth.', 0.00093975145),
 ('ignorance,', 0.00078472577),
 ('society', 0.0007797721),
 ('role', 0.0007566012),
 ('knowledge', 0.00072946213)]

## Analysis
Let's analyze the models to see what we can discern about the differences in culture between these three models.