This project:

Collects articles about women from the New York Times and Washington Post, 1980-2014
Categorizes each article by country + region
Uses Stanford's Named Entity Recognizer to remove proper nouns from article texts
Uses STM (R package) to analyze topical trends in the corpus over time and across region
Compare coverage across region using word separating alogrithms and other techniques.
Conducts statistical analysis regressing number of documents and mean topic distributions on country level variables (note the country level dataset is not included in this repo)

Provide feedback

Saved searches