Skip to content

Latest commit

 

History

History
9 lines (7 loc) · 600 Bytes

README.md

File metadata and controls

9 lines (7 loc) · 600 Bytes

This project:

  1. Collects articles about women from the New York Times and Washington Post, 1980-2014
  2. Categorizes each article by country + region
  3. Uses Stanford's Named Entity Recognizer to remove proper nouns from article texts
  4. Uses STM (R package) to analyze topical trends in the corpus over time and across region
  5. Compare coverage across region using word separating alogrithms and other techniques.
  6. Conducts statistical analysis regressing number of documents and mean topic distributions on country level variables (note the country level dataset is not included in this repo)