Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

The Good NYT

The New York Times (NYT) is the nation's newspaper of record. It is both well-regarded and popular. It has won more Pulitzer awards than any other newspaper. And it is the 30th most visited website in the U.S. (as of October, 2017).

We explore some patterns in production of NYT between 1987 and 2007 using the annotated New York Times Corpus.

Data Analysis

  1. Convert NYT Corpus to CSV, and Recode

  2. Not News
    Has the proportion of news stories about topics unrelated to politics or the economy, such as, cooking, travel, fashion, music, etc., gone up over time?

    We measure kinds of news stories using news.desk and online.section. (See the script for other ideas for how we can measure the kind of news.)

  3. Urban Vs. Rural
    We use the locations (hand indexed), online.locations (algorithmically generated), and dateline fields to estimate rural vs. urban coverage within the US.

    • Script and Figure
  4. National Vs. International We use the news.desk field Foreign News to estimate coverage of foreign news. We can also use the locations (hand indexed), online.locations (algorithmically generated), and dateline fields to estimate national vs. international coverage.

    • Proportion of Foreign Desk Stories Over Time: Script and Figure.

  5. Corrections
    We use the and correction.text to estimate rate of corrections over time, and what is being corrected (later).

  6. Length of Articles
    We use the word.count field to estimate average length of articles and how it has changed over time.

  7. Number of Authors per Article
    We use normalized.byline to estimate number of authors per article and how that has changed over time.

    • Average Number of Authors per Article Over Time: Script and Figure.
  8. No. of Articles per Author per Year
    One common conjecture is that people are producing more. Is that true? We use the normalized.byline field to estimate average number of articles per year per author and how that metric has evolved over time.

    • No. of articles per author per year over time: Script and Figure.
  9. Proportion of Wire Stories
    Using byline.
    - Proportion of AP and Reuters Stories Over Time: Script and Figure

  10. Race and Gender of Reporters
    We use normalized.byline to get the names of the reporters. And we use the gender package and the ethnicolr package to impute gender and race of reporters.

    • Proportion of Female Journalists on Staff Over Time: Script and Figure.
    • Average Number of Female Journalists per Article Over Time: Script and Figure.


Gaurav Sood


Released under CC BY 2.0.


Patterns in NYT production from 1987 to 2007







No releases published