Skip to content
A text mining project for Harvard Business Review articles from 1922 to 2012
R
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
.gitignore
HBR_in90yrs.Rproj
README.md
read_prepare_abstracts.R
term_map_clean2.png

README.md

History of the United States through the lens of Harvard Business Review Articles (1922 - 2012)

HBR 90 years visualization

-> Open the the high def picture

A text mining project for HBR articles in 90 years.

In this project, I use a multivariate technique called Correspondence Analysis (CA). Given a term-year matrix that describe how many times a term j have been mentioned in year (or group of years) j, CA produces a set of orthognal components (just like Principal Component Analysis PCA) that capture the "driving forces" of variance in a dataset.

How to read the plot ?

The plot shows a representative subset of words across all years. You can imagine a spring between each word and all the years. The strenth of the spring is weighted by the number of times a word has been mentioned in that year. This way, words associated with 30's will pull those years while words associated with recent years will pull in a different direction. The plot approximates this sort of image.

More technical describtions can be found in Wikipedia:

You can’t perform that action at this time.