R package to query the Google Ngram Viewer
The package has been updated to deal with the change to Google's website.
Note: with the switch to using
RCurl to access SSL pages,
ngramr will generally no longer work behind a proxy.
The Google Books Ngram Viewer allows you to enter a list of phrases and then displays a graph showing how often the phrases have occurred in a large corpus of books (e.g., "British English", "English Fiction", "French") over time. The current corpus collected in 2012 contains almost half a trillion words for English alone.
This package required R version 2.15 or higher. If you are using an older version of R you will be prompted to upgrade when you try to install the package, so you may as well upgrade now!
The official release of ngramr is available on CRAN. To istall from CRAN, use the following command:
If you have any problems installing the package on OSX, try installing from source:
If you have
installed, install the latest stable version this package directly from GitHub:
require(devtools) install_github("ngramr", "seancarmody") require(ngramr)
and if you are feeling a little more adventurous, you can install the development version:
install_github("ngramr", "seancarmody", "develop")
although it may not always work.
If you are behind a proxy,
install_github may not work for you. Instead of fiddling around with the
RCurl proxy settings, you can download the ZIP archive and use
Here is an example of how to use the
require(ggplot2) ng <- ngram(c("hacker", "programmer"), year_start = 1950) ggplot(ng, aes(x=Year, y=Frequency, colour=Phrase)) + geom_line()
The result is a ggplot2 line graph of the following form:
The same result can be achieved even more simply by using the
ggram plotting wrapper that supports many options, as in this example:
require(ggplot2) ggram(c("monarchy", "democracy"), year_start = 1500, year_end = 2000, corpus = "eng_gb_2012", ignore_case = TRUE, geom = "area", geom_options = list(position = "stack")) + labs(y = NULL)
The colors used by Google Ngram are available through the
google_theme option, as in this example posted by Ben Zimmer at Language Log:
require(ggplot2) ng <- c("((The United States is + The United States has) / The United States)", "((The United States are + The United States have) / The United States)") ggram(ng, year_start = 1800, google_theme = TRUE) + theme(legend.direction = "vertical")