ngramr - R package to query the Google Ngram Viewer
This package was significantly updated in July 2020 to reflect changes to the
Google ngram viewer webpage format. Please let me know if anything is not
working as expected. Note that the tag parameter has been removed. If
this feature is essential for you, please usen v1.6.5.
The Google Books Ngram Viewer allows you to enter a list of phrases and then displays a graph showing how often the phrases have occurred in a large corpus of books (e.g., "British English", "English Fiction", "French") over time. The current corpus produced in 2019 contains almost two trillion words for English alone.
The underlying data is hidden in Web page, embedded in some Javascript. This package extracts the data and provides it in the form of an R dataframe. Early versions of code was adapted from a handy Python script available from Culturomics, written by Jean-Baptiste Michel. The code has been comprehensively redeveloped since then.
Installing
This package requires R version 3.5.0 or higher. If you are using an older version of R you will be prompted to upgrade when you try to install the package, so you may as well upgrade now!
The official release of ngramr is available on CRAN. To install from CRAN, use the following command:
install.packages('ngramr')
If you have any problems installing the package on macOS, try installing from source:
install.packages("ngramr", type="source")
If you have the devtools package installed, install the latest stable
version this package directly from GitHub:
library(devtools)
install_github("seancarmody/ngramr")
library(ngramr)
and if you are feeling a little more adventurous, you can install the development version:
install_github("seancarmody/ngramr", "develop")
although it may not always work.
If the latest release has broken some of your old code, you can install an older version, for example:
install_github("seancarmody/ngramr", "v1.6.5")
If you are behind a proxy, install_github may not work for you. Instead of
fiddling around with the RCurl proxy settings, you can download the latest
ZIP archive and use install_local instead.
Examples
Here is an example of how to use the ngram function:
library(ggplot2)
ng <- ngram(c("hacker", "programmer"), year_start = 1950)
ggplot(ng, aes(x=Year, y=Frequency, colour=Phrase)) +
geom_line()
The result is a ggplot2 line graph of the following form:
The same result can be achieved even more simply by using the ggram
plotting wrapper that supports many options, as in this example:
ggram(c("monarchy", "democracy"), year_start = 1500, year_end = 2000,
corpus = "eng_gb_2012", ignore_case = TRUE,
geom = "area", geom_options = list(position = "stack")) +
labs(y = NULL)
The colours used by Google Ngram are available through the google_theme
option, as in this example posted by Ben Zimmer at Language Log:
ng <- c("((The United States is + The United States has) / The United States)",
"((The United States are + The United States have) / The United States)")
ggram(ng, year_start = 1800, google_theme = TRUE) +
theme(legend.direction = "vertical")
Getting help
If you encounter a bug, please file an issue with a reproducible example on GitHub.
Further Reading
For more information, read this Stubborn Mule post and the Google Ngram syntax documentation. Language Log has a good post written just after the launch of the 2012 corpus.
If you would rather work with R and SQL on the raw Google Ngram datasets, see this post.


