Scholar Scraper

I wrote this simple utility to scrape citation statistics of researcher profiles on Google Scholar, using it as an opportunity to learn node.js. I began with a list of information retrieval researchers, but have since expanded to include a separate list of researchers in human-computer interaction. The results are here.

Editorial note: This list contains only researchers who have a Google Scholar profile; names were identified by snowball sampling and various other ad hoc techniques. If you wish to see a name added, please email me or send a pull request. I will endeavor to periodically run the crawl to gather updated statistics. Of course, scholarly achievement is only partially measured by citation counts, which are known to be flawed in many ways. Evaluations of scholars should include comprehensive examination of their research contributions.

Rerunning the Scraper

Assuming you have node.js installed, rerun the scraper as follows:

$ npm install request cheerio async
$ node scrape.js ./people-ir.json > stats-ir.js
$ node scrape.js ./people-db.json > stats-db.js
$ node scrape.js ./people-nlp.json > stats-nlp.js
$ node scrape.js ./people-hci.json > stats-hci.js
$ node scrape.js ./people-stratosphere.json > stats-stratosphere.js

To scrape the images:

$ node download-images.js ./stats-ir.js
$ node download-images.js ./stats-db.js
$ node download-images.js ./stats-nlp.js
$ node download-images.js ./stats-hci.js
$ node download-images.js ./stats-stratosphere.js

Then open up index.html and it should display the new statistics.

Name		Name	Last commit message	Last commit date
Latest commit History 296 Commits
images		images
tablesorter		tablesorter
tablesorter2		tablesorter2
tooltip		tooltip
.gitignore		.gitignore
README.md		README.md
download-images.js		download-images.js
earliest-year.csv		earliest-year.csv
faculty-nlp-stats.js		faculty-nlp-stats.js
faculty-nlp.html		faculty-nlp.html
faculty-nlp.json		faculty-nlp.json
index-db.html		index-db.html
index-hci.html		index-hci.html
index-ir.html		index-ir.html
index-nlp.html		index-nlp.html
index-stratosphere.html		index-stratosphere.html
index-vis.html		index-vis.html
index.html		index.html
people-db.json		people-db.json
people-hci.json		people-hci.json
people-ir.json		people-ir.json
people-nlp.json		people-nlp.json
people-stratosphere.json		people-stratosphere.json
people-vis.json		people-vis.json
scrape.js		scrape.js
stats-db.js		stats-db.js
stats-hci.js		stats-hci.js
stats-ir.js		stats-ir.js
stats-nlp.js		stats-nlp.js
stats-stratosphere.js		stats-stratosphere.js
stats-vis.js		stats-vis.js

lintool/bigcows

Folders and files

Latest commit

History

Repository files navigation

Scholar Scraper

Rerunning the Scraper

About

Resources

Stars

Watchers

Forks

Languages