Skip to content

'Bot' R code to parse Google Scholar Alerts emails and send individual papers to Twitter

License

Notifications You must be signed in to change notification settings

lmmx/scholaRdaemon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔎 🐥 📃 Google Scholar Alerts Twitter bot 📃 🐥 🔎

Google Scholar lacks an API, but unlike PubMed links directly to papers. Often the stream of a Pubmed-sourced bot is filled with papers not deposited with direct links. Occasionally they will have a DOI, but Medline's indexing of these is inconsistent (the XML for articles themselves can be pretty inconsistent as I found out on a previous excursion under Pubmed's bonnet).

Even when a paper is deposited with this identifier, the DOI minting process means it's not guaranteed that the link will work straight away - I myself have felt (and regularly see other scientists online expressing the same) frustration at having the basic line of scientific enquiry rudely interrupted by technical issues. Preprints are another consideration.

via

Preprints are undeniably coming into the fold of bioscience research, a practice originating in the physics/mathematical sciences that crept in through common ground at arXiv's q-bio section. There are various dedicated sites/accounts monitoring particular subfields (e.g. Haldane's sieve/@haldanessieve for population/evolutionary genetics).

Google Scholar indexes all fields, and in my own experience this leads to casual interdisciplinary reading in a way not possible from Pubmed's purely biomedical library - a facet of research which the BBSRC, MRC and the Society of Biology feel is lacking amongst bioscientists.

Creating a feed of interest through Google Scholar

  • Google Scholar Alerts can provide up to 20 results in an e-mail, and posting/archiving these somewhere other than a busy inbox makes new research more accessible
  • Gmail for instance has various APIs and libraries, including an official Python 2.6-2.7 package and gmailr for R
  • Twitter likewise has python-twitter and twitteR

This script checks for Google Scholar Alerts in a Gmail account, parses through the message for paper titles and links, and sends the list of new articles through to Twitter

  • this could perhaps be automated with a cron job like Lynn Root used for her IfMeetThenTweet IFTTT alternative
  • it could also perhaps be hosted on a free micro instance of Amazon Web Services EC2 (but I've not tried yet) etc.
  • sending the papers to Buffer doesn't make much sense since it seems to be at most 1 email a day, though perhaps other queries may vary

Installation and usage

For a walkthrough on installation see the Wiki homepage. Briefly:

  • Install gmailr and twitteR, set up apps on Google Dev console and likewise for Twitter's
  • Authorise gmailr (gmail_auth) with the JSON obtained by setting up an app
  • Run Rscript run_daemon with --help to show available flags and bots.
    • Bots can be passed as arguments to run_daemon indicating which of the available account configurations to use, default behaviour being to check and tweet for all sequentially if unspecified.
    • These arguments are specified under config/bot_registry.json, where they are stored alongside the corresponding sub-directories to retrieve authentication information from. See the Wiki for more info.

Automation

Dave Tang seems to have beaten me to the idea of using R for a paper bot by just a couple of weeks - he has a working example of a cron script, timed for Pubmed's release, as he worked with eUtils (i.e. Pubmed, like all the other existing bots in Casey Bergman's list, with the exception of eQTLpapers which has Scholar Alerts added manually by Sarah Brown).

crontab -l
#minute hour dom month dow user cmd
0 15-23 * * * cd /Users/davetang/Dropbox/transcriptomes && ./feed.R &> /dev/null

Cron automation makes sense for daily MEDLINE (PubMed) updates, but not for emails - IFTTT-like 'triggering' would be ideal, and can be achieved with custom 'events' through Amazon Lambda [free tier], reacting to changes in AWS S3 file storage, which may be modified with dat pull --live.

For now I'm using cron (hourly entry added with crontab -e) to:

  • source my .bashrc which
    • exports the location of the scholaRdaemon directory to an eponymous variable
    • sets an alias runsdaemon as Rscript "$scholaRdaemon/run_daemon"
  • record the date/time in the sd.log file there
  • run the daemon for all bots (default behaviour, for all bots listed in config/bot_registry.json)
0 * * * * source /home/louis/.bashrc; date >> "$scholaRdaemon"sd.log; runsdaemon >> "$scholaRdaemon"sd.log

About

'Bot' R code to parse Google Scholar Alerts emails and send individual papers to Twitter

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages