Skip to content

Text Snowball

petermr edited this page Aug 18, 2021 · 7 revisions

Text snowball

Goal. To create a list of terms (words and phrases) which are effective at searching the literature for a desired topic.

Why not arxiv?

Method

(We

  • query pygetpapers with a relevant query (AND OR NOT) (maybe fairly general) and TERMS (cyclic voltammetry) NOT generator
  • download small corpus (<= 100 papers)
  • rapidly inspect these for frequent relevant terms
  • RAKE / YAKE (adjust phrase length)
  • human eyeballs
  • SPacy
  • triage the list
  • create/append to list of terms
  • repeat until we get enough, or give up

now we have a list of terms

  • create dictionary from terms

demo

Set up on Google Collab

Choose voltammetry

Use EPMC

pygetpapers -q "cyclic voltammetry" -n
INFO: Final query is cyclic voltammetry
INFO: Total number of hits for the query are 30078

refine

pygetpapers -q "(cyclic voltammetry) AND (lithium ion)" -n
INFO: Final query is (cyclic voltammetry) AND (lithium ion)
INFO: Total number of hits for the query are 2675

Now snowball

  • download 100 papers and examine for additional precision terms (OR)

Clone this wiki locally