Skip to content

j23414/functional_text_mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

Functional Text Mining

Apply functional programming to text mining. The bash scripts are included to demonstrate the PMC (PubMed Central) API.

File Structure:

bin/
  |_ pmc_ids.sh      # given a search term, return article IDs (searchterm_str) -> pmc_ids_int
  |_ pmc_xml.sh      # given IDs, return the fulltext in XML (pmc_ids_int) -> fulltext_xml
  |_ pmc_pdf.sh      # given IDs, return pdf articles (pmc_ids_int) -> fulltext_pdf
  
data/                
  |_ neo4j.ids       # example article IDs
  |_ neo4j.xml       # .. article fulltext
  |_ PMC6239324.pdf  # .. pdf
  

Example Behavior:

bash pmc_ids.sh neo4j > neo4j.ids
bash pmc_xml.sh neo4j.ids > neo4j.xml
bash pmc_pdf.sh neo4j.ids     # pdfs save in local directory

Possible exploration routes

  • reimplement using a functional language
  • xml to tsv converter (title, authors, publication date, journal)
  • compare programming languages used in academic papers (search different languages), compare their use-cases.

Feel free to explore.

About

Apply functional programming to text mining

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published