Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add option to search resulttype=core in epmc_search? #7

Open
cstubben opened this issue Jul 15, 2016 · 4 comments
Open

add option to search resulttype=core in epmc_search? #7

cstubben opened this issue Jul 15, 2016 · 4 comments
Milestone

Comments

@cstubben
Copy link

@cstubben cstubben commented Jul 15, 2016

The core results have all the lite fields plus MeSH terms, abstracts and others. Have you considered parsing resulttype=core so users can get MeSH terms for hundreds of articles at once? I have started an XML parser to get some core fields that might help.

@njahn82
Copy link
Member

@njahn82 njahn82 commented Jul 15, 2016

The europepmc::epmc_details() parses the resulttype=core format. E.g., to get MeSH terms for more than one record, try

lapply(c("25730202", "25891958"), function(x)  europepmc::epmc_details(x)$mesh_topic)
## [[1]]
##   majorTopic_YN                        descriptorName
## 1             N             Chlamydomonas reinhardtii
## 2             N                   Amino Acid Sequence
## 3             N                             Phenotype
## 4             Y                              Mutation
## 5             N       Polymorphism, Single Nucleotide
## 6             N                                Genome
## 7             N                                 Light
## 8             N High-Throughput Nucleotide Sequencing
## 
## [[2]]
##    majorTopic_YN                    descriptorName
## 1              N                          Vacuoles
## 2              N      Plants, Genetically Modified
## 3              N                       Arabidopsis
## 4              N                           Petunia
## 5              N                             Seeds
## 6              N                 Proanthocyanidins
## 7              N      Proton-Translocating ATPases
## 8              N              Arabidopsis Proteins
## 9              N      Genetic Complementation Test
## 10             N Gene Expression Regulation, Plant
## 11             N              Biological Transport
## 12             N                          Mutation
## 13             N         Adenosine Triphosphatases

The function uses the json output because I think it is easier to parse. In addition to MeSH, it returns:

  • basic fields
  • abstract
  • author_details including affiliation and ORCID
  • journal_info
  • ftxt full text infos
  • chemical
  • comments
  • grant info

Please let me know if I have missed something. I could try to support "raw" outputs in the upcoming version, so everyone could apply alternative parsers.

I don't want to include the core-format in the epmc_search because this format is very nested and thus hard to parse. It would also require more memory.

@cstubben
Copy link
Author

@cstubben cstubben commented Jul 16, 2016

That will work for a few articles, but I'd like MeSH terms for 100s of articles and downloading one at a time will take too long. I'd like the option to get raw output so users can create their own parsers.

Maybe change the id_list option to resulttype and include lite (default), idlist and core, and add a new format option (default is parsed except for core?, but you could return DC, JSON or XML)

epmc_search("title:Waddlia")  # return data.frame
epmc_search("title:Waddlia", resulttype="core", format="xml")
@njahn82 njahn82 added this to the v.0.2 milestone Jul 16, 2016
@njahn82
Copy link
Member

@njahn82 njahn82 commented Jul 16, 2016

Great. Will try to implement it for the upcoming version.

njahn82 added a commit that referenced this issue Dec 7, 2016
@njahn82
Copy link
Member

@njahn82 njahn82 commented Jan 10, 2017

There is now an option that returns the core format in list form:

my_list <- epmc_search('Gabi-Kat', output = 'raw', limit = 10)
# display the structure for one list element
str(my_list[[10]])
#> List of 40
#>  $ id                   : chr "27018849"
#>  $ source               : chr "MED"
#>  $ pmid                 : chr "27018849"
#>  $ pmcid                : chr "PMC4883958"
#>  $ doi                  : chr "10.1080/15592324.2016.1161876"
#>  $ title                : chr "Interaction between vitamin B6 metabolism, nitrogen metabolism and autoimmunity."
#>  $ authorString         : chr "Colinas M, Fitzpatrick TB."
#>  $ authorList           :List of 1
#>   ..$ author:List of 2
#>   .. ..$ :List of 6
#>   .. .. ..$ fullName   : chr "Colinas M"
#>   .. .. ..$ firstName  : chr "Maite"
#>   .. .. ..$ lastName   : chr "Colinas"
#>   .. .. ..$ initials   : chr "M"
#>   .. .. ..$ authorId   :List of 2
#>   .. .. .. ..$ type : chr "ORCID"
#>   .. .. .. ..$ value: chr "0000-0001-7053-2983"
#>   .. .. ..$ affiliation: chr "a Department of Botany and Plant Biology , University of Geneva , Geneva , Switzerland."
#>   .. ..$ :List of 5
#>   .. .. ..$ fullName   : chr "Fitzpatrick TB"
#>   .. .. ..$ firstName  : chr "Teresa B"
#>   .. .. ..$ lastName   : chr "Fitzpatrick"
#>   .. .. ..$ initials   : chr "TB"
#>   .. .. ..$ affiliation: chr "a Department of Botany and Plant Biology , University of Geneva , Geneva , Switzerland."
#>  $ authorIdList         :List of 1
#>   ..$ authorId:List of 1
#>   .. ..$ :List of 2
#>   .. .. ..$ type : chr "ORCID"
#>   .. .. ..$ value: chr "0000-0001-7053-2983"
#>  $ journalInfo          :List of 8
#>   ..$ issue               : chr "4"
#>   ..$ volume              : chr "11"
#>   ..$ journalIssueId      : int 2439536
#>   ..$ dateOfPublication   : chr "2016 "
#>   ..$ monthOfPublication  : int 0
#>   ..$ yearOfPublication   : int 2016
#>   ..$ printPublicationDate: chr "2016-01-01"
#>   ..$ journal             :List of 6
#>   .. ..$ title              : chr "Plant signaling & behavior"
#>   .. ..$ medlineAbbreviation: chr "Plant Signal Behav"
#>   .. ..$ isoabbreviation    : chr "Plant Signal Behav"
#>   .. ..$ issn               : chr "1559-2316"
#>   .. ..$ nlmid              : chr "101291431"
#>   .. ..$ essn               : chr "1559-2324"
#>  $ pubYear              : chr "2016"
#>  $ pageInfo             : chr "e1161876"
#>  $ abstractText         : chr "The essential micronutrient vitamin B6 is best known in its enzymatic cofactor form, pyridoxal 5'-phosphate (PLP). However, vit"| __truncated__
#>  $ affiliation          : chr "a Department of Botany and Plant Biology , University of Geneva , Geneva , Switzerland."
#>  $ language             : chr "eng"
#>  $ pubModel             : chr "Print"
#>  $ pubTypeList          :List of 1
#>   ..$ pubType: chr [1:2] "Journal Article" "Research Support, Non-U.S. Gov't"
#>  $ meshHeadingList      :List of 1
#>   ..$ meshHeading:List of 9
#>   .. ..$ :List of 3
#>   .. .. ..$ majorTopic_YN    : chr "N"
#>   .. .. ..$ descriptorName   : chr "Arabidopsis"
#>   .. .. ..$ meshQualifierList:List of 1
#>   .. .. .. ..$ meshQualifier:List of 2
#>   .. .. .. .. ..$ :List of 3
#>   .. .. .. .. .. ..$ abbreviation : chr "GE"
#>   .. .. .. .. .. ..$ qualifierName: chr "genetics"
#>   .. .. .. .. .. ..$ majorTopic_YN: chr "N"
#>   .. .. .. .. ..$ :List of 3
#>   .. .. .. .. .. ..$ abbreviation : chr "IM"
#>   .. .. .. .. .. ..$ qualifierName: chr "immunology"
#>   .. .. .. .. .. ..$ majorTopic_YN: chr "N"
#>   .. ..$ :List of 3
#>   .. .. ..$ majorTopic_YN    : chr "N"
#>   .. .. ..$ descriptorName   : chr "Nitrogen"
#>   .. .. ..$ meshQualifierList:List of 1
#>   .. .. .. ..$ meshQualifier:List of 1
#>   .. .. .. .. ..$ :List of 3
#>   .. .. .. .. .. ..$ abbreviation : chr "ME"
#>   .. .. .. .. .. ..$ qualifierName: chr "metabolism"
#>   .. .. .. .. .. ..$ majorTopic_YN: chr "Y"
#>   .. ..$ :List of 3
#>   .. .. ..$ majorTopic_YN    : chr "N"
#>   .. .. ..$ descriptorName   : chr "Vitamin B 6"
#>   .. .. ..$ meshQualifierList:List of 1
#>   .. .. .. ..$ meshQualifier:List of 1
#>   .. .. .. .. ..$ :List of 3
#>   .. .. .. .. .. ..$ abbreviation : chr "ME"
#>   .. .. .. .. .. ..$ qualifierName: chr "metabolism"
#>   .. .. .. .. .. ..$ majorTopic_YN: chr "Y"
#>   .. ..$ :List of 3
#>   .. .. ..$ majorTopic_YN    : chr "N"
#>   .. .. ..$ descriptorName   : chr "Arabidopsis Proteins"
#>   .. .. ..$ meshQualifierList:List of 1
#>   .. .. .. ..$ meshQualifier:List of 1
#>   .. .. .. .. ..$ :List of 3
#>   .. .. .. .. .. ..$ abbreviation : chr "ME"
#>   .. .. .. .. .. ..$ qualifierName: chr "metabolism"
#>   .. .. .. .. .. ..$ majorTopic_YN: chr "N"
#>   .. ..$ :List of 2
#>   .. .. ..$ majorTopic_YN : chr "N"
#>   .. .. ..$ descriptorName: chr "Temperature"
#>   .. ..$ :List of 2
#>   .. .. ..$ majorTopic_YN : chr "Y"
#>   .. .. ..$ descriptorName: chr "Autoimmunity"
#>   .. ..$ :List of 2
#>   .. .. ..$ majorTopic_YN : chr "N"
#>   .. .. ..$ descriptorName: chr "Gene Expression Regulation, Plant"
#>   .. ..$ :List of 2
#>   .. .. ..$ majorTopic_YN : chr "N"
#>   .. .. ..$ descriptorName: chr "Reproduction"
#>   .. ..$ :List of 2
#>   .. .. ..$ majorTopic_YN : chr "N"
#>   .. .. ..$ descriptorName: chr "Phenotype"
#>  $ keywordList          :List of 1
#>   ..$ keyword: chr [1:8] "Arabidopsis thaliana" "Autoimmunity" "plant defense" "Vitamin B6" ...
#>  $ chemicalList         :List of 1
#>   ..$ chemical:List of 3
#>   .. ..$ :List of 2
#>   .. .. ..$ name          : chr "Arabidopsis Proteins"
#>   .. .. ..$ registryNumber: chr "0"
#>   .. ..$ :List of 2
#>   .. .. ..$ name          : chr "Vitamin B 6"
#>   .. .. ..$ registryNumber: chr "8059-24-3"
#>   .. ..$ :List of 2
#>   .. .. ..$ name          : chr "Nitrogen"
#>   .. .. ..$ registryNumber: chr "N762921K75"
#>  $ subsetList           :List of 1
#>   ..$ subset:List of 1
#>   .. ..$ :List of 2
#>   .. .. ..$ code: chr "IM"
#>   .. .. ..$ name: chr "Index Medicus"
#>  $ fullTextUrlList      :List of 1
#>   ..$ fullTextUrl:List of 3
#>   .. ..$ :List of 5
#>   .. .. ..$ availability    : chr "Free"
#>   .. .. ..$ availabilityCode: chr "F"
#>   .. .. ..$ documentStyle   : chr "pdf"
#>   .. .. ..$ site            : chr "Europe_PMC"
#>   .. .. ..$ url             : chr "http://europepmc.org/articles/PMC4883958?pdf=render"
#>   .. ..$ :List of 5
#>   .. .. ..$ availability    : chr "Free"
#>   .. .. ..$ availabilityCode: chr "F"
#>   .. .. ..$ documentStyle   : chr "html"
#>   .. .. ..$ site            : chr "Europe_PMC"
#>   .. .. ..$ url             : chr "http://europepmc.org/articles/PMC4883958"
#>   .. ..$ :List of 5
#>   .. .. ..$ availability    : chr "Subscription required"
#>   .. .. ..$ availabilityCode: chr "S"
#>   .. .. ..$ documentStyle   : chr "doi"
#>   .. .. ..$ site            : chr "DOI"
#>   .. .. ..$ url             : chr "http://dx.doi.org/10.1080/15592324.2016.1161876"
#>  $ isOpenAccess         : chr "N"
#>  $ inEPMC               : chr "Y"
#>  $ inPMC                : chr "N"
#>  $ hasPDF               : chr "Y"
#>  $ hasBook              : chr "N"
#>  $ hasSuppl             : chr "N"
#>  $ citedByCount         : int 0
#>  $ hasReferences        : chr "Y"
#>  $ hasTextMinedTerms    : chr "Y"
#>  $ hasDbCrossReferences : chr "N"
#>  $ hasLabsLinks         : chr "N"
#>  $ epmcAuthMan          : chr "N"
#>  $ hasTMAccessionNumbers: chr "N"
#>  $ dateOfCompletion     : chr "2016-12-30"
#>  $ dateOfCreation       : chr "2016-05-11"
#>  $ dateOfRevision       : chr "2016-12-31"
#>  $ firstPublicationDate : chr "2016-03-28"
#>  $ embargoDate          : chr "2016-09-28"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.