Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📕 Documentation: Dictionary.xml and DictionaryDescription.md of: eoAnalysisInstrument (inactive) #15

Open
petermr opened this issue Sep 4, 2019 · 5 comments

Comments

@petermr
Copy link
Owner

petermr commented Sep 4, 2019

Created simple dictionary by hand (can be incremented later)

<dictionary title="instrument">
<desc>Hacked from a few papers PMR 20190904</desc>
<entry term="HP6890" name="HP6890"/>
<entry term="QP-5000" name="QP-5000"/>
<entry term="QP" name="QP"/>
<entry term="QP2010" name="QP2010"/>
<entry term="QP2010S" name="QP2010S"/>
<entry term="Shimadzu" name="Shimadzu"/>
<entry term="Clevenger" name="Clevenger"/>
</dictionary>

NOTE: term is used for searching (maybe with stemming).

NOTE: these are probably not in Wikidata. Also Clevenger is not an instrument and should be removed.

name is descriptive.
title attribute on dictionary must match filename

@petermr
Copy link
Owner Author

petermr commented Sep 4, 2019

searching with dictionaries

cd CEVOpen

verify that oil186 is a subdirectory

ls oil186

then search:

ami-search -p oil186 --dictionary species country mydictionaries/instrument.xml 

species is a builtin search, country is a builtin dictionary, mydictionaries/instrument.xml is relative to current directory.

Results are in PMC*/results/search/instrument/results.xml etc.
and aggregated in

/some/where/.../CEVOpen/oil186/search.instrument.snippets.xml

as

<projectSnippetsTree>
 <snippetsTree>
  <snippets file="oil186/PMC4391421/results/search/instrument/results.xml">
   <result pre="Ph. Eur. 5.0 [ 3 ], by using a modified" exact="Clevenger" post="apparatus (with the EO collection area cooled to prevent"/>
   <result pre="chromatography-mass spectrometry. Samples were analyzed by gas chromatography using a" exact="HP6890" post="instrument coupled with a HP 5973 mass spectrometer. The"/>
  </snippets>
 </snippetsTree>
 <snippetsTree>
  <snippets file="oil186/PMC5080681/results/search/instrument/results.xml">
   <result pre="500 ml deionized water. Then, the flask was connected with" exact="Clevenger" post="apparatus, which was placed in the same apparatus. While"/>
   <result pre="the fresh weight. GC-MS analysis GC-MS chromatograms were recorded using" exact="Shimadzu" post="QP-5000 GC-MS. The GC was equipped with Rtx-5 ms"/>
   <result pre="fresh weight. GC-MS analysis GC-MS chromatograms were recorded using Shimadzu" exact="QP-5000" post="GC-MS. The GC was equipped with Rtx-5 ms column"/>
  </snippets>
 </snippetsTree>

Each CTree (PMC document) is searched into snippetsTree and the result XML element is
in W3C Annotation format (pre, exact, post)

@petermr
Copy link
Owner Author

petermr commented Sep 5, 2019

Simple grep that finds mass spec:

grep -r -E -o ".{0,50}mass spectromet{0,50}" PMC*/scholarly.html

will search all the HTML for "mass spectrom" and gives 50 characters either side

@EmanuelFaria EmanuelFaria changed the title create and test instrument dictionary 📕 DICTIONARY: INSTRUMENT (create and test) Sep 6, 2019
@lubianat
Copy link
Collaborator

lubianat commented Sep 12, 2019

Hello,

I am working on how to migrate the article/instrument matches to Wikidata.

The xml with the excerpts is fantastic, but my xml processing skills are still incipient. I remember having seen in the sprint a summary table with the PMC IDs in one column and counts for each term in another column.

Would you know how I can obtain this summary file?

EDIT: Even though I'm still not able to generate the full html table, I could draft some code to migrate to wikidata from the full table. The code is at https://github.com/caffiendFrog/elife2019/tree/master/wikidatamigration

One of the pages edited: https://www.wikidata.org/wiki/Q44476657

@petermr
Copy link
Owner Author

petermr commented Sep 12, 2019 via email

@petermr
Copy link
Owner Author

petermr commented Sep 13, 2019 via email

@EmanuelFaria EmanuelFaria changed the title 📕 DICTIONARY: INSTRUMENT (create and test) 📕 DICTIONARY: eoAnalysisInstrument (inactive) Mar 25, 2020
@EmanuelFaria EmanuelFaria changed the title 📕 DICTIONARY: eoAnalysisInstrument (inactive) 📕 Documentation: Dictionary.xml and DictionaryDescription.md of: eoAnalysisInstrument (inactive) Mar 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants