- 
                Notifications
    
You must be signed in to change notification settings  - Fork 0
 
Applications
        Atsuko Yamaguchi edited this page Feb 26, 2015 
        ·
        17 revisions
      
    - Reinforcement and supplement of data of KNaPSAcK adding references that may include chemical compounds and species.
 - Members: Atsuko Yamaguchi, Toshiaki Tokimatsu
 - KNaPSAcK: a database of metabolites and organisms (mainly plants and microbes).
http://kanaya.naist.jp/knapsack_jsp/top.html- Problem: 50,899 metabolite and 109,820 species-metabolite pairs are in KNApSAcK database. Average species-metabolite pairs per metabolite is only about two. We would like to know as many organisms as possible (for mass productions, etc).
 - Solution: We will try to add information from annotated abstracts.
 - Annotation:
- PubTator (http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/)
- Sample data in pubannotation.org (http://pubannotation.org/projects/pubtator-sample)
 
 - tmChem (through PubTator)
 
 - PubTator (http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/)
 - Outline of our method:
- 
- Extract papers that contain both chemical names and organisms using annotations.
 
 - 
- Compute correspondence between chemical ID of KNaPSAcK and MeSH/ChEBI (that are used in PubTator)
 
 - 
- Manually check these papers (or automatic support?)
 
 
 - 
 - TODO:
- 
- Estimate the cover ratio of chemical compounds appearing in annotations to those included in KNapSAcK (A goal of this hackathon).
 
- 1-1. How to link from KNapSAcK ID to MeSH/ChEBI?
- plan:
- Convert KNapSAcK ID->KEGG compound ID->PubChem or ChEBI
 - Convert MeSH->PubChem
 - Compare two IDs.
 
 - Problem: Because of stereoisomers, ID conversion is not one-to-one mapping.
 
 - plan:
 
 - 
- Consider how to narrow down candidate papers (Future work).
 
 
 - 
 - What we did:
- We analyzed an inclusion relation between papers manually selected to construct KNapSAcK and papers including both chemical names and organism name in annotated abstract.
- 
- The number of papers including chemical names: 9547412
 
 - 
- The number of papers including organism names: 12599725
 
 - 
- The intersection of 1 and 2: 6318259
 
 - 
- The number of papers having pubmed ID for 1000 reference papers of KNapSAcK: 158
 
 - 
- The intersection of 3 and 4: 47 (the ratio 1/3 seems to be not so good but not so bad...)
 
 
 - 
 - TODO:
- To read the abstracts of 111 papers to know why the chemical name / organism in the papers are not annotated by PubTator
- Chemical names/organisms might not be written in abstact. Or there may be another reason.
 
 - To analyze an inclusion relation between organism-chemical pairs included in KNapSAcK and those obtained from annotated abstracts.
 
 - To read the abstracts of 111 papers to know why the chemical name / organism in the papers are not annotated by PubTator
 
 - We analyzed an inclusion relation between papers manually selected to construct KNapSAcK and papers including both chemical names and organism name in annotated abstract.
 
 
- Reinforcement and supplement of data of PRIDE adding references that may include proteins.
 - Members: Shin Kawano
 - PRIDE: The PRIDE PRoteomics IDEntifications database is a centralized, standards compliant, public data repository for proteomics data, including protein and peptide identifications, post-translational modifications and supporting spectral evidence.
http://www.ebi.ac.uk/pride/archive/- Problem: PRIDE provides only detected proteins and peptides list, the proteins have no annotations.
 - Solution: We will try to add information of cellular localization from annotated abstracts.
 - Annotation:
- LocText (https://www.tagtog.net/-corpora/loctext)
- Sample data in pubannotation.org (http://pubannotation.dbcls.jp/projects/LocText)
 
 
 - LocText (https://www.tagtog.net/-corpora/loctext)
 - Outline of our method:
- Make a ProteinID-PMID correspondence table from LocText annotation
 - Retrieve protein and peptide list from PRIDE API (http://wwwdev.ebi.ac.uk/pride/ws/archive/)
 - Show summary and detail protein page including evidenced abstract and detected peptides
 
 - TODO: *