Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new data source: generic TSV data source #2027

Closed
julie-sullivan opened this issue Mar 6, 2019 · 4 comments
Closed

Add a new data source: generic TSV data source #2027

julie-sullivan opened this issue Mar 6, 2019 · 4 comments

Comments

@julie-sullivan
Copy link
Member

julie-sullivan commented Mar 6, 2019

  • accepts TSV or CSV
  • column mappings come from the project XML

Here is a file snippet

Gene    Gene name       Sample  Value   Unit
ENSG00000000003 TSPAN6  adipose tissue  31.5    TPM
ENSG00000000003 TSPAN6  adrenal gland   26.4    TPM
ENSG00000000003 TSPAN6  appendix        9.2     TPM

Assumes the model is correct

@julie-sullivan
Copy link
Member Author

How do we get the Gene <--> RNASeqResult relationship?

@julie-sullivan
Copy link
Member Author

julie-sullivan commented Mar 6, 2019

Can go through the model and see which relationships have a type of RNASeqResult? NO Shouldn't guess.

These should also be in the project XML somehow. How?

<property name="rel3" value="Gene.RNASeqResults"/>

@sammyjava
Copy link
Member

Nice. I have scads and scads of special-purpose TSV sources. Although most of those have the data model constraints written into them as well, so it's hard to envision a generic source that would handle them. But nice to have one in any case.

danielabutano pushed a commit to danielabutano/intermine that referenced this issue Feb 3, 2022
danielabutano pushed a commit to danielabutano/intermine that referenced this issue Feb 4, 2022
danielabutano pushed a commit to danielabutano/intermine that referenced this issue Feb 7, 2022
danielabutano pushed a commit to danielabutano/intermine that referenced this issue Feb 9, 2022
danielabutano pushed a commit to danielabutano/intermine that referenced this issue Feb 15, 2022
danielabutano pushed a commit to danielabutano/intermine that referenced this issue Feb 16, 2022
@danielabutano
Copy link
Member

danielabutano commented Feb 16, 2022

This implementation is pretty simple:

  1. it reads generic-separated-values files (TSV or CSV)
  2. it loads more than one class (in the example below Gene, Organism and Protein)
  3. it's able to create properly one to many, many to many relations between the entities loaded
  4. it doesn't require to set the relations between classes. Given the classes configured in the properties file, it deduces the relations from the model.

Note: Unfortuanlly with the current implementation/configuration, it's not possible to load two entities with the same type (e.g. protein and its isoforms, which are Protein too) because there is no way to distinguish the protein from the isoform protein

Configuration example:

<source name="delimited" type="delimited">
  <property name="delimited.dataSourceName" value="TSV Source Name"/>
  <property name="delimited.dataSetTitle" value="TSV Data Set"/>
  <property name="delimited.licence" value="http//usemydatalicence.com"/>
  <property name="delimited.hasHeader" value="true"/>
  <property name="delimited.columns" value="Gene.primaryIdentifier, Organism.taxonId, null,Protein.primaryIdentifier,Protein.primaryAccession"/>
  <property name="delimited.includes" value="test.tsv"/>
  <property name="src.data.dir" location="/home/daniela/Projects/data/malaria/tsv/"/>
</source>

Mandatory poperties:

  • delimited.dataSourceName
  • delimited.dataSetTitle
  • delimited.columns

Optional poperties:

  • delimited.separator (tab/TAB/comma/COMMA). Default value is: tab
  • delimited.hasHeader (true/false). Default value is: true

danielabutano pushed a commit to danielabutano/intermine that referenced this issue Mar 8, 2022
danielabutano pushed a commit to danielabutano/intermine that referenced this issue Mar 8, 2022
danielabutano pushed a commit to danielabutano/intermine that referenced this issue Mar 9, 2022
danielabutano pushed a commit to danielabutano/intermine that referenced this issue Mar 11, 2022
danielabutano pushed a commit to danielabutano/intermine that referenced this issue Mar 11, 2022
@danielabutano danielabutano added this to the InterMine 5.0.7 milestone Apr 26, 2022
@danielabutano danielabutano added this to InterMine 5.0.7 (pombemine & bitfount) in Roadmap & release planning Apr 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants