Skip to content

Commit

Permalink
update licence docs
Browse files Browse the repository at this point in the history
  • Loading branch information
julie-sullivan committed Apr 11, 2019
1 parent 2456255 commit caae84d
Show file tree
Hide file tree
Showing 6 changed files with 60 additions and 25 deletions.
8 changes: 7 additions & 1 deletion docs/data-model/model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,12 @@ Many to many relationship
e.g. Gene has a collection of Pathways and Pathway has a collection of Genes, fill in either Gene.pathways or Pathway.genes but not both.
If Pathway.genes contains e.g. 20,000 items and Gene.pathways typically 100 items then it is faster to populate Gene.pathways.

Ontologies
-----------------------

It's possible to decorate your InterMine data model with ontology terms.

This isn't used anywhere (yet) but will be used in the future when we start generating RDF.

A short example
-----------------------
Expand All @@ -121,7 +127,7 @@ A short example
<?xml version="1.0"?>
<model name="testing" package="org.intermine.model.bio">
<class name="Protein>" is-interface="true">
<class name="Protein" is-interface="true">
<attribute name="name" type="java.lang.String"/>
<attribute name="extraData" type="java.lang.String"/>
<collection name="features" referenced-type="NewFeature" reverse-reference="protein"/>
Expand Down
46 changes: 32 additions & 14 deletions docs/database/data-sources/data-licences.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,22 @@ How is this information being used?

These data can be displayed prominently on the report page and in query results. We'll also use the licences in the RDF generation.

Why does it have to be a URL to a standard data licence?
------------------------------------------------------------------------

The contents of `DataSet.licence` should a URL that points to a standard data licence.

Why can't I put a URL to the fair use policy?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you put a URL to the data source's fair use policy for example, the URL might change. Also, sometimes the fair use policy is vague, contradictory or just hard to understand.

Why can't I put a short snippet about the fair use policy for these data?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you summarise the fair use policy, there is a danger that you get it wrong, or the data policy changes.

**Providing no information about the data licence is better than having bad information about the data licence.**

How to add licence to an InterMine?
------------------------------------
Expand All @@ -48,31 +64,32 @@ To update the data licence, add the licence information to the project XML file.
<!-- gff example -->
<source name="my-gff" type="my-gff" version="4.0.0">
<property name="gff3.taxonId" value="7227"/>
<property name="gff3.seqClsName" value="MRNA"/>
<property name="src.data.dir" location="/data/flymine"/>
<property name="gff3.dataSetTitle" value="Long oligo data set"/>
<!-- add licence here -->
<property name="gff3.licence" value="https://creativecommons.org/licenses/by-sa/3.0/" />
...
</source>
Another example:
FASTA

.. code-block:: xml
<!-- FASTA example -->
<source name="my-fasta" type="fasta">
<property name="fasta.taxonId" value="7227"/>
<property name="fasta.dataSetTitle" value="Fasta data set for Drosophila melanogaster"/>
<property name="fasta.dataSourceName" value="MyDataSource"/>
<source name="my-fasta" type="fasta">
<!-- add licence here -->
<property name="fasta.licence" value="https://creativecommons.org/licenses/by/4.0/"/>
<property name="fasta.className" value="org.intermine.model.bio.Gene"/>
<property name="fasta.classAttribute" value="primaryIdentifier"/>
<property name="fasta.includes" value="dmel-all-gene-*.fasta"/>
<property name="src.data.dir" location="/data/fasta"/>
...
</source>
OBO

.. code-block:: xml
<!-- OBO example -->
<source name="so" type="so">
<property name="src.data.file" location="so.obo"/>
<!-- add licence here -->
<property name="licence" value="https://creativecommons.org/licenses/by/4.0/"/>
</source>
**All others**
Expand Down Expand Up @@ -107,7 +124,8 @@ This will update the data set licence field for you.
None of my data sources have data licences
------------------------------------------------------


We discovered that only a minority of data sets have a licence: of the 26 core data set types that InterMine supports, only 9 have a data set licence, although 14 had some text about fair use.

Please see our `blog posts <https://intermineorg.wordpress.com/2019/01/03/being-fair-data-licences-in-intermine/>`_ for more details.

.. index:: data licences, licence
3 changes: 3 additions & 0 deletions docs/database/data-sources/library/fasta.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ project XML example
<property name="flybase-dmel-gene-fasta.classAttribute" value="primaryIdentifier"/>
<property name="flybase-dmel-gene-fasta.includes" value="dmel-all-gene-*.fasta"/>
<property name="src.data.dir" location="/DATA/flybase/fasta"/>
<!-- add licence here -->
<property name="flybase-dmel-gene-fasta.licence" value="https://creativecommons.org/licenses/by/4.0/"/>
</source>
Expand All @@ -42,6 +44,7 @@ src.data.dir location of the fasta data file these data
includes name of data file this data file will be loaded into the database
sequenceType class name type of sequence to be loaded
loaderClassName name of Java file that will process the fasta files only use if you have created a custom fasta loader
licence URL pointing to standard data licence for data updates DataSet.licence with value
=============== =================================================== =========================================================

.. index:: FASTA, sequences
21 changes: 12 additions & 9 deletions docs/database/data-sources/library/gff.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,24 +79,27 @@ Here is an example GFF3 entry in the project XML file:
# NOTE: update the "type" if you are using your own custom GFF3 parser
<source name="example-gff3" type="gff">
<property name="gff3.taxonId" value="7227"/>
<property name="gff3.taxonId" value="9606"/>
<property name="gff3.seqClsName" value="Chromosome"/>
<property name="src.data.dir" location="/DATA/*.gff3"/>
<property name="gff3.dataSourceName" value="NCBI"/>
<property name="gff3.dataSetTitle" value="Release GRCh38 of the Homo sapiens genome sequence"/>
<!-- add licence here -->
<property name="gff3.licence" value="https://creativecommons.org/licenses/by-sa/3.0/" />
</source>
Here are the descriptions of the properties available:

====================== ============================= ===========================================================================================================
====================== ================================================= ===========================================================================
property example definition
====================== ============================= ===========================================================================================================
gff3.seqClsName Chromosome the ids in the first column represent Chromosome objects, e.g. MAL1
gff3.taxonId 36329 taxon id of malaria
gff3.dataSourceName PlasmoDB the data source for features and their identifiers, this is used for the DataSet (evidence) and synonyms.
gff3.seqDataSourceName PlasmoDB the source of the seqids (chromosomes) is sometimes different to the features described
gff3.dataSetTitle PlasmoDB P. falciparum genome a DataSet object is created as evidence for the features, it is linked to a DataSource (PlasmoDB)
====================== ============================= ===========================================================================================================
====================== ================================================= ===========================================================================
gff3.seqClsName Chromosome the ids in the first column represent Chromosome objects, e.g. MAL1
gff3.taxonId 36329 taxon id
gff3.dataSourceName PlasmoDB the data source for features and their identifiers, this is used for the DataSet (evidence) and synonyms.
gff3.seqDataSourceName PlasmoDB the source of the seqids (chromosomes) is sometimes different to the features described
gff3.dataSetTitle PlasmoDB P. falciparum genome a DataSet object is created as evidence for the features, it is linked to a DataSource (PlasmoDB)
gff3.licence https://creativecommons.org/licenses/by-sa/3.0/ URL to a standard data licence
====================== ================================================= ===========================================================================


Writing a custom GFF parser
Expand Down
6 changes: 5 additions & 1 deletion docs/database/data-sources/library/go/go-obo.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ project XML example
.. code-block:: xml
<source name="go" type="go">
<property name="src.data.file" location="/data/go-annotation/go-basic.obo"/>
<property name="src.data.file" location="/data/go-annotation/go-basic.obo"/>
</source>
`go-basic.obo` should load in a few minutes. `go.obo` is much more complex and takes a few hours and lots of memory.
Expand All @@ -30,4 +30,8 @@ Optional parameter: <property name="ontologyPrefix" value="FBbt"/>

This parameter causes the data parser to only load ontology terms with that prefix. Some OBO files have cross references that include ontology terms from other ontologies. Unfortunately the file doesn't include which terms correspond to which ontologies so we have to set the prefix.

Optional parameter: <property name="licence" value="https://creativecommons.org/licenses/by/4.0/"/>

This parameter will update the DataSet.licence field with the value you specify.

.. index:: GO, gene ontology, OBO
1 change: 1 addition & 0 deletions docs/database/data-sources/library/so.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ project XML example
<source name="so" type="so">
<property name=”src.data.file” location=”so.obo” />
<property name="licence" value="https://creativecommons.org/licenses/by/4.0/"/>
</source>
To add or remove SO terms from your model, update your `so_terms` file in `dbmodel/resources`
Expand Down

0 comments on commit caae84d

Please sign in to comment.