Skip to content

Commit

Permalink
Er fields tutorial (#611)
Browse files Browse the repository at this point in the history
* Updated ER docs

* Update Docs

* Removed notebook

* ER Docs example corrections

* Example with field value

* Update

* sphinx updates

* remove generated docs

* Ignore generated hdmf api docs

* Removed Notebook doc

* Day 1 Hackathon

* 2nd

* 2nd

* Fix Duplicate resources in add_ref with new unit test for it

* Update unit tests

* Rough Draft with bugs

* Rough Draft with bugs and notebook

* saved buggy doc

* Big Update

* Updated documentation

* Update doc check_object

* Additional test

* schema to master

* update resources

* before dev merge

* Private methods/update resources/docs/tests

* Update docs

* update docs

* Suggestions to Docs

* Update docs/gallery/External_Resources.py

Co-authored-by: Ryan Ly <rly@lbl.gov>

* Update docs/gallery/External_Resources.py

Co-authored-by: Ryan Ly <rly@lbl.gov>

* Update docs/gallery/External_Resources.py

Co-authored-by: Ryan Ly <rly@lbl.gov>

* Update docs/gallery/External_Resources.py

Co-authored-by: Ryan Ly <rly@lbl.gov>

* Update docs/gallery/External_Resources.py

Co-authored-by: Ryan Ly <rly@lbl.gov>

* Update docs/gallery/External_Resources.py

Co-authored-by: Ryan Ly <rly@lbl.gov>

* Update docs/gallery/External_Resources.py

Co-authored-by: Ryan Ly <rly@lbl.gov>

* Update src/hdmf/common/resources.py

Co-authored-by: Ryan Ly <rly@lbl.gov>

* Update docs/gallery/External_Resources.py

Co-authored-by: Ryan Ly <rly@lbl.gov>

* Update docs/gallery/External_Resources.py

Co-authored-by: Ryan Ly <rly@lbl.gov>

* Update docs/gallery/External_Resources.py

Co-authored-by: Ryan Ly <rly@lbl.gov>

* Update docs/gallery/External_Resources.py

Co-authored-by: Ryan Ly <rly@lbl.gov>

* flake8

* Update docs

* Er doc field update

* 06/07 update

* ER DOCS

* Multi-level fields finished

* Bug fixes Objectkeys/added new method

* ER new method and test/fixed duplicate bug in ObjectKeys

* Update to tests by adding skips

* Updated ER Docs

* Test on ObjectKey uniqueness

* final details

* Remove notebook

* Quick Fix

* Cosmetic

* Minor text and formatting changes

* Minor formatting changes

* Minor text changes

* Update doc

* Update docs/gallery/External_Resources.py

* Fix flake8

Co-authored-by: Matthew Avaylon <mavaylon@MacBook-Pro.local>
Co-authored-by: Ryan Ly <rly@lbl.gov>
  • Loading branch information
3 people committed Jun 26, 2021
1 parent ae63f69 commit 858f933
Show file tree
Hide file tree
Showing 3 changed files with 254 additions and 51 deletions.
196 changes: 173 additions & 23 deletions docs/gallery/External_Resources.py
Expand Up @@ -16,7 +16,7 @@
# to organize and map user terms (keys) to multiple resources and entities
# from the resources. A typical use case for external resources is to link data
# stored in datasets or attributes to ontologies. For example, you may have a
# dataset `country` storing locations. Using
# dataset ``country`` storing locations. Using
# :py:class:`~hdmf.common.resources.ExternalResources` allows us to link the
# country names stored in the dataset to an ontology of all countries, enabling
# more rigid standardization of the data and facilitating data query and
Expand All @@ -31,7 +31,6 @@
# allows us to link the two. To reduce data redundancy and improve data integrity,
# ``ExternalResources`` stores this data internally in a collection of
# interlinked tables.
#
# * :py:class:`~hdmf.common.resources.KeyTable` where each row describes a
# :py:class:`~hdmf.common.resources.Key`
# * :py:class:`~hdmf.common.resources.ResourceTable` where each row describes a
Expand All @@ -48,33 +47,107 @@
# convenience functions to simplify interaction with these tables, allowing users
# to treat ``ExternalResources`` as a single large table as much as possible.

###############################################################################
# Rules to ExternalResources
# ------------------------------------------------------
# When using the :py:class:`~hdmf.common.resources.ExternalResources` class, there
# are rules to how users store information in the interlinked tables.

# 1. Multiple :py:class:`~hdmf.common.resources.Key` objects can have the same name.
# They are disambiguated by the :py:class:`~hdmf.common.resources.Object` associated
# with each.
# 2. In order to query specific records, the :py:class:`~hdmf.common.resources.ExternalResources` class
# uses '(object_id, field, Key)' as the unique identifier.
# 3. :py:class:`~hdmf.common.resources.Object` can have multiple :py:class:`~hdmf.common.resources.Key`
# objects.
# 4. Multiple :py:class:`~hdmf.common.resources.Object` objects can use the same :py:class:`~hdmf.common.resources.Key`.
# Note that the :py:class:`~hdmf.common.resources.Key` may already be associated with resources
# and entities.
# 5. Do not use the private methods to add into the :py:class:`~hdmf.common.resources.KeyTable`,
# :py:class:`~hdmf.common.resources.ResourceTable`, :py:class:`~hdmf.common.resources.EntityTable`,
# :py:class:`~hdmf.common.resources.ObjectTable`, :py:class:`~hdmf.common.resources.ObjectKeyTable`
# individually.
# 6. URIs are optional, but highly recommended. If not known, an empty string may be used.
# 7. An entity ID should be the unique string identifying the entity in the given resource.
# This may or may not include a string representing the resource and a colon.
# Use the format provided by the resource. For example, Identifiers.org uses the ID ``ncbigene:22353``
# but the NCBI Gene uses the ID ``22353`` for the same term.

###############################################################################
# Creating an instance of the ExternalResources class
# ------------------------------------------------------

# sphinx_gallery_thumbnail_path = 'figures/gallery_thumbnail_externalresources.png'
from hdmf.common import ExternalResources
from hdmf.common import DynamicTable
from hdmf import Data
import numpy as np

er = ExternalResources(name='example')

###############################################################################
# Using the add_ref method
# ------------------------------------------------------
# :py:func:`~hdmf.common.resources.ExternalResources.add_ref`
# is a wrapper function provided by the ``ExternalResources`` class, that
# is a wrapper function provided by the ``ExternalResources`` class that
# simplifies adding data. Using ``add_ref`` allows us to treat new entries similar
# to adding a new row to a flat table, with ``add_ref`` taking care of populating
# the underlying data structures accordingly.

data = Data(name="species", data=['Homo sapiens', 'Mus musculus'])
er.add_ref(container=data, field='', key='Homo sapiens', resource_name='NCBI_Taxonomy',
resource_uri='https://www.ncbi.nlm.nih.gov/taxonomy', entity_id='NCBI:txid9606',
entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606')
er.add_ref(
container=data,
field='',
key='Homo sapiens',
resource_name='NCBI_Taxonomy',
resource_uri='https://www.ncbi.nlm.nih.gov/taxonomy',
entity_id='NCBI:txid9606',
entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606'
)

er.add_ref(container=data, field='', key='Mus musculus', resource_name='NCBI_Taxonomy',
resource_uri='https://www.ncbi.nlm.nih.gov/taxonomy', entity_id='NCBI:txid10090',
entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090')
er.add_ref(
container=data,
field='',
key='Mus musculus',
resource_name='NCBI_Taxonomy',
resource_uri='https://www.ncbi.nlm.nih.gov/taxonomy',
entity_id='NCBI:txid10090',
entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090'
)

###############################################################################
# Using the add_ref method with get_resource
# ------------------------------------------------------
# When adding references to resources, you may want to refer to multiple entities
# within the same resource. Resource names are unique, so if you call ``add_ref``
# with the name of an existing resource, then that resource will be reused. You
# can also use the :py:func:`~hdmf.common.resources.ExternalResources.get_resource`
# method to get the ``Resource`` object and pass that in to ``add_ref`` to
# reuse an existing resource.

# Let's create a new instance of ExternalResources.
er = ExternalResources(name='example')

data = Data(name="species", data=['Homo sapiens', 'Mus musculus'])
er.add_ref(
container=data,
field='',
key='Homo sapiens',
resource_name='NCBI_Taxonomy',
resource_uri='https://www.ncbi.nlm.nih.gov/taxonomy',
entity_id='NCBI:txid9606',
entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606'
)

# Using get_resource
existing_resource = er.get_resource('NCBI_Taxonomy')
er.add_ref(
container=data,
field='',
key='Mus musculus',
resources_idx=existing_resource,
entity_id='NCBI:txid10090',
entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090'
)

###############################################################################
# Using the add_ref method with get_resource
Expand Down Expand Up @@ -106,23 +179,30 @@
# In the above example, the ``field`` keyword argument was empty because the data
# of the :py:class:`~hdmf.container.Data` object passed in for the ``container``
# argument was being associated with a resource. However, you may want to associate
# an attribute of a :py:class:`~hdmf.container.Data` object with a resource or
# an attribute of a :py:class:`~hdmf.container.Data` object with a resource, or
# a dataset or attribute of a :py:class:`~hdmf.container.Container` object with
# a resource. To disambiguate between these different fields, you can set the
# 'field' keyword.

genotypes = DynamicTable(name='genotypes', description='My genotypes')
genotypes.add_column(name='genotype_name', description="Name of genotypes")
genotypes.add_row(id=0, genotype_name='Rorb')
er.add_ref(container=genotypes, field='genotype_name', key='Rorb', resource_name='MGI Ontology',
resource_uri='http://www.informatics.jax.org/', entity_id='MGI:1346434',
entity_uri="http://www.informatics.jax.org/probe/key/804614")
er.add_ref(
container=genotypes,
field='genotype_name',
key='Rorb',
resource_name='MGI Database',
resource_uri='http://www.informatics.jax.org/',
entity_id='MGI:1346434',
entity_uri='http://www.informatics.jax.org/marker/MGI:1343464'
)

###############################################################################
# Using the get_keys method
# ------------------------------------------------------
# This method returns a DataFrame of key_name, resource_table_idx, entity_id,
# and entity_uri. You can either have a single key object,
# The :py:func:`~hdmf.common.resources.ExternalResources.get_keys` method
# returns a `~pandas.DataFrame` of key_name, resource_table_idx, entity_id,
# and entity_uri. You can either pass a single key object,
# a list of key objects, or leave the input paramters empty to return all.

# All Keys
Expand All @@ -137,7 +217,8 @@
###############################################################################
# Using the get_key method
# ------------------------------------------------------
# This method will return a ``Key`` object. In the current version of ``ExternalResources``,
# The :py:func:`~hdmf.common.resources.ExternalResources.get_key`
# method will return a ``Key`` object. In the current version of ``ExternalResources``,
# duplicate keys are allowed; however, each key needs a unique linking Object.
# In other words, each combination of (container, field, key) can exist only once in
# ``ExternalResources``.
Expand All @@ -148,12 +229,81 @@
###############################################################################
# Using the add_ref method with a key_object
# ------------------------------------------------------
# Sometimes you want to reference a specific key that already exists when adding
# new ontology data into ``ExternalResources``.
# Multiple :py:class:`~hdmf.common.resources.Object` objects can use the same
# :py:class:`~hdmf.common.resources.Key`. To use an existing key when adding
# new entries into ``ExternalResources``, pass the :py:class:`~hdmf.common.resources.Key`
# object instead of the 'key_name' to the ``add_ref`` method. If a 'key_name' is used,
# a new Key will be created.

er.add_ref(container=genotypes, field='genotype_name', key=key_object, resource_name='Ensembl',
resource_uri='https://uswest.ensembl.org/index.html', entity_id='ENSG00000198963',
entity_uri='https://uswest.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000198963')
er.add_ref(
container=genotypes,
field='genotype_name',
key=key_object,
resource_name='Ensembl',
resource_uri='https://uswest.ensembl.org/index.html',
entity_id='ENSG00000198963',
entity_uri='https://uswest.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000198963'
)

# Let's use get_keys to visualize
# Let's use get_keys to visualize all the keys that have been added up to now
er.get_keys()

###############################################################################
# Using get_object_resources
# ------------------------------------------------------
# This method will return information regarding keys, resources, and entities for
# an ``Object``. You can pass either the ``AbstractContainer`` object or its
# object ID for the ``container`` argument, and the name of the field
# (container attribute) for the ``field`` argument.

er.get_object_resources(container=genotypes, field='genotype_name')

###############################################################################
# Special Case: Using add_ref with multi-level fields
# ------------------------------------------------------
# In most cases, the field is the name of a dataset or attribute,
# but if the dataset or attribute is a compound data type, then associating
# external resources with a particular column/field of the compound data type requires
# a special syntax. For example, if a dataset has a compound data type with
# columns/fields 'x', 'y', and 'z', and each
# column/field is associated with different ontologies, then use the 'field'
# value to differentiate the different columns of the dataset.
# This should done using '/' as a separator, e.g., field='data/unit/x'.

# Let's create a new instance of ExternalResources.
er = ExternalResources(name='example')

data = Data(
name='data_name',
data=np.array(
[('Mus musculus', 9, 81.0), ('Homo sapiens', 3, 27.0)],
dtype=[('species', 'U14'), ('age', 'i4'), ('weight', 'f4')]
)
)

er.add_ref(
container=data,
field='data/species',
key='Mus musculus',
resource_name='NCBI_Taxonomy',
resource_uri='https://www.ncbi.nlm.nih.gov/taxonomy',
entity_id='NCBI:txid10090',
entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090'
)

# Note that because the container is a ``Data`` object, and the external resource is being
# associated with the values of the dataset rather than an attribute of the dataset,
# the field must be prefixed with 'data'. Normally, to associate an external resource
# with the values of the dataset, the field can be left blank. This allows us to
# differentiate between a dataset compound data type field named 'x' and a dataset
# attribute named 'x'.

er.add_ref(
container=data,
field='data/species',
key='Homo sapiens',
resource_name='NCBI_Taxonomy',
resource_uri='https://www.ncbi.nlm.nih.gov/taxonomy',
entity_id='NCBI:txid9606',
entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606'
)
31 changes: 29 additions & 2 deletions src/hdmf/common/resources.py
Expand Up @@ -212,7 +212,7 @@ def _add_object(self, **kwargs):

@docval({'name': 'obj', 'type': (int, Object), 'doc': 'the Object to that uses the Key'},
{'name': 'key', 'type': (int, Key), 'doc': 'the Key that the Object uses'})
def _add_external_reference(self, **kwargs):
def _add_object_key(self, **kwargs):
"""
Specify that an object (i.e. container and field) uses a key to reference
an external resource
Expand Down Expand Up @@ -332,6 +332,7 @@ def add_ref(self, **kwargs):

if not isinstance(key, Key):
key = self._add_key(key)
self._add_object_key(object_field, key)

if kwargs['resources_idx'] is not None and kwargs['resource_name'] is None and kwargs['resource_uri'] is None:
resource_table_idx = kwargs['resources_idx']
Expand All @@ -358,10 +359,36 @@ def add_ref(self, **kwargs):

if add_entity:
entity = self._add_entity(key, resource_table_idx, entity_id, entity_uri)
self._add_external_reference(object_field, key)

return key, resource_table_idx, entity

@docval({'name': 'container', 'type': (str, AbstractContainer),
'doc': 'the Container/data object that is linked to resources/entities',
'default': None},
{'name': 'field', 'type': str,
'doc': 'the field of the Container',
'default': None})
def get_object_resources(self, **kwargs):
"""
Get all entities/resources associated with an object
"""
container = kwargs['container']
field = kwargs['field']

keys = []
entities = []
if container is not None and field is not None:
object_field = self._check_object_field(container, field)
# Find all keys associated with the object
for row_idx in self.object_keys.which(objects_idx=object_field.idx):
keys.append(self.object_keys['keys_idx', row_idx])
# Find all the entities/resources for each key.
for key_idx in keys:
entity_idx = self.entities.which(keys_idx=key_idx)
entities.append(self.entities.__getitem__(entity_idx[0]))
df = pd.DataFrame(entities, columns=['keys_idx', 'resource_idx', 'entity_id', 'entity_uri'])
return df

@docval({'name': 'keys', 'type': (list, Key), 'default': None,
'doc': 'the Key(s) to get external resource data for'},
rtype=pd.DataFrame, returns='a DataFrame with keys and external resource data')
Expand Down

0 comments on commit 858f933

Please sign in to comment.