Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

33 annotation tests #103

Merged
merged 42 commits into from
Jul 6, 2017
Merged

33 annotation tests #103

merged 42 commits into from
Jul 6, 2017

Conversation

ChristianLieven
Copy link
Contributor

resolves #33

This PR includes a couple of new checks and tests.

First of all when I refer to 'items' I mean either metabolites or reactions.

  1. I am adding annotation.py, which contains:
    REACTION_ANNOTATIONS : A dictionary mapping MIRIAM-style reaction database namespaces against the corresponding identifier patterns as compiled regex patterns.
    METABOLITE_ANNOTATIONS` : A dictionary mapping MIRIAM-style metabolite database namespaces against the corresponding identifier patterns as compiled regex patterns.

I selected the included databases more or less from experience, i.e. what I had seen to be commonly referenced in metabolic models. I am open for input on what are or aren't 'must-have' databases.

In addition to these two dictionaries, annotation.py contains functions to list items without annotation, list items without annotations of each specific database, list items that do contain annotations of a specific database, but which identifiers do not match the corresponding regex pattern.

  1. test_annotation.py:
    Contains annotation specific tests for a given model.
    We expect that no item is without annotation, that all items have annotations from all databases and that the identifiers of all of those databases are correct.

  2. test_for_annotation.py:
    Contains test for the code in annotation.py.
    We expect that no item is without annotation, that all items have annotations from all databases and that the identifiers of all of those databases are correct.

@codecov-io
Copy link

codecov-io commented May 23, 2017

Codecov Report

Merging #103 into develop will increase coverage by 0.87%.
The diff coverage is 90.74%.

Impacted file tree graph

@@            Coverage Diff             @@
##           develop    #103      +/-   ##
==========================================
+ Coverage    84.43%   85.3%   +0.87%     
==========================================
  Files            8       9       +1     
  Lines          334     388      +54     
  Branches        72      83      +11     
==========================================
+ Hits           282     331      +49     
- Misses          47      51       +4     
- Partials         5       6       +1
Impacted Files Coverage Δ
memote/support/helpers.py 84.21% <40%> (-6.7%) ⬇️
memote/support/annotation.py 95.91% <95.91%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 752f2b6...e8846b8. Read the comment docs.

@ChristianLieven
Copy link
Contributor Author

I'll still be updating this by a couple of relevant features. So maybe wait just a little longer with pulling this in.

@ChristianLieven
Copy link
Contributor Author

This is good to go now!

The annotation module will contain helper functions to provide functions
for testing the model to use commonly used annotation references within
the MIRIAM annotation scheme. This is to ensure that metabolites and
reactions can be mapped across several namespaces in an automated,
machine-readable fashion.
To stay in line with the common output of all support functions
I changed the output of both annotation_overview functions in
annotations.py to return lists of metabolites/ reactions instead
of returning lists of metabolite / reaction IDs.
Added test for both functions that find model components without
annotations: find_met_without_annotation and
find_rxn_without_annotation.
The function find_wrong_annotation_ids takes an overview_dict
as input. From that it obtains a list of reactions or metabolites
with annotations belonging to a given database namespace and then
collects all database IDs that do not match the MIRIAM specified
pattern. The output is another overview dict, this time listing all
rxns/mets that have annotations of a certain type but with wrong
identifiers.
This was, obviously, causing problems with the native python
function 'type'.
ChristianLieven and others added 16 commits July 3, 2017 11:14
Reaction and metabolite ids are checked for matches against all
database patterns outlined in annotations.py.
The biocyc ID pattern for both metablites and reactions match
other IDs aswell. However, this seems to be only the case, because
the biocyc pattern is fairly broad. It doesn't seem to be true for
any other pattern, hence I implemented a clean-up step in the
dataframes output by collect_{rxn|met}_id_namespace.
The functions collect_{met/rxn}_id_namespace now output
a dataframe that contains a column 'unknown' for those IDs
did not match any of the specified Database patterns.
We now check if all metabolites of a given model belong
to the same database namespace.
I ordered the dictionaries REACTION_ANNOTATIONS and METABOLITE_ANNOTATIONS so that the patterns will be applied in
a non-overlapping fashion. I also altered the collect functions
to break as soon as a pattern matches an ID. This is is to avoid
duplicate matching, since BiGG and BioCyc patterns are very
generic and are very likely to match all the other identifiers too.
@Midnighter Midnighter merged commit 530805d into develop Jul 6, 2017
@Midnighter Midnighter removed the ready label Jul 6, 2017
@Midnighter Midnighter deleted the 33_annotation_tests branch July 6, 2017 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test if metabolites can be mapped against MetaNetX DB
3 participants