-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
33 annotation tests #103
33 annotation tests #103
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #103 +/- ##
==========================================
+ Coverage 84.43% 85.3% +0.87%
==========================================
Files 8 9 +1
Lines 334 388 +54
Branches 72 83 +11
==========================================
+ Hits 282 331 +49
- Misses 47 51 +4
- Partials 5 6 +1
Continue to review full report at Codecov.
|
I'll still be updating this by a couple of relevant features. So maybe wait just a little longer with pulling this in. |
This is good to go now! |
The annotation module will contain helper functions to provide functions for testing the model to use commonly used annotation references within the MIRIAM annotation scheme. This is to ensure that metabolites and reactions can be mapped across several namespaces in an automated, machine-readable fashion.
To stay in line with the common output of all support functions I changed the output of both annotation_overview functions in annotations.py to return lists of metabolites/ reactions instead of returning lists of metabolite / reaction IDs.
Added test for both functions that find model components without annotations: find_met_without_annotation and find_rxn_without_annotation.
The function find_wrong_annotation_ids takes an overview_dict as input. From that it obtains a list of reactions or metabolites with annotations belonging to a given database namespace and then collects all database IDs that do not match the MIRIAM specified pattern. The output is another overview dict, this time listing all rxns/mets that have annotations of a certain type but with wrong identifiers.
This was, obviously, causing problems with the native python function 'type'.
Reaction and metabolite ids are checked for matches against all database patterns outlined in annotations.py.
The biocyc ID pattern for both metablites and reactions match other IDs aswell. However, this seems to be only the case, because the biocyc pattern is fairly broad. It doesn't seem to be true for any other pattern, hence I implemented a clean-up step in the dataframes output by collect_{rxn|met}_id_namespace.
The functions collect_{met/rxn}_id_namespace now output a dataframe that contains a column 'unknown' for those IDs did not match any of the specified Database patterns.
We now check if all metabolites of a given model belong to the same database namespace.
I ordered the dictionaries REACTION_ANNOTATIONS and METABOLITE_ANNOTATIONS so that the patterns will be applied in a non-overlapping fashion. I also altered the collect functions to break as soon as a pattern matches an ID. This is is to avoid duplicate matching, since BiGG and BioCyc patterns are very generic and are very likely to match all the other identifiers too.
8b7f5d3
to
e8846b8
Compare
resolves #33
This PR includes a couple of new checks and tests.
First of all when I refer to 'items' I mean either metabolites or reactions.
annotation.py
, which contains:REACTION_ANNOTATIONS
: A dictionary mapping MIRIAM-style reaction database namespaces against the corresponding identifier patterns as compiled regex patterns.METABOLITE_ANNOTATIONS` : A dictionary mapping MIRIAM-style metabolite database namespaces against the corresponding identifier patterns as compiled regex patterns.
I selected the included databases more or less from experience, i.e. what I had seen to be commonly referenced in metabolic models. I am open for input on what are or aren't 'must-have' databases.
In addition to these two dictionaries,
annotation.py
contains functions to list items without annotation, list items without annotations of each specific database, list items that do contain annotations of a specific database, but which identifiers do not match the corresponding regex pattern.test_annotation.py
:Contains annotation specific tests for a given model.
We expect that no item is without annotation, that all items have annotations from all databases and that the identifiers of all of those databases are correct.
test_for_annotation.py
:Contains test for the code in
annotation.py
.We expect that no item is without annotation, that all items have annotations from all databases and that the identifiers of all of those databases are correct.