33 annotation tests #103

ChristianLieven · 2017-05-23T13:19:44Z

resolves #33

This PR includes a couple of new checks and tests.

First of all when I refer to 'items' I mean either metabolites or reactions.

I am adding annotation.py, which contains:
REACTION_ANNOTATIONS : A dictionary mapping MIRIAM-style reaction database namespaces against the corresponding identifier patterns as compiled regex patterns.
METABOLITE_ANNOTATIONS` : A dictionary mapping MIRIAM-style metabolite database namespaces against the corresponding identifier patterns as compiled regex patterns.

I selected the included databases more or less from experience, i.e. what I had seen to be commonly referenced in metabolic models. I am open for input on what are or aren't 'must-have' databases.

In addition to these two dictionaries, annotation.py contains functions to list items without annotation, list items without annotations of each specific database, list items that do contain annotations of a specific database, but which identifiers do not match the corresponding regex pattern.

test_annotation.py:
Contains annotation specific tests for a given model.
We expect that no item is without annotation, that all items have annotations from all databases and that the identifiers of all of those databases are correct.
test_for_annotation.py:
Contains test for the code in annotation.py.
We expect that no item is without annotation, that all items have annotations from all databases and that the identifiers of all of those databases are correct.

codecov-io · 2017-05-23T15:51:30Z

Codecov Report

Merging #103 into develop will increase coverage by 0.87%.
The diff coverage is 90.74%.

@@            Coverage Diff             @@
##           develop    #103      +/-   ##
==========================================
+ Coverage    84.43%   85.3%   +0.87%     
==========================================
  Files            8       9       +1     
  Lines          334     388      +54     
  Branches        72      83      +11     
==========================================
+ Hits           282     331      +49     
- Misses          47      51       +4     
- Partials         5       6       +1

Impacted Files	Coverage Δ
memote/support/helpers.py	`84.21% <40%> (-6.7%)`	⬇️
memote/support/annotation.py	`95.91% <95.91%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 752f2b6...e8846b8. Read the comment docs.

ChristianLieven · 2017-05-24T11:57:38Z

I'll still be updating this by a couple of relevant features. So maybe wait just a little longer with pulling this in.

ChristianLieven · 2017-05-29T09:36:16Z

This is good to go now!

The annotation module will contain helper functions to provide functions for testing the model to use commonly used annotation references within the MIRIAM annotation scheme. This is to ensure that metabolites and reactions can be mapped across several namespaces in an automated, machine-readable fashion.

To stay in line with the common output of all support functions I changed the output of both annotation_overview functions in annotations.py to return lists of metabolites/ reactions instead of returning lists of metabolite / reaction IDs.

Added test for both functions that find model components without annotations: find_met_without_annotation and find_rxn_without_annotation.

The function find_wrong_annotation_ids takes an overview_dict as input. From that it obtains a list of reactions or metabolites with annotations belonging to a given database namespace and then collects all database IDs that do not match the MIRIAM specified pattern. The output is another overview dict, this time listing all rxns/mets that have annotations of a certain type but with wrong identifiers.

This was, obviously, causing problems with the native python function 'type'.

Reaction and metabolite ids are checked for matches against all database patterns outlined in annotations.py.

The biocyc ID pattern for both metablites and reactions match other IDs aswell. However, this seems to be only the case, because the biocyc pattern is fairly broad. It doesn't seem to be true for any other pattern, hence I implemented a clean-up step in the dataframes output by collect_{rxn|met}_id_namespace.

The functions collect_{met/rxn}_id_namespace now output a dataframe that contains a column 'unknown' for those IDs did not match any of the specified Database patterns.

We now check if all metabolites of a given model belong to the same database namespace.

I ordered the dictionaries REACTION_ANNOTATIONS and METABOLITE_ANNOTATIONS so that the patterns will be applied in a non-overlapping fashion. I also altered the collect functions to break as soon as a pattern matches an ID. This is is to avoid duplicate matching, since BiGG and BioCyc patterns are very generic and are very likely to match all the other identifiers too.

ChristianLieven added the ready label May 23, 2017

ChristianLieven added 26 commits July 3, 2017 11:14

feat: add function to find rxns with no annoation

cdfce52

feat: add function to generate annotation overview

7a39bde

feat: add similar annotation generator for rxns

60593ff

feat: add helper func to return subset difference

1c5c7b3

fix: fix typo in generate_rxn_annotation_overview

5ea49b1

test: add tests for find_without_annotation

62c2d52

Added test for both functions that find model components without annotations: find_met_without_annotation and find_rxn_without_annotation.

feat: add checks for mets/rxns without annotation

0893766

docs: fix typo in annotation.py comment

e601986

test: add pytests for met/rxn annotation overview

62a4bb5

style: fix pep8 code style indents etc

d87ac31

feat: add model tests for annotation overviews

09cfeb0

docs: add docs for all params of get_difference

428bd3a

test: implement test of find_wrong_annotation_ids

d7882e8

fix: made find_wrong_annotation_ids handle lists

afed8c4

feat: add functs to test model met & rxn anno ids

d841e72

style: fix flake8 missing summary lines

1ace82f

fix: rewrite testcases as I mixed up the params

166effa

fix: rename type to rxn_or_mets

a1455ef

This was, obviously, causing problems with the native python function 'type'.

fix: fix param name in pytest.mark.parametrize

21d87d1

fix: convert input to re.match to be of type str

064bf15

fix: make test annotation ids more wrong

7299761

fix: remove bigg id from annotation id test

7e5f263

fix: make biocyc id break pattern in anno test

7d96b9b

ChristianLieven and others added 16 commits July 3, 2017 11:14

fix: fix logic for annotation_overview model tests

add79ab

feat: functions to determine ID namespaces

176e9ba

Reaction and metabolite ids are checked for matches against all database patterns outlined in annotations.py.

refactor: make dataframe include 'unknown' column

406c01e

The functions collect_{met/rxn}_id_namespace now output a dataframe that contains a column 'unknown' for those IDs did not match any of the specified Database patterns.

feat: implement model test for met id consistency

bebe23d

We now check if all metabolites of a given model belong to the same database namespace.

fix: fix flake8

f4dfefc

feat: add test_rxn_id_namespace_consistency

dc632a1

refactor: add warning if met/rxn IDs aren't BiGG

f7a61e2

test: add tests for id consistency checks

8045ad6

fix: fix flake8 and add expected value to tests

1d4fd6d

fix: add num parameter to test functions

c407c67

fix: fix flake8 over-indentation in annotation.py

eb10d46

fix: fit one-line docstring into one line flake8

21b44b7

refactor: rework annotation functions

efaf688

fix: fix JSON conversion and test cases

e8846b8

Midnighter force-pushed the 33_annotation_tests branch from 8b7f5d3 to e8846b8 Compare July 6, 2017 08:06

Midnighter merged commit 530805d into develop Jul 6, 2017

Midnighter removed the ready label Jul 6, 2017

Midnighter deleted the 33_annotation_tests branch July 6, 2017 08:20

ChristianLieven mentioned this pull request Aug 9, 2017

Test metabolite identifiers #62

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

33 annotation tests #103

33 annotation tests #103

ChristianLieven commented May 23, 2017

codecov-io commented May 23, 2017 •

edited

Loading

ChristianLieven commented May 24, 2017

ChristianLieven commented May 29, 2017

33 annotation tests #103

33 annotation tests #103

Conversation

ChristianLieven commented May 23, 2017

codecov-io commented May 23, 2017 • edited Loading

Codecov Report

ChristianLieven commented May 24, 2017

ChristianLieven commented May 29, 2017

codecov-io commented May 23, 2017 •

edited

Loading