Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SRST2 & groot parsers #9

Merged
merged 3 commits into from
May 22, 2020
Merged

SRST2 & groot parsers #9

merged 3 commits into from
May 22, 2020

Conversation

fmaguire
Copy link
Member

See a5e6d39 for caveats of groot parser and 674f14d for srst2 caveats

- CIGAR string could be used to calculate some standardised metrics e.g.
  %ID, not implemented

- Certain databases contain information related to other fields in
  their gene name (e.g. groot run with ARG-ANNOT has an accession and
  genomic location for the AMR gene) but not all databases

- Databases pre-prepared by groot also contain the
  `reference_database_id` in the `gene_name` but this isn't guaranteed
  depending on how the database has been prepared for indexing.
- Parser assumes the "fullgenes" gene typing output file from srst2

- Occasionally, the annotation field contains more information depending on the
  database but not in a consistent format across dbs used.
@dfornika
Copy link
Collaborator

Looks great @fmaguire are you planning on doing any more work on this before we merge? The CI setup still needs some work so we don't expect to get green checkmarks there.

@fmaguire
Copy link
Member Author

I was planning on adding a couple more but won't get around to it until next week. So merge away!

@dfornika dfornika merged commit b9fa1b9 into pha4ge:master May 22, 2020
@dfornika dfornika mentioned this pull request May 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants