Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report insertions stripped during alignment #449

Merged
merged 6 commits into from May 23, 2020
Merged

Report insertions stripped during alignment #449

merged 6 commits into from May 23, 2020

Conversation

jameshadfield
Copy link
Member

@jameshadfield jameshadfield commented Feb 27, 2020

Motivated by a comment in today's lab meeting this PR reports insertions in augur align which are removed as we (typically) strip to reference.

It (a) prints them to screen and (b) produces a CSV you can drag&drop onto auspice to view the insertions. Numbering is GFF style -- 1 based, insertion to the right.

E.g. here's (some) of the output from a zika run:

...
1bp insertion at ref position 309
        A: USA/2016/FLUR001
182bp insertion at ref position 465
        WARNING: this insertion was caused due to 'N's or '?'s in provided sequences
...
87bp insertion at ref position 10769
        TGTGGGGAAATCCATGGGTCT: SI_BKK05, BKK03, BKK04, SI_BKK02, SI_BKK06, Brazil/PE243/2015, H/PF/2013, TS17_2016, Paraiba_01, V15261, V15098, RIO_BM1, Rio_U1, PAN/CDC_259359_V1_V3/2015, PAN/CDC_259364_V1_V2/2015, FLA, PAN/BEI_259634_V4/2016, COL/FLR/2015, Rio_S1, CTS_30_16p, CTS_50_16p, CTS_36_16p, CTS_61_16p, CTS_47_16p, CTS_56_16p, CTS_193_16p, CTS_178_16p, V16288, SL1602, CTS_183_16p, CTS_223_16p, ZKC2P6, ZKC2/2016, ZKC2P4, QTX_04, HN16, HND/R103451/2015, V103451, Aedessp/MEX/MEX_2_81/2016, Aedessp/MEX/MEX_I_7/2016, THA/2014/SV0127_14, SV0010/15, SI_BKK01
        TGTGGGGAAATCCATGGGTCTT: mosquito/Haiti/1919/2016, Haiti/1225/2014, Haiti/1/2016, COL/UF_1/2016, VEN/UF_1/2016, Natal_RGN, Zhejiang04, mosquito/Haiti/1855/2016
        TGTGGGGAAATCCATGG: PE243, 31N, V20366
...

and via the auspice visualisation:

image


If we like this direction, there's lots more we can do here, for instance cross reference with the (genbank) reference to see what genes they fall in, translate the insertions if they look like codon insertions etc etc.

@huddlej huddlej added this to Ready for Review in Bioinformatics work via automation May 13, 2020
Bioinformatics work automation moved this from Ready for Review to In Review May 15, 2020
Copy link
Member

@rneher rneher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a very useful direction. I tested this a bit and figured it would be better to test in context of master, so merged master into it. works well AFAICT.

@rneher
Copy link
Member

rneher commented May 15, 2020

@huddlej In this PR by @jameshadfield stripping of insertions and removal of the reference sequence were separated. this messes with quite a few tests...

@rneher rneher merged commit 15d1c0a into master May 23, 2020
Bioinformatics work automation moved this from In Review to Done May 23, 2020
@trvrb trvrb deleted the indels branch January 11, 2021 05:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants