Skip to content
Permalink
Browse files

Merge branch 'danielsoneg-egd-fix_align_includes_reference'

  • Loading branch information
huddlej committed Mar 24, 2020
2 parents 484eb5e + a6fc403 commit e3d184838bd1a84b44657cc72312908f075141c1
Showing with 26 additions and 1 deletion.
  1. +20 −1 augur/align.py
  2. +6 −0 tests/data/align/test_aligned_sequences.fasta
@@ -213,14 +213,33 @@ def strip_non_reference(alignment_fname, reference, keep_reference=False):
-------
list
list of trimmed sequences, effectively a multiple alignment
Tests
-----
>>> [s.name for s in strip_non_reference("tests/data/align/test_aligned_sequences.fasta", "with_gaps", keep_reference=False)]
Trimmed gaps in with_gaps from the alignment
['no_gaps', 'some_other_seq']
>>> [s.name for s in strip_non_reference("tests/data/align/test_aligned_sequences.fasta", "with_gaps", keep_reference=True)]
Trimmed gaps in with_gaps from the alignment
['with_gaps', 'no_gaps', 'some_other_seq']
>>> [s.name for s in strip_non_reference("tests/data/align/test_aligned_sequences.fasta", "no_gaps", keep_reference=True)]
No gaps in alignment to trim (with respect to the reference, no_gaps)
['with_gaps', 'no_gaps', 'some_other_seq']
>>> [s.name for s in strip_non_reference("tests/data/align/test_aligned_sequences.fasta", "no_gaps", keep_reference=False)]
No gaps in alignment to trim (with respect to the reference, no_gaps)
['with_gaps', 'some_other_seq']
>>> [s.name for s in strip_non_reference("tests/data/align/test_aligned_sequences.fasta", "missing", keep_reference=False)]
Traceback (most recent call last):
...
augur.align.AlignmentError: ERROR: reference missing not found in alignment
'''
aln = AlignIO.read(alignment_fname, 'fasta')
seqs = {s.name:s for s in aln}
if reference in seqs:
ref_array = np.array(seqs[reference])
if "-" not in ref_array:
print("No gaps in alignment to trim (with respect to the reference, %s)"%reference)
return aln
return [seq for seq in aln if (keep_reference or seq.name != reference)]

This comment has been minimized.

Copy link
@tolot27

tolot27 Mar 24, 2020

Contributor

I've commented on the PR before it get merged. Hence, here again:

Isn't this redundant to the original line

if keep_reference or seq.name!=reference:
, is it?

I suggest:

    if reference in seqs:
        ref_array = np.array(seqs[reference])
        if "-" not in ref_array:
            print("No gaps in alignment to trim (with respect to the reference, %s)"%reference)
        else:
            ungapped = ref_array!='-'
            ref_aln_array = np.array(aln)[:,ungapped]
    else:
        raise AlignmentError("ERROR: reference %s not found in alignment"%reference)

This comment has been minimized.

Copy link
@tolot27

tolot27 Mar 24, 2020

Contributor

That is one more iteration over all sequences of the alignment, hence it total four iterations (see #462 (comment)).

ungapped = ref_array!='-'
ref_aln_array = np.array(aln)[:,ungapped]
else:
@@ -0,0 +1,6 @@
>with_gaps
---ATATA---
>no_gaps
GGGATATAGGG
>some_other_seq
--GATCTAGGG

0 comments on commit e3d1848

Please sign in to comment.
You can’t perform that action at this time.