Add process_partis.py option for a specific indel #281

eharkins · 2019-05-30T17:36:15Z

@lauradoepker would like the ability to run (ecgtheow*) on only the subset of sequences in a particular cluster that have a given indel (* I am opening this issue on cft because the way ecgtheow processes partis output is by using cft/bin/process_partis.py).

This option would come with other options to specify the indel of interest, including:

indel length
insert vs deletion
sequence position
inserted sequence identity (in nucleotides). Would be None if a deletion.

The name is up for debate; something like : --only-with-particular-indel, --unique-indel, --indel-filter, etc. Going to call it --only-with-particular-indel for now:

if you use --only-with-particular-indel, process_partis.py looks to make sure you have specified other options (see above) to define the indel you care about
if we have everything we need, we choose our cluster annotation per the normal control flow
filter cluster sequences (input_seqs, so as to be able to make sure the indel of interest is there or not) based on containing the indel of interest by using the information from the associated options (see above) and https://github.com/psathyrella/partis/blob/dev/python/utils.py#L634. @psathyrella does this make sense?
then find the indel_reversed_seqs sequences corresponding to the remaining IDs after filtering (we may just want to use whichever key would normally be used based on the existing --indel-reversed-seqs option - which happens to be used in ecgtheow context).
all sequence ids output from this cluster should have _indel_rev appended
we want to output both this subset of the cluster sequences in a file named like cluster_seqs_indel_rev.fa alongside the unfiltered cluster sequences in cluster_seqs.fa (using indel_reversed)

Assuming this makes sense to everyone (cc @matsen), I will open separate issues:

cft: raise an exception if not running process_partis.py with --only-with-particular-indel and an indel is encountered in the specified seed sequence. The message would tell the user to use --only-with-particular-indel or specify something to ignore it like --ignore-seed-indel
ecg: add the ability to use this option in ecgtheow and to run revbayes on both the indel filtered cluster and the unfiltered cluster as Laura requested

The text was updated successfully, but these errors were encountered:

psathyrella · 2019-05-30T19:37:14Z

Yeah, except I think I've changed my mind about how to specify the indel parameters. I think maybe this is what laura was suggesting and I was just being dense, but I think it's probably better to just say "match the indels in this sequence", i.e. specify a uid, rather than having to specify the length/pos/type of the indel.

lauradoepker · 2019-05-30T20:48:32Z

@eharkins I'd like the filtered seqs outfile to be named a little more explicitly, something like indel_filtered_cluster_seqs.fa. Since all sequences in EC will be indel_rev, I'm okay with this fact not being reflected in the file name, but if we do add it (to both), it may prevent future forgetfulness on my part about indel reversal.

metasoarous · 2019-05-31T05:09:16Z

A few things here:

I'd suggest naming the indel pattern matches with +indel or something (maybe custom? --indel-tag?); indel_rev seems to imply that the indel has been reversed in the sequence, which may or may not be the case, but is besides the matter at hand if I understand correctly.
Would it be easier to have one flag for filtering in matches of a certain mutation pattern, and one flag for filtering out? This would solve your concern @lauradoepker over what the file is named, as you'd be able to name it whatever you want (I tend to prefer this over pre-determined naming patterns).
I'd suggest --filter-indel-pattern-in uid or --filter-indel-pattern-out uid, riffing off @psathyrella's suggestion.

eharkins · 2019-05-31T16:46:30Z

Thanks for the input here. It seems like we are going to spend a little bit more time on thinking about how best to handle the particular indel-ed family Laura is currently dealing with, then we can generalize a solution like this if appropriate. @matsen, @lauradoepker let me know how I can be of help in determining the best way forward with that family.

lauradoepker · 2019-06-03T15:08:37Z

@eharkins it's completely up to you to decide how generalized you write the code at this point. I want 157.Vk settled as soon as possible, but not at the cost of you having to rewrite all your code later to make it more generalizable. This issue, then, is for you and @matsen to decide.

eharkins · 2019-06-13T00:01:31Z

e66cf19

eharkins self-assigned this May 30, 2019

matsen mentioned this issue May 31, 2019

Investigate shm indel in QA255.157-Vk #279

Closed

eharkins mentioned this issue May 31, 2019

Make it very obvious when sequences have an indel #282

Closed

lauradoepker mentioned this issue Jun 4, 2019

Indel handling in QA255.157-Vk #277

Closed

eharkins mentioned this issue Jun 6, 2019

Add indelutils.restrict_to_compatible_indels psathyrella/partis#293

Merged

eharkins added a commit that referenced this issue Jun 11, 2019

add option #281

4fe68d2

eharkins closed this as completed Jun 13, 2019

eharkins mentioned this issue Jun 26, 2019

Resolve insertion in QA255.016-VL #285

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add process_partis.py option for a specific indel #281

Add process_partis.py option for a specific indel #281

eharkins commented May 30, 2019

psathyrella commented May 30, 2019

lauradoepker commented May 30, 2019

metasoarous commented May 31, 2019

eharkins commented May 31, 2019

lauradoepker commented Jun 3, 2019

eharkins commented Jun 13, 2019

Add process_partis.py option for a specific indel #281

Add process_partis.py option for a specific indel #281

Comments

eharkins commented May 30, 2019

psathyrella commented May 30, 2019

lauradoepker commented May 30, 2019

metasoarous commented May 31, 2019

eharkins commented May 31, 2019

lauradoepker commented Jun 3, 2019

eharkins commented Jun 13, 2019