-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add process_partis.py option for a specific indel #281
Comments
Yeah, except I think I've changed my mind about how to specify the indel parameters. I think maybe this is what laura was suggesting and I was just being dense, but I think it's probably better to just say "match the indels in this sequence", i.e. specify a uid, rather than having to specify the length/pos/type of the indel. |
@eharkins I'd like the filtered seqs outfile to be named a little more explicitly, something like |
A few things here:
|
Thanks for the input here. It seems like we are going to spend a little bit more time on thinking about how best to handle the particular indel-ed family Laura is currently dealing with, then we can generalize a solution like this if appropriate. @matsen, @lauradoepker let me know how I can be of help in determining the best way forward with that family. |
@lauradoepker would like the ability to run (ecgtheow*) on only the subset of sequences in a particular cluster that have a given indel (* I am opening this issue on cft because the way ecgtheow processes partis output is by using
cft/bin/process_partis.py
).This option would come with other options to specify the indel of interest, including:
The name is up for debate; something like : --only-with-particular-indel, --unique-indel, --indel-filter, etc. Going to call it
--only-with-particular-indel
for now:--only-with-particular-indel
,process_partis.py
looks to make sure you have specified other options (see above) to define the indel you care aboutinput_seqs
, so as to be able to make sure the indel of interest is there or not) based on containing the indel of interest by using the information from the associated options (see above) and https://github.com/psathyrella/partis/blob/dev/python/utils.py#L634. @psathyrella does this make sense?indel_reversed_seqs
sequences corresponding to the remaining IDs after filtering (we may just want to use whichever key would normally be used based on the existing--indel-reversed-seqs
option - which happens to be used in ecgtheow context)._indel_rev
appendedcluster_seqs_indel_rev.fa
alongside the unfiltered cluster sequences incluster_seqs.fa
(usingindel_reversed
)Assuming this makes sense to everyone (cc @matsen), I will open separate issues:
--only-with-particular-indel
and an indel is encountered in the specified seed sequence. The message would tell the user to use--only-with-particular-indel
or specify something to ignore it like--ignore-seed-indel
The text was updated successfully, but these errors were encountered: