Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement request: trim but keep adaptor, --action=trimupto #443

Closed
peterjc opened this issue Feb 28, 2020 · 7 comments
Closed

Enhancement request: trim but keep adaptor, --action=trimupto #443

peterjc opened this issue Feb 28, 2020 · 7 comments

Comments

@peterjc
Copy link
Contributor

peterjc commented Feb 28, 2020

I have a usecase in mind were rather than adapter or primer sequence which I want to match and remove, I have markers for a region of interest, and I want the (possibly inexactly matched) marker to be retained in the output.

Currently there are four action modes (correct as of cutadapt v2.8):

$ cutadapt -h | grep "\-\-action" -A 4
  --action {trim,mask,lowercase,none}
                        What to do with found adapters. mask: replace with 'N'
                        characters; lowercase: convert to lowercase; none:
                        leave unchanged (useful with --discard-untrimmed).
                        Default: trim

Would you consider a new action mode, suggested name trimupto (trim up to) or trimuntil, as follows:

  --action {trim,mask,lowercase,trimupto,none}
                        What to do with found adapters. mask: replace with 'N'
                        characters; lowercase: convert to lowercase; trimupto:
                        trim up to but retaining the adapter; none: leave
                        unchanged (useful with --discard-untrimmed).
                        Default: trim

Left adapter example:

$ for ACTION in trim mask lowercase none; do echo; echo "Using --action $ACTION:"; cutadapt -g AAAAAA -o - example.fasta --quiet --discard-untrimmed --action $ACTION; done

Using --action trim:
>example
XXXXXXXXXXXGGGGGGRRRRRRR

Using --action mask:
>example
NNNNNNNNNNNNNNXXXXXXXXXXXGGGGGGRRRRRRR

Using --action lowercase:
>example
llllllllaaaaaaXXXXXXXXXXXGGGGGGRRRRRRR

Using --action none:
>example
LLLLLLLLAAAAAAXXXXXXXXXXXGGGGGGRRRRRRR

Proposed output:

Using --action trimupto:
>example
AAAAAAXXXXXXXXXXXGGGGGGRRRRRRR

Using left and right adapter:

$ for ACTION in trim mask lowercase none; do echo; echo "Using --action $ACTION:"; cutadapt -g AAAAAA...GGGGGG -o - example.fasta --quiet --discard-untrimmed --action $ACTION; done

Using --action trim:
>example
XXXXXXXXXXX

Using --action mask:
>example
NNNNNNNNNNNNNNXXXXXXXXXXXNNNNNNNNNNNNN

Using --action lowercase:
>example
llllllllaaaaaaXXXXXXXXXXXggggggrrrrrrr

Using --action none:
>example
LLLLLLLLAAAAAAXXXXXXXXXXXGGGGGGRRRRRRR

Proposed output:

Using --action trimupto:
>example
AAAAAAXXXXXXXXXXXGGGGGG

Right adapter example:

$ for ACTION in trim mask lowercase none; do echo; echo "Using --action $ACTION:"; cutadapt -a GGGGGG -o - example.fasta --quiet --discard-untrimmed --action $ACTION; done

Using --action trim:
>example
LLLLLLLLAAAAAAXXXXXXXXXXX

Using --action mask:
>example
LLLLLLLLAAAAAAXXXXXXXXXXXNNNNNNNNNNNNN

Using --action lowercase:
>example
LLLLLLLLAAAAAAXXXXXXXXXXXggggggrrrrrrr

Using --action none:
>example
LLLLLLLLAAAAAAXXXXXXXXXXXGGGGGGRRRRRRR

Proposed output:

Using --action trimupto:
>example
LLLLLLLLAAAAAAXXXXXXXXXXXGGGGGG
@marcelm
Copy link
Owner

marcelm commented Mar 3, 2020

Hi, I’m not working at the moment, so let me get back to you in a while, but one comment already now:
A colleague has mentioned a request for this behavior to me a while ago, so I’ve had this in the back of my head. I had the idea of some extra notation within the adapter specification string, though, that would tell where to cut. But perhaps it’s easier to implement as an additional action.

@peterjc
Copy link
Contributor Author

peterjc commented Mar 3, 2020

I'm encouraged that someone else also asked for this kind of behaviour.

Extra notation in the adapter specification string could work, and would be even more flexible than my current use case requires.

@mariloubodde
Copy link

Hi, I would also be interested in an option to discard sequence outside the adapters, but retain the adapters themselves. I was wondering if you are planning to implement this?

I'm working on a project comparing targeted amplicon data with "in silico" amplified data; for the latter I reconstruct the regions corresponding to the amplicon targets from shotgun sequencing reads. In my current pipeline I have some trouble with reads that overlap (either of) the primers by only a few bases and this would be resolved by retaining the primer sequences.

@marcelm marcelm closed this as completed in f1c30cc Dec 3, 2020
@peterjc
Copy link
Contributor Author

peterjc commented Dec 3, 2020

Excellent, and using --action=retain as the name makes sense to me too. Shorter than my suggestions too 👍

Thank you!

@marcelm
Copy link
Owner

marcelm commented Dec 3, 2020

I wanted to comment here, but the auto-close happened before I got around to it ...

Yes, this is now implemented as --action=retain. I hope the behavior is as you both requested. I had suggested earlier that a special marker in the adapter specification would be a good way to do this, but I realized that implementing this as a different action actually is a lot easier, and not adding extra notation makes it easier for the users.

Documentation is at https://cutadapt.readthedocs.io/en/latest/guide.html#action .

@peterjc The retain actually comes from you because you wrote

I want the [...] marker to be retained in the output

(emphasis mine). It’s a word I rarely use otherwise, so that makes it easy to search for in the documentation.

I’ll release Cutadapt 3.1 with this feature included soon.

@peterjc
Copy link
Contributor Author

peterjc commented Dec 3, 2020

Lovely - I'm on leave right now, but hopefully I'll get to try this out next month. Reading the documentation you've added, it should do what I was hoping for.

@mariloubodde
Copy link

mariloubodde commented Dec 3, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants