Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Suggestion] New parameters for seqkit rename #360

Closed
apcamargo opened this issue Dec 28, 2022 · 6 comments
Closed

[Suggestion] New parameters for seqkit rename #360

apcamargo opened this issue Dec 28, 2022 · 6 comments

Comments

@apcamargo
Copy link

Right now, seqkit rename doesn't provide users with parameters to tune how the renaming is performed. I recently had to rename duplicated headers using | as a field separator and needed to use awk for that.

Some suggestions:

  • Allow custom separators (instead of just _).
  • Allow the counter to start from an arbitrary integer (eg. 0).
  • Allow to add a suffix to all the headers, regardless if they have duplicates or not.

Those are just suggestions, so feel free to ignore them if you don't think they fit SeqKit's design.

@shenwei356
Copy link
Owner

Allow custom separators (instead of just _)

Sure, it's easy.

Allow the counter to start from an arbitrary integer (eg. 0).

OK

Allow to add a suffix to all the headers, regardless if they have duplicates or not.

seqkit replace -p '$' -r '_suffix'

shenwei356 added a commit that referenced this issue Dec 29, 2022
@shenwei356
Copy link
Owner

Added.

There are only IDs.

$ echo -ne ">s\na\n>s\nc\n"
>s
a
>s
c

$ echo -ne ">s\na\n>s\nc\n" \
    | seqkit rename  -s '|' -N 0  \
    | seqkit replace -p '$' -r _suffix
>s_suffix
a
>s|0_suffix
c

There are IDs and descriptions.

$ echo -ne ">s a\na\n>s b\nc\n"
>s a
a
>s b
c

$ echo -ne ">s a\na\n>s b\nc\n" \
    | seqkit rename  -s '|' -N 0 \
    | seqkit seq -i \
    | seqkit replace -p '$' -r _suffix
>s_suffix
a
>s|0_suffix
c

@apcamargo
Copy link
Author

Thank you! That was really quick!

Allow to add a suffix to all the headers, regardless if they have duplicates or not.

By that I meant the suffix with the counter. So, in your example, it would be something like this:

>s|0
a
>s|1
c

@shenwei356
Copy link
Owner

I see, we can add another flag to add a count to the first record.

shenwei356 added a commit that referenced this issue Dec 29, 2022
@shenwei356
Copy link
Owner

Three flags added:

  -1, --rename-1st-rec      rename the first record as well
  -s, --separator string    separator between original ID/name and the counter (default "_")
  -N, --start-num int       starting count number for *duplicated* IDs/names, should be greater than zero (default 2)
$ echo -ne ">s a\na\n>s b\nc\n>s2\ng\n"
>s a
a
>s b
c
>s2
g

# default
$ echo -ne ">s a\na\n>s b\nc\n>s2\ng\n"    \
    | seqkit rename  
>s a
a
>s_2 b
c
>s2
g

$ echo -ne ">s a\na\n>s b\nc\n>s2\ng\n"   \
    | seqkit rename  -s '|' -N 1 -1
>s|0 a
a
>s|1 b
c
>s2|0
g

@apcamargo
Copy link
Author

Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants