Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add subseq selected coordinates to FASTA? #413

Closed
4 tasks done
photocyte opened this issue Sep 28, 2023 · 2 comments
Closed
4 tasks done

Add subseq selected coordinates to FASTA? #413

photocyte opened this issue Sep 28, 2023 · 2 comments

Comments

@photocyte
Copy link
Contributor

photocyte commented Sep 28, 2023

Prerequisites

  • make sure you're are using the latest version by seqkit version
  • read the usage

Describe your issue

  • describe the problem
  • provide a reproducible example

Hi there,

It would be useful for subseq to have a flag where it would add the coordinates of the selected region to the FASTA record description.

E.g. if subseq were run with -r 14:14 it would add SEQID:14-14 to the end of the FASTA description. It would ideally work with GTF/GFF & BED defined regions as well, but would need to change from 0-indexing to 1-indexing in the case of BED.

I briefly looked and the analogous command to subseq in seqtk does not have this feature. bedtools getfasta implements this behavior by default.

All the best,
-Tim

@photocyte
Copy link
Contributor Author

On further testing, seqkit subseq --bed already does this, but in a different format than bedtools. The existing seqkit output looks like the below, where prot1 is the seqid:

>prot1_2661-2685:.

However, the coordinates don't get appended with the -r flag. seqkit subseq -r 2661:2685 gives:

>prot1

@shenwei356
Copy link
Owner

Added one.

  -R, --region-coord      append coordinates to sequence ID for -r/--region
$ seqkit head -n 1 ../tests/hairpin.fa 
>cel-let-7 MI0000001 Caenorhabditis elegans let-7 stem-loop
UACACUGUGGAUCCGGUGAGGUAGUAGGUUGUAUAGUUUGGAAUAUUACCACCGGUGAAC
UAUGCAAUUUUCUACCUUACCGGAGACAGAACUCUUCGA

$ seqkit head -n 1 ../tests/hairpin.fa \
    | ./seqkit subseq -r 1:10 -R
>cel-let-7:1-10 MI0000001 Caenorhabditis elegans let-7 stem-loop
UACACUGUGG

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants