Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear what warning message "g/--remove-gaps to remove spaces" means #387

Closed
2 tasks done
SHuang-Broad opened this issue May 29, 2023 · 3 comments
Closed
2 tasks done
Labels

Comments

@SHuang-Broad
Copy link

SHuang-Broad commented May 29, 2023

Prerequisites

  • make sure you're are using the latest version by seqkit version
  • read the usage

Describe your issue

describe the problem

Warning message when running the following command

$ seqkit seq -m 10000 <path_to_fastq.gz> | gzip > <output.fastq.gz>

�[33m[WARN]�[0m you may switch on flag -g/--remove-gaps to remove spaces

The usage page documents -g as removing gaps as opposed to removing spaces (which looks more likely to be right).
Assuming that it is indeed "removing gaps", does it mean that bases that are literally N will be dropped?

Second, what could be triggering this warning message? I'd hope my FASTQ doesn't have spaces in its sequence lines....

provide a reproducible example

Unfortunately, the data I work with is protected. Sorry about this.

Thank you!
Steve

@shenwei356
Copy link
Owner

shenwei356 commented May 29, 2023

Hi Steve, it's triggered by options -m or -M for filtering squence by lenght, just in case there are some spaces in the sequences which might result in incorrect outputs.

By default, -g removes "- \t." which can be set by :

-G, --gap-letters string        gap letters (default "- \t.")

You're right, it's more like removing spaces not gaps in most scenarios. I used the “gap” in case of multiple sequence alignment files, where the gap is marked as “-”.

@SHuang-Broad
Copy link
Author

Thanks for the explanation!
It looks like you've labeled this a todo item, so I'll leave this open. (But please feel free to close it when appropriate).

@shenwei356
Copy link
Owner

Updated

  -g, --remove-gaps               remove gaps letters set by -G/--gap-letters, e.g., spaces, tabs, and
                                  dashes (gaps "-" in aligned sequences)
  -G, --gap-letters string        gap letters to be removed with -g/--remove-gaps (default "- \t.")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants