Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how many SARS-COV-2 sequences can nextclade handle in a MSA file? #1345

Closed
liamxg opened this issue Dec 16, 2023 · 8 comments
Closed

how many SARS-COV-2 sequences can nextclade handle in a MSA file? #1345

liamxg opened this issue Dec 16, 2023 · 8 comments
Labels
t:ask Type: question, request of information 1

Comments

@liamxg
Copy link

liamxg commented Dec 16, 2023

Dear @nextclade team,

I have more than 5 million SARS-COV-2 sequences need to align, can nextclade handle this?

@liamxg liamxg added needs triage Mark for review and label assignment t:ask Type: question, request of information 1 labels Dec 16, 2023
@ivan-aksamentov
Copy link
Member

ivan-aksamentov commented Dec 16, 2023

Hi, @liamxg, it depends on what version you want to use.

Nextclade Web (the web version, on https://clades.nextstrain.org) can handle ~1000 sequences at a time, depending on your browser and computer resources (computation is done inside your browser, on your computer). If you need to use Nextclade Web, then we recommend to split your data into smaller batches and/or subsample it.

For large-scale analysis we recommend using Nextclade CLI (command line version; see docs here: https://docs.nextstrain.org/projects/nextclade/en/stable/user/nextclade-cli.html). You can see how we use it internally in:

Feel free to join our discussion forums, where you can discuss your case with other users and with Nextstrain team: https://discussion.nextstrain.org/

@ivan-aksamentov ivan-aksamentov removed the needs triage Mark for review and label assignment label Dec 16, 2023
@liamxg
Copy link
Author

liamxg commented Dec 17, 2023

Dear @ivan-aksamentov,

Thanks.

Could you help me out:

nextclade run
--input-dataset data/sars-cov-2
--output-all=output/
data/sars-cov-2/sequences.fasta
Error:
0: --input-dataset: path is invalid. Expected a directory path or a zip archive file path, but got: '"data/sars-cov-2"'

Location:
packages_rs/nextclade-cli/src/cli/nextclade_loop.rs:55

Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

@liamxg
Copy link
Author

liamxg commented Dec 17, 2023

using more than 5 million sequences.

@ivan-aksamentov
Copy link
Member

What's inside data/sars-cov-2?

@liamxg
Copy link
Author

liamxg commented Dec 17, 2023

Dear @ivan-aksamentov,

please see bellow:
image

@ivan-aksamentov
Copy link
Member

@liamxg If Nextclade cannot find dataset files, it means you probably confused your directories. This is not related to Nextclade, so you will have to figure this out yourself, sorry. I'd suggest to delete everything and start over paying attention to what paths you are giving to Nextclade and what these paths actually contain. Make sure you read nextclade --help, nextclade dataset get --help and nextclade run --help.

Please open a new issue if you have questions or reports related to Nextclade.

@liamxg
Copy link
Author

liamxg commented Dec 17, 2023

Dear @ivan-aksamentov,
Solved. Thanks.

@liamxg
Copy link
Author

liamxg commented Dec 23, 2023

Dear @ivan-aksamentov,

Is is possible to run more than 5 million sequences at once using Nextclade CLI?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t:ask Type: question, request of information 1
Projects
None yet
Development

No branches or pull requests

2 participants