-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Beginners Help with Nextclade CLI #1416
Comments
Make sure you read the CLI docs ("Usage" and "Reference" pages): Both of your invocations are invalid because you did not provide the nextclade dataset get --name="nextstrain/sars-cov-2/wuhan-hu-1/orfs" --output-dir="dataset/" You can also use shortcut name of this particular dataset: nextclade dataset get --name="sars-cov-2" --output-dir="dataset/" These two invocations do the same thing. Think of arguments as key-value pairs separated from other arguments with spaces:
Each argument has a specific meaning in the context of the program you are using. In most cases you need both, the key and the value. The key is the pre-agreed name of the argument. By looking at the key the program understands what kind of information you want to provide to it. In the case of the It is usually better to wrap the values in quotation marks, especially if it contains spaces:
Some arguments which mean to turn something on or off don't need value, only the key (this kind of arguments is sometimes called "flags"). A good example is nextclade dataset list --only-names As you see it does not have any value after it. It just toggles on the printing of only the names of the datasets, instead of the big table which it prints by default. There are also so-called "positional" arguments, which have no key, but only the value. For example, when you pass a fasta file to nextclade run --input-dataset="dataset/" --output-dir="results/" "my_input_1.fasta" "my_input_2.fasta" In this case there are two positional arguments: "my_input_1.fasta", "my_input_2.fasta". So as you see, positional arguments are good when you need to pass multiple things into the program. You can find the available arguments and their meaning in the built-in help screen, by running the program with only the nextclade --help And in case of Nextclade, you can also read dedicated help screen different for each of the subcommands: nextclade run --help
nextclade dataset list --help
nextclade dataset get --help
nextclade sort --help None of this is specific to Nextclade (nextclade is used as a relevant example). These are the basics of using command-line programs (aka console or terminal programs, or CLI). There should be plenty of learning materials on this topic on the internet. Regarding specifics of Nextclade, I would not download the dataset into the current directory (the There is also another, simpler way to run nextclade analysis: nextclade run --dataset-name="sars-cov-2" --output-dir="results/" "my_input.fasta" This does not need a separate |
I think either should work. But I usually use the All verbosity levels are listed under
This is a convention for denoting variables (placeholders). The I think the convention originally comes from
I would not recommend JSON and NDJSON outputs, because they are unstable, meaning the format can change without notice. This is mentioned in the docs. You probably want TSV output ( |
Thank you. I have used the Nextclade Web export files a lot, so I'm familiar with those. I'm trying to get ndjson files because I want to be able to search them using Julia, which I (half) learned and was ready to start using before I realized there was something called bash that I really should've learned before I ever even tried Julia because it's impossible to do anything without bash. |
Is there a way to get the GISAID accession numbers from Nextclade? I'm doing a search and the only results I can get are the sequence names, which I then have to paste one at a time into the GISAID text search in order to find and download the fastas. I'd like to be able to paste all the EPI_ISL numbers at once so I can download them easily, but I don't see them anywhere in the TSV file or the ndjson file and I'm not sure where else they would be. |
Not sure what you mean here. Nextclade software does not deal with GISAID and does not even know what accession is, or that GISAID even exists. We don't rely on any database. The only source of data is the input files users provide - input fasta files and dataset files. Sequence names are taken from your input fasta file and presented in the output files as is. If your fasta file does not contain accession you will not get it from Nextclade. So it's your responsibility to set the names in your input fasta such that you get desired names in the output TSV. Or do you mean something else? By the way, sequence names are not guaranteed to be unique - scientists often don't bother with naming their produced sequences too much and it's a bit of a chaos. So it's not always possible to deduce exact sequence just from the name. |
I'm trying to figure out how to use Nextclade CLI. I follow all the directions, but nothing ever seems to work. For example, after trying to run a fasta multiple times, the error messages indicated I needed an input dataset. So I've followed the directions to get one, but nothing works. Basically, it says to use:
nextclade dataset get [OPTIONS] --name <--output-dir <OUTPUT_DIR>|--output-zip <OUTPUT_ZIP>>
I've tried both
nextclade dataset get --nextstrain/sars-cov-2/wuhan-hu-1/orfs --output-dir .
nextclade dataset get --SARS-CoV-2 --output-dir .
neither of which work. It says that both nextstrain/sars-cov-2/wuhan-hu-1/orfs and SARS-CoV-2 are "unexpected arguments" even though these are the exact names listed on the input dataset list.
I have no idea what I'm doing wrong.
The text was updated successfully, but these errors were encountered: