Skip to content

Commit

Permalink
Release 0.7.2
Browse files Browse the repository at this point in the history
  • Loading branch information
farchaab committed Apr 26, 2024
2 parents add93c7 + bcbab70 commit da5f120
Show file tree
Hide file tree
Showing 5 changed files with 55 additions and 16 deletions.
2 changes: 1 addition & 1 deletion assembly_finder/assembly_finder.VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.7.1
0.7.2
2 changes: 1 addition & 1 deletion assembly_finder/workflow/envs/taxonkit.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: ncbi-datasets-cli
name: taxonkit
channels:
- conda-forge
- bioconda
Expand Down
8 changes: 5 additions & 3 deletions assembly_finder/workflow/rules/download.smk
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
rule download_taxdump:
output:
temp(os.path.join(TAXONKIT, "taxdump.tar.gz")),
os.path.join(TAXONKIT, "taxdump.tar.gz"),
log:
os.path.join(dir.out.logs, "curl.log"),
conda:
Expand Down Expand Up @@ -200,7 +200,7 @@ rule unzip_archive:
input:
os.path.join(dir.out.base, "archive.zip"),
output:
temp(directory(os.path.join(dir.out.base, "archive"))),
directory(os.path.join(dir.out.base, "archive")),
log:
os.path.join(dir.out.logs, "unzip.log"),
conda:
Expand Down Expand Up @@ -283,8 +283,9 @@ rule add_genome_paths:
df.to_csv(output[0], sep="\t", index=None)


rule cleanup_reports:
rule cleanup_files:
input:
os.path.join(dir.out.base, "archive"),
os.path.join(dir.out.base, "assembly_summary.tsv"),
os.path.join(dir.out.base, "sequence_report.tsv"),
os.path.join(dir.out.base, "taxonomy.tsv"),
Expand All @@ -296,6 +297,7 @@ rule cleanup_reports:
os.path.join(dir.env, "utils.yml")
shell:
"""
rm -rf {input[0]}
find {params[0]} -name "*.json*" -print0 | xargs -0 rm
touch {output}
"""
31 changes: 27 additions & 4 deletions docs/examples.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,43 @@
## Small datasets

### Staphylococcus aureus reference genome

```sh
assembly_finder -i staphylococcus_aureus --source refseq -nb 1
```

### Download from a list of taxons

```sh
assembly_finder -i 1290,1813735,114185 -o test -nb 1
```

## Big datasets

!!! warning

These examples are for big datasets downloads, so using an NCBI api-key is highly recommended

## Best ranking genome for each bacteria species
### Download all chlamydia genomes

```sh
assembly_finder -i chlamydia --api-key <api-key>
```

### Best ranking genome for each bacteria species

```sh
assembly_finder -i bacteria --api-key <api-key> --rank species --nrank 1
```

## Complete RefSeq bacteria viruses and archaea (excluding metagenomes and atypical genomes)
### Complete RefSeq bacteria viruses and archaea <small>(excluding MAGs and atypical)</small>

```sh
assembly_finder -i bacteria,viruses,archaea -o outdir --api-key <api-key> --source refseq --assembly-level complete --mag exclude --atypical
```

## Download sequences from a specific bioproject
### Specific bioproject

```sh
assembly_finder -i PRJNA289059 --accession
assembly_finder -i PRJNA289059 --api-key <api-key> --accession
```
28 changes: 21 additions & 7 deletions docs/inputs.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,45 @@
# Inputs

Input can be either a string or a table, and queries can be either taxa or accession as shown in [NCBI datasets docs](https://www.ncbi.nlm.nih.gov/datasets/docs/v2/download-and-install).

## Strings

=== "Taxons"

!!! note

Taxons can be either taxids or taxon names

``` sh
```sh
assembly_finder -i 1290,staphylococcus_aureus,562 -nb 1 -o taxons
```

=== "Accessions"
``` sh

```sh
assembly_finder --accession -i GCF_003812505.1,GCF_000418345.1,GCF_000157115.2 -o accessions
```

## Tables

=== "Taxons"
| taxon | nb |

!!! note
You can set the number of genomes per taxon in the table

| taxon | nb |
| :-------------------- | :-- |
| 1290 | 1 |
| staphylococcus_aureus | 1 |
| 562 | 1 |
| 1290 | 1 |
| staphylococcus_aureus | 1 |
| 562 | 1 |

=== "Accessions"

!!! note

The accession table does not have a header

| GCF_003812505.1 |
| :-------------- |
| GCF_000418345.1 |
| GCF_000157115.2 |

0 comments on commit da5f120

Please sign in to comment.