Skip to content

Commit

Permalink
add a new tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
shenwei356 committed Nov 24, 2023
1 parent 89d1e6c commit 9e5f985
Show file tree
Hide file tree
Showing 6 changed files with 98 additions and 19 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ Subcommand |F
[`list`](https://bioinf.shenwei.me/taxonkit/usage/#list) |List taxonomic subtrees (TaxIds) bellow given TaxIds
[`lineage`](https://bioinf.shenwei.me/taxonkit/usage/#lineage) |Query taxonomic lineage of given TaxIds
[`reformat`](https://bioinf.shenwei.me/taxonkit/usage/#reformat) |Reformat lineage in canonical ranks
[`name2taxid`](https://bioinf.shenwei.me/taxonkit/usage/#name2taxid) |Convert scientific names to TaxIds
[`name2taxid`](https://bioinf.shenwei.me/taxonkit/usage/#name2taxid) |Convert taxon names to TaxIds
[`filter`](https://bioinf.shenwei.me/taxonkit/usage/#filter) |Filter TaxIds by taxonomic rank range
[`lca`](https://bioinf.shenwei.me/taxonkit/usage/#lca) |Compute lowest common ancestor (LCA) for TaxIds
[`taxid-changelog`](https://bioinf.shenwei.me/taxonkit/usage/#taxid-changelog)|Create TaxId changelog from dump archives
Expand Down
26 changes: 15 additions & 11 deletions doc/docs/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,10 @@

## Current Version

- [TaxonKit v0.15.0](https://github.com/shenwei356/taxonkit/releases/tag/v0.15.0)
[![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/taxonkit/v0.15.0/total.svg)](https://github.com/shenwei356/taxonkit/releases/tag/v0.15.0)
- `taxonkit reformat`:
- For lineages with more than one node, if it fails to query TaxId with the parent-child pair, use the last child only. [#82](https://github.com/shenwei356/taxonkit/issues/82)
- The flag `-T/--trim` also does not add the prefix for missing ranks lower than the current rank. [#82](https://github.com/shenwei356/taxonkit/issues/82)
- New flag `-s/--miss-rank-repl-suffix` to set the suffix for estimated taxon names. [#85](https://github.com/shenwei356/taxonkit/issues/85)
- [TaxonKit v0.15.1](https://github.com/shenwei356/taxonkit/releases/tag/v0.15.1)
[![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/taxonkit/v0.15.1/total.svg)](https://github.com/shenwei356/taxonkit/releases/tag/v0.15.1)
- `taxonkit name2taxid`:
- remove the restriction of name types. [#87](https://github.com/shenwei356/taxonkit/issues/87)

### Please cite

Expand All @@ -28,11 +26,11 @@

OS |Arch |File, 中国镜像 |Download Count
:------|:---------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Linux |**64-bit**|[**taxonkit_linux_amd64.tar.gz**](https://github.com/shenwei356/taxonkit/releases/download/v0.15.0/taxonkit_linux_amd64.tar.gz),<br/> [中国镜像](http://app.shenwei.me/data/taxonkit/taxonkit_linux_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/taxonkit/latest/taxonkit_linux_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/taxonkit/releases/download/v0.15.0/taxonkit_linux_amd64.tar.gz)
Linux |**arm64** |[**taxonkit_linux_arm64.tar.gz**](https://github.com/shenwei356/taxonkit/releases/download/v0.15.0/taxonkit_linux_arm64.tar.gz),<br/> [中国镜像](http://app.shenwei.me/data/taxonkit/taxonkit_linux_arm64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/taxonkit/latest/taxonkit_linux_arm64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/taxonkit/releases/download/v0.15.0/taxonkit_linux_arm64.tar.gz)
macOS |**64-bit**|[**taxonkit_darwin_amd64.tar.gz**](https://github.com/shenwei356/taxonkit/releases/download/v0.15.0/taxonkit_darwin_amd64.tar.gz),<br/> [中国镜像](http://app.shenwei.me/data/taxonkit/taxonkit_darwin_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/taxonkit/latest/taxonkit_darwin_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/taxonkit/releases/download/v0.15.0/taxonkit_darwin_amd64.tar.gz)
macOS |**arm64** |[**taxonkit_darwin_arm64.tar.gz**](https://github.com/shenwei356/taxonkit/releases/download/v0.15.0/taxonkit_darwin_arm64.tar.gz),<br/> [中国镜像](http://app.shenwei.me/data/taxonkit/taxonkit_darwin_arm64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/taxonkit/latest/taxonkit_darwin_arm64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/taxonkit/releases/download/v0.15.0/taxonkit_darwin_arm64.tar.gz)
Windows|**64-bit**|[**taxonkit_windows_amd64.exe.tar.gz**](https://github.com/shenwei356/taxonkit/releases/download/v0.15.0/taxonkit_windows_amd64.exe.tar.gz),<br/> [中国镜像](http://app.shenwei.me/data/taxonkit/taxonkit_windows_amd64.exe.tar.gz)|[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/taxonkit/latest/taxonkit_windows_amd64.exe.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/taxonkit/releases/download/v0.15.0/taxonkit_windows_amd64.exe.tar.gz)
Linux |**64-bit**|[**taxonkit_linux_amd64.tar.gz**](https://github.com/shenwei356/taxonkit/releases/download/v0.15.1/taxonkit_linux_amd64.tar.gz),<br/> [中国镜像](http://app.shenwei.me/data/taxonkit/taxonkit_linux_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/taxonkit/latest/taxonkit_linux_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/taxonkit/releases/download/v0.15.1/taxonkit_linux_amd64.tar.gz)
Linux |**arm64** |[**taxonkit_linux_arm64.tar.gz**](https://github.com/shenwei356/taxonkit/releases/download/v0.15.1/taxonkit_linux_arm64.tar.gz),<br/> [中国镜像](http://app.shenwei.me/data/taxonkit/taxonkit_linux_arm64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/taxonkit/latest/taxonkit_linux_arm64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/taxonkit/releases/download/v0.15.1/taxonkit_linux_arm64.tar.gz)
macOS |**64-bit**|[**taxonkit_darwin_amd64.tar.gz**](https://github.com/shenwei356/taxonkit/releases/download/v0.15.1/taxonkit_darwin_amd64.tar.gz),<br/> [中国镜像](http://app.shenwei.me/data/taxonkit/taxonkit_darwin_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/taxonkit/latest/taxonkit_darwin_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/taxonkit/releases/download/v0.15.1/taxonkit_darwin_amd64.tar.gz)
macOS |**arm64** |[**taxonkit_darwin_arm64.tar.gz**](https://github.com/shenwei356/taxonkit/releases/download/v0.15.1/taxonkit_darwin_arm64.tar.gz),<br/> [中国镜像](http://app.shenwei.me/data/taxonkit/taxonkit_darwin_arm64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/taxonkit/latest/taxonkit_darwin_arm64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/taxonkit/releases/download/v0.15.1/taxonkit_darwin_arm64.tar.gz)
Windows|**64-bit**|[**taxonkit_windows_amd64.exe.tar.gz**](https://github.com/shenwei356/taxonkit/releases/download/v0.15.1/taxonkit_windows_amd64.exe.tar.gz),<br/> [中国镜像](http://app.shenwei.me/data/taxonkit/taxonkit_windows_amd64.exe.tar.gz)|[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/taxonkit/latest/taxonkit_windows_amd64.exe.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/taxonkit/releases/download/v0.15.1/taxonkit_windows_amd64.exe.tar.gz)

## Installation

Expand Down Expand Up @@ -153,6 +151,12 @@ All-in-one command:

## Release history

- [TaxonKit v0.15.0](https://github.com/shenwei356/taxonkit/releases/tag/v0.15.0)
[![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/taxonkit/v0.15.0/total.svg)](https://github.com/shenwei356/taxonkit/releases/tag/v0.15.0)
- `taxonkit reformat`:
- For lineages with more than one node, if it fails to query TaxId with the parent-child pair, use the last child only. [#82](https://github.com/shenwei356/taxonkit/issues/82)
- The flag `-T/--trim` also does not add the prefix for missing ranks lower than the current rank. [#82](https://github.com/shenwei356/taxonkit/issues/82)
- New flag `-s/--miss-rank-repl-suffix` to set the suffix for estimated taxon names. [#85](https://github.com/shenwei356/taxonkit/issues/85)
- [TaxonKit v0.14.2](https://github.com/shenwei356/taxonkit/releases/tag/v0.14.2)
[![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/taxonkit/v0.14.2/total.svg)](https://github.com/shenwei356/taxonkit/releases/tag/v0.14.2)
- `taxonkit filter`:
Expand Down
65 changes: 65 additions & 0 deletions doc/docs/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,71 @@ where rank of the closest higher node is still lower than rank cutoff**.
species Severe acute respiratory syndrome-related coronavirus
strain Severe acute respiratory syndrome coronavirus 2

## Mapping old species names to new ones

Some species names in papers or websites might changed, we can try querying their TaxIds via their old new names
and then retrieve the new ones.

cat example/changed_species_names.txt
Lactobacillus fermentum
Mycoplasma gallinaceum

# TaxonKit >= v0.15.1
cat example/changed_species_names.txt \
| taxonkit name2taxid \
| taxonkit lineage -i 2 -n \
| cut -f 1,4

Lactobacillus fermentum Limosilactobacillus fermentum
Mycoplasma gallinaceum

Woops, there's no information of `Mycoplasma gallinaceum`.
Then we check the [taxid-changelog](https://github.com/shenwei356/taxid-changelog).

zcat taxonkit/taxid-changelog.csv.gz \
| csvtk grep -f name -P example/changed_species_names.txt
| csvtk cut -f taxid,version,change,name,rank \
| csvtk pretty

taxid version change name rank
----- ---------- -------------- ----------------------- -------
1613 2013-02-21 NEW Lactobacillus fermentum species
1613 2016-03-01 ABSORB Lactobacillus fermentum species
1613 2016-03-01 CHANGE_LIN_LEN Lactobacillus fermentum species
29556 2013-02-21 NEW Mycoplasma gallinaceum species
29556 2016-03-01 CHANGE_LIN_LEN Mycoplasma gallinaceum species
29556 2021-01-01 CHANGE_NAME Mycoplasma gallinaceum species
29556 2021-01-01 CHANGE_LIN_LIN Mycoplasma gallinaceum species

We can see the names are changed. Full changes can be queried with the taxid. e.g.,

taxid version change change-value name rank
----- ---------- -------------- ------------ ------------------------- -------
29556 2013-02-21 NEW Mycoplasma gallinaceum species
29556 2016-03-01 CHANGE_LIN_LEN Mycoplasma gallinaceum species
29556 2020-09-01 CHANGE_NAME Mycoplasmopsis gallinacea species
29556 2020-09-01 CHANGE_LIN_TAX Mycoplasmopsis gallinacea species
29556 2021-01-01 CHANGE_NAME Mycoplasma gallinaceum species
29556 2021-01-01 CHANGE_LIN_LIN Mycoplasma gallinaceum species
29556 2021-09-01 CHANGE_NAME Mycoplasmopsis gallinacea species
29556 2021-09-01 CHANGE_LIN_LIN Mycoplasmopsis gallinacea species
29556 2023-03-01 CHANGE_LIN_LIN Mycoplasmopsis gallinacea species


Then we just use their TaxIds to rertrieve the new names. **The final commands are**:

zcat taxonkit/taxid-changelog.csv.gz \
| csvtk grep -f name -P example/changed_species_names.txt \
| csvtk uniq -f taxid \
| csvtk cut -f name,taxid \
| csvtk del-header \
| csvtk csv2tab \
| taxonkit lineage -i 2 -n \
| cut -f 1,4

Lactobacillus fermentum Limosilactobacillus fermentum
Mycoplasma gallinaceum Mycoplasmopsis gallinacea

## Add taxonomy information to BLAST result

An blast result file `blast_result.txt`, where the second column is the accession of matched sequences.
Expand Down
16 changes: 12 additions & 4 deletions doc/docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ All-in-one command:
```text
TaxonKit - A Practical and Efficient NCBI Taxonomy Toolkit
Version: 0.14.2
Version: 0.15.1
Author: Wei Shen <shenwei356@gmail.com>
Expand Down Expand Up @@ -75,7 +75,7 @@ Available Commands:
lca Compute lowest common ancestor (LCA) for TaxIds
lineage Query taxonomic lineage of given TaxIds
list List taxonomic subtrees of given TaxIds
name2taxid Convert scientific names to TaxIds
name2taxid Convert taxon names to TaxIds
profile2cami Convert metagenomic profile table to CAMI format
reformat Reformat lineage in canonical ranks
taxid-changelog Create TaxId changelog from dump archives
Expand All @@ -90,6 +90,8 @@ Flags:
-j, --threads int number of CPUs. 4 is enough (default 4)
--verbose print verbose information
Use "taxonkit [command] --help" for more information about a command.
```

## list
Expand Down Expand Up @@ -999,11 +1001,11 @@ Examples:
Usage

```text
Convert scientific names to TaxIds
Convert taxon names to TaxIds
Attention:
1. Some TaxIds share the same scientific names, e.g, Drosophila.
1. Some TaxIds share the same names, e.g, Drosophila.
These input lines are duplicated with multiple TaxIds.
$ echo Drosophila | taxonkit name2taxid | taxonkit lineage -i 2 -r -L
Expand Down Expand Up @@ -1069,6 +1071,12 @@ Example data
uncultured murine large bowel bacterium BAC 54B 314101 cellular organisms;Bacteria;environmental samples;uncultured murine large bowel bacterium BAC 54B
Croceibacter phage P2559Y 1327037 Viruses;Caudovirales;Siphoviridae;unclassified Siphoviridae;Croceibacter phage P2559Y

1. Convert old names to new names.

$ echo Lactobacillus fermentum | taxonkit name2taxid | taxonkit lineage -i 2 -n | cut -f 1,2,4
Lactobacillus fermentum 1613 Limosilactobacillus fermentum


1. **Some TaxIds share the same scientific names**, e.g, Drosophila.

$ echo Drosophila \
Expand Down
2 changes: 2 additions & 0 deletions example/changed_species_names.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Lactobacillus fermentum
Mycoplasma gallinaceum
6 changes: 3 additions & 3 deletions taxonkit/cmd/name2taxid.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,12 @@ import (
// name2taxidCmd represents the fx2tab command
var name2taxidCmd = &cobra.Command{
Use: "name2taxid",
Short: "Convert scientific names to TaxIds",
Long: `Convert scientific names to TaxIds
Short: "Convert taxon names to TaxIds",
Long: `Convert taxon names to TaxIds
Attention:
1. Some TaxIds share the same scientific names, e.g, Drosophila.
1. Some TaxIds share the same names, e.g, Drosophila.
These input lines are duplicated with multiple TaxIds.
$ echo Drosophila | taxonkit name2taxid | taxonkit lineage -i 2 -r -L
Expand Down

0 comments on commit 9e5f985

Please sign in to comment.