Skip to content

Single python script to construct lineages from ncbi taxdump files names.dmp and nodes.dmp.

Notifications You must be signed in to change notification settings

hbckleikamp/NCBI2Lineage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

NCBI2Lineage

Converting taxids into homogeneous lineages for NCBI can be tricky as there can be gaps as well as several "no rank" taxids within a lineage.


This python script constructs homogeneous lineages from ncbi taxdump files names.dmp and nodes.dmp.
The script is tested in pyhon 3.9 and run in spyder 5.1.5
The required inputs are the full filepaths to nodes.dmp and names.dmp, which can be downloaded from ncbi taxdump ftp.
Additionally, the desired ranks of the lineage are supplied.
The script outputs two tsv files, with a lineage per row for each taxon in NCBI.
The taxon is present in the idx column and can be used to easily convert ncbi taxids into rank normalized lineages.

About

Single python script to construct lineages from ncbi taxdump files names.dmp and nodes.dmp.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages