# orthomap: Step 1 - get taxonomic information

This notebook will demonstrate how to get taxonomic information for your query species with `orthomap`.

Given a species name or taxonomic ID, the query species lineage information is extracted with the help of the `ete3` python toolkit and the `NCBI taxonomy` ([Huerta-Cepas et al., 2016](https://doi.org/10.1093/molbev/msw046)). This information is needed alongside with the taxonomic classifications for all species used in the OrthoFinder comparison.

__Note:__ If you need to download or update the NCBI taxonomy database via the `ete3` python package. Please use the `orthomap` command line function [ncbitax](https://orthomap.readthedocs.io/en/latest/tutorials/commandline.ncbitax.html) or run the following code:

## Notebook file

Notebook file can be obtained here:

[https://raw.githubusercontent.com/kullrich/orthomap/main/docs/notebooks/query_lineage.ipynb](https://raw.githubusercontent.com/kullrich/orthomap/main/docs/notebooks/query_lineage.ipynb)

## Import libraries

In [1]:
import numpy as np
import pandas as pd
import scanpy as sc
import seaborn as sns
import matplotlib.pyplot as plt
from statannot import add_stat_annotation
# increase dpi
%matplotlib inline
#plt.rcParams['figure.dpi'] = 300
#plt.rcParams['savefig.dpi'] = 300
plt.rcParams['figure.figsize'] = [6, 4.5]
#plt.rcParams['figure.figsize'] = [4.4, 3.3]

## Import orthomap python package submodules

In [2]:
# import submodules
from orthomap import qlin, gtf2t2g, of2orthomap, orthomap2tei, datasets

## Get query species taxonomic lineage information

The `orthomap` submodule `qlin` helps to get taxonomic information for you with the `qlin.get_qlin()` function as follows:

In [3]:
# get query species taxonomic lineage information
query_lineage = qlin.get_qlin(q='Caenorhabditis elegans')

query name: Caenorhabditis elegans
query taxID: 6239
query kingdom: Eukaryota
query lineage names: 
['root(1)', 'cellular organisms(131567)', 'Eukaryota(2759)', 'Opisthokonta(33154)', 'Metazoa(33208)', 'Eumetazoa(6072)', 'Bilateria(33213)', 'Protostomia(33317)', 'Ecdysozoa(1206794)', 'Nematoda(6231)', 'Chromadorea(119089)', 'Rhabditida(6236)', 'Rhabditina(2301116)', 'Rhabditomorpha(2301119)', 'Rhabditoidea(55879)', 'Rhabditidae(6243)', 'Peloderinae(55885)', 'Caenorhabditis(6237)', 'Caenorhabditis elegans(6239)']
query lineage: 
[1, 131567, 2759, 33154, 33208, 6072, 33213, 33317, 1206794, 6231, 119089, 6236, 2301116, 2301119, 55879, 6243, 55885, 6237, 6239]


## Get query species lineage as a tree object

In [4]:
lineage_tree = qlin.get_lineage_topo(qt='6239')
print(lineage_tree)


                                                      /- /-18/6239/Caenorhabditis elegans
                                                   /-|
                                                /-|   \-17/6237/Caenorhabditis
                                               |  |
                                             /-|   \-16/55885/Peloderinae
                                            |  |
                                          /-|   \-15/6243/Rhabditidae
                                         |  |
                                       /-|   \-14/55879/Rhabditoidea
                                      |  |
                                    /-|   \-13/2301119/Rhabditomorpha
                                   |  |
                                 /-|   \-12/2301116/Rhabditina
                                |  |
                              /-|   \-11/6236/Rhabditida
                             |  |
                           /-|   \-10/119089/Chromadorea
              

If you like to continue, please have a look at the documentation of [Step 2 - gene age class assignment](https://orthomap.readthedocs.io/en/latest/tutorials/get_orthomap.html) to get further insides.