# Look up records of species, gene, protein, cell marker

Entities and ontologies can be complex with many different identifiers or even species.

Here we show Bionty's Entity model for species, genes, proteins and cell markers. You'll see how to

- initialize an Entity model with different identifiers
- access the reference table via `.df`
- lookup an entity record via `.lookup.{term}`

In [None]:
import bionty as bt

## Species


To examine the Species ontology we create the corresponding object and look at the associated Pandas DataFrame.

In [None]:
species = bt.Species()

### Reference table

In [None]:
species.df.head()

### Lookup records

Terms can be searched with auto-complete using a lookup object:

```{tip}

By default, the `name` field is used to generate the lookup, you may change the field via:

`species.lookup_field = <new field>`
```

For duplications, we uniquefy them by appending `__0`, `__1`, `__2`, ...

In [None]:
species.lookup.white_tufted_ear_marmoset

In [None]:
species.lookup.white_tufted_ear_marmoset.scientific_name

To access the information of, for example the human, pig, and mouse species, we select the corresponding species through Pandas:

In [None]:
species.df

In [None]:
species.df.set_index("name", inplace=True)
species.df.loc[["human", "mouse", "pig"]]

### Instantiate with a different identifier

You can pass a different id field to instantiate an entity. This id field will become the index of the reference table and be used for {doc}`./curate`.

In [None]:
species = bt.Species()

In [None]:
species.df.head()

## Gene

Next let's take a look at genes, which follows the same design choices as `Species`.

The only difference is the `Gene` class will initialize with a `species` parameter, therefore you will only retrieve gene entries of the specified species.

In [None]:
gene = bt.Gene(species="human")

In [None]:
gene.df

In [None]:
gene.lookup.TCF7

Convert between identifiers just using Pandas:

In [None]:
gene.df.loc[gene.df["symbol"].isin(["BRCA1", "BRCA2"])]

The mouse reference is also available from ensembl:

In [None]:
gene = bt.Gene("mouse")

In [None]:
gene.df.head()

## Protein

The protein reference uses UniProt id as the standardized identifier.

In [None]:
protein = bt.Protein(species="human")

In [None]:
protein.lookup.ABC_transporter_domain_containing_protein

In [None]:
protein.df.head()

## Cell marker

The cell marker ontologies works similarly.

In [None]:
cell_marker = bt.CellMarker(species="human")

In [None]:
cell_marker.df.head()

In [None]:
cell_marker.lookup.CD45