# Track sample-level metadata

We already saw how to link data objects to entities representing features during ingestion.

For sample-level metadata, the underlying schema is often more complicated, and hence it's best done in a separate step.

Here, we walk through this process.

In [1]:
import lamindb as ln
import lnschema_bionty as bt
import lnschema_lamin1 as ln1

ln.track()

ℹ️ Instance: laminlabs/lamindata
ℹ️ User: giovp
✅ Added: Transform(id='zMCvXplQ8kTk', version='0', name='13-link-samples', type=notebook, title='Track sample-level metadata', created_by_id='eut8h4zv', created_at=datetime.datetime(2023, 5, 21, 16, 15, 42, 543607))
✅ Added: Run(id='8j9qIW97Md3PnCwf41bY', transform_id='zMCvXplQ8kTk', transform_version='0', created_by_id='eut8h4zv', created_at=datetime.datetime(2023, 5, 21, 16, 15, 43, 810085))


Samples, i.e., metadata associated with observations, are linked with the same approach post-ingestion.

We'll need to lazily relationships of objects, and hence, we need to keep track of a session.

In [2]:
ss = ln.Session()

Let's first query an scRNA-seq dataset stored as an `.h5ad` file.

In [3]:
file = ss.select(ln.File, suffix=".h5ad").first()

In [4]:
file

[session open] File(id='WGDHevIgEDPJ6CB99foT', name='tabula-muris-senis-facs-processed-official-annotations.h5ad', suffix='.h5ad', size=4795677086, key='Data-objects/tabula-muris-senis-facs-processed-official-annotations.h5ad', run_id='xqGUIfF1YLq2h70zl51J', transform_id='FqmyAmP74zEB', transform_version='0', storage_id='fw2dGZSl', created_at=datetime.datetime(2023, 4, 24, 18, 16, 32, 313696), created_by_id='FBa7SHjn')

For instance, let's annotate a scRNA-seq dataset with its readout type (scRNA-seq), the tissue, and the species.

## Readout

In [5]:
ro_lookup = bt.Readout.bionty.lookup()
scrnaseq = ro_lookup.single_cell_RNA_sequencing

scrnaseq

readout(index=7409, ontology_id='EFO:0008913', name='single-cell RNA sequencing')

In [6]:
readout = bt.Readout(name=scrnaseq.name)

readout

Readout(id='QAjOPfts', name='single-cell RNA sequencing', created_by='eut8h4zv')

Link the readout against the data object.

In [7]:
file.readouts.append(readout)

## Biosample

In [8]:
biosample = ln1.Biosample(name="Mouse Lymph Node")

### Species

We already have mouse in the database, hence let's just query it. No need to create a new record.

In [9]:
species = ln.select(bt.Species, name="mouse").one()

species

NoResultFound: ()

In [None]:
biosample.species = species

### Tissue

In [10]:
tissue_lookup = bt.Tissue.bionty.lookup()

ℹ️ Downloading Tissue reference for the first time might take a while...


Output()

In [11]:
tissue_lookup.lymph_node

tissue(ontology_id='UBERON:0000029', name='lymph node')

In [12]:
tissue = bt.Tissue(name=tissue_lookup.lymph_node.name)

In [13]:
tissue

Tissue(id='Weo9RFmc', name='lymph node')

In [14]:
biosample.tissue = tissue

## Link against file

Link against the data object:

In [15]:
file.biosamples.append(biosample)

## Add to the DB

We can add everything to the DB in one transaction:

In [16]:
ss.add([readout, biosample])

[[session open] Readout(id='QAjOPfts', name='single-cell RNA sequencing', created_by='eut8h4zv', created_at=datetime.datetime(2023, 5, 21, 16, 15, 50, 771114)),
 [session open] Biosample(id='on5SsBJxryzXTMtzjZmI', name='Mouse Lymph Node', created_by='eut8h4zv', created_at=datetime.datetime(2023, 5, 21, 16, 15, 50, 771114), tissue_id='Weo9RFmc')]

Let us close the session.

In [17]:
ss.close()

```{Tip}

Manage `Session` closing with a context manager instead of manually closing it!

With it the above would look like:

```{code}
with ln.Session() as ss:
    # manipulate data
```

## Query for linked metadata

In [18]:
ln.select(ln.File).where(
    ln.File.readouts,
    bt.Readout.name == scrnaseq.name,
).df()

Unnamed: 0_level_0,name,suffix,size,hash,key,run_id,transform_id,transform_version,storage_id,created_at,updated_at,created_by_id
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
WGDHevIgEDPJ6CB99foT,tabula-muris-senis-facs-processed-official-ann...,.h5ad,4795677086,,Data-objects/tabula-muris-senis-facs-processed...,xqGUIfF1YLq2h70zl51J,FqmyAmP74zEB,0,fw2dGZSl,2023-04-24 18:16:32.313696,,FBa7SHjn


In [19]:
ln.select(ln.File).join(ln.File.biosamples).where(
    ln1.Biosample.species, bt.Species.name == "mouse"
).df()

Unnamed: 0,id,name,suffix,size,hash,key,run_id,transform_id,transform_version,storage_id,created_at,updated_at,created_by_id


## What's in the database?

### Biological entities

In [20]:
ln.view(schema="bionty")

******************
* module: [1;92m[1mbionty[0m[0m *
******************
[1;94m[1mCellType[0m[0m


Unnamed: 0_level_0,ontology_id,name
id,Unnamed: 1_level_1,Unnamed: 2_level_1
0aMjyguT,CL:0000012,obsolete cell by class
xWomRiU6,CL:0000024,oogonial cell
0JAxlL0C,CL:0000073,barrier epithelial cell
NZ0k2Rqf,CL:0000062,osteoblast
1Lnwga1C,CL:0000054,bone matrix secreting cell
d7Oib6HL,CL:0000014,germ line stem cell
Zo6JGi9b,CL:0000056,myoblast
0SaUnph0,CL:0000182,hepatocyte
wXQeNauN,,my new cell type
QvYE8bIq,CL:0000084,T cell


[1;94m[1mGene[0m[0m


Unnamed: 0_level_0,Unnamed: 1_level_0,ensembl_gene_id,symbol,gene_type,description,ncbi_gene_id,hgnc_id,mgi_id,omim_id,synonyms,species_id
id,version,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
fk6Q3O0U,,ENSG00000286699,,,,,,,,,sSfX
KP2vA5Vv,,ENSG00000271409,,,,,,,,,sSfX
xu1Xj3vJ,,ENSG00000244952,,,,,,,,,sSfX
WPp81VuU,,ENSG00000255823,,,,,,,,,sSfX
v7MegLm5,,ENSG00000244693,,,,,,,,,sSfX
fVhDZaC4,,ENSG00000258414,,,,,,,,,sSfX
4FsMcHFE,,ENSG00000272370,,,,,,,,,sSfX
B1DDYmJI,,ENSG00000261438,,,,,,,,,sSfX
MQDYZc7I,,ENSG00000286601,,,,,,,,,sSfX
ySAirL13,,ENSG00000256374,,,,,,,,,sSfX


[1;94m[1mReadout[0m[0m


Unnamed: 0_level_0,efo_id,name,molecule,instrument,measurement,created_by,created_at,updated_at
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
QAjOPfts,,single-cell RNA sequencing,,,,eut8h4zv,2023-05-21 16:15:50.771114,


[1;94m[1mSpecies[0m[0m


Unnamed: 0_level_0,name,taxon_id,scientific_name
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
sSfX,human,,


[1;94m[1mTissue[0m[0m


Unnamed: 0_level_0,ontology_id,name
id,Unnamed: 1_level_1,Unnamed: 2_level_1
Weo9RFmc,,lymph node


### Wetlab

In [21]:
ln.view(schema="lamin1")

******************
* module: [1;92m[1mlamin1[0m[0m *
******************
[1;94m[1mBiosample[0m[0m


Unnamed: 0_level_0,name,created_by,created_at,updated_at,batch,species_id,tissue_id,cell_type_id,disease_id
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
on5SsBJxryzXTMtzjZmI,Mouse Lymph Node,eut8h4zv,2023-05-21 16:15:50.771114,,,,Weo9RFmc,,


In [22]:
# integrity checks
with ln.Session() as ss:
    mouselymph = ss.select(ln.File, name="Mouse Lymph Node scRNA-seq").one()

    mouselymph_hash = mouselymph.hash
    assert mouselymph_hash == "Qprqj0O23197Ko-VobaZiw"

    mouselymph_features_hash = mouselymph.features[0].id
    assert mouselymph_features_hash == "2Mv3JtH-ScBVYHilbLaQ"

NoResultFound: ()