# Ingest fcs files

In [1]:
import lamindb as db
import bionty as bt
import readfcs
import pandas as pd
from nbproject import header

header()

0,1
id,XWJj7BgwAf6p
time_init,2022-07-10 10:05
time_run,2022-07-11 14:09
version,draft
dependency,bionty==0.0.6+17.g0499eed lamindb==0.0.9 nbproject==0.2.0 pandas==1.4.3 readfcs==0.1a1


## An example fcs file

We provide a `readfcs` wrapper to read in fcs files as `AnnData` objects, `readfcs.datasets.example()` points to a small fcs file (~3MB) for benchmarking purposes.

In [2]:
filepath = readfcs.datasets.example()
fcsfile = readfcs.FCSFile(filepath)

Here we have a standard `fcs` file containing flow cytometry data, measured from 65016 cells with 16 channels.

In [3]:
adata = fcsfile.to_anndata()

adata

AnnData object with n_obs × n_vars = 65016 × 16
    var: 'marker'
    uns: 'meta'

## Curate the channel makers

In [4]:
adata.var["marker"] = [i.split("/")[-1] for i in adata.var["marker"]]

In [5]:
adata.var.T

channel,FSC-A,FSC-H,SSC-A,B515-A,R780-A,R710-A,R660-A,V800-A,V655-A,V585-A,V450-A,G780-A,G710-A,G660-A,G610-A,G560-A
marker,,,,KI67,CD3,CD28,CD45RO,CD8,CD4,CD57,CD14,CCR5,CD19,CD27,CCR7,CD127


In [6]:
curated_channels = bt.Gene().curate(adata.var, column="marker")

9 terms (56.2%) are not mappable.


In [7]:
curated_channels.T

hgnc_symbol,Unnamed: 1,Unnamed: 2,Unnamed: 3,KI67,CD3,CD28,CD45RO,CD8,CD4,CD57,CD14,CCR5,CD19,CD27,CCR7,IL7R
marker,,,,KI67,CD3,CD28,CD45RO,CD8,CD4,CD57,CD14,CCR5,CD19,CD27,CCR7,CD127
channel,FSC-A,FSC-H,SSC-A,B515-A,R780-A,R710-A,R660-A,V800-A,V655-A,V585-A,V450-A,G780-A,G710-A,G660-A,G610-A,G560-A
__curated__,False,False,False,False,False,True,False,False,True,False,False,True,True,True,True,True


In [8]:
adata.var = curated_channels.reset_index().set_index("channel")

## Ingest into LaminDB

In [9]:
! lndb init --storage $HOME/mydata

Using instance: /Users/sunnysun/mydata/mydata.lndb


In [10]:
! lndb login --email "xiaoji.sun515@gmail.com"

In [11]:
adata.write("example_fcs.h5ad")

  c.reorder_categories(natsorted(c.categories), inplace=True)
... storing 'hgnc_symbol' as categorical
  c.reorder_categories(natsorted(c.categories), inplace=True)
... storing 'marker' as categorical


In [12]:
db.do.ingest("example_fcs.h5ad", i_confirm_i_saved=True, integrity=True)

Added file example_fcs.h5ad (hAHyztaWKZR8JpJVW8W3) from notebook 'Ingest fcs files' (XWJj7BgwAf6p) by user xiaoji.sun515@gmail.com (BjExb4ik).
Cell numbers increase at increments of 1: Awesome!
Bumped notebook version to 1. Wrote dependencies to dependency store.
File changed on disk! Reload and restart the notebook if you want to continue.
