# Preprocess flow data

In this notebook, we load an fcs file into the anndata format, move the forward scatter (FCS) and sideward scatter (SSC) information to the `.obs` section of the anndata file and perform compensation on the data. Next, we apply different types of normalisation to the data. The fcs file was part of the following [reference](https://insight.jci.org/articles/view/124928) and originally deposited on the [FlowRepository](http://flowrepository.org/id/FR-FCM-ZYQ9).

In [None]:
import readfcs
import pytometry as pm

In [None]:
%load_ext autoreload
%autoreload 2

Read data from `readfcs` package example.

In [None]:
path_data = readfcs.datasets.Oetjen18_t1()

In [None]:
adata = pm.io.read_fcs(path_data)

In [None]:
adata

## Reduce features 

We split the data matrix into the marker intensity part and the FSC/SSC part. Moreover, we move all height related features to the `.obs` part of the anndata file. Notably. the function `split_signal` checks if a feature name is either FSC/SSC or whether a name endswith `-A` for area related features and `-H` for height related features.   

Let us check the `var_names` of the features and the channel names. In this example, the channel names have been cleaned such that none of the markers have the `-A` or `-H` suffix. 

In [None]:
adata.var

We use the `channel` column of the `adata.var` data frame to split the matrix.

In [None]:
pm.pp.split_signal(adata, var_key="channel")

In [None]:
adata

The data matrix was reduced by three features (`FSC-A`, `FSC-H` and `SSC-A`). 

## Compensation

Next, we compensate the data using the compensation matrix that is included in the FCS file header. Alternatively, one may provide a custom compensation matrix.

The `compensate` function matches the `var_names` of `adata` with the column names of the spillover matrix to compensate the correct channels.  

In [None]:
pm.pp.compensate(adata)

## Normalize data

In the next step, we normalize the data. By default, normalization is an inplace operation, i.e. we only create a new anndata object, if we set the argument `inplace=False`. We demonstrate three different normalization methods that are build in `pytometry`:
* arcsinh 
* logicle 
* bi-exponential

In [None]:
adata_arcsinh = pm.tl.normalize_arcsinh(adata, cofactor=150, inplace=False)

In [None]:
adata_logicle = pm.tl.normalize_logicle(adata, inplace=False)

In [None]:
adata_biex = pm.tl.normalize_biexp(adata, inplace=False)