# Process FCS file

Here we will process the initial `data.fcs` file by:
- Moving some values in `vars` to `obs`
- Creating mappings from the `OmiqFileIndex` to:
  - Cell Type w/ Patient Info
- Generating $2D$ and $3D$ UMAP embeddings
- Writing the data to a `.zarr` file
- Writing Cell mappings to CSV files for Houdini Processing to generate `.obj` geometry files for Spline3D.

Directory Structure:
```sh
vitessce_suts/
├── data/
│   └── data.fcs
└── scripts/
    ├── process_fcs.ipynb *
    └── vitessce.ipynb
```


## Import packages

In [None]:
from pathlib import Path

import pandas as pd
import pytometry as pm
import scanpy as sc

## Read fcs, set cell types, categories, etc...

In [None]:
fcs_path = Path("../data/data.fcs")

In [None]:
adata = pm.io.read_fcs(path=fcs_path)

In [None]:
# Extract only the marker data, copy so we do not edit the view

new_adata = adata[
    :,
    adata.var_names[:-3],
].copy()

In [None]:
# Extractt the omiq file index (categories)
omiq_file_index = adata.to_df()["OmiqFileIndex"].astype("int").astype("category")
new_adata.obs["OmiqFileIndex"] = omiq_file_index

In [None]:
cell_type_mapping = {0: "Healthy", 1: "Healthy", 2: "Cancer"}
patient_annotation = {0: "Healthy", 1: "Cancer", 2: "Cancer"}
cell_type_patient_mapping = {
    0: "Healthy Cells",
    1: "Queen Bee Cells",
    2: "Cancer Cells",
}
new_adata.obs["cell_type"] = new_adata.obs["OmiqFileIndex"].map(cell_type_mapping).astype("category")
new_adata.obs["patient"] = new_adata.obs["OmiqFileIndex"].map(patient_annotation).astype("category")
new_adata.obs["cell_type_patient"] = new_adata.obs["OmiqFileIndex"].map(cell_type_patient_mapping).astype("category")

## UMAP Embedding

3D embedding and 2D embedding

In [None]:
sc.pp.neighbors(adata=new_adata)
new_adata.obsm["X_umap_2D"] = sc.tl.umap(adata=new_adata, n_components=2, copy=True).obsm["X_umap"].copy()
new_adata.obsm["X_umap_3D"] = sc.tl.umap(adata=new_adata, n_components=3, copy=True).obsm["X_umap"].copy()

## Export 3D embedding to `csvs`

In [None]:
df = pd.DataFrame(data=new_adata.obsm["X_umap_3D"], columns=["x", "y", "z"])
df["cell_type_patient"] = new_adata.obs["OmiqFileIndex"].values
for name, group in df.groupby(by="cell_type_patient", observed=True):
    group.to_csv(f"../data/{name}.csv", index=False)

In [None]:
markers = ["141Pr_pPLCg2", "150Sm_pSTAT5", "159Tb_pAkt", "166Er_pSyk", "176Yb_pCreb"]

In [None]:
new_adata2 = new_adata.copy()[:, markers]

In [None]:
renamed_markers = ["Gamma", "STAT5", "AKT", "SYK", "CREB"]

In [None]:
new_adata2.var_names = renamed_markers

In [None]:
new_adata2.write_zarr("../data/embedded_data.zarr")