# 01_load_with_readfcs

**01 — Loading Flow Cytometry Data with `readfcs`**

In this demo we’ll use [`readfcs`](https://pypi.org/project/readfcs/) to load a flow cytometry (`.fcs`) file and do some very basic exploratory data analysis (EDA).

***Why readfcs?**
- It's super light-weight. 
- Good starting library to just grab something and get started.

As always, the goal: **lower the barrier** so you can quickly get data into Python and make your first plots.  
This is not meant to be a full analysis pipeline — just a sandbox to get started.


## 1. Imports


In [None]:
import readfcs
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
from pathlib import Path



## 2. Load an FCS file

- `readfcs.read(path)` returns an **AnnData** (“Annotated Data”) object.  
- We’ll convert it to a pandas `DataFrame` for easier inspection and plotting.  
- Why do we need to convert it?
    - FCS files store raw flow cytometry event data (measurements per cell: fluorescence intensities, scatter properties, etc.).
    - To work with them, tools like FlowCore (R) or FlowKit (Python) parse the binary .fcs into a structured object.
    - The ANNE object is a higher-level abstraction:
        - It holds the raw measurement matrix (cells × channels).
        - It carries metadata annotations (channel names, markers, sample IDs, etc.).
        - It allows downstream analysis (gating, transformations, clustering) while keeping the data + annotations bound together.


In [None]:
"""
Haven't decided...
- which open source fcs files we want to use for the demo
- how I want to cite them, etc. to give the authors credit
- etc.
so until I figure out the proper way to do that, I'll leave it ambiguous on the fcs files being used.
"""

data_dir = Path("../data/raw")

# Collect all .fcs files recursively
fcs_files = [str(p) for p in data_dir.rglob("*.fcs")]
fcs_files

In [None]:
# Replace with the path to your .fcs file
fcs_path = fcs_files[0]

adata = readfcs.read(fcs_path)   # AnnData object
df_fcs = pd.DataFrame(adata.X, columns=adata.var_names)
df_log = np.arcsinh(df_fcs / 5)   # standard “asinh transform with cofactor 5”



## 3. What is AnnData?
Basically...
- Data
- Metadata

In [None]:
adata

In [None]:
adata["n"]

In [None]:
adata.to_df()

In [None]:
df_fcs.columns

In [None]:
df_fcs.info()
df_fcs.describe().T.head()


In [None]:
import matplotlib.pyplot as plt

df_log['CD3'].hist(bins=100, figsize=(6,4))
plt.title("CD3 intensity distribution")
plt.xlabel("asinh(intensity)")
plt.ylabel("Cell count")
plt.show()


In [None]:
plt.figure(figsize=(6,6))
plt.scatter(df_log['CD3'], df_log['CD19'], s=2, alpha=0.3)
plt.xlabel("CD3 (T cells)")
plt.ylabel("CD19 (B cells)")
plt.title("CD3 vs CD19")
plt.show()
