# Dataset analysis: Samplewise Analysis

In this vignette, we showcase samplewise analysis methods covered by FACSPy.  

First, we create the dataset, covering cell lineages of peripheral blood, spleen and bone marrow.

In [1]:
import warnings
warnings.filterwarnings(
    action='ignore',
    category=FutureWarning
)

In [2]:
import FACSPy as fp
import os

In [3]:
input_directory = "../../Tutorials/mouse_lineages"
metadata = fp.dt.Metadata(os.path.join(input_directory, "metadata.csv"))
panel = fp.dt.Panel(os.path.join(input_directory, "panel.csv"))
workspace = fp.dt.FlowJoWorkspace(os.path.join(input_directory, "lineages.wsp"))

In [None]:
dataset = fp.dt.create_dataset(input_directory = input_directory,
                               metadata = metadata,
                               panel = panel,
                               workspace = workspace)
dataset











... gating sample 20112023_lineage_BM_Cre_neg_unstained_037.fcs
... gating sample 20112023_lineage_BM_Cre_pos_unstained_036.fcs
... gating sample 20112023_lineage_BM_M1_038.fcs
... gating sample 20112023_lineage_BM_M2_039.fcs
... gating sample 20112023_lineage_BM_M3_040.fcs
... gating sample 20112023_lineage_BM_M4_041.fcs
... gating sample 20112023_lineage_BM_M5_042.fcs
... gating sample 20112023_lineage_BM_M6_043.fcs
... gating sample 20112023_lineage_PB_Cre_neg_unstained_030.fcs
... gating sample 20112023_lineage_PB_Cre_pos_unstained_029.fcs
... gating sample 20112023_lineage_PB_M2_031.fcs
... gating sample 20112023_lineage_PB_M3_032.fcs
... gating sample 20112023_lineage_PB_M4_033.fcs
... gating sample 20112023_lineage_PB_M5_034.fcs
... gating sample 20112023_lineage_PB_M6_035.fcs
... gating sample 20112023_lineage_SPL_Cre_neg_unstained_045.fcs
... gating sample 20112023_lineage_SPL_Cre_pos_unstained_044.fcs
... gating sample 20112023_lineage_SPL_M1_046.fcs
... gating sample 2011202

In [None]:
cofactors = fp.dt.CofactorTable(os.path.join(input_directory, "cofactors.csv"))
fp.dt.transform(dataset, transform = "asinh", cofactor_table = cofactors, key_added = "transformed")

In [None]:
fp.subset_gate(dataset, "cells")

We set default gates and default layers in order not to repeat the keyword arguments for the respective functions. Note that you can override these parameters by explicitly passing them.

In [None]:
fp.settings.default_gate = "CD45+"
fp.settings.default_layer = "transformed"

First, we calculate the gate frequencies, MFI and FOP values using their respective functions

In [None]:
fp.tl.gate_frequencies(dataset)

fp.tl.mfi(dataset,
          layer = "compensated")
fp.tl.mfi(dataset,
          layer = "transformed")

fp.tl.fop(dataset,
          layer = "compensated")

## Sample correlation

Sample correlations calculate the respective correlation coefficient (pearson, spearman or kendall) from the calculated MFI values per sample.  

We use a heatmap visualization where each row and column correspond to one sample. We use the parameter `metadata_annotation` in order to plot the respective metadata.

Note that `gate` and `layer` are set using the settings from line [7]. 


In [None]:
fp.pl.sample_correlation(dataset,
                         metadata_annotation = ["organ", "sex"],
                         corr_method = "spearman", # or 'pearson' or 'kendall'
                         cmap = "inferno")

We can also use the FOP to calculate the correlation heatmap. In order to do that, we pass `data_metric='fop'`. We set the `layer` argument explicitly since FOPs are calculated from compensated events.

In [None]:
fp.pl.sample_correlation(dataset,
                         data_metric = "fop",
                         layer = "compensated", 
                         metadata_annotation = ["organ", "sex"],
                         corr_method = "spearman",
                         cmap = "inferno")

## Sample distance

Another metric for the sample grouping can be obtained from the distance calculation from sample to sample using their respective MFI/FOP values.

Similar to the sample correlation, we use the `fp.pl.sample_distance()` function with the same parameter combinations.

In [None]:
fp.pl.sample_distance(dataset,
                      metadata_annotation = ["organ", "sex"], 
                      cmap = "inferno")

In [None]:
fp.pl.sample_distance(dataset,
                      data_metric = "fop",
                      layer = "compensated",
                      metadata_annotation = ["organ", "sex"], 
                      cmap = "inferno")

## Samplewise dimensionality reductions

We can calculate a samplewise dimensionality reductions based on their MFI/FOP vectors. This allows us to quickly visualize sample groupings. Here, we calculate a sample-wise PCA.

In [None]:
fp.tl.pca_samplewise(dataset)

In [None]:
fp.pl.pca_samplewise(dataset,
                     color = "organ")
fp.pl.pca_samplewise(dataset,
                     color = "sex")

We repeat the analysis with another dimensionality reduction, MDS, with similar parameters.

In [None]:
fp.tl.mds_samplewise(dataset)

In [None]:
fp.pl.mds_samplewise(dataset,
                     color = "organ")
fp.pl.mds_samplewise(dataset,
                     color = "sex")

## Marker correlation

In order to visualize corresponding and correlating markers, we use the marker correlation analysis. Here, a pearson/spearman/kendall correlation is calculated between the respective markers per sample.  

We observe that Ly6G, Ly6C and CD11b are highly correlated. This makes intuitive sense since these are markers for myeloid cells.

In [None]:
fp.pl.marker_correlation(dataset,
                         include_technical_channels = False)

## Expression heatmaps

The samplewise expression can be visualized as a heatmap.

In [None]:
fp.pl.expression_heatmap(dataset,
                         metadata_annotation = ["genotype", "organ"],
                         scaling = "MinMaxScaler", #scales from 0 to 1
                         metaclusters = 2)

In [None]:
fp.pl.expression_heatmap(dataset,
                         data_metric = "fop",
                         layer = "compensated",
                         metadata_annotation = ["genotype", "organ"],
                         scaling = "MinMaxScaler", #scales from 0 to 1
                         metaclusters = 2)

## Fold change analysis

This analysis calculates the asinh fold changes of the markers. This gives a quick overview of differentially expressed markers across conditions. Here, we compare the difference of MFI between blood and bone marrow. We notice an increase in Ly6G expression, which is to be expected as Neutrophils mature from bone marrow to peripheral blood.

In [None]:
fp.pl.fold_change(dataset,
                  gate = "CD45+",
                  layer = "compensated",
                  groupby = "organ",
                  group1 = "BM",
                  group2 = "PB",
                  min_pval = 10e-4,
                  figsize = (2,4)
                 )

In [None]:
fp.save_dataset(dataset,
                output_dir = "../../Tutorials/mouse_lineages",
                file_name = "raw_dataset_samplewise",
                overwrite = True)