# ABBA cell count analysis

This notebook is the last step in the ABBA whole-brain cell counting analysis.  
It assumes you have done the following steps:
- Alignment of brain slices in ABBA, exported to a QuPath project.
- Detected cells of interest in QuPath. The detections should be exported to ```.csv``` files (one per slice) in a folder called ```results```. 
- If there are regions to exclude, you should have drawn them and exported to ```.txt``` files (one per slice) in a folder called ```regions_to_exclude```.

Run this notebook to load the cell counts and do analysis on them. 

## Before we start ...
The majority of the functions and classes we need written in 3 files: ```brain_hierarchy.py```, ```readCSV_helpers.py``` and ```pls_helpers.py```. We will now import the necessary functions and classes from these python files to this notebook, so that we can use them later:

In [None]:
from brain_hierarchy import AllenBrainHierarchy
from readCSV_helpers import *
from pls_helpers import PLS

And we'll need other python functions to easily read and manipulate data and make nice plots:

In [None]:
import pandas as pd
# import copy
# import json
import numpy as np
import os

# import matplotlib.pyplot as plt
import plotly.express as px
# import seaborn as sns
# import pickle
import plotly.graph_objects as go
from itertools import product

## The Allen Brain Atlas

We start by importing the mouse Allen Brain Atlas, in which we find information about all brain regions (their parent region and children regions in the brain hierarchy, for example).

In [None]:
# from https://help.brain-map.org/display/api/Downloading+an+Ontology%27s+Structure+Graph
# StructureGraph id=1
path_to_allen_json = "./data/AllenMouseBrainOntology.json"

branches_to_exclude = ['retina','VS','grv','fiber tracts']
AllenBrain = AllenBrainHierarchy(path_to_allen_json, branches_to_exclude)

#edges = AllenBrain.edges_dict
#tree = AllenBrain.tree_dict
#brain_region_dict = AllenBrain.brain_region_dict
#regions = list(brain_region_dict.keys())

We can also visualize the hierarchy of brain regions as a network (a tree). **Note that running the above cell may take a few minutes**.

In [None]:
# Plot brain region hierarchy
# If you want to plot it, install PyDot (pydot)
fig = AllenBrain.plot_plotly_graph()
fig.show()

Based on the graph above, you might want to specify the regions on which you want to do further PLS analysis:  
*Note: to see more information about the regions, hover over them with your mouse.*

- Specify a level. Analysis can only be done on one level (slice) in the brain region.

- To exclude brain regions that belong to a certain branch, add the *abbreviated* nodes at the beginning of the branches to the list above.  
Example:  
```branches_to_exclude = ['retina', 'VS']```  
means that **all the subregions that belong to the retina and the ventricular systems** are excluded from the PLS analysis.

## Load data

Now, we're ready to read the ```.csv``` files with the cell counts, and also the exclusion files (if there were regions to exclude).  
Below, you have to specify:
- ```animals_root```: Absolute path to the folder that contains the animal folders.
- ```group_1_dirs```: A list of names of the folders corresponding to animals in **Group 1** (e.g., Control group). Indeed, it is necessary to store the results in individual folders for each animal.
- ```group_2_dirs```: A list of names of the folders corresponding to animals in **Group 2** (e.g., Stress group).
- ```group_1_name```: A meaningful string for Group 1.
- ```group_2_name```: A meaningful string for Group 2.
- ```area_key```: A string of the column in the ```.csv``` files that refers to the size of a brain areatra
- ```tracer_key```: A string of the column in the ```.csv``` files that refers to the tracer number used to highlight the marker
- ```marker_key```: A string of the marker we would like to highlight (e.g. CFos)

Provare a modificar per ottenere densita in mm^2 (da micron)

In [None]:
# ####################################### SET PARAMETERS ####################################


animals_root = './data/QuPath_output/'
group_1_dirs = ['Control_17C', 'Control_18C', 'Control_19C']
group_1_name = 'Control'
group_2_dirs = ['Stress_5S', 'Stress_8S', 'Stress_10S', 'Stress_13S', 'Resilient_1R', 'Resilient_2R', 'Resilient_3R', 'Resilient_4R', 'Resilient_11R']
group_2_name = 'Stress'
area_key = 'Area um^2'
# area_key = 'DAPI: DAPI area um^2'
tracer_key='Num AF647'
marker_key='CFos'

data_output_path = './data/python_norm_output/'
plots_output_path = './plots/python_output/'


# ###########################################################################################


if not(os.path.exists(data_output_path)):
    os.makedirs(data_output_path, exist_ok=True)
if not(os.path.exists(plots_output_path)):
    os.makedirs(plots_output_path, exist_ok=True)

Now, we load the Control and Stress results seperately in two pandas dataframes, and save the results.

**Note**: regions to exclude are automatically excluded.

In [None]:
def read_group_slices(animal_root: str, animal_dirs: list[str], AllenBrain: AllenBrainHierarchy) -> list[list[pd.DataFrame]]:
    animals_slices_paths = [os.path.join(animal_root, animal, 'results') for animal in animal_dirs]
    animals_excluded_regions = [list_regions_to_exclude(os.path.join(animal_root, animal)) for animal in animal_dirs]
    # load_cell_counts() -> list[pd.DataFrame], set[str], list[dict[str, str]]
    return [load_cell_counts(input_path, exluded_regions, AllenBrain, area_key, tracer_key, marker_key)[0] for (input_path, exluded_regions) in zip(animals_slices_paths, animals_excluded_regions)]

def area_µm2_to_mm2(group) -> None:
    for slices in group:
        for slice in slices:
            slice.area = slice.area * 1e-06

# for each brain region, aggregate marker counts from all the animal's slices into one value.
# methods:
# - sum
# - avg & std of marker/area ratio (density)

def sum_cell_counts(slices: list[pd.DataFrame]) -> pd.DataFrame:# methods: Callable[[int, int], int]):
    slices_df = pd.concat(slices)
    slices_df = slices_df.groupby(slices_df.index, axis=0).sum()
    return slices_df

def animal_cell_density(slices: list[pd.DataFrame], marker_key: str) -> pd.Series:
    slices_marker_densities = [slice[marker_key] / slice['area'] for slice in slices]
    return pd.concat(slices_marker_densities)

def avg_cell_density(slices: list[pd.DataFrame], marker_key: str) -> pd.Series:
    marker_densities = animal_cell_density(slices, marker_key)
    avg_marker_densities = marker_densities.groupby(marker_densities.index, axis=0).mean()
    return avg_marker_densities

def std_cell_density(slices: list[pd.DataFrame], marker_key: str) -> pd.Series:
    marker_densities = animal_cell_density(slices, marker_key)
    std_marker_densities = marker_densities.groupby(marker_densities.index, axis=0).std()
    return std_marker_densities[~std_marker_densities.isnull()] # remove NaN: when there is only one slice, the std can't be computed and instead outputs NaN

# https://en.wikipedia.org/wiki/Coefficient_of_variation
def coefficient_variation(x) -> np.float64:
    avg = x.mean()
    if len(x) > 1 and avg != 0:
        return x.std(ddof=1) / avg
    else:
        return 0

def variation_cell_density(slices: list[pd.DataFrame], marker_key: str) -> pd.Series:
    # we merge hemisphere just in case there is some Right/Left region
    merged_hem_slices = [merge_hemispheres(slice) for slice in slices]
    marker_densities = animal_cell_density(merged_hem_slices, marker_key)
    variation = marker_densities.groupby(marker_densities.index, axis=0).apply(coefficient_variation)
    return variation

# a: confidence threshold
# TODO: determine k parameter (https://en.wikipedia.org/wiki/Normal_distribution#Confidence_intervals)
# NOTE: avg not used atm
def check_density_distribution(animals_slices: list[list[pd.DataFrame]], animal_names: list[str], a=0.001) -> None:
    animals_avg_density = [avg_cell_density(brain_slices, marker_key) for brain_slices in animals_slices]
    animals_std_density = [std_cell_density(brain_slices, marker_key) for brain_slices in animals_slices]
    for i in range(len(animal_names)):
        animal = animal_names[i]
        region_in_confidence_interval = animals_std_density[i] < a
        print(f"Animal {animal}: out of {len(region_in_confidence_interval)} brain regions, {(~region_in_confidence_interval).sum()} are outside the confidence interval (a={a})")
        # print(region_in_confidence_interval.index[~region_in_confidence_interval])

def write_brains(root_output_path: str, animal_names: list[str], animal_brains: list[pd.DataFrame]) -> None:
    assert len(animal_names) == len(animal_brains),\
        f"The number of animals read and analysed ({len(animal_brains)}) differs from the numner of animals in the input group ({len(animal_names)})"
    for i in range(len(animal_names)):
        brain = animal_brains[i]
        name = animal_names[i]
        output_path = os.path.join(root_output_path, animal_names[i])
        os.makedirs(output_path, exist_ok=True)
        output_path = os.path.join(output_path, name+'_summed.csv')
        brain.to_csv(output_path, sep='\t', mode='w')
        print(f'Raw summed cell counts are saved to {output_path}')

def analyze(animal_names: list[str], animal_brains: list[pd.DataFrame], marker_key: str, AllenBrain: AllenBrainHierarchy) -> pd.DataFrame:
    brain = pd.concat({name: normalize_cell_counts(brain, marker_key) for name,brain in zip(animal_names, animal_brains)})
    brain = pd.concat({marker_key: brain}, axis=1)
    brain = brain.reorder_levels([1,0], axis=0)
    ordered_indices = product(AllenBrain.brain_region_dict.keys(), animal_names)
    brain = brain.reindex(ordered_indices, fill_value=np.nan)
    return brain

In [None]:
# TODO: change to group_1_slices / group_2_slices
# NOTE: group_*_slices still discriminate Right from Left hemisphere. sort_hemispheres() to sum them.
group_1_slices = read_group_slices(animals_root, group_1_dirs, AllenBrain)
area_μm2_to_mm2(group_1_slices)
print(f'Imported all brain slices from {str(len(group_1_slices))} animals of {group_1_name} group.')
# check_density_distribution(group_1_slices, group_1_dirs, a=800)

group_2_slices = read_group_slices(animals_root, group_2_dirs, AllenBrain)
area_μm2_to_mm2(group_2_slices)
print(f'Imported all brain slices  {str(len(group_2_slices))} animals of {group_2_name} group.')
# check_density_distribution(group_2_slices, group_2_dirs)


In [None]:
# SLICE ANALYSIS: SUMMARY
#
# avg = avg_cell_density(group_1_slices[0], marker_key)
# std = std_cell_density(group_1_slices[0], marker_key)
# idx = variation_cell_density(group_1_slices[0], marker_key)
# df = pd.concat(group_1_slices[0])
# slices_per_area = df.groupby(df.index).count().iloc[:,0]
# 
# threshold = 1
# above_threshold_filter = idx > threshold
# print(f"""Summary for animal 0:
#     - N areas: {len(idx)}
#     - Areas \w CV > {threshold}:
#         + N: {above_threshold_filter.sum()}
#         + Mean slices/area: {slices_per_area[above_threshold_filter].mean()}
#         + S.D. slices/area: {slices_per_area[above_threshold_filter].std()}
#     - Areas \w CV <= {threshold}:
#         + N: {(~above_threshold_filter).sum()}
#         + Mean slices/area: {slices_per_area[~above_threshold_filter].mean()}
#         + S.D. slices/area: {slices_per_area[~above_threshold_filter].std()}
# """)

In [None]:
def plot_cv_above_threshold(brains_CV, brains_name, marker_key, cv_threshold=1) -> go.Figure: 
    fig = go.Figure()
    for i,cv in enumerate(brains_CV):
        above_threshold_filter = cv > cv_threshold
        # Scatterplot (animals)
        fig.add_trace(go.Scatter(
                            mode = 'markers',
                            y = cv[above_threshold_filter],
                            x = [i]*above_threshold_filter.sum(),
                            text = cv.index[above_threshold_filter],
                            opacity=0.7,
                            marker=dict(
                                size=7,
                                line=dict(
                                    color='rgb(0,0,0)',
                                    width=1
                                )
                            ),
                            showlegend=False
                    )
        )

    fig.update_layout(
        title = f"Coefficient of variaton of {marker_key} across brain slices > {cv_threshold}",
        
        xaxis = dict(
            tickmode = 'array',
            tickvals = np.arange(0,len(brains_name)),
            ticktext = brains_name
        ),
        yaxis=dict(
            title = "Brain regions' CV"
        ),
        width=700, height=500
    )
    return fig

In [None]:
def filter_selected_regions(regions_df: pd.DataFrame, AllenBrain: AllenBrainHierarchy) -> pd.DataFrame:
    selected_allen_regions = AllenBrain.get_selected_regions()
    selectable_regions = set(regions_df.index).intersection(set(selected_allen_regions))
    return regions_df[list(selectable_regions)]

group_1_CVs = [variation_cell_density(slices, marker_key) for slices in group_1_slices]
AllenBrain.select_from_csv("./data/AllenSummaryStructures.csv")
group_1_CVs = [filter_selected_regions(brain, AllenBrain) for brain in group_1_CVs]

group_2_CVs = [variation_cell_density(slices, marker_key) for slices in group_2_slices]
AllenBrain.select_from_csv("./data/AllenSummaryStructures.csv")
group_2_CVs = [filter_selected_regions(brain, AllenBrain) for brain in group_2_CVs]

cv_threshold = 1
print("N regions above threshold:", sum([(animal_cv > cv_threshold).sum() for animal_cv in group_1_CVs+group_2_CVs]))
print("N regions below threshold:", sum([(animal_cv <= cv_threshold).sum() for animal_cv in group_1_CVs+group_2_CVs]))
plot_cv_above_threshold(group_1_CVs+group_2_CVs, group_1_dirs+group_2_dirs, marker_key, cv_threshold=cv_threshold).show()

In [None]:
r = 'IG'
group = group_2_slices
dirs = group_2_dirs
i_animal = 1
df = pd.concat([merge_hemispheres(slice) for slice in group[i_animal]])
slices_per_area = df.groupby(df.index).count().iloc[:,0]
print(f"""Summary for brain region '{r}' of {dirs[i_animal]}:
    - N slices: {slices_per_area[r]}
    - Coefficient of Variation: {variation_cell_density(group[i_animal], marker_key)[r]}""")
#    - Mean: {avg_cell_density(group[i_animal], marker_key)[r]:.2f} {marker_key}/mm²),
#    - S.D.: {std_cell_density(group[i_animal], marker_key)[r]:.2f} {marker_key}/mm²,

In [None]:
group_1_brains = [sum_cell_counts(cell_count_slices) for cell_count_slices in group_1_slices]
# NOTE: brains are being written WITH Left/Right discrimination
write_brains(data_output_path, group_1_dirs, group_1_brains)

group_2_brains = [sum_cell_counts(cell_count_slices) for cell_count_slices in group_2_slices]
write_brains(data_output_path, group_2_dirs, group_2_brains)

In [None]:
# fgh = group_1_brains[0].loc[['Left: root', 'Right: root']].sum()
# fgh.CFos / fgh.area

In [None]:
group_1_results = analyze(group_1_dirs, [merge_hemispheres(brain) for brain in group_1_brains], marker_key, AllenBrain)
group_2_results = analyze(group_2_dirs, [merge_hemispheres(brain) for brain in group_2_brains], marker_key, AllenBrain)

In [None]:
# Save results
save_results(group_1_results, data_output_path, f'results_cell_counts_{group_1_name}.csv')
save_results(group_2_results, data_output_path, f'results_cell_counts_{group_2_name}.csv')