# ABBA cell count analysis

This notebook is the last step in the ABBA whole-brain cell counting analysis.  
It assumes you have done the following steps:
- Alignment of brain slices in ABBA, exported to a QuPath project.
- Detected cells of interest in QuPath. The detections should be exported to ```.csv``` files (one per slice) in a folder called ```results```. 
- If there are regions to exclude, you should have drawn them and exported to ```.txt``` files (one per slice) in a folder called ```regions_to_exclude```.

Run this notebook to load the cell counts and do analysis on them. 

## Before we start ...
The majority of the functions and classes we need written in 3 files: ```brain_hierarchy.py```, ```readCSV_helpers.py``` and ```pls_helpers.py```. We will now import the necessary functions and classes from these python files to this notebook, so that we can use them later:

In [1]:
from brain_hierarchy import AllenBrainHierarchy
from readCSV_helpers import * #collect_and_analyze_cell_counts, save_results
from pls_helpers import PLS

And we'll need other python functions to easily read and manipulate data and make nice plots:

In [2]:
import pandas as pd
# import copy
# import json
import numpy as np
import os

# import matplotlib.pyplot as plt
import plotly.express as px
# import seaborn as sns
# import pickle
import plotly.graph_objects as go
import itertools

## The Allen Brain Atlas

We start by importing the mouse Allen Brain Atlas, in which we find information about all brain regions (their parent region and children regions in the brain hierarchy, for example).

In [3]:
path_to_allen_json = "./data/AllenMouseBrainOntology.json"
AllenBrain = AllenBrainHierarchy(path_to_allen_json) 

edges = AllenBrain.edges_dict
tree = AllenBrain.tree_dict
brain_region_dict = AllenBrain.brain_region_dict
regions = list(brain_region_dict.keys())

We now have access to useful information about all brain regions. Below, show the first three of them:

In [4]:
AllenBrain.df.head(3)

Unnamed: 0_level_0,atlas_id,ontology_id,acronym,region_name,color_hex_triplet,graph_order,st_level,hemisphere_id,parent_structure_id,children,id,distance_from_root
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
997,-1.0,1,root,root,FFFFFF,0,0,3,,"[{'id': 8, 'atlas_id': 0, 'ontology_id': 1, 'a...",997,0
8,0.0,1,grey,Basic cell groups and regions,BFDAE3,1,1,3,997.0,"[{'id': 567, 'atlas_id': 70, 'ontology_id': 1,...",8,1
1009,691.0,1,fiber tracts,fiber tracts,CCCCCC,1101,1,3,997.0,"[{'id': 967, 'atlas_id': 686, 'ontology_id': 1...",1009,1


We can also visualize the hierarchy of brain regions as a network (a tree). **Note that running the above cell may take a few minutes**.

In [5]:
# Plot brain region hierarchy
# If you want to plot it, install PyDot (pydot)
# fig = AllenBrain.plot_plotly_graph()
# fig.show()

Based on the graph above, you might want to specify the regions on which you want to do further analysis:  
*Note: to see more information about the regions, hover over them with your mouse.*

- Specify a level. Analysis can only be done one one level (slice) in the brain region.

- To exclude brain regions that belong to a certain branch, add the *abbreviated* nodes at the beginning of the branches to the list above.  
Example:  
```branches_to_exclude = ['retina', 'VS']```  
means that **all the subregions that belong to the retina and the ventricular systems** are excluded from the analysis.

In [6]:
level = 6
branches_to_exclude = ['retina','VS','grv','fiber tracts']

Now, get the selected regions as a variable:

In [7]:
selected_regions = AllenBrain.list_regions_to_analyze(level, branches_to_exclude)
print(f'You selected %d regions to analyze.'%len(selected_regions))

You selected 288 regions to analyze.


## Load data

Now, we're ready to read the ```.csv``` files with the cell counts, and also the exclusion files (if there were regions to exclude).  
Below, you have to specify:
- ```animals_root```: Absolute path to the folder that contains the animal folders.
- ```group_1_dirs```: A list of names of the folders corresponding to animals in **Group 1** (e.g., Control group). Indeed, it is necessary to store the results in individual folders for each animal.
- ```group_2_dirs```: A list of names of the folders corresponding to animals in **Group 2** (e.g., Stress group).
- ```group_1_name```: A meaningful string for Group 1.
- ```group_2_name```: A meaningful string for Group 2.
- ```area_key```: A string of the column in the ```.csv``` files that refers to the size of a brain areatra
- ```tracer_key```: A string of the column in the ```.csv``` files that refers to the tracer number used to highlight the marker
- ```marker_key```: A string of the marker we would like to highlight (e.g. CFos)

Provare a modificar per ottenere densita in mm^2 (da micron)

In [8]:
# ####################################### SET PARAMETERS ####################################


animals_root = './data/QuPath_output/'
group_1_dirs = ['Control_17C', 'Control_18C', 'Control_19C']
group_1_name = 'Control'
group_2_dirs = ['Stress_5S', 'Stress_8S', 'Stress_10S', 'Stress_13S', 'Resilient_1R', 'Resilient_2R', 'Resilient_3R', 'Resilient_4R', 'Resilient_11R']
group_2_name = 'Stress'
area_key='Area um^2' # 'DAPI: DAPI area um^2'
tracer_key='Num AF647'
marker_key='CFos'

data_output_path = './data/python_output/'
plots_output_path = './plots/python_output/'


# ###########################################################################################


if not(os.path.exists(data_output_path)):
    os.makedirs(data_output_path, exist_ok=True)
if not(os.path.exists(plots_output_path)):
    os.makedirs(plots_output_path, exist_ok=True)

Now, we load the Control and Stress results seperately in two pandas dataframes, and save the results.

**Note**: regions to exclude are automatically excluded.

In [9]:
def read_group_slices(animal_root: str, animal_dirs: list[str]) -> list[list[pd.DataFrame]]:
    animals_slices_paths = [os.path.join(animal_root, animal, 'results') for animal in animal_dirs]
    animals_excluded_regions = [list_regions_to_exclude(os.path.join(animal_root, animal)) for animal in animal_dirs]
    # load_cell_counts() -> list[pd.DataFrame], set[str], list[dict[str, str]]
    return [load_cell_counts(input_path, exluded_regions, AllenBrain, area_key, tracer_key, marker_key)[0] for (input_path, exluded_regions) in zip(animals_slices_paths, animals_excluded_regions)]

def area_µm2_to_mm2(group) -> None:
    for slices in group:
        for slice in slices:
            slice.area = slice.area * 1e-06

# for each brain region, aggregate marker counts from all the animal's slices into one value.
# methods:
# - sum
# - avg & std of marker/area ratio (density)

def sum_cell_counts(slices: list[pd.DataFrame]) -> pd.DataFrame:# methods: Callable[[int, int], int]):
    slices_df = pd.concat(slices)
    slices_df = slices_df.groupby(slices_df.index, axis=0).sum()
    return slices_df

def animal_cell_density(slices: list[pd.DataFrame], marker_key: str) -> pd.Series:
    slices_marker_densities = [slice[marker_key] / slice['area'] for slice in slices]
    return pd.concat(slices_marker_densities)

def avg_cell_density(slices: list[pd.DataFrame], marker_key: str) -> pd.Series:
    marker_densities = animal_cell_density(slices, marker_key)
    avg_marker_densities = marker_densities.groupby(marker_densities.index, axis=0).mean()
    return avg_marker_densities

def std_cell_density(slices: list[pd.DataFrame], marker_key: str) -> pd.Series:
    marker_densities = animal_cell_density(slices, marker_key)
    std_marker_densities = marker_densities.groupby(marker_densities.index, axis=0).std()
    return std_marker_densities[~std_marker_densities.isnull()] # remove NaN: when there is only one slice, the std can't be computed and instead outputs NaN

# a: confidence threshold
# TODO: determine k parameter (https://en.wikipedia.org/wiki/Normal_distribution#Confidence_intervals)
# NOTE: avg not used atm
def check_density_distribution(animals_slices: list[list[pd.DataFrame]], animal_names: list[str], a=0.001) -> None:
    animals_avg_density = [avg_cell_density(brain_slices, marker_key) for brain_slices in animals_slices]
    animals_std_density = [std_cell_density(brain_slices, marker_key) for brain_slices in animals_slices]
    for i in range(len(animal_names)):
        animal = animal_names[i]
        region_in_confidence_interval = animals_std_density[i] < a
        print(f"Animal {animal}: out of {len(region_in_confidence_interval)} brain regions, {(~region_in_confidence_interval).sum()} are outside the confidence interval (a={a})")
        # print(region_in_confidence_interval.index[~region_in_confidence_interval])

def write_brains(root_output_path: str, animal_names: list[str], animal_brains: list[pd.DataFrame]) -> None:
    assert len(animal_names) == len(animal_brains),\
        f"The number of animals read and analysed ({len(animal_brains)}) differs from the numner of animals in the input group ({len(animal_names)})"
    for i in range(len(animal_names)):
        brain = animal_brains[i]
        name = animal_names[i]
        output_path = os.path.join(root_output_path, animal_names[i])
        os.makedirs(output_path, exist_ok=True)
        output_path = os.path.join(output_path, name+'_summed.csv')
        brain.to_csv(output_path, sep='\t', mode='w')
        print(f'Raw summed cell counts are saved to {output_path}')

def analyze(animal_names: list[str], animal_brains: list[pd.DataFrame], marker_key: str, AllenBrain: AllenBrainHierarchy) -> pd.DataFrame:
    brain = pd.concat({animal: normalize_cell_counts(brain, marker_key) for animal,brain in zip(animal_names, animal_brains)})
    brain = pd.concat({marker_key: brain}, axis=1)
    brain = brain.reorder_levels([1,0], axis=0)
    ordered_indices = itertools.product(AllenBrain.brain_region_dict.keys(), animal_names)
    brain = brain.reindex(ordered_indices, fill_value=np.nan)
    return brain

In [10]:
# TODO: change to group_1_slices / group_2_slices
# NOTE: *_total_cell_counts still discriminate Right from Left hemisphere. sort_hemispheres() to sum them.
group_1_slices = read_group_slices(animals_root, group_1_dirs)
area_μm2_to_mm2(group_1_slices)
print(f'Imported all brain slices from {str(len(group_1_slices))} animals of {group_1_name} group.')
check_density_distribution(group_1_slices, group_1_dirs)

Imported all brain slices from 3 animals of Control group.
Animal Control_17C: out of 1469 brain regions, 1452 are outside the confidence interval (a=0.001)
Animal Control_18C: out of 1266 brain regions, 1170 are outside the confidence interval (a=0.001)
Animal Control_19C: out of 1310 brain regions, 1197 are outside the confidence interval (a=0.001)


In [11]:
group_1_brains = [sum_cell_counts(cell_count_slices) for cell_count_slices in group_1_slices]
# NOTE: brains are being written WITH Left/Right discrimination
write_brains(data_output_path, group_1_dirs, group_1_brains)

Raw summed cell counts are saved to ./data/python_output/Control_17C/Control_17C_summed.csv
Raw summed cell counts are saved to ./data/python_output/Control_18C/Control_18C_summed.csv
Raw summed cell counts are saved to ./data/python_output/Control_19C/Control_19C_summed.csv


In [12]:
group_2_slices = read_group_slices(animals_root, group_2_dirs)
area_μm2_to_mm2(group_2_slices)
print(f'\nImported all brain slices from {str(len(group_2_slices))} animals of {group_2_name} group.')
check_density_distribution(group_2_slices, group_2_dirs)


Imported all brain slices from 9 animals of Stress group.
Animal Stress_5S: out of 1188 brain regions, 1115 are outside the confidence interval (a=0.001)
Animal Stress_8S: out of 1575 brain regions, 1503 are outside the confidence interval (a=0.001)
Animal Stress_10S: out of 1336 brain regions, 1321 are outside the confidence interval (a=0.001)
Animal Stress_13S: out of 1187 brain regions, 1048 are outside the confidence interval (a=0.001)
Animal Resilient_1R: out of 1479 brain regions, 1435 are outside the confidence interval (a=0.001)
Animal Resilient_2R: out of 1522 brain regions, 1460 are outside the confidence interval (a=0.001)
Animal Resilient_3R: out of 1372 brain regions, 1265 are outside the confidence interval (a=0.001)
Animal Resilient_4R: out of 1451 brain regions, 1416 are outside the confidence interval (a=0.001)
Animal Resilient_11R: out of 1407 brain regions, 1360 are outside the confidence interval (a=0.001)


In [13]:
group_2_brains = [sum_cell_counts(cell_count_slices) for cell_count_slices in group_2_slices]
write_brains(data_output_path, group_2_dirs, group_2_brains)

Raw summed cell counts are saved to ./data/python_output/Stress_5S/Stress_5S_summed.csv
Raw summed cell counts are saved to ./data/python_output/Stress_8S/Stress_8S_summed.csv
Raw summed cell counts are saved to ./data/python_output/Stress_10S/Stress_10S_summed.csv
Raw summed cell counts are saved to ./data/python_output/Stress_13S/Stress_13S_summed.csv
Raw summed cell counts are saved to ./data/python_output/Resilient_1R/Resilient_1R_summed.csv
Raw summed cell counts are saved to ./data/python_output/Resilient_2R/Resilient_2R_summed.csv
Raw summed cell counts are saved to ./data/python_output/Resilient_3R/Resilient_3R_summed.csv
Raw summed cell counts are saved to ./data/python_output/Resilient_4R/Resilient_4R_summed.csv
Raw summed cell counts are saved to ./data/python_output/Resilient_11R/Resilient_11R_summed.csv


In [14]:
# fgh = group_1_brains[0].loc[['Left: root', 'Right: root']].sum()
# fgh.CFos / fgh.area

In [15]:
group_1_results = analyze(group_1_dirs, group_1_brains, marker_key, AllenBrain)
group_2_results = analyze(group_2_dirs, group_2_brains, marker_key, AllenBrain)

In [16]:

# Save results
save_results(group_1_results, data_output_path, f'results_cell_counts_{group_1_name}.csv')
save_results(group_2_results, data_output_path, f'results_cell_counts_{group_2_name}.csv')


! A results_python folder already existed in root. I am overwriting previous results!

Results are saved in ./data/python_output/

Done!

! A results_python folder already existed in root. I am overwriting previous results!

Results are saved in ./data/python_output/

Done!


True

The data are stored in ```group_1_results``` and ```group_2_results```:

# Partial Least Squares  

The analysis done below is taken from the tutorial written by [Krishnan et al.](https://www.sciencedirect.com/science/article/pii/S1053811910010074).  
Run the 2 cells below to get started.

In [17]:
# PLS
animal_list = group_2_dirs + group_1_dirs
normalization = 'Density'   #Normalize on Density rather then Percentage
rank = 1

# Create a PLS object
# TODO: see what happens if analyze() does not include the NaN rows
cfosPLS = PLS(group_1_results, group_2_results, group_1_dirs, group_2_dirs, selected_regions, 'CFos', normalization)

# Show the matrix X
cfosPLS.X

Unnamed: 0,CLA,LA,PA,MEV,SCO,CUN,VTN,PPN,MO,SS,...,CS,LDT,PRNr,RPO,CN,NTB,RM,CENT3,"CUL4, 5",ANcr1
Control_17C,395.448533,294.466332,351.548048,255.482971,133.169247,172.258244,61.48988,81.499616,372.02762,744.84778,...,139.833447,293.427463,98.920689,245.715652,249.98934,640.907383,445.429518,366.545751,260.415203,567.933443
Control_18C,31.023108,45.436144,27.700546,63.640862,19.948905,28.555457,11.221561,25.773282,26.634032,58.523083,...,30.798738,54.079084,37.114195,9.380548,8.577175,16.621015,9.604191,21.789653,14.014456,0.0
Control_19C,29.899975,68.725095,23.632022,117.930471,147.416091,143.723712,0.0,185.351458,169.720883,216.331126,...,327.190503,295.792306,184.169286,420.132227,209.249782,213.91457,258.007337,314.985859,284.749293,127.955908
Stress_5S,242.563113,130.919413,5.621496,129.395713,0.0,58.639078,35.728112,65.32752,101.040393,133.046604,...,66.235131,133.807827,71.566753,63.433874,62.669816,378.176764,169.865483,34.961775,45.810527,30.795556
Stress_8S,528.300528,259.073975,315.136646,336.553685,520.337851,241.555347,478.434936,326.34997,622.334564,846.17248,...,292.400041,347.440934,289.988104,262.361645,307.888525,410.88537,356.448668,298.997657,250.916688,345.982519
Stress_10S,274.422365,226.602931,206.914564,580.199194,35.128019,295.095712,90.644471,282.302569,468.575452,700.13575,...,203.751887,393.280342,98.799978,282.749907,220.57283,0.0,144.616642,497.654559,359.42926,234.168866
Stress_13S,279.261334,51.868285,1.253955,221.364705,944.187919,75.093095,193.491458,97.50249,82.884552,194.448994,...,99.265626,334.928609,110.252029,113.189566,169.469382,272.797451,117.496517,497.237838,271.582713,157.175788
Resilient_1R,185.900003,116.631952,101.355517,338.634331,469.932585,116.687337,110.210072,89.373651,180.287852,334.144393,...,84.201465,165.60612,63.989727,103.276598,175.381945,70.321581,76.916474,150.91749,165.701284,229.130717
Resilient_2R,118.590286,167.916156,186.991045,330.397935,223.933071,152.574947,34.311107,123.693781,48.427927,111.047727,...,145.916286,132.695773,88.760858,71.720499,46.132345,20.702726,58.824829,44.829133,68.434812,104.724735
Resilient_3R,47.581731,67.976036,51.369273,0.0,161.762158,60.299886,31.455753,110.559238,28.525546,79.243296,...,88.576995,183.171618,94.186895,406.166286,9.57736,41.541645,116.281059,0.0,26.113479,24.300069


In [18]:
# Show the matrix Y
pd.get_dummies(cfosPLS.y).rename(columns={0: group_2_name, 1: group_1_name})

Unnamed: 0,Stress,Control
Control_17C,0,1
Control_18C,0,1
Control_19C,0,1
Stress_5S,1,0
Stress_8S,1,0
Stress_10S,1,0
Stress_13S,1,0
Resilient_1R,1,0
Resilient_2R,1,0
Resilient_3R,1,0


The two matrices printed above (X and Y) illustrate the data on which the PLS is done.  
- ```X:``` The rows in this matrix are the mice. The columns in the matrix are the regions selected for analysis. The values in the matrix are the **normalized value of marked cells: in that region relative to the whole brain.** 
The normalization methods are either:
  + Density
  + Percentage (on the total number of detected marked cells outside of excluded regions)
  + RelativeDensity
- ```Y:``` The rows in this matrix are the mice. The columns in the matrix are the 2 groups. **A value in this matrix is 1 if the mice belongs to the specified group**.

In brief, PLS analyzes the relationship (correlation) between the columns of ```X``` and ```Y```. In our specific case, there will be 2 important outputs:
- **Salience scores**: Each brain region has a salience score. A high salience scores means that the brain region explains much of the correlation between ```X``` and ```Y```.  
- **Singular values**: These are the eigenvalues of the correlation matrix $R = Y^TX$.

## Random permutations to see whether we can differentiate signal from noise. 
Here, we randomly shuffle the group to which a mouse belongs, and calculate the singular values of the permuted dataset.  
From [Krishnan et al.](https://www.sciencedirect.com/science/article/pii/S1053811910010074):  
> The set of all the (permuted) singular values provides a sampling distribution of the singular values under the null hypothesis and, therefore can be used as a null hypothesis test.

*Note: running the cell below will take a few minutes.*

In [19]:
num_permutations = 5000
print(f'Randomly permuting singular values %d times ...'%num_permutations)
s,singular_values = cfosPLS.randomly_permute_singular_values(num_permutations)
print('Done!\n')

Randomly permuting singular values 5000 times ...
Done!



In [20]:
# TODO: move to Plotly

# Plot distribution of singular values
# plt.figure(figsize=(10,4))
# plt.hist(singular_values[:,0],bins=10)
# plt.axvline(cfosPLS.s[0], color='r')
# plt.xlabel('First singular value')
# plt.ylabel('Frequency')
# plt.legend([f'Experiment','Sampling distribution\nunder H0 (%d permutations)'%num_permutations])
# plt.show()

In [21]:
# Calculate p-value = Probability(experiment | H0)
p = (singular_values[:,0] > s[0]).sum() / num_permutations
print('p-value = '+str(p))

p-value = 0.5226


## Bootstrap to identify stable salience scores

Here, we use [bootstrapping](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)) (= sampling of the mice in the dataset, with replacement) to get an estimate of which salience scores are stable.

From [Krishnan et al.](https://www.sciencedirect.com/science/article/pii/S1053811910010074):  
> When a vector of saliences is considered generalizable and is kept for further analysis, we need to identify its elements that are stable through resampling. In practice, the stability of an element is evaluated by dividing it by its standard error. [...] To estimate the standard errors, we create bootstrap samples which are obtained by sampling with replacement the observations in and (Efron and Tibshirani, 1986). A salience standard error is then estimated as the standard error of the saliences from a large number of these bootstrap samples (say 1000 or 10000). **The ratios are akin to a Z-score, therefore when they are larger than 2 the corresponding saliences are considered significantly stable.**

*Note: Running the cell below will take a few minutes.*

In [22]:
num_bootstrap = 5000
print(f'Bootstrapping salience scores {num_bootstrap} times...')
u_salience_scores,v_salience_scores = cfosPLS.bootstrap_salience_scores(rank,num_bootstrap)
print('Done!')

Bootstrapping salience scores 5000 times...
Done!


In [23]:
data_output_path

'./data/python_output/'

In [24]:
# Plot PLS salience scores
plot_threshold = 1.2 # Only brain regions with a salience higher than plot_threshold are shown. 2 is the significance threshold.

file_title = 'PLS_CFos' + '_' + normalization + '.png'

tp, salient_regions = cfosPLS.plot_salience_scores(plot_threshold, plots_output_path, file_title, brain_region_dict,
                              fig_width=1000, fig_height=2000)

##### salient_regions.reset_index()['index']

In [25]:
df = salient_regions.reset_index()
df.columns = ['region', 'salience']
df['salience'] = df['salience'].abs()
df = df.sort_values(by='salience')
df.to_csv('./data/R_results/salient_regions.csv', sep=';', index=False)
df

Unnamed: 0,region,salience
7,HIP,1.20218
38,SCsg,1.226886
48,TRN,1.259545
5,RSP,1.287647
9,BMAa,1.294284
13,AAA,1.308575
6,PTLp,1.355499
8,BLAa,1.366841
14,IA,1.401529
16,GPi,1.409025


In [26]:
pls_filename = 'PLS_CFos_' + normalization + '_salience_scores.csv'
save_results(v_salience_scores.rename(columns={0:'salience score'}), data_output_path, pls_filename)


! A results_python folder already existed in root. I am overwriting previous results!

Results are saved in ./data/python_output/

Done!


True

# Plot percentages

In [27]:
# In this case we wanted to normalize it based on the density, rather then the Percentage 
# I didn't modify the various labels in the plot as I was just focused on adapting the code to our dataset, rather then polishing it

tracer_to_plot = 'CFos'
normalization = 'Density' # 'Density','Percentage','RelativeDensity'
threshold = 1e-2 # Only plot bars with value larger than threshold (1e-6, 1e-2, 3)
y_axis_label = 'region_names' # change this to 'acronym' to have acronyms on the y-axis

# Calculate mean values
group_1_df = pd.DataFrame(group_1_results[(tracer_to_plot,normalization)].rename('cell counts'))
group_1_avg = group_1_df.reset_index().groupby('level_0').mean(numeric_only=True)
group_1_sem = group_1_df.reset_index().groupby('level_0').sem(numeric_only=True)

group_2_df = pd.DataFrame(group_2_results[(tracer_to_plot,normalization)].rename('cell counts'))
group_2_avg = group_2_df.reset_index().groupby('level_0').mean(numeric_only=True)
group_2_sem = group_2_df.reset_index().groupby('level_0').sem(numeric_only=True)

# Determine which regions to plot  
mean_sum = group_1_avg + group_2_avg
#regs_to_plot = mean_sum[(mean_sum['cell counts']>threshold) & (mean_sum['cell counts'].notnull())].sort_values(by='cell counts').index.to_list()
regs_to_plot = cfosPLS.X.columns.to_list()

# y-axis, with seperate values for each region
y_axis_il, ticklabels = pd.factorize(group_1_df.loc[regs_to_plot].reset_index()['level_0'])
y_axis_bla, ticklabels = pd.factorize(group_2_df.loc[regs_to_plot].reset_index()['level_0'])
if(y_axis_label=='region_names'):
    ticklabels = [AllenBrain.brain_region_dict[reg] for reg in ticklabels]
     
fig = go.Figure()

# Barplot
fig.add_trace(go.Bar(
                     x = group_1_avg.loc[regs_to_plot]['cell counts'],
                     name = f'{group_1_name} mean',
                     error_x = dict(
                         type='data',
                         array=group_1_sem.loc[regs_to_plot]['cell counts']
                     )
              )
)

fig.add_trace(go.Bar(
                     x = group_2_avg.loc[regs_to_plot]['cell counts'],
                     name = f'{group_2_name} mean',
                     error_x = dict(
                         type='data',
                         array=group_2_sem.loc[regs_to_plot]['cell counts']
                     )
              )
)

fig.update_layout(barmode='group', colorway=['rgb(0,255,0)', 'rgb(255,0,0)'])

# Scatterplot (animals)
fig.add_trace(go.Scatter(
                    mode = 'markers',
                    y = y_axis_il - 0.2,
                    x = group_1_df.loc[regs_to_plot]['cell counts'],
                    name = f'{group_1_name} animals',
                    opacity=0.5,
                    marker=dict(
                        color='rgb(0,255,0)',
                        size=5,
                        line=dict(
                            color='rgb(0,0,0)',
                            width=1
                        )
                    )
              )
)

fig.add_trace(go.Scatter(
                    mode = 'markers',
                    y = y_axis_bla + 0.2,
                    x = group_2_df.loc[regs_to_plot]['cell counts'],
                    name = f'{group_2_name} animals',
                    opacity=0.5,
                    marker=dict(
                        color='rgb(255,0,0)',
                        size=5,
                        line=dict(
                            color='rgb(0,0,0)',
                            width=1
                        )
                    )
              )
)

# Figure title
title = ''
if normalization=='RelativeDensity':
    title = '[#'+tracer_to_plot+ ' / area] / ['+tracer_to_plot+'(brain) / area(brain)].'
elif normalization=='Density':
    title = '[#'+tracer_to_plot+ ' / area]'
elif normalization=='Percentage':
    title = '[#'+tracer_to_plot+ ' / brain]'

# Update layout
fig.update_layout(
    title = title,
    yaxis = dict(
        tickmode = 'array',
        tickvals = np.arange(0,len(regs_to_plot)),
        ticktext = ticklabels
    ),
    xaxis=dict(
        title = 'CFos density (relative to brain)'
    ),
    width=900, height=5000,
    hovermode="x unified",
    yaxis_range = [-1,len(regs_to_plot)+1]
)

fig.show()

# Save figure as PNG
file_title = 'barplot_' + tracer_to_plot + '_' + normalization + 'CvS.png'
fig.write_image(os.path.join(plots_output_path, file_title))