# Analysis of Data Using PLS and Bootstrap: Group WILD without father exposed to scent marks

## Overview
This Python script is designed to analyze brain region activity using Partial Least Squares (PLS) analysis, comparing experimental groups based on data related to cell counts, energy levels, and density in different brain regions. The script takes volumetric data, processes it for PLS analysis, and generates statistical output that identifies significant regions across experimental conditions. The script also applies bootstrap testing to assess the robustness of the PLS results and plots the outcomes.

## Workflow Summary
1. **Load Volume Data**: The script loads and cleans a volume database of brain regions.
2. **Set Directories and Load Data**: The root directories and experimental details are defined, followed by loading the results from a precomputed dictionary containing region-specific measurements.
3. **Data Formatting for PLS**: The data is formatted for PLS analysis, focusing on the following:
   - Cell counts (`n_cells`)
   - Energy (`energy`)
   - Density (`density`)
   - Relative density (`relative_density`)

   These datasets are processed, cleaned, and saved as CSV files for later use.
   
4. **PLS Analysis**: The script runs PLS analysis using external Python scripts (`area_pls.py`) on each dataset (cell counts, energy, density, relative density) and saves the salience and contrast results.
   
5. **Plot Results**: It visualizes the contrasts and saliences for each dataset across brain regions, using custom plotting functions.

6. **Identify Significant Areas**: The script identifies brain regions that show significant differences in the PLS analysis across the experimental groups. It highlights overlapping significant regions across different metrics (e.g., cell counts, energy).

7. **Compare Experimental Groups**: In specific sections of the script, the analysis is performed to compare only two experimental groups, excluding the control group, and identifies brain regions with significant differences between familiar and unfamiliar stimuli.

## Key Functions and Steps

- **`ace.clean_volumes_database()`**: Cleans and loads the brain volume database.
- **`upls.reformat_dict_acronym()`**: Reformats the dictionary of results, standardizing brain region acronyms.
- **`upls.format_data_pls()`**: Formats the data for PLS analysis based on the specified batch and data table (cell counts, energy, etc.).
- **PLS Execution**: Calls to external `area_pls.py` scripts run PLS analysis on the prepared datasets.
- **`upls.plot_panel_contrasts()` and `upls.plot_panel_saliences()`**: Plot the contrast and salience data, respectively, for each dataset and experimental condition.
- **`upls.identify_pls_sig_areas()`**: Identifies brain regions that are statistically significant in the PLS analysis based on a threshold value (2.56).

## Outputs
The script produces several outputs:
- CSV files with the formatted data for PLS analysis.
- CSV files containing PLS results for saliences and contrasts.
- Plots for each metric (cell counts, energy, density, relative density) displaying contrasts and saliences across brain regions.
- Lists of significant brain regions, highlighting overlaps between different experimental conditions and metrics.

## Requirements
The script requires several Python libraries, including:
- `numpy`, `pandas`, `matplotlib`, `seaborn` for data manipulation and visualization.
- Custom modules: `analyze_cells_energy`, `utils_PLS`, and `utils` for specific data processing and analysis tasks.


In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import analyze_cells_energy as ace
import re
import utils
import itertools
import seaborn as sns
import utils_PLS as upls

In [None]:
# load query file where we added volumes for each area
volumes = ace.clean_volumes_database()

In [None]:
root_directory = '/home/stella/Documents/Torino/projects/'
experiment = 'SexualImprinting'
experimental_group = 'WILD_ScentMarks_Exposure_wof'
batch='WILD_scent_marks_wof'
data_directory = root_directory + experiment + '/' \
                + experimental_group + '/'
dict_results_across_mice = np.load('dict_results/newvolumes/dict_results_across_mice_WILD_scent_marks_wof.npy', allow_pickle=True).item()

In [None]:
dict_results_across_mice = upls.reformat_dict_acronym(dict_results=dict_results_across_mice, volumes=volumes)

# Format data for task PLS

In [None]:
data_ncells = upls.format_data_pls(dict_results=dict_results_across_mice, 
                                   batch=batch, table='n_cells')
data_energy = upls.format_data_pls(dict_results=dict_results_across_mice, 
                                   batch=batch, table='energy')
data_density = upls.format_data_pls(dict_results=dict_results_across_mice, 
                                    batch=batch, table='density')
data_relative_density = upls.format_data_pls(dict_results=dict_results_across_mice, 
                                             batch=batch, table='relative_density')
data_ncells.dropna(inplace=True, axis=1)
data_energy.dropna(inplace=True, axis=1)
data_density.dropna(inplace=True, axis=1)
data_relative_density.dropna(inplace=True, axis=1)
data_ncells.to_csv('./results_pls/'+batch+'_n_cells.csv')
data_energy.to_csv('./results_pls/'+batch+'_energy.csv')
data_density.to_csv('./results_pls/'+batch+'_density.csv')
data_relative_density.to_csv('./results_pls/'+batch+'_relative_density.csv')

# Format data for hierarchy plotting

In [None]:
df_levels = upls.create_df_levels(volumes)

# Apply task PLS

In [None]:
%%bash
python area_pls.py -i results_pls/WILD_scent_marks_wof_n_cells.csv -o './results_pls/WILD_scent_marks_wof_ncells'

In [None]:
%%bash
python area_pls.py -i results_pls/WILD_scent_marks_wof_energy.csv -o './results_pls/WILD_scent_marks_wof_energy'

In [None]:
%%bash
python area_pls.py -i results_pls/WILD_scent_marks_wof_density.csv -o './results_pls/WILD_scent_marks_wof_density'

In [None]:
%%bash
python area_pls.py -i results_pls/WILD_scent_marks_wof_relative_density.csv -o './results_pls/WILD_scent_marks_wof_relative_density'

# Plot results

In [None]:
upls.plot_panel_contrasts(batch=batch, variable='ncells')
upls.plot_panel_saliences(batch=batch, variable='ncells', df_levels=df_levels)

# PLS ENERGY

In [None]:
upls.plot_panel_contrasts(batch=batch, variable='energy')
upls.plot_panel_saliences(batch=batch, variable='energy', df_levels=df_levels)

# PLS density

In [None]:
upls.plot_panel_contrasts(batch=batch, variable='density')
upls.plot_panel_saliences(batch=batch, variable='density', df_levels=df_levels)

# PLS relative density

In [None]:
upls.plot_panel_contrasts(batch=batch, variable='relative_density')
upls.plot_panel_saliences(batch=batch, variable='relative_density', df_levels=df_levels)

# Identify area overlap

In [None]:
overlap = {'ncells':[], 'energy':[], 'density':[], 'relative_density':[]}
for variable in overlap.keys():
    overlap[variable] = set(upls.identify_pls_sig_areas(saliences=pd.read_csv(
        './results_pls/'+ batch +'_'+ variable +'_saliences.csv'), 
                                           threshold=2.56, 
                                           volumes=volumes))
[len(overlap[key]) for key in overlap.keys()]

In [None]:
variable='relative_density'
control_vs_fam_vs_unfam = upls.identify_pls_sig_areas(saliences=pd.read_csv(
        './results_pls/'+ batch +'_'+ variable +'_saliences.csv'), 
                                           threshold=2.56, 
                                           volumes=volumes)