# Compute mean and std

This notebook computes the mean and standard deviation for EEG and MEG distance to fMRI, using data from the DataFrame generated by the `generation_comparison_modalities_df` function in the `MEEG_fMRI_whole_compa_script.py` script or from the `Generation_df_comparison` notebook for POA and COG.

It mirrors the functionality of the `mean_std_computation_step` function in `MEEG_fMRI_whole_compa_script.py`.

The analysis covers various combination, including:
- **All Conditions Combined**: Aggregated data across all conditions.
- **Conditions Separate**: Analysis of each condition individually.
- **TP Separate**: Analysis by separating time points.
- **Both Separate**: Analysis by separating per timepoins and conditions.


## Import the necessary libraries

In [1]:
import pandas as pd
from pathlib import Path
import numpy as np
import sys

# Personal Imports
# Add the directory that contains the utils package to sys.path
sys.path.append(str(Path('..').resolve()))
from utils.utils import flatten_columns, load_df_from_csv
from utils.df_utils import reshape_dataframe

## Define the Paths

Before running the notebooks, ensure that you update the paths in `config.py` to match your local setup:

- **`LOCAL_DIR`**: Set this to the directory where your BIDS-formatted data is stored.

In [2]:
from config import LOCAL_DIR

local_dir = LOCAL_DIR

## Compute Mean and Standard Deviation

The following cell executes the `compute_mean_std` function from `stats_utils`. This function calculates the mean and standard deviation of distance metrics for EEG and MEG modalities, focusing on the Peak of Activation (POA) and Center of Gravity (COG).

The function processes data from an Excel file that includes modality comparisons, computing statistics for various combinations of conditions and time points. Results are saved in a new Excel file for each modality and analysis type.

The process includes:

1. **Define Paths**: Establish paths for loading input data and saving output results.
2. **Load and Prepare Data**: Load data from the specified Excel file, flatten column names, and reshape the DataFrame for analysis.
3. **Initialize Results DataFrame**: Create an empty DataFrame to store computed mean and standard deviation values.
4. **Compute Statistics**:
    - Filter data by modality ('meg' or 'eeg').
    - Compute mean and standard deviation for all time points and conditions.
    - Calculate mean and standard deviation separately for each time point and condition.
5. **Save Results**: Export the computed statistics to Excel files, with separate files for each modality and analysis type.


In [3]:
# Iterate over each analysis type
for analysis_type in ['COG', 'POA']:

    print(f'\nStarting {analysis_type} mean and std computation.')
    
   # Create an empty DataFrame with the specified columns
    mean_std_df = pd.DataFrame(columns=[
        'modality', 'conditions', 'tp',
        'dist_x_mean', 'dist_x_std',
        'dist_y_mean', 'dist_y_std',
        'dist_z_mean', 'dist_z_std',
        'dist_mean', 'dist_std'
    ])

    def fill_mean_std_df_(df, modality, tp, condition):
        """
        Adds a new row to the mean and standard deviation DataFrame based on input data.

        This function calculates the mean and standard deviation of distance metrics for a given modality,
        time point (tp), and condition, and appends these statistics to the provided DataFrame.

        Parameters
        ----------
        df : pd.DataFrame
            DataFrame containing distance metrics with columns 'Dist_x', 'Dist_y', 'Dist_z', and 'Dist_norm'.
            
        modality : str
            The modality of the data (e.g., 'eeg', 'meg').
    
        tp : str
            The time point of the data (e.g., 'pre', 'post', or a numeric index).
    
        condition : str
            The experimental condition (e.g., 'rest', 'task').

        Returns
        -------
        None
            The function modifies the `mean_std_df` DataFrame in place by appending a new row.
        """
    
        global mean_std_df
        
        # Calculate statistics
        stats = {
            'modality': modality,
            'conditions': condition,
            'tp': tp,
            'dist_x_mean': np.round(np.nanmean(df['Dist_x']), 2),
            'dist_x_std': np.round(np.nanstd(df['Dist_x']), 2),
            'dist_y_mean': np.round(np.nanmean(df['Dist_y']), 2),
            'dist_y_std': np.round(np.nanstd(df['Dist_y']), 2),
            'dist_z_mean': np.round(np.nanmean(df['Dist_z']), 2),
            'dist_z_std': np.round(np.nanstd(df['Dist_z']), 2),
            'dist_mean': np.round(np.nanmean(df['Dist_norm']), 2),
            'dist_std': np.round(np.nanstd(df['Dist_norm']), 2)
        }
        
        # Create a DataFrame for the new row
        new_row = pd.DataFrame([stats])
    
        # Append the new row to the existing DataFrame
        if mean_std_df.empty:
            mean_std_df = new_row
        else:
            mean_std_df = pd.concat([mean_std_df, new_row], ignore_index=True)
        
    # Define paths for output directories and load the DataFrame
    df_outdir = Path(local_dir) / 'derivatives' / 'results' / analysis_type
    csv_path = df_outdir / f'all_subjects_analysis-{analysis_type}_modality_comparison.csv'
    df_ = load_df_from_csv(csv_path)
    
    # Flatten column names and reshape the DataFrame
    df_.columns = flatten_columns(df_.columns)
    reshaped_df_ = reshape_dataframe(df_)
    
    # Initialize results DataFrame for each modality
    for modality in ['meg', 'eeg']:
        # Filter the data for the current modality
        reshaped_df_modality = reshaped_df_[reshaped_df_['modality'] == modality]
        
        # Compute mean and std for 'all' time points and conditions
        fill_mean_std_df_(reshaped_df_modality, modality, 'all', 'all')
        
        # Compute mean and std for each time point
        for tp in range(3):
            reshaped_df_modality_tp = reshaped_df_modality[reshaped_df_modality['Info_tpindex'] == tp]
            fill_mean_std_df_(reshaped_df_modality_tp, modality, tp, 'all')
        
        # Compute mean and std for each condition
        for condition in np.unique(reshaped_df_modality['Info_condition']):
            reshaped_df_modality_condition = reshaped_df_modality[reshaped_df_modality['Info_condition'] == condition]
            fill_mean_std_df_(reshaped_df_modality_condition, modality, 'all', condition)
            
            # Compute mean and std for each time point within each condition
            for tp in range(3):
                reshaped_df_modality_condition_tp = reshaped_df_modality_condition[reshaped_df_modality_condition['Info_tpindex'] == tp]
                fill_mean_std_df_(reshaped_df_modality_condition_tp, modality, tp, condition)
        
        # Save the results to an Excel file
        mean_std_df_path = df_outdir / f'all_subjects_analysis-{analysis_type}_modality-{modality}_stats-MeanStd_python.csv'
        mean_std_df.to_csv(mean_std_df_path, index=False)

    print(f'\n{analysis_type} mean and std computation completed successfully.')


Starting COG mean and std computation.

COG mean and std computation completed successfully.

Starting POA mean and std computation.

POA mean and std computation completed successfully.
