# R location statisic

This notebook corresponds to the section titled "Compares dMEG and dEEG means to zero." and corresponds to the `mean_stats_analysis` function from the `MEEG_fMRI_whole_compa_script.py` script.

Within this notebook, you can perform the mean comparison analysis using either an R script (`diff_zero.R` with the `mean_diff_zero` function from `utils.stats_utils`).

## Necessary Libraries

In [None]:
from pathlib import Path
import sys

# Personal Imports
# Add the directory that contains the utils package to sys.path
sys.path.append(str(Path('..').resolve()))
from utils.stats_utils import mean_diff_zero

## Necessary path

Before running the notebooks, ensure that you update the paths in `config.py` to match your local setup:

- **`LOCAL_DIR`**: Set this to the directory where your BIDS-formatted data is stored.
- **`R_WORKING_DIRECTORY`**: Specify the directory where your R scripts are saved.
- **`RSCRIPT_EXECUTABLE`**: Provide the path to the `Rscript` executable on your computer. For more details, refer to `config.md`.

In [None]:
from config import LOCAL_DIR, R_WORKING_DIRECTORY, RSCRIPT_EXECUTABLE

local_dir = LOCAL_DIR
working_directory = R_WORKING_DIRECTORY

# Define the Rscript executable path if necessary (otherwise, use just 'Rscript')
rscript_executable = RSCRIPT_EXECUTABLE  # or provide the full path if needed

## Run the script

### Explanation of the R Process

1. **Loading Data:**
   - The function begins by loading POA (Point of Analysis) or wCOG (weighted Center of Gravity) data from an Excel file that contains pre-computed Euclidean distances.

2. **Data Extraction:**
   - EEG and MEG distances are separately loaded using the `prepare_modality_df` and `clean_column_names` functions from `statistical_zeros_analysis_utils.R`. This step is crucial for determining whether the mean of dEEG (or dMEG) significantly deviates from zero.

3. **Location Analysis:**
   - Next, using the `compute_location_analysis` function from `statistical_zeros_analysis_utils.R`, the analysis is performed for each combination of time point (tp) and condition. The function applies a Multivariate Non-Parametric Test (either a sign test or signed rank test) with the `sr.loc.test` function from the `SpatialNP` package. This multivariate test assesses the location of one or more samples based on spatial signs or ranks. For a single sample, it tests the null hypothesis about a specific location. For multiple samples, it tests the null hypothesis that all samples have the same location.

4. **Results and Plotting:**
   - The results from the dispersion tests are compiled into a DataFrame and saved as an Excel file in the specified `local_dir` directory. The file is named `all_subjects_analysis-POA/COG_modality_comparison_analysis-location.csv`.

5. **Parameters:**
   - The function allows for the specification of the following parameters:
     - `nb_permu`: The number of permutations used in the dispersion test, which affects the robustness and accuracy of the p-value estimation.
     - `null_value`: The value against which the dEEG (or dMEG) mean is compared during the analysis.
     - `score`: Specifies the type of statistical test performed, which can be either 'sign' or 'rank'.

### **References:**
   - For more information on the `sr.loc.test` function, refer to the [SpatialNP Documentation](https://search.r-project.org/CRAN/refmans/SpatialNP/html/locationtests.html).
   - For theoretical background on distance-based tests for multivariate dispersion, consult the following article:
       - Oja, H., & Randles, R. H. (2004). Multivariate Nonparametric Tests. *Statistical Science, 19*(4), 598–605. [Link to article](http://www.jstor.org/stable/4144430).


### Details about the Results DataFrame:

The columns in the results DataFrame are:

- **modality**: The modality under study.
- **condition**: The specific condition for which the statistics were computed.
- **tp**: The time point at which the statistics were computed.
- **method_name**: Indicates the method used, either 'sign' or 'rank'.
- **q_2**:
  - **For Sign Tests**: The statistic \( Q^2 \), defined as `Q^2 = n * p * ||S̄||^2`. This measures the deviation of the average spatial sign vector `S̄` from the origin, reflecting the concentration of data directions around a central point.
  - **For Rank Tests**: The statistic \( U^2 \). This measures how much the average rank vectors deviate from zero after applying transformation and normalization. It assesses the distribution of ranks in the multivariate setting, accounting for affine invariance.
- for rank tests, the q_2 column correspond to the statistic U2, which  Measures the extent to which the average rank vectors deviate from zero after transformation and normalization.
- **sample_size**: The number of data points in the sample.
- **number_of_groups**: The number of groups in the analysis (in this case, always one).
- **comparison_value**: The value to which the mean is compared.
- **p_value**: The p-value obtained from the test.
- **number_of_permutations**: The number of permutations performed to compute the statistics.




In [None]:
nb_permu = 999
null_value = None
score = 'sign'

try:
    print('Beginning POA analysis.')
    mean_diff_zero(working_directory, local_dir, rscript_executable, analysis_type='POA', nb_permu=nb_permu, null_value=null_value, score=score)
except Exception as e:
    print(f"An error occurred during the POA analysis: {e}")
    raise

try:
    print('Beginning COG analysis.')
    mean_diff_zero(working_directory, local_dir, rscript_executable, analysis_type='COG', nb_permu=nb_permu, null_value=null_value, score=score)
except Exception as e:
    print(f"An error occurred during the COG analysis: {e}")
    raise