# Statistical analysis for the experimental data

🎯 **Goal**: Confirm that the effect produced by the addition of PDGF-BB is statistically significant

---

## Context
To make a more informed decision on whether the presence and spatial location of PDGF-BB is leading to different migration patterns, we will perform some statistical tests on the experimental data.

Since we are comparing 3 groups (control, MC and OC), we will use a one-way ANOVA test to find if there are any statistical differences between them. Subsequently, if we detect that there are differences, we will run some post-hoc Tukey tests to identify which pairs of groups are different from each other. When we only have 2 groups (control and OC), we use a single one-way ANOVA test.

We perform these tests for each day of experiments and present the corresponding p-value.

In [7]:
from pathlib import Path

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
from statsmodels.stats.multicomp import pairwise_tukeyhsd

DATA_PATH = Path('processed-data/')

In [8]:
def return_asterisks(pvalue):
    """Converts a given p-value into the corresponding number of asterisks."""
    
    if pvalue <= 0.001:
        return "***"
    elif pvalue <= 0.01:
        return "**"
    elif pvalue <= 0.05:
        return "*"
    
    return "-"

## 2.5 mg/mL matrix - 96 hours

### One-way ANOVA between the three conditions

In [9]:
# Read the processed data into a DF
single_chamber_df = pd.read_csv(DATA_PATH / '1chamber-96hours.csv')

# Perform a one-way ANOVA between all conditions
for day in single_chamber_df['day'].unique():
    day_data = single_chamber_df[single_chamber_df['day'] == day]

    fvalue, pvalue = stats.f_oneway(day_data[day_data['condition'] == 'Control']['fixed_distance'], 
                                    day_data[day_data['condition'] == 'Monolayer chamber']['fixed_distance'],
                                    day_data[day_data['condition'] == 'Opposite chamber']['fixed_distance'])

    print(f'p-value for day {day}: {pvalue} ({return_asterisks(pvalue)})')

p-value for day 1: 2.243438544672415e-23 (***)
p-value for day 2: 1.9979550780179422e-32 (***)
p-value for day 3: 2.2080571988330536e-45 (***)
p-value for day 4: 2.3840512134165863e-37 (***)


### Post-hoc tests

In [10]:
# Set the rejection value to identify the groups with differences
rejection_pvalue = 0.001

for day in single_chamber_df['day'].unique():
    day_data = single_chamber_df[single_chamber_df['day'] == day]
    tukey = pairwise_tukeyhsd(day_data['fixed_distance'], day_data['condition'], alpha=rejection_pvalue)
    
    print(f'\nResults for day {day}')
    print(tukey)


Results for day 1
           Multiple Comparison of Means - Tukey HSD, FWER=0.00           
      group1            group2      meandiff p-adj  lower   upper  reject
-------------------------------------------------------------------------
          Control Monolayer chamber  34.0436 0.001 10.3955 57.6917   True
          Control  Opposite chamber  66.6714 0.001 43.7371 89.6056   True
Monolayer chamber  Opposite chamber  32.6278 0.001 12.5389 52.7166   True
-------------------------------------------------------------------------

Results for day 2
           Multiple Comparison of Means - Tukey HSD, FWER=0.00            
      group1            group2      meandiff p-adj  lower   upper   reject
--------------------------------------------------------------------------
          Control Monolayer chamber  58.0939 0.001 27.3304  88.8575   True
          Control  Opposite chamber 103.4103 0.001 73.7044 133.1161   True
Monolayer chamber  Opposite chamber  45.3164 0.001 18.9752  71.6575  

## 2.5 mg/mL matrix - 216 hours

In [5]:
# Read the processed data into a DF
three_chambers_df = pd.read_csv(DATA_PATH / '3ch-216hours.csv')

# Perform a one-way ANOVA between all conditions
for day in three_chambers_df['day'].unique():
    day_data = three_chambers_df[three_chambers_df['day'] == day]

    fvalue, pvalue = stats.f_oneway(day_data[day_data['condition'] == 'Control']['fixed_distance'], 
                                    day_data[day_data['condition'] == 'Opposite chamber']['fixed_distance'])

    print(f'p-value for day {day}: {pvalue} ({return_asterisks(pvalue)})')

p-value for day 1: 6.94244762434078e-10 (***)
p-value for day 2: 1.5871213114475845e-22 (***)
p-value for day 3: 4.391514061384971e-25 (***)
p-value for day 4: 1.0421255815115056e-20 (***)
p-value for day 7: 3.6514562993049014e-41 (***)
p-value for day 8: 9.182031264343604e-39 (***)
p-value for day 9: 5.9527983136208366e-40 (***)


## 4 mg/mL matrix - 96 hours

In [6]:
# Read the processed data into a DF
high_density_df = pd.read_csv(DATA_PATH / '4mgml-96hours.csv')

# Perform a one-way ANOVA between all conditions
for day in high_density_df['day'].unique():
    day_data = high_density_df[high_density_df['day'] == day]

    fvalue, pvalue = stats.f_oneway(day_data[day_data['condition'] == 'Control']['fixed_distance'], 
                                    day_data[day_data['condition'] == 'Opposite channel']['fixed_distance'])

    print(f'p-value for day {day}: {pvalue} ({return_asterisks(pvalue)})')

p-value for day 1: 1.9279825716816e-15 (***)
p-value for day 2: 1.956489563333031e-13 (***)
p-value for day 3: 1.0423642322416357e-15 (***)
p-value for day 4: 8.882902647100095e-23 (***)
