<a href="https://colab.research.google.com/github/etemadism/Courses/blob/main/03_paired_t_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Paired t-test

# Statistical Analysis for Paired Data Using Paired t-Test

This Google Colab tutorial demonstrates how to perform a paired t-test to analyze the effect of treatment on a continuous variable (e.g., enzyme activity). The tutorial walks you through step-by-step implementation using Python libraries.

## Overview

In this tutorial, we will:

Calculate the differences between paired samples (pre- and post-treatment).
Check for statistical significance using a paired t-test.
Interpret the results at a 5% significance level.

### Data availability

You can find datasets and extended examples on Etemadism’s GitHub.




##Step 1: Import Libraries and define groups
We'll use the scipy.stats library for the t-test and numpy for calculations.

In [2]:
import pandas as pd
import numpy as np
from scipy.stats import ttest_rel



##Step 2: Define the Data
Define the data for both groups.

In [6]:
# Load dataset
data = pd.read_csv("/content/Paired_t_test_sample_data.csv")
data

Unnamed: 0,Thiamine monophosphate,Riboflavin,Flavin mononucleotide,Neopterin,Tryptophan,Kynurenine,treatment
0,4.84,5.130,7.16,8.69,67.5,2.52,0
1,5.78,18.500,12.60,12.00,62.1,1.23,1
2,2.98,4.260,8.44,15.90,75.0,1.74,1
3,12.10,3.260,7.05,34.50,32.3,1.68,1
4,6.08,17.400,11.50,11.40,68.5,1.86,0
...,...,...,...,...,...,...,...
395,4.34,0.489,3.80,16.00,55.2,3.17,0
396,7.64,3.310,4.32,16.10,53.8,2.11,1
397,6.94,7.040,8.66,16.90,33.4,1.78,0
398,4.92,8.780,11.50,7.39,56.8,1.71,1


In [12]:
# Separate data into pre- and post-treatment groups
pre_treatment = data[data['treatment'] == 0]
post_treatment = data[data['treatment'] == 1]

In [14]:
print(f"Pre-treatment size: {len(pre_treatment)}")
print(f"Post-treatment size: {len(post_treatment)}")


Pre-treatment size: 242
Post-treatment size: 158


In [15]:
# Keep only rows where both groups have values
min_size = min(len(pre_treatment), len(post_treatment))
pre_treatment = pre_treatment.iloc[:min_size]
post_treatment = post_treatment.iloc[:min_size]


In [19]:
post_treatment

Unnamed: 0,Thiamine monophosphate,Riboflavin,Flavin mononucleotide,Neopterin,Tryptophan,Kynurenine,treatment
1,5.78,18.50,12.60,12.00,62.1,1.23,1
2,2.98,4.26,8.44,15.90,75.0,1.74,1
3,12.10,3.26,7.05,34.50,32.3,1.68,1
6,3.21,7.85,9.69,15.50,69.0,1.89,1
9,2.14,10.40,12.30,8.65,50.1,1.22,1
...,...,...,...,...,...,...,...
388,6.69,3.26,9.34,12.20,61.1,1.66,1
390,8.07,15.00,13.40,15.40,62.8,2.13,1
391,3.96,2.05,4.57,9.59,56.0,1.16,1
396,7.64,3.31,4.32,16.10,53.8,2.11,1


## Step 2: Perform the paired t-test

In [24]:
# Get the list of metabolites (all columns except 'treatment')
metabolites = [col for col in data.columns if col != 'treatment']

# Initialize a dictionary to store results
results = {}

# Perform paired t-tests
for metabolite in metabolites:
    # Perform t-test
    t_stat, p_value = ttest_rel(post_treatment[metabolite], pre_treatment[metabolite])

    # Store results
    results[metabolite] = {'t_statistic': t_stat, 'p_value': p_value}
# Convert results to a pandas DataFrame
results_df = pd.DataFrame.from_dict(results, orient='index').reset_index()
results_df.columns = ['Metabolite', 'T-Statistic', 'P-Value']

# Display the table
results_df

Unnamed: 0,Metabolite,T-Statistic,P-Value
0,Thiamine monophosphate,2.782008,0.006065
1,Riboflavin,0.181046,0.856565
2,Flavin mononucleotide,-0.240881,0.809961
3,Neopterin,0.556965,0.578344
4,Tryptophan,-4.141465,5.6e-05
5,Kynurenine,-1.804188,0.073119
