# Imports and settings

### Microbiome-based Obesity Analysis <br> Statistical Association and Predictive Modeling

In this notebook, we analyze gut microbiome data in order to study the relationship between bacterial taxa and obesity.

The goals of this analysis are twofold:
1. **Statistical association**: identify bacterial taxa whose abundances differ significantly between lean and obese individuals.
2. **Predictive modeling**: evaluate whether gut microbiome composition can be used to predict obesity status.

The dataset was obtained from the *LeChatelierE_2013* study via the curatedMetagenomicData framework.  
Initial data acquisition, filtering, and preprocessing were performed in a separate notebook and are summarized below.


In [1]:
import numpy as np
import pandas as pd

from scipy.stats import mannwhitneyu
from statsmodels.stats.multitest import multipletests

from sklearn.model_selection import StratifiedKFold, train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
    confusion_matrix, classification_report,
    roc_auc_score, roc_curve
)

import matplotlib.pyplot as plt

# Loading the data