# 📉 4.5 Regression Modelling

This notebook introduces regression modelling to predict nutrition outcomes.

**Objectives**:
- Build linear regression models.
- Evaluate model performance.
- Apply regression to `vitamin_trial.csv`.

**Context**: Regression models predict outcomes, like vitamin D levels based on trial data.

<details><summary>Fun Fact</summary>
Regression is like a hippo predicting its next meal’s size—patterns guide the guess! 🦛
</details>

In [None]:
# Setup for Google Colab: Fetch datasets automatically or manually
%run ../../bootstrap.py    # installs requirements + editable package

import fns_toolkit as fns

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression  # For regression
from sklearn.metrics import r2_score  # For model evaluation

print('Environment ready.')

## Data Preparation

Load `vitamin_trial.csv` and prepare features for regression.

In [None]:
# Load the dataset
df = fns.get_dataset('vitamin_trial')  # Path relative to notebook

# Prepare features and target
X = df[['Time']]  # Feature: Time
y = df['Vitamin_D']  # Target: Vitamin D levels
print(f'Features shape: {X.shape}, Target shape: {y.shape}')  # Display shapes

Features shape: (200, 1), Target shape: (200,)


## Linear Regression

Build and evaluate a linear regression model.

In [3]:
# Initialize and fit model
model = LinearRegression()  # Create regression model
model.fit(X, y)  # Fit model to data

# Predict and evaluate
y_pred = model.predict(X)  # Predict Vitamin D levels
r2 = r2_score(y, y_pred)  # Calculate R² score
print(f'R² score: {round(r2, 2)}')  # Display R²

R² score: 0.72


## Exercise 1: Build a Model

Build a regression model using `Time` and a dummy variable for `Group` (Control=0, Treatment=1). Report the R² score. Document your code.

**Guidance**: Use `pd.get_dummies()` to encode `Group`.

**Answer**:

My regression code is...

## Conclusion

You’ve learned to build and evaluate regression models for nutrition data.

**Next Steps**: Explore Bayesian methods in 5.1.

**Resources**:
- [Scikit-Learn Regression](https://scikit-learn.org/stable/modules/linear_model.html)
- [Regression Guide](https://www.datacamp.com/community/tutorials/linear-regression-python)
- Repository: [github.com/ggkuhnle/data-analysis-toolkit-FNS](https://github.com/ggkuhnle/data-analysis-toolkit-FNS)