## 📉 Linear vs. Non-linear Modeling

In greenhouse and crop data analysis, understanding the form of relationships between variables is essential.

---

### 🔷 Linear Relationship

A **linear model** assumes a straight-line relationship between variables:

$$
y = a \cdot x + b
$$

An example could be CO₂ increasing proportionally with temperature.

---

### 🔶 Non-linear Relationship

In real-world systems, responses are often **non-linear**:
- They may **saturate** (e.g., photosynthesis vs. light)
- Or **accelerate/decay** depending on the condition

Common non-linear models include:

**Exponential model**:
$$
y = a \cdot e^{b \cdot x}
$$

**Michaelis-Menten (saturation) model**:
$$
y = \frac{a \cdot x}{b + x}
$$

---

### 🔍 Why This Matters

- Linear models are simpler to fit and interpret but may not capture biological complexity.
- Non-linear models often better represent plant-environment dynamics like CO₂ assimilation or radiation response.

We will now simulate both types and compare them visually.


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Simulate x values
x = np.linspace(0, 10, 100)

# Linear data with noise
y_linear = 2 * x + 5 + np.random.normal(0, 1, size=len(x))

# Non-linear (saturation) data with noise
y_nonlinear = (20 * x) / (5 + x) + np.random.normal(0, 1, size=len(x))

# Create DataFrame
df = pd.DataFrame({
    'x': x,
    'y_linear': y_linear,
    'y_nonlinear': y_nonlinear
})

# Plot
plt.figure(figsize=(10, 5))

# Linear plot
plt.subplot(1, 2, 1)
plt.scatter(df['x'], df['y_linear'], label='Linear Data', color='blue')
plt.title("Linear Relationship")
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)

# Non-linear plot
plt.subplot(1, 2, 2)
plt.scatter(df['x'], df['y_nonlinear'], label='Non-linear Data', color='green')
plt.title("Non-linear Relationship (Saturation)")
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)

plt.tight_layout()
plt.show()


In [None]:
# 📦 Import lmfit for non-linear regression
!pip install lmfit --quiet
from lmfit import Model

# 📐 Define the non-linear model function (Michaelis-Menten type)
def saturation_model(x, a, b):
    return (a * x) / (b + x)

# 🧪 Create the model
model = Model(saturation_model)

# 🧲 Provide initial parameter guesses
params = model.make_params(a=20, b=5)

# 🧮 Fit the model to the simulated non-linear data
result = model.fit(df['y_nonlinear'], params, x=df['x'])

# 📃 Print a fit report
print(result.fit_report())

# 📊 Plot the original data and the best-fit curve
plt.figure(figsize=(8, 5))
plt.scatter(df['x'], df['y_nonlinear'], label='Observed Data', color='green')
plt.plot(df['x'], result.best_fit, label='Fitted Curve', color='black')
plt.title('Non-linear Fit using Michaelis-Menten Model')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.grid(True)
plt.show()


## 🧪 Statistical Testing

Statistical tests help determine whether observed differences in your data are **significant** or just due to **random variation**.

---

### 🔍 1. Normality Test (Shapiro-Wilk)
Tests whether a variable is normally distributed.

$$
H_0: \text{Data is normally distributed}
$$
$$
H_1: \text{Data is not normally distributed}
$$

If the p-value < 0.05, we **reject the null hypothesis**.

---

### 🔍 2. Two-Sample t-test
Used to test if the **means** of two independent groups are significantly different.

$$
H_0: \mu_1 = \mu_2
$$
$$
H_1: \mu_1 \ne \mu_2
$$

---

### 🔍 3. One-Way ANOVA
Used to compare **more than two groups**. It tests whether **any** group mean is significantly different.

---

### 📌 When to Use:

| Test          | Use Case                            |
|---------------|--------------------------------------|
| Shapiro-Wilk  | Check normality of one variable      |
| t-test        | Compare two conditions or timeframes |
| ANOVA         | Compare three or more treatments     |

We'll simulate a CO₂ variable across different time periods or treatments to apply these tests.


In [None]:
from scipy import stats
import numpy as np
import pandas as pd

# Simulate 3 groups (e.g., CO2 levels under 3 lighting regimes)
np.random.seed(0)
group1 = np.random.normal(410, 5, size=30)
group2 = np.random.normal(420, 5, size=30)
group3 = np.random.normal(430, 5, size=30)

# Combine into a DataFrame
df_test = pd.DataFrame({
    'CO2': np.concatenate([group1, group2, group3]),
    'Group': ['A']*30 + ['B']*30 + ['C']*30
})

# 📊 1. Shapiro-Wilk Normality Test
print("Shapiro-Wilk Normality Test (Group A):")
shapiro_test = stats.shapiro(group1)
print(f"Statistic: {shapiro_test.statistic:.3f}, p-value: {shapiro_test.pvalue:.4f}\n")

# 📊 2. t-test between Group A and Group B
print("T-test (Group A vs B):")
ttest = stats.ttest_ind(group1, group2)
print(f"T-statistic: {ttest.statistic:.3f}, p-value: {ttest.pvalue:.4f}\n")

# 📊 3. One-way ANOVA for all three groups
print("One-way ANOVA (Group A, B, C):")
anova = stats.f_oneway(group1, group2, group3)
print(f"F-statistic: {anova.statistic:.3f}, p-value: {anova.pvalue:.4f}")


## 📊 Visualizing Group Differences

A **boxplot** shows the distribution of values in each group, including:
- Median
- Interquartile range (IQR)
- Potential outliers

We use this plot to **visually assess**:
- Whether group means differ
- How variable each group is

This complements numerical tests like t-tests or ANOVA.


In [1]:
import seaborn as sns
import matplotlib.pyplot as plt

# 🖼️ Boxplot for visual comparison
plt.figure(figsize=(8, 5))
sns.boxplot(x='Group', y='CO2', data=df_test, palette='Set2')
plt.title('CO₂ Concentration by Group')
plt.xlabel('Group (Lighting Regime)')
plt.ylabel('CO₂ (ppm)')
plt.grid(True)
plt.show()


ModuleNotFoundError: No module named 'seaborn'

## 📌 Annotating Significance on Plots

To make statistical test results more intuitive, we can **annotate p-values** or significance markers (e.g., `*`, `**`, `ns`) directly on the boxplot.

This makes the plot not only descriptive but also **statistically informative**.

We'll use the `statannotations` package to do this cleanly.


In [None]:
# 📦 Install statannotations if not already installed
!pip install statannotations --quiet


In [None]:
from statannotations.Annotator import Annotator

# Define pairs of groups to compare
pairs = [("A", "B"), ("A", "C"), ("B", "C")]

# Create figure and plot
plt.figure(figsize=(8, 5))
ax = sns.boxplot(x='Group', y='CO2', data=df_test, palette='Set2')

# Initialize Annotator
annotator = Annotator(ax, pairs, data=df_test, x='Group', y='CO2')
annotator.configure(test='t-test_ind', text_format='star', loc='outside')
annotator.apply_and_annotate()

# Final touches
plt.title('CO₂ Concentration by Group with Significance Annotations')
plt.grid(True)
plt.show()


## 📝 Exercises

Apply the concepts from today's lab to deepen your understanding of greenhouse data modeling and analysis.

---

### 🔁 1. Non-linear Regression Practice

Fit a **Michaelis-Menten model** to the relationship between `Light Intensity (µmol/m²/s)` and `Photosynthetic Rate (µmol CO₂/m²/s)` using simulated or real data.

- Plot the data and the fitted curve.
- Print the parameter estimates (`a`, `b`) and interpret them.
- Compare the fit to a linear model using R².

---

### 🔬 2. Shapiro-Wilk Normality Test

Using a CO₂ dataset from two different weeks:

- Test whether each week’s data is normally distributed using `scipy.stats.shapiro`.
- Interpret the results and decide whether parametric tests (e.g., t-test) are appropriate.

---

### 📊 3. T-test on Environmental Data

Compare average **humidity** between two timeframes:
- Before and after ventilation
- Weekday vs weekend
- Or any logical split in your dataset

Use `scipy.stats.ttest_ind()` and:
- Report the t-statistic and p-value
- State whether the difference is statistically significant (p < 0.05)

---

### 🔀 4. ANOVA on Simulated Groups

Simulate or use 3+ treatments (e.g., different light colors or CO₂ levels). Run a **one-way ANOVA**:

- Use `scipy.stats.f_oneway()`
- Plot a boxplot with significance annotations
- Interpret the group differences

---

### 🌟 Bonus Challenge

Try fitting a **custom non-linear model** (e.g., logistic growth or polynomial function) using `lmfit`. Compare it to the Michaelis-Menten model using AIC or residuals.

---
