# Section 2: Principles of Experimental Design

1. Randomization: refers to the random assignment of experimental units (subjects, plots, etc.) to treatment groups. Fisher emphasized randomization as a method for controlling both known and unknown sources of bias in experiments. By randomly assigning treatments, we ensure that the groups are comparable and that results are not influenced by factors outside the control of the researcher.

Fisher’s View:

Fisher argued that randomization is essential to prevent bias in the experiment. It ensures that all experimental units have an equal chance of receiving any treatment and that the treatment effects can be estimated independently of confounding variables.


2. Replication: involves repeating an experimental treatment on multiple experimental units to reduce variability in the data. The idea is that the more times an experiment is repeated, the more reliable and consistent the estimates of the treatment effects will be. Replication allows us to estimate the inherent variability in the data, which is crucial for statistical analysis.

Fisher’s View:

Fisher emphasized that replication is necessary to distinguish between real treatment effects and random variation. Without replication, it is difficult to determine whether observed differences are due to the treatments or random noise.


3. Blocking: is the process of grouping experimental units that are similar in some way (e.g., soil type, age, gender) into blocks. Blocking helps control for known sources of variability and improves the precision of treatment comparisons by ensuring that treatment differences are not confounded with these other sources of variation.

Fisher’s View:

Fisher’s introduction of blocking was an attempt to control for factors that could introduce noise into the experimental data. By organizing similar units into blocks and applying treatments within these blocks, Fisher’s designs minimize variability due to external factors.

Randomization Example:
We are testing three types of fertilizer (Fertilizer A, Fertilizer B, and Fertilizer C) on crop yields. We randomly assign these fertilizers to 30 fields to eliminate any bias based on factors like field location or previous usage.

In [1]:
import numpy as np 
import pandas as pd 

n_fields = 30 
fertilizers = ['feritlizer A', 'fertilizer B', 'fertilizer C']
np.random.seed(42)
fertilizer_assignment = np.random.choice(fertilizers, size=n_fields)

In [2]:
fields = pd.DataFrame({
    'Field_ID': range(1, n_fields + 1),
    'Fertilizer': fertilizer_assignment
})

# Display the assignment of fertilizers
fields.head()

Unnamed: 0,Field_ID,Fertilizer
0,1,fertilizer C
1,2,feritlizer A
2,3,fertilizer C
3,4,fertilizer C
4,5,feritlizer A


Blocking Example:
In this example, we block by soil type. Let’s assume that we have two types of soil: Soil Type 1 and Soil Type 2. We then randomize the fertilizer treatments within each soil block.

In [3]:

# Soil types
soil_types = ['Soil Type 1', 'Soil Type 2']

# Block the fields by soil type
soil_block = np.random.choice(soil_types, size=n_fields)

# Randomly assign fertilizers within each soil type
block_assignment = []

for soil in soil_types:
    block_fertilizers = np.random.choice(fertilizers, size=n_fields // len(soil_types))
    block_assignment.extend(block_fertilizers)




In [4]:
# Create a DataFrame for blocking
block_data = pd.DataFrame({
    'Field_ID': range(1, n_fields + 1),
    'Soil_Type': soil_block,
    'Fertilizer': block_assignment
})

# Display the blocking assignment
block_data.head()

Unnamed: 0,Field_ID,Soil_Type,Fertilizer
0,1,Soil Type 2,feritlizer A
1,2,Soil Type 2,fertilizer B
2,3,Soil Type 2,feritlizer A
3,4,Soil Type 2,fertilizer B
4,5,Soil Type 2,fertilizer C


We want to test the effects of three fertilizers (Fertilizer A, Fertilizer B, Fertilizer C) on crop yield. We have 30 fields, but we know that soil type might influence crop yield. Therefore, we will block by Soil Type (e.g., Soil Type 1, Soil Type 2), and we will replicate the experiment by using multiple fields per treatment.

Steps:
Block the fields by soil type.
Randomly assign the fertilizers within each block.
Simulate the crop yield (for simplicity, we'll use random data).
Analyze the results using ANOVA to check for significant differences between treatments and soil types.

In [5]:
import numpy as np
import pandas as pd
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm

# Simulate crop yields based on fertilizer treatment and soil type
np.random.seed(42)  # For reproducibility

# Simulate yield data (just for example, normally distributed yields)
yield_data = []

for soil in soil_types:
    for fertilizer in fertilizers:
        # Simulate yields for each combination of soil type and fertilizer
        yields = np.random.normal(loc=50, scale=5, size=n_fields // 6)  # 5 fields per combination
        yield_data.extend(zip([soil] * len(yields), [fertilizer] * len(yields), yields))

# Create a DataFrame for the simulated crop yield data
yield_df = pd.DataFrame(yield_data, columns=['Soil_Type', 'Fertilizer', 'Yield'])

# Perform ANOVA to analyze the effect of Fertilizer, Soil Type, and their interaction
model = ols('Yield ~ C(Soil_Type) + C(Fertilizer) + C(Soil_Type):C(Fertilizer)', data=yield_df).fit()

# Perform ANOVA
anova_results = anova_lm(model)

# Display the ANOVA table
anova_results



Unnamed: 0,df,sum_sq,mean_sq,F,PR(>F)
C(Soil_Type),1.0,29.550324,29.550324,1.810994,0.190965
C(Fertilizer),2.0,69.88907,34.944535,2.141578,0.13938
C(Soil_Type):C(Fertilizer),2.0,96.206515,48.103258,2.948012,0.071643
Residual,24.0,391.612478,16.317187,,


# Conclusion
In this section, we expanded on the key principles of randomization, replication, and blocking as outlined by Ronald A. Fisher in The Design of Experiments. These principles provide the foundation for designing rigorous experiments that minimize bias and allow for reliable statistical inference.

Randomization ensures that treatment assignment is unbiased.
Replication provides a more reliable estimate of treatment effects by repeating the experiment on multiple units.
Blocking controls for known sources of variability, ensuring that treatment effects are not confounded with other factors.
By simulating experiments in Python, we applied these principles to test the effects of fertilizers on crop yield, demonstrating how to design, analyze, and interpret experimental results. Fisher’s work remains a cornerstone of modern experimental design, and these principles continue to guide research across fields.