## 🧪 Ensemble Learning Practice: K-Fold Cross-Validation with Bagging Regressor


In [152]:
import pandas as pd

In [153]:
df=pd.read_excel('/content/Concrete_Data.xls')

In [154]:
df.head()

Unnamed: 0,Cement (component 1)(kg in a m^3 mixture),Blast Furnace Slag (component 2)(kg in a m^3 mixture),Fly Ash (component 3)(kg in a m^3 mixture),Water (component 4)(kg in a m^3 mixture),Superplasticizer (component 5)(kg in a m^3 mixture),Coarse Aggregate (component 6)(kg in a m^3 mixture),Fine Aggregate (component 7)(kg in a m^3 mixture),Age (day),"Concrete compressive strength(MPa, megapascals)"
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.986111
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.887366
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.269535
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05278
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.296075


In [155]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1030 entries, 0 to 1029
Data columns (total 9 columns):
 #   Column                                                 Non-Null Count  Dtype  
---  ------                                                 --------------  -----  
 0   Cement (component 1)(kg in a m^3 mixture)              1030 non-null   float64
 1   Blast Furnace Slag (component 2)(kg in a m^3 mixture)  1030 non-null   float64
 2   Fly Ash (component 3)(kg in a m^3 mixture)             1030 non-null   float64
 3   Water  (component 4)(kg in a m^3 mixture)              1030 non-null   float64
 4   Superplasticizer (component 5)(kg in a m^3 mixture)    1030 non-null   float64
 5   Coarse Aggregate  (component 6)(kg in a m^3 mixture)   1030 non-null   float64
 6   Fine Aggregate (component 7)(kg in a m^3 mixture)      1030 non-null   float64
 7   Age (day)                                              1030 non-null   int64  
 8   Concrete compressive strength(MPa, megapascals)  

🧱 **Domain: Concrete Mixture and Strength**

---

🎯 **Goal**  
Predict the compressive strength of concrete (in MPa) based on its ingredients and age.

---

📊 **Features Breakdown**

| Feature | Description | Impact on Strength |
|--------|-------------|---------------------|
| Cement | More cement typically increases strength due to better bonding | ✅ Increasing is generally good |
| Blast Furnace Slag | Can partially replace cement; improves durability but may reduce early strength | ⚖️ Moderate increase may help |
| Fly Ash | Improves workability and long-term strength, but may reduce early strength | ⚖️ Balanced use is beneficial |
| Water | Essential for hydration, but too much weakens concrete | ❌ Excess reduces strength |
| Superplasticizer | Enhances flow without extra water, allowing lower water-cement ratio | ✅ Helps increase strength |
| Coarse Aggregate | Provides bulk and compressive resistance | ✅ Generally beneficial |
| Fine Aggregate | Fills gaps and improves finish; excessive amounts may reduce strength | ⚖️ Needs balance |
| Age | Concrete gains strength over time as it cures | ✅ Longer age increases strength |

---

🧪 **Target Variable**  
- **Concrete compressive strength**: Measured in megapascals (MPa), indicates how much pressure the concrete can withstand.


In [156]:
df.rename(columns={
    'Cement (component 1)(kg in a m^3 mixture)': 'Cement_kg',
    'Blast Furnace Slag (component 2)(kg in a m^3 mixture)': 'Slag_kg',
    'Fly Ash (component 3)(kg in a m^3 mixture)': 'FlyAsh_kg',
    'Water  (component 4)(kg in a m^3 mixture)': 'Water_kg',
    'Superplasticizer (component 5)(kg in a m^3 mixture)': 'Superplasticizer_kg',
    'Coarse Aggregate  (component 6)(kg in a m^3 mixture)': 'CoarseAgg_kg',
    'Fine Aggregate (component 7)(kg in a m^3 mixture)': 'FineAgg_kg',
    'Age (day)': 'Age_days',
    'Concrete compressive strength': 'Strength_MPa'
}, inplace=True)

In [157]:
df.isna().sum() #No missing values

Unnamed: 0,0
Cement_kg,0
Slag_kg,0
FlyAsh_kg,0
Water_kg,0
Superplasticizer_kg,0
CoarseAgg_kg,0
FineAgg_kg,0
Age_days,0
"Concrete compressive strength(MPa, megapascals)",0


In [158]:
df.columns = df.columns.str.strip().str.replace('\xa0', ' ')
df.rename(columns={'Concrete compressive strength(MPa, megapascals)': 'Concrete compressive strength'}, inplace=True)

In [159]:
from sklearn.model_selection import train_test_split , KFold
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error,r2_score
from sklearn.neighbors import KNeighborsRegressor

In [160]:
X_train,X_test,y_train,y_test=train_test_split(
    df.drop('Concrete compressive strength',axis=1),
    df['Concrete compressive strength'],
    test_size=0.2,
    random_state=42
)

In [161]:
X_train.shape,X_test.shape,y_train.shape,y_test.shape

((824, 8), (206, 8), (824,), (206,))

In [162]:
824+206

1030

In [163]:
lr=LinearRegression()
lr.fit(X_train,y_train)
training_score=r2_score(lr.predict(X_train),y_train)
print(training_score)

0.3618892962763446


In [164]:
lr_testing_score=r2_score(lr.predict(X_test),y_test)
print(lr_testing_score)

0.42303938808034913


In [165]:
knn=KNeighborsRegressor()
knn.fit(X_train,y_train)
training_score=r2_score(knn.predict(X_train),y_train)
print(training_score)

0.7158572278677388


In [166]:
knn_testing_score=r2_score(knn.predict(X_test),y_test)
print(knn_testing_score)

0.5730512109447223


In [167]:
'''VotingRegressor takes r2 score as average of all the models'''
average_r2_Score=(lr_testing_score+knn_testing_score)/2
print(average_r2_Score)

0.49804529951253573


# 🧪 Implementing KFold + BaggingRegressor

For more about KFold: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html

For more about BaggingRegressor: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingRegressor.html

In [168]:
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import BaggingRegressor

In [169]:
model=BaggingRegressor(estimator=KNeighborsRegressor(),n_estimators=10,random_state=23,
                       oob_score=True)

In [170]:
model.fit(X_train,y_train)

  warn(


In [171]:
model.get_params()

{'bootstrap': True,
 'bootstrap_features': False,
 'estimator__algorithm': 'auto',
 'estimator__leaf_size': 30,
 'estimator__metric': 'minkowski',
 'estimator__metric_params': None,
 'estimator__n_jobs': None,
 'estimator__n_neighbors': 5,
 'estimator__p': 2,
 'estimator__weights': 'uniform',
 'estimator': KNeighborsRegressor(),
 'max_features': 1.0,
 'max_samples': 1.0,
 'n_estimators': 10,
 'n_jobs': None,
 'oob_score': True,
 'random_state': 23,
 'verbose': 0,
 'warm_start': False}

In [172]:
kfold=KFold(n_splits=10,shuffle=True,random_state=23)
results=cross_val_score(model,X_train,y_train,cv=kfold)

  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(


In [173]:
print('Out-Of-Bag scores',model.oob_score_) #oob are the left samples while replacing training samples for estimtors
print('R2 scores for each fold: ',results)
print('Overall R2 score: ',results.mean())

Out-Of-Bag scores 0.6381582842265059
R2 scores for each fold:  [0.71573868 0.71888741 0.74106782 0.70016994 0.79899196 0.69060816
 0.70279863 0.59368968 0.5921397  0.57152384]
Overall R2 score:  0.6825615820238091


In [174]:
(0.71573868+0.71888741+0.74106782+0.70016994+0.79899196+0.69060816+0.70279863+0.59368968+0.5921397+0.57152384)/10

0.6825615820000001

# ✍🏽What we can understand from this is , for one dataset + one model training may not generalize the data well even it have high testing score , how we training our model with data is important , so try to implement kfold+bagging we can give more shuffled samples to our model. So only kNeighborRegressor gives 0.57 as testing score without doing cross validation, while training with kfold+bagging we get 0.68 as testing score.

Predicting new concrete strength

In [175]:
new_concrete = pd.DataFrame([{
    'Cement_kg': 300.0,
    'Slag_kg': 100.0,
    'FlyAsh_kg': 50.0,
    'Water_kg': 180.0,
    'Superplasticizer_kg': 10.0,
    'CoarseAgg_kg': 950.0,
    'FineAgg_kg': 750.0,
    'Age_days': 28
}])


In [176]:
predicted_strength = model.predict(new_concrete)
predicted_strength #Kfold+BaggingRegressor prediction

array([41.59673519])

In [177]:
lr_prediction=lr.predict(new_concrete) #Linear regression prediction
knn_prediction=knn.predict(new_concrete) #KNeighborRegressor prediction

In [178]:
lr_prediction,knn_prediction

(array([38.89656221]), array([42.67815071]))