# **Concrete Compressive Strength Prediction Using a Neural Network**

**In this tutorial**, we will predict the compressive strength of concrete using a neural network model. The dataset is sourced from the UCI Machine Learning Repository, and the `ucimlrepo` library will be used to fetch the dataset. We will preprocess the data, build and train a neural network, and evaluate its performance.

---

## **Step 1: Install Necessary Libraries**

Before proceeding, make sure to install the required libraries. This step is only needed if you're running the notebook on Google Colab or any environment where these libraries are not pre-installed.




In [None]:
# Install necessary libraries for Google Colab
!pip install ucimlrepo tensorflow

## **Step 2: Import Required Libraries**

We begin by importing the necessary libraries for data handling, model building, and evaluation. These include:

- **pandas**: for data manipulation.
- **fetch_ucirepo**: from the `ucimlrepo` library to fetch the dataset.
- **train_test_split**: from `sklearn.model_selection` for splitting the data.
- **mean_squared_error**: from `sklearn.metrics` to evaluate the model's performance.
- **keras**: from TensorFlow to build the neural network.


In [None]:
# Import necessary libraries
import pandas as pd
from ucimlrepo import fetch_ucirepo
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from tensorflow import keras
from tensorflow.keras import layers

## **Step 3: Fetch the Dataset**

We use the `fetch_ucirepo` function to retrieve the Concrete Compressive Strength dataset from the UCI repository. This dataset contains data on various components of concrete and their corresponding compressive strength.


In [None]:
# Fetch dataset
concrete_compressive_strength = fetch_ucirepo(id=165)

## **Step 4: Prepare the Data**

We convert the dataset into a pandas DataFrame. The features (**X**) represent the input data, while the target variable (**y**) represents the compressive strength of concrete that we want to predict.

We flatten the target variable to ensure it's in the correct format for regression. 

Finally, we display the shape of the dataset to understand the dimensionality.


In [None]:
# Data as pandas DataFrame
X = concrete_compressive_strength.data.features
y = concrete_compressive_strength.data.targets.values.flatten()  # Flatten the target variable

# Display the original data shape
print("Original data shape:")
print(X.shape)


## **Step 5: Split the Data**

We split the dataset into training and testing sets using an 80/20 split. This allows us to train the model on 80% of the data and evaluate it on the remaining 20%.

- **X_train** and **y_train**: represent the training set.
- **X_test** and **y_test**: represent the testing set.


In [None]:
# Split the data into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display the shapes of the training and testing sets
print("\nTraining set shape:", X_train.shape)
print("Testing set shape:", X_test.shape)


---

## **Step 6: Build the Neural Network**

We construct a neural network model with two hidden layers:
- The first layer has 64 neurons and uses the ReLU activation function.
- The second layer has 32 neurons, also using the ReLU activation function.
- The output layer has 1 neuron since this is a regression problem where we want to predict a single continuous value (compressive strength).


In [None]:
# Build the neural network model
model = keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    layers.Dense(32, activation='relu'),
    layers.Dense(1)  # Output layer for regression
])


---

## **Step 7: Compile the Model**

We compile the model with:
- **Optimizer:** Adam, a popular optimization algorithm for neural networks.
- **Loss function:** Mean Squared Error (MSE), suitable for regression problems where we measure the average squared difference between actual and predicted values.


In [None]:
# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')


---

## **Step 8: Train the Model**

We train the neural network using the training set (`X_train`, `y_train`). During training:
- The model runs for 100 epochs.
- A batch size of 32 is used, meaning the model processes 32 samples at a time.
- We use a validation split of 20%, so the model evaluates performance on 20% of the training data during training.


In [None]:
# Train the model
history = model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2, verbose=1)


---

## **Step 9: Make Predictions**

After the model is trained, we make predictions on the test set (`X_test`). These predictions represent the model's estimate of the compressive strength based on unseen data.


In [None]:
# Make predictions on the test set
y_pred = model.predict(X_test)


---

## **Step 10: Evaluate the Model**

We evaluate the model's performance by calculating the Mean Squared Error (MSE) on the test set. A lower MSE indicates better model performance.


In [None]:
# Calculate Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print("\nMean Squared Error on the test set:", mse)


---

## **Step 11: View Sample Predictions**

We display a few sample predictions alongside their actual values for comparison. This helps us understand how well the model is performing at a glance.


In [None]:
# Display a few predictions
predictions_df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred.flatten()})
print("\nSample predictions:")
print(predictions_df.head())
