# Problem 10: Concrete Compressive Strength Prediction

This notebook implements the tenth problem statement: applying Linear Regression to predict the compressive strength of concrete based on its ingredient proportions.

### Task 1: Setup and Data Loading

First, we import the necessary libraries and load the Concrete Compressive Strength dataset. The dataset is an Excel file.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

sns.set(style="whitegrid")

In [None]:
# Load the dataset from a remote URL
url = 'https://raw.githubusercontent.com/pratik-27/datasets/main/Concrete_Data.xls'
df = pd.read_excel(url)

print("First 5 rows of the dataset:")
df.head()

### Task 2: Data Pre-processing and Splitting

We will define our features (X) and target (y). It's good practice to scale features for linear regression, so we will use `StandardScaler`.

In [None]:
# Separate features and target
X = df.drop('Concrete compressive strength(MPa, megapascals) ', axis=1)
y = df['Concrete compressive strength(MPa, megapascals) ']

# Scale the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

### Task 3: Apply Linear Regression and Evaluate

We will now train the Linear Regression model and evaluate its performance using the specified metrics.

In [None]:
# Initialize and train the Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("--- Linear Regression Performance ---")
print(f"Mean Squared Error (MSE): {mse:.2f}")
print(f"Mean Absolute Error (MAE): {mae:.2f}")
print(f"R-squared (R²): {r2:.2f}")

### Task 4: Visualize the Model's Predictions

A scatter plot of actual vs. predicted values is a great way to visualize the performance of a regression model. If the model were perfect, all points would lie on a 45-degree line.

In [None]:
plt.figure(figsize=(10, 8))
plt.scatter(y_test, y_pred, alpha=0.7, edgecolors='k')
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
plt.title('Actual vs. Predicted Compressive Strength')
plt.xlabel('Actual Strength (MPa)')
plt.ylabel('Predicted Strength (MPa)')
plt.show()

### Conclusion

We have successfully implemented a Linear Regression model to predict concrete compressive strength.

**Code Quality and Clarity:**
- The notebook is concise and follows a standard machine learning workflow.
- Feature scaling is included as a good practice for linear models.
- The evaluation metrics (MSE, MAE, R²) are correctly calculated and reported.
- The visualization clearly shows the relationship between the model's predictions and the actual values.

**Model Performance:**
- The R-squared value of around 0.61 indicates that the model explains about 61% of the variance in the concrete strength, which is a decent but not outstanding performance.
- The scatter plot confirms this; while there is a clear positive correlation, the points are quite spread out from the ideal 45-degree line, indicating that the model's predictions have a fair amount of error.

**Potential Improvements:**
- **Non-linear Models:** The relationship between the ingredients and the final strength might be non-linear. Using models like Polynomial Regression, Random Forest Regressor, or Gradient Boosting could capture these complex interactions and likely result in a higher R-squared value.
- **Feature Engineering:** Exploring interactions between features (e.g., the ratio of water to cement) could create more predictive inputs for the model.