
# üìò Student Score Prediction Using Simple Linear Regression

This notebook demonstrates how to build a **Simple Linear Regression** model to predict student exam scores based on the **number of hours studied** using a real-world style dataset.


In [None]:

# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score


## üìÇ Load the Dataset

In [None]:

# Load the dataset
data = pd.read_csv("student_scores_real.csv")

# Display first few rows
data.head()


## üîç Basic Data Exploration

In [None]:

# Check shape of the dataset
print("Shape of dataset:", data.shape)

# Check for missing values
print("\nMissing values:")
print(data.isnull().sum())

# Basic statistics
data.describe()


## üìä Visualize the Relationship (Hours vs Scores)

In [None]:

plt.figure(figsize=(8, 5))
plt.scatter(data["Hours"], data["Scores"])
plt.xlabel("Hours Studied")
plt.ylabel("Scores")
plt.title("Hours Studied vs Exam Scores")
plt.grid(True)
plt.show()


## üß± Prepare Features and Target

In [None]:

# Feature (X) and target (y)
X = data[["Hours"]]   # 2D array
y = data["Scores"]    # 1D array

X.head(), y.head()


## ‚úÇÔ∏è Train-Test Split

In [None]:

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print("Training samples:", X_train.shape[0])
print("Testing samples:", X_test.shape[0])


## ü§ñ Train the Linear Regression Model

In [None]:

# Create and train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Model parameters
print("Slope (coefficient):", model.coef_[0])
print("Intercept:", model.intercept_)


## üìà Make Predictions

In [None]:

# Predict on the test set
y_pred = model.predict(X_test)

# Compare actual vs predicted
comparison = pd.DataFrame({
    "Hours (X_test)": X_test["Hours"].values,
    "Actual Scores": y_test.values,
    "Predicted Scores": y_pred
})

comparison


## üßÆ Model Evaluation

In [None]:

mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print("Mean Absolute Error (MAE):", mae)
print("Mean Squared Error (MSE):", mse)
print("Root Mean Squared Error (RMSE):", rmse)
print("R¬≤ Score:", r2)


## üìâ Plot Regression Line

In [None]:

plt.figure(figsize=(8, 5))

# Scatter plot of the actual data
plt.scatter(X, y, label="Actual Data")

# Regression line
line = model.coef_[0] * X + model.intercept_
plt.plot(X, line, label="Regression Line")

plt.xlabel("Hours Studied")
plt.ylabel("Scores")
plt.title("Simple Linear Regression - Hours vs Scores")
plt.legend()
plt.grid(True)
plt.show()


## üîÆ Predict Score for Custom Study Hours

In [None]:

# Predict the score for a custom number of study hours
hours = float(input("Enter number of study hours: "))
predicted_score = model.predict([[hours]])[0]

print(f"Predicted Score for {hours} hours of study: {predicted_score:.2f}")



## ‚úÖ Conclusion

In this project, we:

- Loaded a real-world style dataset of **student study hours vs exam scores**  
- Explored and visualized the relationship between variables  
- Built and trained a **Simple Linear Regression** model  
- Evaluated the model using MAE, MSE, RMSE, and R¬≤ score  
- Used the model to predict scores for custom study hours  

This is a great beginner-friendly machine learning project that demonstrates how linear regression can be applied to real-world educational data.
