<a href="https://colab.research.google.com/github/navaneethkomuravelli/explainable_Ai_2374/blob/main/ex_ai_assignment_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

** 1.LearnNow–OnlineCoursePlatform:**


In [9]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression

# 1. Prepare the data
data = {
    'Number of Emails Sent': [1, 3, 2, 1, 3],
    'Enrollments': [80, 120, 95, 85, 130]
}
df = pd.DataFrame(data)

X = df[['Number of Emails Sent']]
y = df['Enrollments']

# 2. Perform Linear Regression Analysis
model = LinearRegression()
model.fit(X, y)

# Get the coefficients
m = model.coef_[0]
c = model.intercept_

print("Linear Regression Coefficients:")
print(f"Slope (m): {m:.2f}")
print(f"Intercept (c): {c:.2f}")
print(f"Linear Regression Equation: y = {m:.2f}x + {c:.2f}\n")

# 3. Calculate the Baseline Value
baseline_value = np.mean(y)
print(f"Baseline Value (Mean of Enrollments): {baseline_value:.2f}\n")

# 4. Calculate SHAP values and Predictions
predictions = model.predict(X)
shap_values = predictions - baseline_value

df['Predicted Enrollments'] = predictions
df['SHAP Value'] = shap_values
df['Final Prediction (Baseline + SHAP)'] = baseline_value + shap_values

# 5. Interpret the Results
df['Over/Underprediction'] = np.where(df['Predicted Enrollments'] > df['Enrollments'], 'Overprediction', 'Underprediction')

print("Results Table:")
print(df)
print("\n")

# 6. Summary Analysis
print("Summary Analysis:")
print("-------------------")

# Accuracy of the model
mae = np.mean(np.abs(predictions - y))
print(f"The Mean Absolute Error (MAE) of the model is approximately {mae:.2f}, indicating that on average, the model's predictions are off by this many enrollments.")
print("The model shows a clear positive trend, suggesting a reasonable fit for the data.")

# Trend analysis
print(f"\nTrend Analysis:")
print(f"The positive slope of {m:.2f} confirms a strong positive correlation. For every additional email sent, the model predicts an increase of {m:.2f} enrollments.")

# SHAP interpretation insights
print("\nSHAP Interpretation Insights:")
print("The SHAP values explain each prediction's deviation from the baseline.")
print("For example, a SHAP value of -21.25 for x=1 shows that sending 1 email decreases the prediction by 21.25 enrollments relative to the average.")
print("Conversely, a SHAP value of 21.25 for x=3 shows that sending 3 emails increases the prediction by 21.25 enrollments relative to the average.")
print("A SHAP value of 0 for x=2 means that sending 2 emails results in a prediction exactly at the baseline.")

Linear Regression Coefficients:
Slope (m): 21.25
Intercept (c): 59.50
Linear Regression Equation: y = 21.25x + 59.50

Baseline Value (Mean of Enrollments): 102.00

Results Table:
   Number of Emails Sent  Enrollments  Predicted Enrollments  SHAP Value  \
0                      1           80                  80.75      -21.25   
1                      3          120                 123.25       21.25   
2                      2           95                 102.00        0.00   
3                      1           85                  80.75      -21.25   
4                      3          130                 123.25       21.25   

   Final Prediction (Baseline + SHAP) Over/Underprediction  
0                               80.75       Overprediction  
1                              123.25       Overprediction  
2                              102.00       Overprediction  
3                               80.75      Underprediction  
4                              123.25      Underprediction 

2. ShopEase–E-commerceRevenuePredictionusingMultipleLinear
 RegressionandSHAPAnalysis

In [10]:
# 1. Prepare the data for Multiple Linear Regression
data = {
    'AdSpend': [200, 300, 250, 150, 100],
    'Discount(%)': [10, 15, 5, 10, 0],
    'Revenue(y)': [1500, 2000, 1700, 1400, 1000]
}
df_shop = pd.DataFrame(data)

X_shop = df_shop[['AdSpend', 'Discount(%)']]
y_shop = df_shop['Revenue(y)']

In [11]:
# 2. Perform Multiple Linear Regression Analysis
model_shop = LinearRegression()
model_shop.fit(X_shop, y_shop)

# Get the coefficients
m1 = model_shop.coef_[0]
m2 = model_shop.coef_[1]
c_shop = model_shop.intercept_

print("Multiple Linear Regression Coefficients:")
print(f"Coefficient for AdSpend (m1): {m1:.2f}")
print(f"Coefficient for Discount(%) (m2): {m2:.2f}")
print(f"Intercept (c): {c_shop:.2f}")
print(f"Linear Regression Equation: y = {m1:.2f}*AdSpend + {m2:.2f}*Discount(%) + {c_shop:.2f}\n")

Multiple Linear Regression Coefficients:
Coefficient for AdSpend (m1): 3.90
Coefficient for Discount(%) (m2): 14.07
Intercept (c): 628.15
Linear Regression Equation: y = 3.90*AdSpend + 14.07*Discount(%) + 628.15



In [12]:
# 3. Calculate the Baseline Value
baseline_value_shop = np.mean(y_shop)
print(f"Baseline Value (Mean of Revenue): {baseline_value_shop:.2f}\n")

Baseline Value (Mean of Revenue): 1520.00



In [13]:
# 4. Calculate SHAP values and Predictions
predictions_shop = model_shop.predict(X_shop)

# Calculate total SHAP value for each prediction
shap_values_total = predictions_shop - baseline_value_shop

# Distribute SHAP contribution based on model coefficients
# We can approximate the SHAP contribution of each feature by scaling its feature value by its coefficient
# This is a simplification and true SHAP values would require a SHAP library
shap_adspend = (X_shop['AdSpend'] * m1)
shap_discount = (X_shop['Discount(%)'] * m2)

# It's important to note that the sum of these simplified SHAP values for each instance
# will not necessarily exactly equal the total SHAP value (prediction - baseline) due to the intercept term
# and the nature of how SHAP values are calculated across different feature combinations.
# However, they provide a proportional sense of each feature's contribution relative to its coefficient.


df_shop['Predicted Revenue'] = predictions_shop
df_shop['Total SHAP Value (Approx)'] = shap_values_total
df_shop['SHAP Contribution (AdSpend Approx)'] = shap_adspend
df_shop['SHAP Contribution (Discount% Approx)'] = shap_discount


# 5. Compute Final Prediction for Each Record (Verification)
# The final prediction should be Baseline + Sum of SHAP values for that instance.
# Using the approximate SHAP values, we can show that Baseline + sum of approximate SHAP
# is not exactly equal to the prediction because of the intercept, but the total SHAP value is.

df_shop['Baseline + Total SHAP'] = baseline_value_shop + df_shop['Total SHAP Value (Approx)']

# Verify prediction = Baseline + Total SHAP
# This will be true by definition of how Total SHAP Value was calculated
print("Verification: Predicted Revenue vs Baseline + Total SHAP")
print(df_shop[['Revenue(y)', 'Predicted Revenue', 'Baseline + Total SHAP']])
print("\n")

Verification: Predicted Revenue vs Baseline + Total SHAP
   Revenue(y)  Predicted Revenue  Baseline + Total SHAP
0        1500        1548.148148            1548.148148
1        2000        2008.148148            2008.148148
2        1700        1672.592593            1672.592593
3        1400        1353.333333            1353.333333
4        1000        1017.777778            1017.777778




In [14]:
# 6. Interpret the Results
df_shop['Over/Underprediction'] = np.where(df_shop['Predicted Revenue'] > df_shop['Revenue(y)'], 'Overprediction', 'Underprediction')

print("Results Table:")
print(df_shop)
print("\n")

print("Summary Analysis:")
print("-------------------")

# Accuracy of the model
mae_shop = np.mean(np.abs(predictions_shop - y_shop))
print(f"The Mean Absolute Error (MAE) of the model is approximately {mae_shop:.2f}, indicating that on average, the model's predictions are off by this many revenue units.")

# Trend analysis
print(f"\nTrend Analysis:")
print(f"The coefficient for AdSpend ({m1:.2f}) suggests that for every unit increase in AdSpend, the model predicts an increase of {m1:.2f} in revenue, holding Discount(%) constant.")
print(f"The coefficient for Discount(%) ({m2:.2f}) suggests that for every percentage point increase in Discount(%), the model predicts a change of {m2:.2f} in revenue, holding AdSpend constant.")
print("Note the sign of the Discount(%) coefficient and interpret accordingly (positive means increase in discount is associated with increase in revenue, negative means decrease).")


# SHAP interpretation insights
print("\nSHAP Interpretation Insights (Approximate):")
print("The 'Total SHAP Value (Approx)' column shows the approximate deviation of each prediction from the baseline revenue.")
print("The 'SHAP Contribution (AdSpend Approx)' and 'SHAP Contribution (Discount% Approx)' columns provide a simplified view of how each feature approximately contributed to that deviation based on their coefficients.")
print("For a more rigorous SHAP analysis, a dedicated SHAP library (like `shap`) would be used to account for feature interactions and the intercept term correctly.")


# Compare predicted vs actual revenue and comment on over/underprediction
print("\nComparison of Predicted vs Actual Revenue and Over/Underprediction:")
for index, row in df_shop.iterrows():
    print(f"Record {index + 1}: Actual Revenue = {row['Revenue(y)']:.2f}, Predicted Revenue = {row['Predicted Revenue']:.2f}. This is an {row['Over/Underprediction']}.")

print("\nPossible reasons for over/underprediction could include factors not included in the model, non-linear relationships between variables, or random noise in the data.")

Results Table:
   AdSpend  Discount(%)  Revenue(y)  Predicted Revenue  \
0      200           10        1500        1548.148148   
1      300           15        2000        2008.148148   
2      250            5        1700        1672.592593   
3      150           10        1400        1353.333333   
4      100            0        1000        1017.777778   

   Total SHAP Value (Approx)  SHAP Contribution (AdSpend Approx)  \
0                  28.148148                          779.259259   
1                 488.148148                         1168.888889   
2                 152.592593                          974.074074   
3                -166.666667                          584.444444   
4                -502.222222                          389.629630   

   SHAP Contribution (Discount% Approx)  Baseline + Total SHAP  \
0                            140.740741            1548.148148   
1                            211.111111            2008.148148   
2                            

**3. RegressionwithDiabetesDataset **

In [15]:
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import numpy as np
import pandas as pd

# Load the diabetes dataset
diabetes = load_diabetes()
X_dia, y_dia = diabetes.data, diabetes.target

# Convert to DataFrame for easier handling and interpretation
X_dia_df = pd.DataFrame(X_dia, columns=diabetes.feature_names)
y_dia_df = pd.Series(y_dia, name='disease_progression')

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_dia_df, y_dia_df, test_size=0.2, random_state=42)

# 1. Perform Multiple Linear Regression Analysis
model_dia = LinearRegression()
model_dia.fit(X_train, y_train)

# Get the coefficients
coefficients = model_dia.coef_
intercept = model_dia.intercept_

print("Multiple Linear Regression Coefficients:")
for feature, coef in zip(X_train.columns, coefficients):
    print(f"Coefficient for {feature}: {coef:.2f}")
print(f"Intercept: {intercept:.2f}\n")

# 2. Calculate the Baseline Value
baseline_value_dia = np.mean(y_train)
print(f"Baseline Value (Mean of Training Disease Progression): {baseline_value_dia:.2f}\n")

# 3. Calculate SHAP values and Predictions (Approximation using coefficients)
predictions_test = model_dia.predict(X_test)

# Calculate total SHAP value for each prediction
shap_values_total_test = predictions_test - baseline_value_dia

# Distribute SHAP contribution based on model coefficients
# This is a simplification for demonstration. A SHAP library is needed for true SHAP values.
shap_contributions_test = X_test * coefficients

# 4. Compute Final Prediction for Each Record (Verification)
df_test_results = X_test.copy()
df_test_results['Actual Progression'] = y_test
df_test_results['Predicted Progression'] = predictions_test
df_test_results['Total SHAP Value (Approx)'] = shap_values_total_test

# Add approximate SHAP contributions for each feature
for feature in X_test.columns:
    df_test_results[f'SHAP Contribution ({feature} Approx)'] = shap_contributions_test[feature]

# Verify prediction = Baseline + Total SHAP
# This will be true by definition of how Total SHAP Value was calculated
df_test_results['Baseline + Total SHAP'] = baseline_value_dia + df_test_results['Total SHAP Value (Approx)']

print("Verification: Predicted Progression vs Baseline + Total SHAP (for test set)")
print(df_test_results[['Actual Progression', 'Predicted Progression', 'Baseline + Total SHAP']].head())
print("\n")

# 5. Interpret the Results
df_test_results['Over/Underprediction'] = np.where(df_test_results['Predicted Progression'] > df_test_results['Actual Progression'], 'Overprediction', 'Underprediction')

print("Results Table (first 5 test records):")
display(df_test_results.head())
print("\n")

print("Summary Analysis (for test set):")
print("-------------------")

# Accuracy of the model
mae_test = np.mean(np.abs(predictions_test - y_test))
print(f"The Mean Absolute Error (MAE) of the model on the test set is approximately {mae_test:.2f}, indicating that on average, the model's predictions are off by this many disease progression units.")

print("\nSHAP Interpretation Insights (Approximate for first 5 test records):")
print("The 'Total SHAP Value (Approx)' column shows the approximate deviation of each prediction from the baseline progression.")
print("The 'SHAP Contribution (Feature Approx)' columns provide a simplified view of how each feature approximately contributed to that deviation based on their coefficients.")
print("For a more rigorous SHAP analysis, a dedicated SHAP library (`shap`) would be used to account for feature interactions and the intercept term correctly.")


# Compare predicted vs actual progression and comment on over/underprediction
print("\nComparison of Predicted vs Actual Progression and Over/Underprediction (first 5 test records):")
for index, row in df_test_results.head().iterrows():
    print(f"Record Index: {index}: Actual Progression = {row['Actual Progression']:.2f}, Predicted Progression = {row['Predicted Progression']:.2f}. This is an {row['Over/Underprediction']}.")
    print("Approximate SHAP Contributions for this record:")
    for feature in X_test.columns:
        print(f"  {feature}: {row[f'SHAP Contribution ({feature} Approx)']:.2f}")
    print("-" * 20)

print("\nPossible reasons for over/underprediction could include factors not included in the model, non-linear relationships between variables, or random noise in the data.")

Multiple Linear Regression Coefficients:
Coefficient for age: 37.90
Coefficient for sex: -241.96
Coefficient for bmi: 542.43
Coefficient for bp: 347.70
Coefficient for s1: -931.49
Coefficient for s2: 518.06
Coefficient for s3: 163.42
Coefficient for s4: 275.32
Coefficient for s5: 736.20
Coefficient for s6: 48.67
Intercept: 151.35

Baseline Value (Mean of Training Disease Progression): 153.74

Verification: Predicted Progression vs Baseline + Total SHAP (for test set)
     Actual Progression  Predicted Progression  Baseline + Total SHAP
287               219.0             139.547558             139.547558
211                70.0             179.517208             179.517208
72                202.0             134.038756             134.038756
321               230.0             291.417029             291.417029
73                111.0             123.789659             123.789659


Results Table (first 5 test records):


Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6,...,SHAP Contribution (bmi Approx),SHAP Contribution (bp Approx),SHAP Contribution (s1 Approx),SHAP Contribution (s2 Approx),SHAP Contribution (s3 Approx),SHAP Contribution (s4 Approx),SHAP Contribution (s5 Approx),SHAP Contribution (s6 Approx),Baseline + Total SHAP,Over/Underprediction
287,0.045341,-0.044642,-0.006206,-0.015999,0.125019,0.125198,0.019187,0.034309,0.032432,-0.00522,...,-3.366288,-5.562905,-116.453527,64.860413,3.135539,9.445843,23.87664,-0.254051,139.547558,Underprediction
211,0.092564,-0.044642,0.036907,0.021872,-0.02496,-0.016658,0.000779,-0.039493,-0.022517,-0.021788,...,20.019163,7.605113,23.250109,-8.62996,0.127273,-10.873235,-16.576642,-1.060448,179.517208,Overprediction
72,0.063504,0.05068,-0.00405,-0.012556,0.103003,0.04879,0.056003,-0.002592,0.084492,-0.017646,...,-2.197015,-4.365813,-95.946572,25.276194,9.152071,-0.713696,62.202568,-0.858849,134.038756,Underprediction
321,0.096197,-0.044642,0.051996,0.079265,0.054845,0.036577,-0.076536,0.141322,0.098648,0.061054,...,28.20407,27.560645,-51.087606,18.949209,-12.507444,38.908507,72.62459,2.971534,291.417029,Overprediction
73,0.012648,0.05068,-0.020218,-0.002228,0.038334,0.053174,-0.006584,0.034309,-0.005142,-0.009362,...,-10.966559,-0.774535,-35.707389,27.54742,-1.076034,9.445843,-3.785674,-0.45565,123.789659,Overprediction




Summary Analysis (for test set):
-------------------
The Mean Absolute Error (MAE) of the model on the test set is approximately 42.79, indicating that on average, the model's predictions are off by this many disease progression units.

SHAP Interpretation Insights (Approximate for first 5 test records):
The 'Total SHAP Value (Approx)' column shows the approximate deviation of each prediction from the baseline progression.
The 'SHAP Contribution (Feature Approx)' columns provide a simplified view of how each feature approximately contributed to that deviation based on their coefficients.
For a more rigorous SHAP analysis, a dedicated SHAP library (`shap`) would be used to account for feature interactions and the intercept term correctly.

Comparison of Predicted vs Actual Progression and Over/Underprediction (first 5 test records):
Record Index: 287: Actual Progression = 219.00, Predicted Progression = 139.55. This is an Underprediction.
Approximate SHAP Contributions for this record

**4.RegressionwithStudentPerformanceDataset**

In [16]:
# Load a sample student performance dataset
from sklearn.datasets import make_regression
import pandas as pd

# Generate synthetic data resembling student performance data
X_perf, y_perf = make_regression(n_samples=100, n_features=5, random_state=42, noise=10)

# Create feature names
feature_names = ['studytime', 'parental_education', 'absences', 'failures', 'health']
X_perf_df = pd.DataFrame(X_perf, columns=feature_names)
y_perf_df = pd.Series(y_perf, name='final_exam_score')

# Split data into training and testing sets
X_train_perf, X_test_perf, y_train_perf, y_test_perf = train_test_split(X_perf_df, y_perf_df, test_size=0.2, random_state=42)

# 1. Perform Multiple Linear Regression Analysis
model_perf = LinearRegression()
model_perf.fit(X_train_perf, y_train_perf)

# Get the coefficients
coefficients_perf = model_perf.coef_
intercept_perf = model_perf.intercept_

print("Multiple Linear Regression Coefficients:")
for feature, coef in zip(X_train_perf.columns, coefficients_perf):
    print(f"Coefficient for {feature}: {coef:.2f}")
print(f"Intercept: {intercept_perf:.2f}\n")

# 2. Calculate the Baseline Value
baseline_value_perf = np.mean(y_train_perf)
print(f"Baseline Value (Mean of Training Final Exam Scores): {baseline_value_perf:.2f}\n")

# 3. Calculate SHAP values and Predictions (Approximation using coefficients)
predictions_test_perf = model_perf.predict(X_test_perf)

# Calculate total SHAP value for each prediction
shap_values_total_test_perf = predictions_test_perf - baseline_value_perf

# Distribute SHAP contribution based on model coefficients (Simplification)
shap_contributions_test_perf = X_test_perf * coefficients_perf

# 4. Compute Final Prediction for Each Record (Verification)
df_test_results_perf = X_test_perf.copy()
df_test_results_perf['Actual Score'] = y_test_perf
df_test_results_perf['Predicted Score'] = predictions_test_perf
df_test_results_perf['Total SHAP Value (Approx)'] = shap_values_total_test_perf

# Add approximate SHAP contributions for each feature
for feature in X_test_perf.columns:
    df_test_results_perf[f'SHAP Contribution ({feature} Approx)'] = shap_contributions_test_perf[feature]

# Verify prediction = Baseline + Total SHAP
df_test_results_perf['Baseline + Total SHAP'] = baseline_value_perf + df_test_results_perf['Total SHAP Value (Approx)']

print("Verification: Predicted Score vs Baseline + Total SHAP (for test set)")
display(df_test_results_perf[['Actual Score', 'Predicted Score', 'Baseline + Total SHAP']].head())
print("\n")

# 5. Interpret the Results
df_test_results_perf['Over/Underprediction'] = np.where(df_test_results_perf['Predicted Score'] > df_test_results_perf['Actual Score'], 'Overprediction', 'Underprediction')

print("Results Table (first 5 test records):")
display(df_test_results_perf.head())
print("\n")

print("Summary Analysis (for test set):")
print("-------------------")

# Accuracy of the model
mae_test_perf = np.mean(np.abs(predictions_test_perf - y_test_perf))
print(f"The Mean Absolute Error (MAE) of the model on the test set is approximately {mae_test_perf:.2f}, indicating that on average, the model's predictions are off by this many score units.")

print("\nSHAP Interpretation Insights (Approximate for first 5 test records):")
print("The 'Total SHAP Value (Approx)' column shows the approximate deviation of each prediction from the baseline score.")
print("The 'SHAP Contribution (Feature Approx)' columns provide a simplified view of how each feature approximately contributed to that deviation based on their coefficients.")
print("For a more rigorous SHAP analysis, a dedicated SHAP library (`shap`) would be used to account for feature interactions and the intercept term correctly.")


# Compare predicted vs actual score and comment on over/underprediction
print("\nComparison of Predicted vs Actual Score and Over/Underprediction (first 5 test records):")
for index, row in df_test_results_perf.head().iterrows():
    print(f"Record Index: {index}: Actual Score = {row['Actual Score']:.2f}, Predicted Score = {row['Predicted Score']:.2f}. This is an {row['Over/Underprediction']}.")
    print("Approximate SHAP Contributions for this record:")
    for feature in X_test_perf.columns:
        print(f"  {feature}: {row[f'SHAP Contribution ({feature} Approx)']:.2f}")
    print("-" * 20)

print("\nPossible reasons for over/underprediction could include factors not included in the model, non-linear relationships between variables, or random noise in the data.")

Multiple Linear Regression Coefficients:
Coefficient for studytime: 61.52
Coefficient for parental_education: 98.47
Coefficient for absences: 61.10
Coefficient for failures: 55.54
Coefficient for health: 35.97
Intercept: 0.55

Baseline Value (Mean of Training Final Exam Scores): 10.32

Verification: Predicted Score vs Baseline + Total SHAP (for test set)


Unnamed: 0,Actual Score,Predicted Score,Baseline + Total SHAP
83,163.122207,177.568564,177.568564
53,-298.232215,-292.406774,-292.406774
70,114.031569,111.540003,111.540003
45,180.251948,178.393669,178.393669
44,409.19936,400.616364,400.616364




Results Table (first 5 test records):


Unnamed: 0,studytime,parental_education,absences,failures,health,Actual Score,Predicted Score,Total SHAP Value (Approx),SHAP Contribution (studytime Approx),SHAP Contribution (parental_education Approx),SHAP Contribution (absences Approx),SHAP Contribution (failures Approx),SHAP Contribution (health Approx),Baseline + Total SHAP,Over/Underprediction
83,-0.315269,0.651391,0.570891,1.135566,0.954002,163.122207,177.568564,167.24935,-19.396023,64.144149,34.881239,63.07102,34.319367,177.568564,Overprediction
53,-0.252568,-1.024388,-0.92693,-0.059525,-3.241267,-298.232215,-292.406774,-302.725988,-15.53852,-100.87405,-56.635173,-3.306128,-116.601716,-292.406774,Overprediction
70,-0.112328,0.21398,1.964725,0.035264,-0.699726,114.031569,111.540003,101.220788,-6.910656,21.071145,120.044115,1.958591,-25.172004,111.540003,Underprediction
45,1.17944,1.831459,-0.898415,0.491919,-1.320233,180.251948,178.393669,168.074454,72.56162,180.348391,-54.892866,27.321929,-47.494218,178.393669,Underprediction
44,1.119575,3.078881,-0.249036,0.576557,0.31125,409.19936,400.616364,390.297149,68.878587,303.185204,-15.216027,32.022839,11.196948,400.616364,Underprediction




Summary Analysis (for test set):
-------------------
The Mean Absolute Error (MAE) of the model on the test set is approximately 8.44, indicating that on average, the model's predictions are off by this many score units.

SHAP Interpretation Insights (Approximate for first 5 test records):
The 'Total SHAP Value (Approx)' column shows the approximate deviation of each prediction from the baseline score.
The 'SHAP Contribution (Feature Approx)' columns provide a simplified view of how each feature approximately contributed to that deviation based on their coefficients.
For a more rigorous SHAP analysis, a dedicated SHAP library (`shap`) would be used to account for feature interactions and the intercept term correctly.

Comparison of Predicted vs Actual Score and Over/Underprediction (first 5 test records):
Record Index: 83: Actual Score = 163.12, Predicted Score = 177.57. This is an Overprediction.
Approximate SHAP Contributions for this record:
  studytime: -19.40
  parental_educatio