Linear regression

The code is designed to train a linear regression model to predict the total quantity of lessons taken by past swim club members. The goal is to progressively remove data on the classes taken and evaluate how well the model can still predict the total lessons taken. The model is trained on past members, and the approach is based on class progression (i.e., sequence and quantity of classes taken).

Key Steps:
Data Preparation:

The dataset is first split to focus on leavers (past members) who are no longer with the swim club. These leavers provide the historical data needed for training.
A list of classes (PreSchool 1 to INTERMEDIATE) is used, representing the sequence of lessons available in the swim club.
Feature Engineering:

The Variance to Median (VTM) is calculated for each class column. This represents the difference between the number of classes taken by a member and the median number of classes taken by others, excluding zeros. This helps standardize the lesson progression across different members.
Training on Reduced Class Data:

The model is trained iteratively, each time using progressively fewer class data points. Starting with all the class data, three classes are removed in each iteration to see how well the model can still predict the total lessons taken when data is progressively withheld.
The target variable is the Total Quantity of Lessons, and each iteration uses less and less class progression information to predict this target.
Training and Testing:

For each iteration, the code splits the data into training and testing sets (70% training, 30% testing).
The features are scaled using a StandardScaler to normalize the data.
A linear regression model is trained using the scaled data, and predictions are made on the test set.
The model’s performance is evaluated using Mean Squared Error (MSE) and R-squared (a measure of how well the model fits the data).
Performance Evaluation:

The results are printed for each iteration, showing how the model’s performance changes as more class data is removed.
Additionally, the actual and predicted total lessons are compared, showing how close the model's predictions are to the true values.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Step 1: Load your dataset
df = pd.read_excel('swimclass_rawdata.xlsx')

In [12]:
# Step 2: Prepare the data for leavers (ignore current members for now)
leavers_df = df[df['Current_member'] == 0].copy()  # Use .copy() to avoid SettingWithCopyWarning

# Step 3: Define the class columns
lesson_columns = ['PreSchool 1', 'PreSchool 2', 'PreSchool 3', 'PreSchool 4', 'PreSchool 5',
                  'PreSchool 6', 'Academy 1', 'Academy 2', 'Academy 3', 'Academy 4', 
                  'Academy 5', 'Academy 6', 'BEGINNERS', 'INTERMEDIATE']

# Step 4: Calculate Variance to Median (VTM) for each class column
for col in lesson_columns:
    median_value = leavers_df.loc[leavers_df[col] > 0, col].median()  # Exclude zero values and use .loc[]
    leavers_df.loc[:, f'{col}_VTM'] = leavers_df[col] - median_value  # Explicit .loc[] usage to avoid the warning

# Step 5: Train on progressively reduced class data
for classes_used in range(len(lesson_columns), 1, -3):  # Remove class data progressively, 3 at a time
    selected_classes = lesson_columns[:classes_used]
    features = [f'{col}_VTM' for col in selected_classes] + ['TOTAL QUANTITY OF LESSONS']  # Exclude TOTAL_QUANTITY_OF_LESSONS
    
    # Step 6: Define the target (Total lessons we want to predict)
    leavers_df.loc[:, 'PAST_LESSONS'] = leavers_df[lesson_columns].sum(axis=1)  # Known lessons taken so far
    
    # Remove TOTAL_QUANTITY_OF_LESSONS for training
    X_leavers = leavers_df[features].drop(columns=['TOTAL QUANTITY OF LESSONS'])
    y_leavers = leavers_df['TOTAL QUANTITY OF LESSONS']  # Total lessons as target
    
    # Step 7: Split data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X_leavers, y_leavers, test_size=0.3, random_state=42)
    
    # Step 8: Scale the data
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    # Step 9: Train the linear regression model
    linear_reg = LinearRegression()
    linear_reg.fit(X_train_scaled, y_train)
    
    # Step 10: Make predictions and evaluate the model
    y_pred = linear_reg.predict(X_test_scaled)
    
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    
    # Print results for each iteration
    print(f"Classes used: {classes_used} | Mean Squared Error: {mse:.2f} | R-squared: {r2:.2f}")
    
    # Optional: Compare actual vs predicted total lessons
    results_df = pd.DataFrame({'Actual_TOTAL_LESSONS': y_test, 'Predicted_TOTAL_LESSONS': y_pred})
    results_df['DIFFERENCE'] = results_df['Predicted_TOTAL_LESSONS'] - results_df['Actual_TOTAL_LESSONS']
    print(results_df.head())


Classes used: 14 | Mean Squared Error: 0.71 | R-squared: 1.00
       Actual_TOTAL_LESSONS  Predicted_TOTAL_LESSONS  DIFFERENCE
20075                    39                38.997279   -0.002721
30546                   105               104.934708   -0.065292
20806                    35                34.975545   -0.024455
39438                    30                30.002425    0.002425
18229                   134               134.005573    0.005573
Classes used: 11 | Mean Squared Error: 6.84 | R-squared: 1.00
       Actual_TOTAL_LESSONS  Predicted_TOTAL_LESSONS  DIFFERENCE
20075                    39                39.014868    0.014868
30546                   105               104.760644   -0.239356
20806                    35                34.958702   -0.041298
39438                    30                30.039038    0.039038
18229                   134               133.944120   -0.055880
Classes used: 8 | Mean Squared Error: 251.91 | R-squared: 0.91
       Actual_TOTAL_LESSONS  Pred

Logistic Regression

Summary of the Code
Load the Dataset: The dataset contains information about members of the swim club, including the number of lessons they’ve taken across different classes.
Prepare the Data: The data for leavers is extracted. The lesson columns (e.g., PreSchool 1, PreSchool 2, etc.) are selected, and the "variance to median" (VTM) for each class is calculated. The VTM represents how much a member deviates from the median number of lessons taken for each class.
Binary Target Variable: The total number of lessons (TOTAL QUANTITY OF LESSONS) is converted into a binary target, with members categorized as having taken more than 50 lessons (1) or 50 or fewer lessons (0).
Train Logistic Regression: The model is trained on progressively fewer classes, removing three classes at a time. For each iteration:
The logistic regression model is trained using the VTM columns and TOTAL QUANTITY OF LESSONS.
The dataset is split into training and testing sets.
The model is fitted, and predictions are made.
Performance metrics like accuracy, classification report, and confusion matrix are printed.

In [None]:
Optional if already loaded
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Step 1: Load your dataset
df = pd.read_excel('swimclass_rawdata.xlsx')

In [11]:
# Step 2: Prepare the data for leavers (ignore current members for now)
leavers_df = df[df['Current_member'] == 0].copy()

# Step 3: Define the class columns
lesson_columns = ['PreSchool 1', 'PreSchool 2', 'PreSchool 3', 'PreSchool 4', 'PreSchool 5',
                  'PreSchool 6', 'Academy 1', 'Academy 2', 'Academy 3', 'Academy 4', 
                  'Academy 5', 'Academy 6', 'BEGINNERS', 'INTERMEDIATE']

# Step 4: Calculate Variance to Median (VTM) for each class column
for col in lesson_columns:
    median_value = leavers_df.loc[leavers_df[col] > 0, col].median()  # Exclude zero values
    leavers_df.loc[:, f'{col}_VTM'] = leavers_df[col] - median_value  # Ensure .loc is used properly

# Step 5: Create a binary target (whether TOTAL_QUANTITY_OF_LESSONS > 50)
leavers_df.loc[:, 'LESSONS_BINARY'] = (leavers_df['TOTAL QUANTITY OF LESSONS'] > 50).astype(int)

# Step 6: Train on progressively reduced class data
for classes_used in range(len(lesson_columns), 1, -3):  # Remove class data progressively, 3 at a time
    selected_classes = lesson_columns[:classes_used]
    features = [f'{col}_VTM' for col in selected_classes]  # Exclude TOTAL_QUANTITY_OF_LESSONS from features
    
    # Step 7: Define the target (Binary: Did they take more than 50 lessons?)
    X_leavers = leavers_df[features].copy()
    y_leavers = leavers_df['LESSONS_BINARY'].copy()
    
    # Step 8: Split data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X_leavers, y_leavers, test_size=0.3, random_state=42)
    
    # Step 9: Scale the data
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    # Step 10: Train the logistic regression model
    log_reg = LogisticRegression()
    log_reg.fit(X_train_scaled, y_train)
    
    # Step 11: Make predictions and evaluate the model
    y_pred = log_reg.predict(X_test_scaled)
    
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Classes used: {classes_used} | Accuracy: {accuracy:.2f}")
    
    # Print detailed classification metrics
    print("Classification Report:")
    print(classification_report(y_test, y_pred))
    
    # Print confusion matrix
    print("Confusion Matrix:")
    print(confusion_matrix(y_test, y_pred))
    
    # Optional: Compare actual vs predicted binary lessons
    results_df = pd.DataFrame({'Actual_BINARY_LESSONS': y_test, 'Predicted_BINARY_LESSONS': y_pred})
    results_df['DIFFERENCE'] = results_df['Predicted_BINARY_LESSONS'] - results_df['Actual_BINARY_LESSONS']
    print(results_df.head())


Classes used: 14 | Accuracy: 1.00
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      8336
           1       1.00      1.00      1.00      4591

    accuracy                           1.00     12927
   macro avg       1.00      1.00      1.00     12927
weighted avg       1.00      1.00      1.00     12927

Confusion Matrix:
[[8336    0]
 [   0 4591]]
       Actual_BINARY_LESSONS  Predicted_BINARY_LESSONS  DIFFERENCE
20075                      0                         0           0
30546                      1                         1           0
20806                      0                         0           0
39438                      0                         0           0
18229                      1                         1           0
Classes used: 11 | Accuracy: 1.00
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      8336
   

Random Forest

The goal of the code is to predict the Total Quantity of Lessons taken by swim club members who have left the club, based on the sequence of classes they took. The code progressively removes class data to test how the model's prediction accuracy changes with less available information.

Steps:
Loading the Dataset: The dataset is loaded from an Excel file, focusing on members who have left (Current_member == 0).

Defining Class Columns: A list of swim class levels (e.g., 'PreSchool 1', 'Academy 1', etc.) is created. These columns represent different swim class levels that members have taken.

Calculating Variance to Median (VTM): For each class, the median value is calculated based on the number of lessons taken by the members. Then, the Variance to Median (VTM) is computed for each member, representing how their lessons differ from the median value for that class. This provides a normalized feature to work with for modeling.

Training the Model with Progressively Reduced Class Data: The core of the code tests how well the Random Forest model performs when progressively fewer class data points are available:

The classes are reduced by 3 in each iteration (starting from all 14 class columns and reducing down).
In each iteration, the remaining classes' VTM features are used to train the model.
The target variable is Total Quantity of Lessons, which represents the total lessons a member took before leaving.
Modeling with Random Forest:

The model uses Random Forest regression to predict the total number of lessons based on the available class progression data.
The training and testing sets are created using train_test_split from scikit-learn.
The features are scaled using StandardScaler for better model performance.
The model is then trained, and predictions are made on the test set.
Evaluating Model Performance: After each iteration, the model's performance is evaluated using Mean Squared Error (MSE) and R-squared (R²) metrics:

MSE measures the average squared difference between the predicted and actual total lessons.
R² indicates how well the model fits the data, with 1.00 being a perfect fit.
Results for each iteration (with progressively fewer classes) are printed out, including the MSE, R², and the difference between actual and predicted lessons.

In [14]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score

In [None]:
#Optional Step if not loaded


# Step 1: Load your dataset
df = pd.read_excel('swimclass_rawdata.xlsx')

In [15]:
# Step 2: Prepare the data for leavers (ignore current members for now)
leavers_df = df[df['Current_member'] == 0].copy()  # Use .copy() to avoid SettingWithCopyWarning

# Step 3: Define the class columns
lesson_columns = ['PreSchool 1', 'PreSchool 2', 'PreSchool 3', 'PreSchool 4', 'PreSchool 5',
                  'PreSchool 6', 'Academy 1', 'Academy 2', 'Academy 3', 'Academy 4', 
                  'Academy 5', 'Academy 6', 'BEGINNERS', 'INTERMEDIATE']

# Step 4: Calculate Variance to Median (VTM) for each class column
for col in lesson_columns:
    median_value = leavers_df.loc[leavers_df[col] > 0, col].median()  # Exclude zero values and use .loc[]
    leavers_df.loc[:, f'{col}_VTM'] = leavers_df[col] - median_value  # Explicit .loc[] usage to avoid the warning

# Step 5: Train on progressively reduced class data
for classes_used in range(len(lesson_columns), 1, -3):  # Remove class data progressively, 3 at a time
    selected_classes = lesson_columns[:classes_used]
    features = [f'{col}_VTM' for col in selected_classes] + ['TOTAL QUANTITY OF LESSONS']  # Exclude TOTAL_QUANTITY_OF_LESSONS
    
    # Step 6: Define the target (Total lessons we want to predict)
    leavers_df.loc[:, 'PAST_LESSONS'] = leavers_df[lesson_columns].sum(axis=1)  # Known lessons taken so far
    
    # Remove TOTAL_QUANTITY_OF_LESSONS for training
    X_leavers = leavers_df[features].drop(columns=['TOTAL QUANTITY OF LESSONS'])
    y_leavers = leavers_df['TOTAL QUANTITY OF LESSONS']  # Total lessons as target
    
    # Step 7: Split data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X_leavers, y_leavers, test_size=0.3, random_state=42)
    
    # Step 8: Scale the data
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    # Step 9: Train the Random Forest model
    rf_reg = RandomForestRegressor(n_estimators=100, random_state=42)
    rf_reg.fit(X_train_scaled, y_train)
    
    # Step 10: Make predictions and evaluate the model
    y_pred = rf_reg.predict(X_test_scaled)
    
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    
    # Print results for each iteration
    print(f"Classes used: {classes_used} | Mean Squared Error: {mse:.2f} | R-squared: {r2:.2f}")
    
    # Compare actual vs predicted total lessons
    results_df = pd.DataFrame({'Actual_TOTAL_LESSONS': y_test, 'Predicted_TOTAL_LESSONS': y_pred})
    results_df['DIFFERENCE'] = results_df['Predicted_TOTAL_LESSONS'] - results_df['Actual_TOTAL_LESSONS']
    print(results_df.head())


Classes used: 14 | Mean Squared Error: 39.16 | R-squared: 0.99
       Actual_TOTAL_LESSONS  Predicted_TOTAL_LESSONS  DIFFERENCE
20075                    39                    39.00        0.00
30546                   105                   108.96        3.96
20806                    35                    34.98       -0.02
39438                    30                    30.00        0.00
18229                   134                   130.38       -3.62
Classes used: 11 | Mean Squared Error: 40.97 | R-squared: 0.98
       Actual_TOTAL_LESSONS  Predicted_TOTAL_LESSONS  DIFFERENCE
20075                    39                    39.00        0.00
30546                   105                   108.93        3.93
20806                    35                    34.98       -0.02
39438                    30                    30.00        0.00
18229                   134                   130.20       -3.80
Classes used: 8 | Mean Squared Error: 239.60 | R-squared: 0.91
       Actual_TOTAL_LESSONS  Pr

XG boost

This code builds an XGBoost regression model to predict the Total Quantity of Lessons taken by leavers from a swim club based on the classes they have taken. It progressively reduces the number of classes used for training to observe how the model performs with fewer features. 

Dataset Preparation:

The dataset is loaded, and leavers (students who have left the club) are selected.
The classes taken by leavers are listed, and a new feature (_VTM or Variance to Median) is created for each class. This represents how much a student's class count differs from the median value for that class.
Training and Prediction Process:

The model is trained using a progressively reduced number of class features (starting with 14 classes and reducing by 3 at each step).
The target variable is the TOTAL QUANTITY OF LESSONS, which the model aims to predict.
For each iteration:
The data is split into training and testing sets.
The features are scaled using StandardScaler.
An XGBoost regressor is trained using the current set of features.
Predictions are made on the test set.
Mean Squared Error (MSE) and R-squared scores are computed to evaluate the model's performance.
The actual and predicted lesson totals are compared, and the differences are printed.
Performance Monitoring:

As the number of features (classes) is reduced, the model's performance is evaluated to see how well it can predict the total number of lessons based on progressively fewer data points.

In [17]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score
import xgboost as xgb

In [None]:
#Optional Step if not loaded


# Step 1: Load your dataset
df = pd.read_excel('swimclass_rawdata.xlsx')

In [20]:
# Step 2: Prepare the data for leavers (ignore current members for now)
leavers_df = df[df['Current_member'] == 0].copy()  # Use .copy() to avoid SettingWithCopyWarning

# Step 3: Define the class columns
lesson_columns = ['PreSchool 1', 'PreSchool 2', 'PreSchool 3', 'PreSchool 4', 'PreSchool 5',
                  'PreSchool 6', 'Academy 1', 'Academy 2', 'Academy 3', 'Academy 4', 
                  'Academy 5', 'Academy 6', 'BEGINNERS', 'INTERMEDIATE']

# Step 4: Calculate Variance to Median (VTM) for each class column
for col in lesson_columns:
    median_value = leavers_df.loc[leavers_df[col] > 0, col].median()  # Exclude zero values
    leavers_df.loc[:, f'{col}_VTM'] = leavers_df[col] - median_value

# Step 5: Train on progressively reduced class data
for classes_used in range(len(lesson_columns), 1, -3):  # Remove class data progressively, 3 at a time
    selected_classes = lesson_columns[:classes_used]
    features = [f'{col}_VTM' for col in selected_classes] + ['TOTAL QUANTITY OF LESSONS']  # Exclude TOTAL_QUANTITY_OF_LESSONS
    
    # Step 6: Define the target (Total lessons we want to predict)
    leavers_df.loc[:, 'PAST_LESSONS'] = leavers_df[lesson_columns].sum(axis=1)  # Known lessons taken so far
    
    # Remove TOTAL_QUANTITY_OF_LESSONS for training
    X_leavers = leavers_df[features].drop(columns=['TOTAL QUANTITY OF LESSONS'])
    y_leavers = leavers_df['TOTAL QUANTITY OF LESSONS']  # Total lessons as target
    
    # Step 7: Split data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X_leavers, y_leavers, test_size=0.3, random_state=42)
    
    # Step 8: Scale the data
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    # Step 9: Train the XGBoost regression model
    xgb_reg = xgb.XGBRegressor(n_estimators=100, learning_rate=0.1, max_depth=5, random_state=42)
    xgb_reg.fit(X_train_scaled, y_train)
    
    # Step 10: Make predictions and evaluate the model
    y_pred = xgb_reg.predict(X_test_scaled)
    
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    
    # Print results for each iteration
    print(f"Classes used: {classes_used} | Mean Squared Error: {mse:.2f} | R-squared: {r2:.2f}")
    
    # Optional: Compare actual vs predicted total lessons
    results_df = pd.DataFrame({'Actual_TOTAL_LESSONS': y_test, 'Predicted_TOTAL_LESSONS': y_pred})
    results_df['DIFFERENCE'] = results_df['Predicted_TOTAL_LESSONS'] - results_df['Actual_TOTAL_LESSONS']
    print(results_df.head())


Classes used: 14 | Mean Squared Error: 23.69 | R-squared: 0.99
       Actual_TOTAL_LESSONS  Predicted_TOTAL_LESSONS  DIFFERENCE
20075                    39                39.050774    0.050774
30546                   105               121.588165   16.588165
20806                    35                32.085907   -2.914093
39438                    30                31.703869    1.703869
18229                   134               137.799164    3.799164
Classes used: 11 | Mean Squared Error: 26.64 | R-squared: 0.99
       Actual_TOTAL_LESSONS  Predicted_TOTAL_LESSONS  DIFFERENCE
20075                    39                39.783947    0.783947
30546                   105               120.549370   15.549370
20806                    35                32.642548   -2.357452
39438                    30                31.172129    1.172129
18229                   134               136.474884    2.474884
Classes used: 8 | Mean Squared Error: 224.35 | R-squared: 0.92
       Actual_TOTAL_LESSONS  Pr

ANN approach

ANN Model Setup:

A Sequential model is created with:
Input Layer: Accepts the scaled feature set.
3 Hidden Layers: With 128, 64, and 32 neurons respectively, each using the ReLU activation function.
Output Layer: Predicts the total number of lessons.
The model uses the Adam optimizer and minimizes the mean_squared_error loss function.
Model Training:

The model is trained with 100 epochs, using a batch size of 32. Verbose output is set to show training progress every 10 epochs.


In [22]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input

In [None]:
#Optional Step if not loaded


# Step 1: Load your dataset
df = pd.read_excel('swimclass_rawdata.xlsx')

In [28]:
# Step 2: Prepare the data for leavers (ignore current members for now)
leavers_df = df[df['Current_member'] == 0].copy()  # Use .copy() to avoid SettingWithCopyWarning

# Step 3: Define the class columns
lesson_columns = ['PreSchool 1', 'PreSchool 2', 'PreSchool 3', 'PreSchool 4', 'PreSchool 5',
                  'PreSchool 6', 'Academy 1', 'Academy 2', 'Academy 3', 'Academy 4', 
                  'Academy 5', 'Academy 6', 'BEGINNERS', 'INTERMEDIATE']

# Step 4: Calculate Variance to Median (VTM) for each class column
for col in lesson_columns:
    median_value = leavers_df.loc[leavers_df[col] > 0, col].median()  # Exclude zero values
    leavers_df.loc[:, f'{col}_VTM'] = leavers_df[col] - median_value  # Proper use of .loc to avoid warnings

# Step 5: Train on progressively reduced class data
for classes_used in range(len(lesson_columns), 1, -3):  # Remove class data progressively, 3 at a time
    selected_classes = lesson_columns[:classes_used]
    features = [f'{col}_VTM' for col in selected_classes] + ['TOTAL QUANTITY OF LESSONS']  # Exclude TOTAL_QUANTITY_OF_LESSONS
    
    # Step 6: Define the target (Total lessons we want to predict)
    leavers_df['PAST_LESSONS'] = leavers_df[lesson_columns].sum(axis=1)  # Known lessons taken so far
    
    # Remove TOTAL_QUANTITY_OF_LESSONS for training
    X_leavers = leavers_df[features].drop(columns=['TOTAL QUANTITY OF LESSONS'])
    y_leavers = leavers_df['TOTAL QUANTITY OF LESSONS']  # Total lessons as target
    
    # Step 7: Split data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X_leavers, y_leavers, test_size=0.3, random_state=42)
    
    # Step 8: Scale the data
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    # Step 9: Build and train the ANN model
    model = Sequential()
    model.add(Input(shape=(X_train_scaled.shape[1],)))  # Use Input(shape) for the first layer
    model.add(Dense(128, activation='relu'))  # 1st hidden layer
    model.add(Dense(64, activation='relu'))   # 2nd hidden layer
    model.add(Dense(32, activation='relu'))   # 3rd hidden layer
    model.add(Dense(1))  # Output layer for regression (single target)

    model.compile(optimizer='adam', loss='mean_squared_error')
    
    # Train the model, showing output every 10 epochs
    model.fit(X_train_scaled, y_train, epochs=100, batch_size=32, verbose=2, validation_split=0.2)
    
    # Step 10: Make predictions and evaluate the model
    y_pred = model.predict(X_test_scaled)
    
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    
    # Print results for each iteration
    print(f"Classes used: {classes_used} | Mean Squared Error: {mse:.2f} | R-squared: {r2:.2f}")
    
    # Optional: Compare actual vs predicted total lessons
    results_df = pd.DataFrame({'Actual_TOTAL_LESSONS': y_test, 'Predicted_TOTAL_LESSONS': y_pred.flatten()})
    results_df['DIFFERENCE'] = results_df['Predicted_TOTAL_LESSONS'] - results_df['Actual_TOTAL_LESSONS']
    print(results_df.head())


Epoch 1/100
755/755 - 4s - 6ms/step - loss: 412.2234 - val_loss: 14.7607
Epoch 2/100
755/755 - 2s - 3ms/step - loss: 9.0928 - val_loss: 4.6396
Epoch 3/100
755/755 - 3s - 4ms/step - loss: 3.5924 - val_loss: 6.9199
Epoch 4/100
755/755 - 1s - 2ms/step - loss: 3.0069 - val_loss: 2.0406
Epoch 5/100
755/755 - 1s - 2ms/step - loss: 2.4038 - val_loss: 2.2554
Epoch 6/100
755/755 - 1s - 2ms/step - loss: 2.0165 - val_loss: 2.6981
Epoch 7/100
755/755 - 1s - 2ms/step - loss: 1.9333 - val_loss: 2.1351
Epoch 8/100
755/755 - 1s - 2ms/step - loss: 1.3222 - val_loss: 1.0432
Epoch 9/100
755/755 - 1s - 2ms/step - loss: 2.2088 - val_loss: 0.9423
Epoch 10/100
755/755 - 1s - 2ms/step - loss: 1.6912 - val_loss: 1.4644
Epoch 11/100
755/755 - 2s - 3ms/step - loss: 1.4298 - val_loss: 1.2481
Epoch 12/100
755/755 - 2s - 3ms/step - loss: 1.7476 - val_loss: 1.9115
Epoch 13/100
755/755 - 1s - 2ms/step - loss: 1.5259 - val_loss: 25.8289
Epoch 14/100
755/755 - 3s - 3ms/step - loss: 3.1470 - val_loss: 0.9379
Epoch 15/10

Epoch 9/100
755/755 - 1s - 2ms/step - loss: 10.6828 - val_loss: 10.3282
Epoch 10/100
755/755 - 1s - 2ms/step - loss: 10.7753 - val_loss: 10.3512
Epoch 11/100
755/755 - 1s - 2ms/step - loss: 10.1942 - val_loss: 12.0977
Epoch 12/100
755/755 - 1s - 2ms/step - loss: 10.2167 - val_loss: 12.5963
Epoch 13/100
755/755 - 3s - 3ms/step - loss: 10.8805 - val_loss: 10.0427
Epoch 14/100
755/755 - 2s - 2ms/step - loss: 9.9687 - val_loss: 9.8668
Epoch 15/100
755/755 - 2s - 2ms/step - loss: 9.7150 - val_loss: 9.8809
Epoch 16/100
755/755 - 1s - 2ms/step - loss: 10.3278 - val_loss: 9.0032
Epoch 17/100
755/755 - 1s - 2ms/step - loss: 9.9755 - val_loss: 8.8046
Epoch 18/100
755/755 - 1s - 2ms/step - loss: 9.7714 - val_loss: 18.5902
Epoch 19/100
755/755 - 1s - 2ms/step - loss: 10.2763 - val_loss: 11.2492
Epoch 20/100
755/755 - 1s - 2ms/step - loss: 9.5668 - val_loss: 8.2163
Epoch 21/100
755/755 - 1s - 2ms/step - loss: 9.3991 - val_loss: 9.7532
Epoch 22/100
755/755 - 1s - 2ms/step - loss: 9.3130 - val_loss: 

Epoch 16/100
755/755 - 2s - 2ms/step - loss: 251.7899 - val_loss: 234.1118
Epoch 17/100
755/755 - 2s - 3ms/step - loss: 253.0278 - val_loss: 233.9031
Epoch 18/100
755/755 - 1s - 2ms/step - loss: 248.9523 - val_loss: 241.4491
Epoch 19/100
755/755 - 1s - 2ms/step - loss: 252.1931 - val_loss: 233.5018
Epoch 20/100
755/755 - 1s - 2ms/step - loss: 250.8020 - val_loss: 235.7004
Epoch 21/100
755/755 - 1s - 2ms/step - loss: 248.8542 - val_loss: 230.9794
Epoch 22/100
755/755 - 1s - 2ms/step - loss: 248.5922 - val_loss: 227.2683
Epoch 23/100
755/755 - 1s - 2ms/step - loss: 247.1836 - val_loss: 239.4077
Epoch 24/100
755/755 - 1s - 2ms/step - loss: 247.7066 - val_loss: 224.1320
Epoch 25/100
755/755 - 1s - 2ms/step - loss: 247.2053 - val_loss: 226.1307
Epoch 26/100
755/755 - 1s - 2ms/step - loss: 248.7511 - val_loss: 232.0926
Epoch 27/100
755/755 - 1s - 2ms/step - loss: 245.1638 - val_loss: 229.2112
Epoch 28/100
755/755 - 1s - 2ms/step - loss: 244.7131 - val_loss: 231.1870
Epoch 29/100
755/755 - 2s

Epoch 19/100
755/755 - 1s - 2ms/step - loss: 773.8607 - val_loss: 761.8495
Epoch 20/100
755/755 - 1s - 2ms/step - loss: 775.5841 - val_loss: 760.3206
Epoch 21/100
755/755 - 1s - 2ms/step - loss: 773.5093 - val_loss: 752.1154
Epoch 22/100
755/755 - 1s - 2ms/step - loss: 773.7682 - val_loss: 755.7438
Epoch 23/100
755/755 - 1s - 2ms/step - loss: 772.5822 - val_loss: 741.0767
Epoch 24/100
755/755 - 1s - 2ms/step - loss: 772.0166 - val_loss: 758.3414
Epoch 25/100
755/755 - 2s - 3ms/step - loss: 771.9504 - val_loss: 740.2029
Epoch 26/100
755/755 - 1s - 2ms/step - loss: 769.0836 - val_loss: 744.7178
Epoch 27/100
755/755 - 1s - 2ms/step - loss: 771.5261 - val_loss: 744.9844
Epoch 28/100
755/755 - 2s - 2ms/step - loss: 771.3478 - val_loss: 738.9144
Epoch 29/100
755/755 - 1s - 2ms/step - loss: 769.9266 - val_loss: 739.8315
Epoch 30/100
755/755 - 1s - 2ms/step - loss: 769.8257 - val_loss: 748.9245
Epoch 31/100
755/755 - 2s - 2ms/step - loss: 767.6950 - val_loss: 759.1513
Epoch 32/100
755/755 - 1s

Epoch 21/100
755/755 - 1s - 2ms/step - loss: 2307.9954 - val_loss: 2269.8230
Epoch 22/100
755/755 - 1s - 2ms/step - loss: 2310.1643 - val_loss: 2265.6365
Epoch 23/100
755/755 - 1s - 2ms/step - loss: 2308.0464 - val_loss: 2274.1318
Epoch 24/100
755/755 - 1s - 2ms/step - loss: 2307.1418 - val_loss: 2265.8433
Epoch 25/100
755/755 - 1s - 2ms/step - loss: 2306.5825 - val_loss: 2271.5562
Epoch 26/100
755/755 - 1s - 2ms/step - loss: 2305.6079 - val_loss: 2267.4973
Epoch 27/100
755/755 - 1s - 2ms/step - loss: 2309.3877 - val_loss: 2265.2998
Epoch 28/100
755/755 - 1s - 2ms/step - loss: 2305.8220 - val_loss: 2267.6597
Epoch 29/100
755/755 - 1s - 2ms/step - loss: 2306.6101 - val_loss: 2271.0466
Epoch 30/100
755/755 - 1s - 2ms/step - loss: 2304.9790 - val_loss: 2266.2671
Epoch 31/100
755/755 - 1s - 2ms/step - loss: 2306.8665 - val_loss: 2268.6763
Epoch 32/100
755/755 - 1s - 2ms/step - loss: 2306.8708 - val_loss: 2275.8232
Epoch 33/100
755/755 - 1s - 2ms/step - loss: 2305.4404 - val_loss: 2269.3584

Results summary from ANN

Classes used: 14 | Mean Squared Error: 0.88 | R-squared: 1.00
       Actual_TOTAL_LESSONS  Predicted_TOTAL_LESSONS  DIFFERENCE
20075                    39                39.270248    0.270248
30546                   105               105.376823    0.376823
20806                    35                35.337696    0.337696
39438                    30                30.255575    0.255575
18229                   134               134.399765    0.399765

Classes used: 11 | Mean Squared Error: 10.60 | R-squared: 1.00
       Actual_TOTAL_LESSONS  Predicted_TOTAL_LESSONS  DIFFERENCE
20075                    39                39.504517    0.504517
30546                   105               105.519890    0.519890
20806                    35                34.829407   -0.170593
39438                    30                30.320866    0.320866
18229                   134               135.448410    1.448410

Classes used: 8 | Mean Squared Error: 214.65 | R-squared: 0.92
       Actual_TOTAL_LESSONS  Predicted_TOTAL_LESSONS  DIFFERENCE
20075                    39                36.863304   -2.136696
30546                   105               114.073128    9.073128
20806                    35                33.848129   -1.151871
39438                    30                28.835672   -1.164328
18229                   134               129.458054   -4.541946

Classes used: 5 | Mean Squared Error: 700.22 | R-squared: 0.74
       Actual_TOTAL_LESSONS  Predicted_TOTAL_LESSONS  DIFFERENCE
20075                    39                48.857971    9.857971
30546                   105               118.195877   13.195877
20806                    35                56.994919   21.994919
39438                    30                37.618988    7.618988
18229                   134               140.749390    6.749390

Classes used: 2 | Mean Squared Error: 2245.45 | R-squared: 0.16
       Actual_TOTAL_LESSONS  Predicted_TOTAL_LESSONS  DIFFERENCE
20075                    39                45.878181    6.878181
30546                   105                40.134293  -64.865707
20806                    35                45.878181   10.878181
39438                    30                45.878181   15.878181
18229                   134               100.031418  -33.968582

Hyperparameter tuning

In [43]:
!pip install optuna

Collecting optuna
  Downloading optuna-4.0.0-py3-none-any.whl.metadata (16 kB)
Collecting alembic>=1.5.0 (from optuna)
  Downloading alembic-1.13.2-py3-none-any.whl.metadata (7.4 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.8.2-py3-none-any.whl.metadata (10 kB)
Collecting Mako (from alembic>=1.5.0->optuna)
  Downloading Mako-1.3.5-py3-none-any.whl.metadata (2.9 kB)
Downloading optuna-4.0.0-py3-none-any.whl (362 kB)
   ---------------------------------------- 0.0/362.8 kB ? eta -:--:--
   ------------------------------------ -- 337.9/362.8 kB 10.2 MB/s eta 0:00:01
   ---------------------------------------- 362.8/362.8 kB 4.5 MB/s eta 0:00:00
Downloading alembic-1.13.2-py3-none-any.whl (232 kB)
   ---------------------------------------- 0.0/233.0 kB ? eta -:--:--
   ---------------------------------------- 233.0/233.0 kB 7.2 MB/s eta 0:00:00
Downloading colorlog-6.8.2-py3-none-any.whl (11 kB)
Downloading Mako-1.3.5-py3-none-any.whl (78 kB)
   -------------------------

This code below performs hyperparameter tuning for a neural network model using Optuna, a framework for hyperparameter optimization.Summary of the key steps and components:

Libraries:
Optuna: Used for optimizing hyperparameters.
TensorFlow/Keras: To define and train the neural network model.
Scikit-learn: For data splitting, scaling, and calculating performance metrics.
Steps Breakdown:
Data Preparation:

The dataset X_leavers (features) and y_leavers (target) is split into training and testing sets using train_test_split().
The features are scaled using StandardScaler to standardize the data for better neural network performance.
Objective Function:

An objective function is defined for Optuna to optimize. It contains:
Hyperparameters to be tuned:
optimizer: Chooses between 'adam' and 'rmsprop'.
neurons: Specifies the number of neurons in the hidden layer (between 32 and 128).
batch_size: Specifies the batch size for training (between 16 and 64).
epochs: Specifies the number of training epochs (between 10 and 100).
Inside the objective function:
A simple feed-forward neural network (ANN) is created using Keras:
One hidden layer with a variable number of neurons and a relu activation.
A single output layer for regression.
The model is compiled with a loss function of mean_squared_error and the selected optimizer.
The model is trained using the specified hyperparameters (epochs and batch_size).
The Mean Squared Error (MSE) is calculated on the test data and returned as the metric to be minimized by Optuna.
Optimization with Optuna:

A study is created using optuna.create_study() with the goal to minimize the MSE.
The optimisation runs for 20 trials, exploring different combinations of the hyperparameters defined in the objective function.
Best Hyperparameters:

Once the study completes, the best combination of hyperparameters (optimizer, neurons, batch_size, epochs) and the best MSE are printed.
Purpose:
The goal is to find the best hyperparameters for the neural network model (optimizer, number of neurons, batch size, and epochs) that minimize the MSE on the test data, ensuring the best possible performance for predicting the target variable.

In [44]:
import optuna
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X_leavers, y_leavers, test_size=0.3, random_state=42)

# Scale the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Define the objective function for Optuna
def objective(trial):
    # Hyperparameters to be tuned
    optimizer = trial.suggest_categorical('optimizer', ['adam', 'rmsprop'])
    neurons = trial.suggest_int('neurons', 32, 128)
    batch_size = trial.suggest_int('batch_size', 16, 64)
    epochs = trial.suggest_int('epochs', 10, 100)

    # Create the Keras model
    model = Sequential()
    model.add(Dense(neurons, input_dim=X_train_scaled.shape[1], activation='relu'))
    model.add(Dense(1))  # Output layer for regression
    model.compile(optimizer=optimizer, loss='mean_squared_error')

    # Train the model
    model.fit(X_train_scaled, y_train, epochs=epochs, batch_size=batch_size, verbose=0)

    # Predict and calculate the MSE on the test set
    y_pred = model.predict(X_test_scaled)
    mse = mean_squared_error(y_test, y_pred)

    return mse  # We aim to minimize MSE

# Create a study and optimize it
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=20)

# Output the best hyperparameters
print("Best hyperparameters: ", study.best_params)
print("Best MSE: ", study.best_value)



[I 2024-09-23 08:55:59,202] A new study created in memory with name: no-name-f69b08a1-f504-49fe-9f0d-d4b11fc0c1f1
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m404/404[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step


[I 2024-09-23 08:56:40,738] Trial 0 finished with value: 2255.6711421187574 and parameters: {'optimizer': 'rmsprop', 'neurons': 94, 'batch_size': 34, 'epochs': 40}. Best is trial 0 with value: 2255.6711421187574.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m404/404[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step


[I 2024-09-23 08:56:59,942] Trial 1 finished with value: 2263.8592430864646 and parameters: {'optimizer': 'rmsprop', 'neurons': 43, 'batch_size': 42, 'epochs': 21}. Best is trial 0 with value: 2255.6711421187574.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m404/404[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step


[I 2024-09-23 08:58:02,404] Trial 2 finished with value: 2254.902327413596 and parameters: {'optimizer': 'rmsprop', 'neurons': 68, 'batch_size': 47, 'epochs': 82}. Best is trial 2 with value: 2254.902327413596.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m404/404[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step


[I 2024-09-23 08:59:02,821] Trial 3 finished with value: 2253.7992168423284 and parameters: {'optimizer': 'adam', 'neurons': 99, 'batch_size': 37, 'epochs': 63}. Best is trial 3 with value: 2253.7992168423284.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m404/404[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step


[I 2024-09-23 09:00:05,776] Trial 4 finished with value: 2253.006090115664 and parameters: {'optimizer': 'adam', 'neurons': 121, 'batch_size': 53, 'epochs': 80}. Best is trial 4 with value: 2253.006090115664.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m404/404[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step


[I 2024-09-23 09:01:34,187] Trial 5 finished with value: 2254.877044764769 and parameters: {'optimizer': 'adam', 'neurons': 104, 'batch_size': 32, 'epochs': 51}. Best is trial 4 with value: 2253.006090115664.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m404/404[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step


[I 2024-09-23 09:01:52,387] Trial 6 finished with value: 2260.7255594967114 and parameters: {'optimizer': 'adam', 'neurons': 54, 'batch_size': 55, 'epochs': 26}. Best is trial 4 with value: 2253.006090115664.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m404/404[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step


[I 2024-09-23 09:03:21,821] Trial 7 finished with value: 2253.499593491012 and parameters: {'optimizer': 'rmsprop', 'neurons': 51, 'batch_size': 27, 'epochs': 72}. Best is trial 4 with value: 2253.006090115664.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m404/404[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step


[I 2024-09-23 09:05:42,089] Trial 8 finished with value: 2252.480927465534 and parameters: {'optimizer': 'adam', 'neurons': 127, 'batch_size': 22, 'epochs': 69}. Best is trial 8 with value: 2252.480927465534.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m404/404[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step


[I 2024-09-23 09:06:54,771] Trial 9 finished with value: 2254.6647442811823 and parameters: {'optimizer': 'adam', 'neurons': 42, 'batch_size': 21, 'epochs': 44}. Best is trial 8 with value: 2252.480927465534.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m404/404[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step


[I 2024-09-23 09:07:52,737] Trial 10 finished with value: 2252.142872192731 and parameters: {'optimizer': 'adam', 'neurons': 124, 'batch_size': 64, 'epochs': 99}. Best is trial 10 with value: 2252.142872192731.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m404/404[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step


[I 2024-09-23 09:11:22,755] Trial 11 finished with value: 2249.1038366338084 and parameters: {'optimizer': 'adam', 'neurons': 127, 'batch_size': 16, 'epochs': 100}. Best is trial 11 with value: 2249.1038366338084.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m404/404[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step


[I 2024-09-23 09:12:23,354] Trial 12 finished with value: 2252.3622141676296 and parameters: {'optimizer': 'adam', 'neurons': 111, 'batch_size': 61, 'epochs': 100}. Best is trial 11 with value: 2249.1038366338084.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m404/404[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step


[I 2024-09-23 09:15:57,171] Trial 13 finished with value: 2249.4579214534833 and parameters: {'optimizer': 'adam', 'neurons': 82, 'batch_size': 16, 'epochs': 100}. Best is trial 11 with value: 2249.1038366338084.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m404/404[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step


[I 2024-09-23 09:18:56,338] Trial 14 finished with value: 2250.5223471601216 and parameters: {'optimizer': 'adam', 'neurons': 80, 'batch_size': 17, 'epochs': 90}. Best is trial 11 with value: 2249.1038366338084.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m404/404[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step


[I 2024-09-23 09:21:54,776] Trial 15 finished with value: 2250.4025316068564 and parameters: {'optimizer': 'adam', 'neurons': 84, 'batch_size': 17, 'epochs': 89}. Best is trial 11 with value: 2249.1038366338084.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m404/404[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step


[I 2024-09-23 09:23:54,288] Trial 16 finished with value: 2250.8668674994015 and parameters: {'optimizer': 'adam', 'neurons': 69, 'batch_size': 26, 'epochs': 89}. Best is trial 11 with value: 2249.1038366338084.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m404/404[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step


[I 2024-09-23 09:26:02,499] Trial 17 finished with value: 2250.484392976516 and parameters: {'optimizer': 'adam', 'neurons': 113, 'batch_size': 27, 'epochs': 99}. Best is trial 11 with value: 2249.1038366338084.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m404/404[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step


[I 2024-09-23 09:28:07,821] Trial 18 finished with value: 2255.0626471140467 and parameters: {'optimizer': 'rmsprop', 'neurons': 91, 'batch_size': 16, 'epochs': 61}. Best is trial 11 with value: 2249.1038366338084.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m404/404[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step


[I 2024-09-23 09:30:14,354] Trial 19 finished with value: 2254.8053084699727 and parameters: {'optimizer': 'adam', 'neurons': 64, 'batch_size': 22, 'epochs': 81}. Best is trial 11 with value: 2249.1038366338084.


Best hyperparameters:  {'optimizer': 'adam', 'neurons': 127, 'batch_size': 16, 'epochs': 100}
Best MSE:  2249.1038366338084


revised clean optimiser

In [None]:
import optuna
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X_leavers, y_leavers, test_size=0.3, random_state=42)

# Scale the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

def objective(trial):
    # Hyperparameter search space
    optimizer = trial.suggest_categorical('optimizer', ['adam', 'rmsprop'])
    neurons = trial.suggest_int('neurons', 32, 128)
    batch_size = trial.suggest_int('batch_size', 16, 64)
    epochs = trial.suggest_int('epochs', 10, 100)
    
    # Build the model using Input layer to avoid warnings
    model = Sequential()
    model.add(Input(shape=(X_train_scaled.shape[1],)))  # Use Input layer instead of input_dim
    model.add(Dense(neurons, activation='relu'))
    model.add(Dense(1))  # Output layer for regression

    model.compile(optimizer=optimizer, loss='mean_squared_error')

    # Train the model
    model.fit(X_train_scaled, y_train, epochs=epochs, batch_size=batch_size, verbose=0)

    # Predict on the test set
    y_pred = model.predict(X_test_scaled)
    mse = mean_squared_error(y_test, y_pred)

    return mse


# Create a study and optimize it
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=20)

# Output the best hyperparameters
print("Best hyperparameters: ", study.best_params)
print("Best MSE: ", study.best_value)



Results : the hyperparameter optimisation process using Optuna has found the following best hyperparameters forthe model:

Optimizer: adam
Neurons: 127 (almost the upper boundary of the search space)
Batch Size: 16 (smaller batch sizes often allow for more detailed updates to the weights)
Epochs: 100 (the maximum value in the range, indicating that the model benefits from extended training)
The best mean squared error (MSE) obtained was 2249.10.

Interpretation:
Optimizer (Adam): Adam is a widely used optimizer for neural networks as it combines the advantages of two other popular optimizers: AdaGrad and RMSProp. It adjusts the learning rate dynamically, which often results in faster and more stable convergence.
Neurons (127): The model converged on a large number of neurons for the hidden layer. More neurons allow the network to capture more complex relationships between features but also increase the risk of overfitting if not regularized properly.
Batch Size (16): A smaller batch size often means noisier but more frequent updates to the model weights, which can lead to faster convergence but also requires more computational resources.
Epochs (100): Training for the maximum allowed number of epochs shows that the model likely continues to learn and reduce the error over time, which suggests that it might benefit from even longer training.

Updated ANN Code with Best Hyperparameters:

Changes Made Based on Best Hyperparameters:
Neurons: Set to 127 in the first hidden layer.
Batch Size: Set to 16.
Epochs: Set to 100.
Optimizer: Adam (as found to be the best in the hyperparameter tuning).

In [46]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.optimizers import Adam

In [None]:
# Step 1: Load your dataset
df = pd.read_excel('swimclass_rawdata.xlsx')


In [47]:
# Step 2: Prepare the data for leavers (ignore current members for now)
leavers_df = df[df['Current_member'] == 0].copy()  # Use .copy() to avoid SettingWithCopyWarning

# Step 3: Define the class columns
lesson_columns = ['PreSchool 1', 'PreSchool 2', 'PreSchool 3', 'PreSchool 4', 'PreSchool 5',
                  'PreSchool 6', 'Academy 1', 'Academy 2', 'Academy 3', 'Academy 4', 
                  'Academy 5', 'Academy 6', 'BEGINNERS', 'INTERMEDIATE']

# Step 4: Calculate Variance to Median (VTM) for each class column
for col in lesson_columns:
    median_value = leavers_df.loc[leavers_df[col] > 0, col].median()  # Exclude zero values
    leavers_df.loc[:, f'{col}_VTM'] = leavers_df[col] - median_value  # Proper use of .loc to avoid warnings

# Step 5: Train on progressively reduced class data
for classes_used in range(len(lesson_columns), 1, -3):  # Remove class data progressively, 3 at a time
    selected_classes = lesson_columns[:classes_used]
    features = [f'{col}_VTM' for col in selected_classes] + ['TOTAL QUANTITY OF LESSONS']
    
    # Step 6: Define the target (Total lessons we want to predict)
    leavers_df['PAST_LESSONS'] = leavers_df[lesson_columns].sum(axis=1)  # Known lessons taken so far
    
    # Remove TOTAL_QUANTITY_OF_LESSONS for training
    X_leavers = leavers_df[features].drop(columns=['TOTAL QUANTITY OF LESSONS'])
    y_leavers = leavers_df['TOTAL QUANTITY OF LESSONS']  # Total lessons as target
    
    # Step 7: Split data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X_leavers, y_leavers, test_size=0.3, random_state=42)
    
    # Step 8: Scale the data
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    # Step 9: Build the ANN model using the best hyperparameters from Optuna
    model = Sequential()
    model.add(Input(shape=(X_train_scaled.shape[1],)))  # Use Input(shape) for the first layer
    model.add(Dense(127, activation='relu'))  # Use 127 neurons based on Optuna results
    model.add(Dense(64, activation='relu'))   # 2nd hidden layer
    model.add(Dense(32, activation='relu'))   # 3rd hidden layer
    model.add(Dense(1))  # Output layer for regression (single target)

    # Step 10: Compile the model using Adam optimizer
    optimizer = Adam()
    model.compile(optimizer=optimizer, loss='mean_squared_error')
    
    # Step 11: Train the model using the best batch size and epochs
    model.fit(X_train_scaled, y_train, epochs=100, batch_size=16, verbose=2, validation_split=0.2)
    
    # Step 12: Make predictions and evaluate the model
    y_pred = model.predict(X_test_scaled)
    
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    
    # Print results for each iteration
    print(f"Classes used: {classes_used} | Mean Squared Error: {mse:.2f} | R-squared: {r2:.2f}")
    
    # Optional: Compare actual vs predicted total lessons
    results_df = pd.DataFrame({'Actual_TOTAL_LESSONS': y_test, 'Predicted_TOTAL_LESSONS': y_pred.flatten()})
    results_df['DIFFERENCE'] = results_df['Predicted_TOTAL_LESSONS'] - results_df['Actual_TOTAL_LESSONS']
    print(results_df.head())


Epoch 1/100
1509/1509 - 5s - 3ms/step - loss: 220.5011 - val_loss: 13.3238
Epoch 2/100
1509/1509 - 3s - 2ms/step - loss: 4.6661 - val_loss: 8.6396
Epoch 3/100
1509/1509 - 3s - 2ms/step - loss: 2.8487 - val_loss: 1.8805
Epoch 4/100
1509/1509 - 5s - 3ms/step - loss: 2.8752 - val_loss: 1.6906
Epoch 5/100
1509/1509 - 2s - 2ms/step - loss: 2.3150 - val_loss: 4.5611
Epoch 6/100
1509/1509 - 2s - 2ms/step - loss: 2.0016 - val_loss: 1.7564
Epoch 7/100
1509/1509 - 2s - 2ms/step - loss: 2.2598 - val_loss: 4.1352
Epoch 8/100
1509/1509 - 3s - 2ms/step - loss: 1.7847 - val_loss: 1.9785
Epoch 9/100
1509/1509 - 2s - 2ms/step - loss: 1.7392 - val_loss: 1.1422
Epoch 10/100
1509/1509 - 2s - 2ms/step - loss: 2.2629 - val_loss: 0.8827
Epoch 11/100
1509/1509 - 3s - 2ms/step - loss: 1.3024 - val_loss: 2.5697
Epoch 12/100
1509/1509 - 2s - 2ms/step - loss: 1.5353 - val_loss: 2.5259
Epoch 13/100
1509/1509 - 2s - 2ms/step - loss: 1.4062 - val_loss: 2.1284
Epoch 14/100
1509/1509 - 2s - 2ms/step - loss: 1.3901 - v

1509/1509 - 2s - 2ms/step - loss: 11.3922 - val_loss: 9.5188
Epoch 7/100
1509/1509 - 2s - 2ms/step - loss: 11.7905 - val_loss: 8.9450
Epoch 8/100
1509/1509 - 2s - 2ms/step - loss: 11.0213 - val_loss: 9.4896
Epoch 9/100
1509/1509 - 2s - 2ms/step - loss: 11.5141 - val_loss: 11.3660
Epoch 10/100
1509/1509 - 2s - 2ms/step - loss: 11.0538 - val_loss: 10.6995
Epoch 11/100
1509/1509 - 2s - 2ms/step - loss: 10.7906 - val_loss: 9.8024
Epoch 12/100
1509/1509 - 3s - 2ms/step - loss: 11.0463 - val_loss: 11.1155
Epoch 13/100
1509/1509 - 2s - 2ms/step - loss: 10.6585 - val_loss: 8.7053
Epoch 14/100
1509/1509 - 2s - 2ms/step - loss: 11.5449 - val_loss: 10.7944
Epoch 15/100
1509/1509 - 2s - 2ms/step - loss: 10.1927 - val_loss: 9.0600
Epoch 16/100
1509/1509 - 2s - 2ms/step - loss: 10.5383 - val_loss: 10.2212
Epoch 17/100
1509/1509 - 2s - 2ms/step - loss: 10.4383 - val_loss: 8.1885
Epoch 18/100
1509/1509 - 3s - 2ms/step - loss: 10.2943 - val_loss: 14.8674
Epoch 19/100
1509/1509 - 3s - 2ms/step - loss: 1

1509/1509 - 3s - 2ms/step - loss: 258.3735 - val_loss: 236.2105
Epoch 11/100
1509/1509 - 3s - 2ms/step - loss: 255.2215 - val_loss: 246.5443
Epoch 12/100
1509/1509 - 5s - 4ms/step - loss: 256.0258 - val_loss: 239.5229
Epoch 13/100
1509/1509 - 3s - 2ms/step - loss: 254.9734 - val_loss: 232.1377
Epoch 14/100
1509/1509 - 3s - 2ms/step - loss: 255.2583 - val_loss: 232.6532
Epoch 15/100
1509/1509 - 3s - 2ms/step - loss: 251.9284 - val_loss: 245.5881
Epoch 16/100
1509/1509 - 5s - 3ms/step - loss: 253.2455 - val_loss: 238.2728
Epoch 17/100
1509/1509 - 3s - 2ms/step - loss: 251.8938 - val_loss: 254.2750
Epoch 18/100
1509/1509 - 3s - 2ms/step - loss: 251.5737 - val_loss: 236.9374
Epoch 19/100
1509/1509 - 3s - 2ms/step - loss: 252.0130 - val_loss: 244.9591
Epoch 20/100
1509/1509 - 3s - 2ms/step - loss: 250.9172 - val_loss: 233.9496
Epoch 21/100
1509/1509 - 3s - 2ms/step - loss: 247.9893 - val_loss: 234.2617
Epoch 22/100
1509/1509 - 2s - 2ms/step - loss: 251.0495 - val_loss: 238.7755
Epoch 23/100

Epoch 10/100
1509/1509 - 3s - 2ms/step - loss: 793.2783 - val_loss: 770.9091
Epoch 11/100
1509/1509 - 5s - 3ms/step - loss: 787.7170 - val_loss: 753.2531
Epoch 12/100
1509/1509 - 2s - 2ms/step - loss: 788.0262 - val_loss: 772.6821
Epoch 13/100
1509/1509 - 2s - 2ms/step - loss: 787.5667 - val_loss: 768.2220
Epoch 14/100
1509/1509 - 3s - 2ms/step - loss: 783.2938 - val_loss: 753.4684
Epoch 15/100
1509/1509 - 3s - 2ms/step - loss: 780.1913 - val_loss: 788.0505
Epoch 16/100
1509/1509 - 3s - 2ms/step - loss: 784.5383 - val_loss: 756.8053
Epoch 17/100
1509/1509 - 2s - 2ms/step - loss: 781.4944 - val_loss: 766.0422
Epoch 18/100
1509/1509 - 2s - 2ms/step - loss: 781.6238 - val_loss: 743.0609
Epoch 19/100
1509/1509 - 3s - 2ms/step - loss: 779.9602 - val_loss: 795.2603
Epoch 20/100
1509/1509 - 2s - 2ms/step - loss: 775.9935 - val_loss: 746.7892
Epoch 21/100
1509/1509 - 2s - 2ms/step - loss: 778.0149 - val_loss: 759.0793
Epoch 22/100
1509/1509 - 3s - 2ms/step - loss: 780.6810 - val_loss: 756.1238

Epoch 10/100
1509/1509 - 2s - 2ms/step - loss: 2312.2263 - val_loss: 2289.5181
Epoch 11/100
1509/1509 - 3s - 2ms/step - loss: 2313.6382 - val_loss: 2278.5383
Epoch 12/100
1509/1509 - 3s - 2ms/step - loss: 2311.6509 - val_loss: 2273.5703
Epoch 13/100
1509/1509 - 2s - 2ms/step - loss: 2310.6736 - val_loss: 2270.8738
Epoch 14/100
1509/1509 - 2s - 2ms/step - loss: 2313.0371 - val_loss: 2269.6497
Epoch 15/100
1509/1509 - 2s - 2ms/step - loss: 2310.4836 - val_loss: 2292.1177
Epoch 16/100
1509/1509 - 3s - 2ms/step - loss: 2311.6963 - val_loss: 2276.3057
Epoch 17/100
1509/1509 - 3s - 2ms/step - loss: 2310.4504 - val_loss: 2268.5686
Epoch 18/100
1509/1509 - 3s - 2ms/step - loss: 2307.8704 - val_loss: 2267.5649
Epoch 19/100
1509/1509 - 3s - 2ms/step - loss: 2307.8879 - val_loss: 2266.3069
Epoch 20/100
1509/1509 - 2s - 2ms/step - loss: 2309.3774 - val_loss: 2275.7422
Epoch 21/100
1509/1509 - 2s - 2ms/step - loss: 2307.7131 - val_loss: 2273.6089
Epoch 22/100
1509/1509 - 2s - 2ms/step - loss: 2308.

Results summary from ANN with hyerparameters post tuned attempt

Classes used: 14 | Mean Squared Error: 0.82 | R-squared: 1.00
       Actual_TOTAL_LESSONS  Predicted_TOTAL_LESSONS  DIFFERENCE
20075                    39                39.052773    0.052773
30546                   105               105.231911    0.231911
20806                    35                35.077221    0.077221
39438                    30                30.023224    0.023224
18229                   134               134.296921    0.296921


Classes used: 11 | Mean Squared Error: 8.21 | R-squared: 1.00
       Actual_TOTAL_LESSONS  Predicted_TOTAL_LESSONS  DIFFERENCE
20075                    39                38.634995   -0.365005
30546                   105               104.251366   -0.748634
20806                    35                35.083950    0.083950
39438                    30                29.920155   -0.079845
18229                   134               133.617310   -0.382690

Classes used: 8 | Mean Squared Error: 212.28 | R-squared: 0.92
       Actual_TOTAL_LESSONS  Predicted_TOTAL_LESSONS  DIFFERENCE
20075                    39                40.080444    1.080444
30546                   105               117.914139   12.914139
20806                    35                33.823544   -1.176456
39438                    30                30.939425    0.939425
18229                   134               131.397751   -2.602249

Classes used: 5 | Mean Squared Error: 702.24 | R-squared: 0.74
       Actual_TOTAL_LESSONS  Predicted_TOTAL_LESSONS  DIFFERENCE
20075                    39                50.266125   11.266125
30546                   105               118.840439   13.840439
20806                    35                56.597195   21.597195
39438                    30                38.875069    8.875069
18229                   134               135.221466    1.221466

Classes used: 2 | Mean Squared Error: 2253.07 | R-squared: 0.16
       Actual_TOTAL_LESSONS  Predicted_TOTAL_LESSONS  DIFFERENCE
20075                    39                47.526028    8.526028
30546                   105                38.257729  -66.742271
20806                    35                47.526028   12.526028
39438                    30                47.526028   17.526028
18229                   134                97.894379  -36.105621

Results from ANN pretuning

Classes used: 14 | Mean Squared Error: 0.88 | R-squared: 1.00
       Actual_TOTAL_LESSONS  Predicted_TOTAL_LESSONS  DIFFERENCE
20075                    39                39.270248    0.270248
30546                   105               105.376823    0.376823
20806                    35                35.337696    0.337696
39438                    30                30.255575    0.255575
18229                   134               134.399765    0.399765

Classes used: 11 | Mean Squared Error: 10.60 | R-squared: 1.00
       Actual_TOTAL_LESSONS  Predicted_TOTAL_LESSONS  DIFFERENCE
20075                    39                39.504517    0.504517
30546                   105               105.519890    0.519890
20806                    35                34.829407   -0.170593
39438                    30                30.320866    0.320866
18229                   134               135.448410    1.448410

Classes used: 8 | Mean Squared Error: 214.65 | R-squared: 0.92
       Actual_TOTAL_LESSONS  Predicted_TOTAL_LESSONS  DIFFERENCE
20075                    39                36.863304   -2.136696
30546                   105               114.073128    9.073128
20806                    35                33.848129   -1.151871
39438                    30                28.835672   -1.164328
18229                   134               129.458054   -4.541946

Classes used: 5 | Mean Squared Error: 700.22 | R-squared: 0.74
       Actual_TOTAL_LESSONS  Predicted_TOTAL_LESSONS  DIFFERENCE
20075                    39                48.857971    9.857971
30546                   105               118.195877   13.195877
20806                    35                56.994919   21.994919
39438                    30                37.618988    7.618988
18229                   134               140.749390    6.749390

Classes used: 2 | Mean Squared Error: 2245.45 | R-squared: 0.16
       Actual_TOTAL_LESSONS  Predicted_TOTAL_LESSONS  DIFFERENCE
20075                    39                45.878181    6.878181
30546                   105                40.134293  -64.865707
20806                    35                45.878181   10.878181
39438                    30                45.878181   15.878181
18229                   134               100.031418  -33.968582

Summary of tuning approach for ANN and benefits

Original ANN (Pre-tuning) vs. Post-tuned Model:
Classes used: 14

Original MSE: 0.88 | Post-tuning MSE: 0.82
Original R-squared: 1.00 | Post-tuning R-squared: 1.00
Differences:
The MSE improved slightly (from 0.88 to 0.82), showing that tuning the hyperparameters made a small improvement.
Prediction errors reduced slightly (e.g., row 20075 from 0.27 to 0.05).
Overall, this is an incremental improvement in prediction accuracy.
Classes used: 11

Original MSE: 10.60 | Post-tuning MSE: 8.21
Original R-squared: 1.00 | Post-tuning R-squared: 1.00
Differences:
The MSE improved from 10.60 to 8.21, indicating better accuracy.
Some predictions shifted slightly closer to the actual values (e.g., row 30546 from 0.52 to -0.75), reflecting a reduction in error.
Classes used: 8

Original MSE: 214.65 | Post-tuning MSE: 212.28
Original R-squared: 0.92 | Post-tuning R-squared: 0.92
Differences:
Very small improvement in MSE (from 214.65 to 212.28).
Prediction errors for certain classes reduced (e.g., row 39438, from -1.16 to 0.93), but others remained large (e.g., row 30546 with 9.07 to 12.91).
Classes used: 5

Original MSE: 700.22 | Post-tuning MSE: 702.24
Original R-squared: 0.74 | Post-tuning R-squared: 0.74
Differences:
MSE remained essentially the same, with a very small increase (from 700.22 to 702.24).
Errors for certain classes remained high (e.g., row 20075 increased from 9.85 to 11.27).
Classes used: 2

Original MSE: 2245.45 | Post-tuning MSE: 2253.07
Original R-squared: 0.16 | Post-tuning R-squared: 0.16
Differences:
MSE worsened slightly (from 2245.45 to 2253.07), reflecting no meaningful improvement.
Predictions still far from actual values (e.g., row 30546 error worsened from -64.86 to -66.74).
Overall Comparison:
Significant Improvements:

Classes used: 14 and 11: Clear improvements in MSE and reduction in prediction errors. Post-tuning, these models perform more accurately than before.
Marginal Changes:

Classes used: 8: Minimal improvement. While the MSE decreased slightly, prediction errors remain large for some rows.
No Improvement or Regression:

Classes used: 5 and 2: These models saw no real improvement and, in some cases, worsened slightly after hyperparameter tuning. The MSE remained high, and errors in prediction stayed large.




Conclusion:
Hyperparameter tuning led to marginal improvements for larger class sets (14, 11), but the performance gains diminish significantly as the number of classes reduces (8, 5, 2). The overall impact of tuning was positive for complex cases but insufficient for models trained with fewer class data.

Overall comparison of models used and best approach:

1. Classes Used: 8
Model	Mean Squared Error (MSE)	R-squared	Comments
Linear	251.91	0.91	Decent performance but struggles with high deviations.
Random Forest	239.60	0.91	Similar to linear; moderate prediction accuracy.
XGBoost	224.35	0.92	Good performance with lower MSE and slightly better R².
ANN Tuned	212.28	0.92	Best performance; captures patterns better with low MSE.

Best Model: ANN Tuned
Reason: It has the lowest MSE (212.28) with the same R² (0.92) as XGBoost, indicating a slight edge over other models.

2. Classes Used: 5
Model	Mean Squared Error (MSE)	R-squared	Comments
Linear	917.67	0.66	High error, poor performance, large deviations.
Random Forest	784.51	0.71	Moderate improvement over linear, but still high errors.
XGBoost	706.84	0.74	Better than Random Forest, though still substantial errors.
ANN Tuned	702.24	0.74	Marginally the best; lowest MSE but similar to XGBoost.

Best Model: ANN Tuned
Reason: Though the difference between ANN and XGBoost is marginal, ANN has the lowest MSE (702.24), giving it a slight advantage for predicting in limited data cases.

3. Classes Used: 2
Model	Mean Squared Error (MSE)	R-squared	Comments
Linear	2370.54	0.11	Poor performance with significant errors.
Random Forest	2290.42	0.14	Slightly better than linear, but still very high error.
XGBoost	2239.14	0.16	Best among simpler models but still substantial errors.
ANN Tuned	2253.07	0.16	Similar to XGBoost but slightly higher MSE.

Best Model: XGBoost
Reason: While both XGBoost and ANN are close, XGBoost has the lowest MSE (2239.14) and the highest R² (0.16), making it slightly better for very limited data.

Summary of Best Models for Limited Data
Classes Used: 8: ANN Tuned performs the best.
Classes Used: 5: ANN Tuned edges out other models with the lowest MSE.
Classes Used: 2: XGBoost is slightly better than ANN in this very limited data scenario.

Overall approaxh:
ANN Tuned is model of choice for Classes Used: 8 and 5, as it performs better with low MSE and good R².
For Classes Used: 2, XGBoost performs marginally better, but the difference with ANN is minor. Either model could be acceptable.