# Question B4 (10 marks)

Model degradation is a common issue faced when deploying machine learning models (including neural networks) in the real world. New data points could exhibit a different pattern from older data points due to factors such as changes in government policy or market sentiments. For instance, housing prices in Singapore have been increasing and the Singapore government has introduced 3 rounds of cooling measures over the past years (16 December 2021, 30 September 2022, 27 April 2023).

In such situations, the distribution of the new data points could differ from the original data distribution which the models were trained on. Recall that machine learning models often work with the assumption that the test distribution should be similar to train distribution. When this assumption is violated, model performance will be adversely impacted.  In the last part of this assignment, we will investigate to what extent model degradation has occurred.




---



---



Your co-investigators used a linear regression model to rapidly test out several combinations of train/test splits and shared with you their findings in a brief report attached in Appendix A below. You wish to investigate whether your deep learning model corroborates with their findings.

In [39]:
pip install alibi-detect


You should consider upgrading via the '/usr/local/bin/python3 -m pip install --upgrade pip' command.[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.


In [22]:
pip install pytorch-tabular


You should consider upgrading via the '/usr/local/bin/python3 -m pip install --upgrade pip' command.[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.


In [41]:
pip install torch-optimizer


You should consider upgrading via the '/usr/local/bin/python3 -m pip install --upgrade pip' command.[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.


In [42]:
SEED = 42

import os

import random
random.seed(SEED)

import numpy as np
np.random.seed(SEED)

import pandas as pd
import pytorch_tabular
import math
from sklearn.metrics import r2_score, mean_squared_error
from alibi_detect.cd import TabularDrift

1.Evaluate your model from B1 on data from year 2022 and report the test R2.

In [43]:
from pytorch_tabular import TabularModel
from pytorch_tabular.models import CategoryEmbeddingModelConfig
from pytorch_tabular.config import (
    DataConfig,
    OptimizerConfig,
    TrainerConfig,
)

In [44]:
df = pd.read_csv('hdb_price_prediction.csv')

# TODO: Enter your code here
# Training Data Set: Year 2019 and before
df_train = df[df['year'] <= 2019].copy()
# Validation Data Set: Year 2020
df_val = df[df['year'] == 2020].copy()
# Testing Data Set: Year 2022
df_test_2022 = df[df['year'] == 2022].copy()
df_test_2023 = df[df['year'] == 2023].copy()

# Dropping Unncessary Columns
df_train.drop(columns=['year','full_address'], inplace=True)
df_val.drop(columns=['year','full_address'], inplace=True)
df_test_2022.drop(columns=['year','full_address'], inplace=True)
df_test_2023.drop(columns=['year','full_address'], inplace=True)

print("Training Data (2019):", df_train.shape)
print("Testing Data (2022):", df_test_2022.shape)
print("Testing Data (2023):", df_test_2023.shape)


Training Data (2019): (64057, 12)
Testing Data (2022): (26702, 12)
Testing Data (2023): (16424, 12)


In [45]:
num_col_names = ['dist_to_nearest_stn','dist_to_dhoby','degree_centrality','eigenvector_centrality',
                 'remaining_lease_years','floor_area_sqm']
cat_col_names = ['month','town','flat_model_type','storey_range']

In [46]:
data_config = DataConfig(
    target=["resale_price"],  
    continuous_cols=num_col_names,
    categorical_cols=cat_col_names,
)
trainer_config = TrainerConfig(
    auto_lr_find=True,  # Runs the LRFinder to automatically derive a learning rate
    batch_size=1024,
    max_epochs=50,
)
optimizer_config = OptimizerConfig()

model_config = CategoryEmbeddingModelConfig(
    task="regression",
    layers="50",  
)

tabular_model = TabularModel(
    data_config=data_config,
    model_config=model_config,
    optimizer_config=optimizer_config,
    trainer_config=trainer_config,
)

In [47]:
from torch_optimizer import QHAdam
# Training Tabular Model
tabular_model.fit(df_train, 
                  validation=df_val, 
                  optimizer=QHAdam)

Seed set to 42


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  X_encoded[col].fillna(self._imputed, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  X_encoded[col].fillna(self._imputed, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are settin

GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:639: Checkpoint directory /Users/mihirbhupathiraju/Desktop/sc4001/saved_models exists and is not empty.
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.


Finding best initial lr:   0%|          | 0/100 [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_steps=100` reached.
Learning rate set to 0.5754399373371567
Restoring states from the checkpoint path at /Users/mihirbhupathiraju/Desktop/sc4001/.lr_find_d56bf353-9484-4803-98ed-d0c3184a6ba5.ckpt
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/lightning_fabric/utilities/cloud_io.py:56: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We 

Output()

  return torch.load(f, map_location=map_location)


<pytorch_lightning.trainer.trainer.Trainer at 0x2ab179cf0>

In [48]:
evaluation = tabular_model.evaluate(df_test_2022)
predicted = tabular_model.predict(df_test_2022)

Output()

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  X_encoded[col].fillna(self._imputed, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  X_encoded[col].fillna(self._imputed, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are settin

In [49]:
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error
predictions_2022 = tabular_model.predict(df_test_2022)

# Extract the actual target values from the 2022 test dataset
tar_val_2022 = df_test_2022['resale_price']

# Calculate RMSE and R² for 2022
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np

# Calculate RMSE for 2022
rmse_2022 = np.sqrt(mean_squared_error(tar_val_2022, predictions_2022))

# Calculate R² for 2022
r2_2022 = r2_score(tar_val_2022, predictions_2022)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  X_encoded[col].fillna(self._imputed, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  X_encoded[col].fillna(self._imputed, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are settin

In [50]:
# Print the results
print(f"Test RMSE for 2022: {rmse_2022}")
print(f"Test R² for 2022: {r2_2022}")

Test RMSE for 2022: 127883.69476366475
Test R² for 2022: 0.4358391314704636


2.Evaluate your model from B1 on data from year 2023 and report the test R2.

In [51]:
# TODO: Enter your code here

# Testing Data Set: Year 2023
df_test_2 = df[df['year'] == 2023].copy()
df_test_2.drop(columns=['year','full_address','nearest_stn'], inplace=True)

print("Training Data (2019):", df_train.shape)
print("Testing Data (2023):", df_test_2.shape)

Training Data (2019): (64057, 12)
Testing Data (2023): (16424, 11)


In [52]:
evaluation_2 = tabular_model.evaluate(df_test_2)
predicted_2 = tabular_model.predict(df_test_2)

Output()

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  X_encoded[col].fillna(self._imputed, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  X_encoded[col].fillna(self._imputed, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are settin

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  X_encoded[col].fillna(self._imputed, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  X_encoded[col].fillna(self._imputed, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are settin

In [53]:
# Make predictions on the 2022 test set
predictions_2023 = tabular_model.predict(df_test_2)

# Extract the actual target values from the 2022 test dataset
tar_val_2023 = df_test_2['resale_price']

# Calculate RMSE and R² for 2022
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np

# Calculate RMSE for 2022
rmse_2023 = np.sqrt(mean_squared_error(tar_val_2023, predictions_2023))

# Calculate R² for 2022
r2_2023 = r2_score(tar_val_2023, predictions_2023)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  X_encoded[col].fillna(self._imputed, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  X_encoded[col].fillna(self._imputed, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are settin

In [54]:
# Print the results
print(f"Test RMSE for 2023: {rmse_2023}")
print(f"Test R² for 2023: {r2_2023}")

Test RMSE for 2023: 156263.96101348396
Test R² for 2023: 0.17172324635759506


3.Did model degradation occur for the deep learning model?


Yes Model Degradation occurred, the R^2 value when the model was tested for 2022 was 0.4358391314704636 then dropped to 0.17172324635759506 for 2023. 

In [None]:
# YOUR ANSWER HERE



---



---



4.Model degradation could be caused by [various data distribution shifts](https://huyenchip.com/2022/02/07/data-distribution-shifts-and-monitoring.html#data-shift-types): covariate shift (features), label shift and/or concept drift (altered relationship between features and labels).
There are various conflicting terminologies in the [literature](https://www.sciencedirect.com/science/article/pii/S0950705122002854#tbl1). Let’s stick to this reference for this assignment.

> Using the **Alibi Detect** library, apply the **TabularDrift** function with the training data (year 2019 and before) used as the reference and **detect which features have drifted** in the 2023 test dataset. Before running the statistical tests, ensure you **sample 1000 data points** each from the train and test data. Do not use the whole train/test data. (Hint: use this example as a guide https://docs.seldon.io/projects/alibi-detect/en/stable/examples/cd_chi2ks_adult.html)


In [55]:
# YOUR CODE HERE
# Dropping Resale Price since its the target
train_copy = df_train.copy()
train_copy.drop(columns=['resale_price','month'],inplace=True)

test_copy = df_test_2022.copy()
test_copy.drop(columns=['resale_price','month'],inplace=True)

feature_names = train_copy.columns
feature_names

Index(['town', 'nearest_stn', 'dist_to_nearest_stn', 'dist_to_dhoby',
       'degree_centrality', 'eigenvector_centrality', 'flat_model_type',
       'remaining_lease_years', 'floor_area_sqm', 'storey_range'],
      dtype='object')

In [56]:
# TODO: Enter your code here
sample_train = train_copy.sample(1000, random_state = 42)
sample_test = test_copy.sample(1000, random_state = 42)

categories_per_feature = {f: None for f in range(sample_train.values.shape[1])}
cd = TabularDrift(sample_train.values, 
                  p_val=.05, 
                  categories_per_feature=categories_per_feature)
preds = cd.predict(sample_test.values)
labels = ['No!', 'Yes!']
print('Drift? {}'.format(labels[preds['data']['is_drift']]))

Drift? Yes!


In [57]:
fpreds = cd.predict(sample_test.values, drift_type='feature')
for f in range(cd.n_features):
    stat = 'Chi2' if f in list(categories_per_feature.keys()) else 'K-S'
    fname = feature_names[f]
    is_drift = fpreds['data']['is_drift'][f]
    stat_val, p_val = fpreds['data']['distance'][f], fpreds['data']['p_val'][f]
    print(f'{fname} -- Drift? {labels[is_drift]} -- {stat} {stat_val:.3f} -- p-value {p_val:.3f}')

town -- Drift? Yes! -- Chi2 44.147 -- p-value 0.010
nearest_stn -- Drift? Yes! -- Chi2 100.058 -- p-value 0.040
dist_to_nearest_stn -- Drift? No! -- Chi2 1787.333 -- p-value 0.251
dist_to_dhoby -- Drift? No! -- Chi2 1787.333 -- p-value 0.251
degree_centrality -- Drift? No! -- Chi2 3.300 -- p-value 0.348
eigenvector_centrality -- Drift? Yes! -- Chi2 100.058 -- p-value 0.040
flat_model_type -- Drift? Yes! -- Chi2 66.537 -- p-value 0.000
remaining_lease_years -- Drift? Yes! -- Chi2 833.791 -- p-value 0.000
floor_area_sqm -- Drift? Yes! -- Chi2 154.319 -- p-value 0.007
storey_range -- Drift? No! -- Chi2 19.503 -- p-value 0.108


5.Assuming that the flurry of housing measures have made an impact on the relationship between all the features and resale_price (i.e. P(Y|X) changes), which type of data distribution shift possibly led to model degradation?


The housing measures may have led to a covariate shift in the data distribution, where the feature distributions change while the relationship between features and the target variable (resale price) remains the same. Additionally, a label shift could occur if the conditional distribution of the target variable given the features changes (i.e., the probability of resale prices for given features changes due to market interventions). Both shifts can significantly impact model performance and lead to degradation.

In [None]:
# YOUR ANSWER HERE

6.From your analysis via TabularDrift, which features contribute to this shift?




In this analysis, the p-value accompanies the drift results. For features exhibiting drift, their p-values are less than 0.05 (the significance level). Consequently, we reject the null hypothesis, which posits no change in the feature's distribution, and conclude that there is substantial evidence of data drift in those features.

The features are "town", "nearest_stn", "eigenvector_centrality", "flat_model_type", "remaining_lease_years", "floor_area_sqm". 

7.Suggest 1 way to address model degradation and implement it, showing improved test R2 for year 2023.


We should be training data that is closer to the year of testing data and validation data

training data to be before and inclusive of year 2021, validation data to be in year 2022, testing data to be in year 2023 

In [58]:
# YOUR CODE HERE
# The Choice of the 2021 and before is according to Appendix A
final_train = df[(df['year'] <= 2021)]
final_val = df[df['year'] == 2022]
final_test = df[df['year'] == 2023]

data_config = DataConfig(
    target=["resale_price"],  
    continuous_cols=num_col_names,
    categorical_cols=cat_col_names,
)
trainer_config = TrainerConfig(
    auto_lr_find=True,  # Runs the LRFinder to automatically derive a learning rate
    batch_size=1024,
    max_epochs=50,
)
optimizer_config = OptimizerConfig()

model_config = CategoryEmbeddingModelConfig(
    task="regression",
    layers="50",  
)

tabular_model = TabularModel(
    data_config=data_config,
    model_config=model_config,
    optimizer_config=optimizer_config,
    trainer_config=trainer_config,
)

In [59]:
# Training Tabular Model
tabular_model.fit(final_train, 
                  validation=final_val, 
                  optimizer=QHAdam)

Seed set to 42


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  X_encoded[col].fillna(self._imputed, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  X_encoded[col].fillna(self._imputed, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are settin

GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:639: Checkpoint directory /Users/mihirbhupathiraju/Desktop/sc4001/saved_models exists and is not empty.
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.


Finding best initial lr:   0%|          | 0/100 [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_steps=100` reached.
Learning rate set to 0.5754399373371567
Restoring states from the checkpoint path at /Users/mihirbhupathiraju/Desktop/sc4001/.lr_find_647a7cbf-9e1e-46ed-9829-66cc41286735.ckpt
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/lightning_fabric/utilities/cloud_io.py:56: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We 

Output()

  return torch.load(f, map_location=map_location)


<pytorch_lightning.trainer.trainer.Trainer at 0x2b6121c60>

In [60]:
final_evaluation = tabular_model.evaluate(final_test)
final_predicted = tabular_model.predict(final_test)


Output()

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  X_encoded[col].fillna(self._imputed, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  X_encoded[col].fillna(self._imputed, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are settin

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  X_encoded[col].fillna(self._imputed, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  X_encoded[col].fillna(self._imputed, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are settin

In [61]:
# Make predictions on the 2022 test set
predictions_2023 = tabular_model.predict(final_test)

# Extract the actual target values from the 2022 test dataset
tar_val_2023 = final_test['resale_price']

# Calculate RMSE and R² for 2022
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np

# Calculate RMSE for 2022
rmse_2023 = np.sqrt(mean_squared_error(tar_val_2023, predictions_2023))

# Calculate R² for 2022
r2_2023 = r2_score(tar_val_2023, predictions_2023)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  X_encoded[col].fillna(self._imputed, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  X_encoded[col].fillna(self._imputed, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are settin

In [62]:
print(f"Test RMSE for 2023: {rmse_2023}")
print(f"Test R² for 2023: {r2_2023}")

Test RMSE for 2023: 126610.74850597786
Test R² for 2023: 0.4562503782993892


### Appendix A



Here are our results from a linear regression model. We used StandardScaler for continuous variables and OneHotEncoder for categorical variables.

While 2021 data can be predicted well, test R2 dropped rapidly for 2022 and 2023.

| Training set | Test set | Test R2 |
|--------------|----------|---------|
| Year <= 2020 | 2021     | 0.76    |
| Year <= 2020 | **2022**     | 0.41    |
| Year <= 2020 | **2023**     | **0.10**   |



Similarly, a model trained on 2017 data can predict 2018-2021 well (with slight degradation in performance for 2021), but drops drastically in 2022 and 2023.

| Training set | Test set | Test R2 |
|--------------|----------|---------|
| 2017         | 2018     | 0.90    |
|              | 2019     | 0.89    |
|              | 2020     | 0.87    |
|              | 2021     | 0.72    |
|              | **2022**     | **0.37**    |
|              | **2023**     | **0.09**    |

With the test set fixed at year 2021, training on data from 2017-2020 still works well on the test data, with minimal degradation. Training sets closer to year 2021 generally do better.

| Training set | Test set | Test R2 |
|--------------|----------|---------|
| 2020         | 2021     | 0.81    |
| 2019         | 2021     | 0.75    |
| 2018         | 2021     | 0.73    |
| 2017         | 2021     | 0.72    |