# KNN Bikeshare Regression

The problem is to predict the total number of bike rentals, given a day, weather, and other features.

The dataset:

    H. Fanaee-T. "Bike Sharing," UCI Machine Learning Repository, 2013. [Online]. Available: https://doi.org/10.24432/C5W894.

## Load the data

In [1]:
from ucimlrepo import fetch_ucirepo
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

In [2]:
bikeshare = fetch_ucirepo(id=275)
X = bikeshare.data.features
y = bikeshare.data.targets

In [3]:
X.shape

(17379, 13)

In [4]:
y.shape

(17379, 1)

In [5]:
print(bikeshare.variables)

          name     role         type demographic  \
0      instant       ID      Integer        None   
1       dteday  Feature         Date        None   
2       season  Feature  Categorical        None   
3           yr  Feature  Categorical        None   
4         mnth  Feature  Categorical        None   
5           hr  Feature  Categorical        None   
6      holiday  Feature       Binary        None   
7      weekday  Feature  Categorical        None   
8   workingday  Feature       Binary        None   
9   weathersit  Feature  Categorical        None   
10        temp  Feature   Continuous        None   
11       atemp  Feature   Continuous        None   
12         hum  Feature   Continuous        None   
13   windspeed  Feature   Continuous        None   
14      casual    Other      Integer        None   
15  registered    Other      Integer        None   
16         cnt   Target      Integer        None   

                                          description units mis

Three features {casual, registered, cnt} are not included in the X dataset, because they are the values to be predicted.

In [6]:
X

Unnamed: 0,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed
0,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0000
1,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.80,0.0000
2,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.80,0.0000
3,2011-01-01,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0000
4,2011-01-01,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0000
...,...,...,...,...,...,...,...,...,...,...,...,...,...
17374,2012-12-31,1,1,12,19,0,1,1,2,0.26,0.2576,0.60,0.1642
17375,2012-12-31,1,1,12,20,0,1,1,2,0.26,0.2576,0.60,0.1642
17376,2012-12-31,1,1,12,21,0,1,1,1,0.26,0.2576,0.60,0.1642
17377,2012-12-31,1,1,12,22,0,1,1,1,0.26,0.2727,0.56,0.1343


In [7]:
X.dtypes

dteday         object
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
dtype: object

In [8]:
y

Unnamed: 0,cnt
0,16
1,40
2,32
3,13
4,1
...,...
17374,119
17375,89
17376,90
17377,61


## Prepare the data

In [9]:
X = X.drop(labels="dteday", axis="columns", inplace=False)

## Train the model

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1009)

In [11]:
X_train.shape

(13903, 12)

In [12]:
X_test.shape

(3476, 12)

In [13]:
knn_regressor = KNeighborsRegressor(n_neighbors=5)
knn_regressor.fit(X_train, y_train)

## Make predictions

In [14]:
y_pred = knn_regressor.predict(X_test)

In [15]:
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
print(f'R-squared: {r2}')

Mean Squared Error: 2881.9067203682393
R-squared: 0.9164161698064326


## SHAP Explainer

SHAP values represent the contribution of each feature to the prediction for a specific instance compared to the average prediction. Positive SHAP values indicate that the feature increased the prediction, while negative values indicate a decrease.

In [None]:
import shap

## Explain a single prediction from the test set

See sample code at: https://shap.readthedocs.io/en/latest/example_notebooks/tabular_examples/model_agnostic/Iris%20classification%20with%20scikit-learn.html

In [None]:
explainer = shap.KernelExplainer(knn_regressor.predict_proba, X_train)
shap_values = explainer.shap_values(X_test.iloc[0, :])
shap.force_plot(explainer.expected_value[0], shap_values[:, 0], X_test.iloc[0, :])

## Explain all the predictions in the test set

See sample code at: https://shap.readthedocs.io/en/latest/example_notebooks/tabular_examples/model_agnostic/Iris%20classification%20with%20scikit-learn.html

In [None]:
shap_values = explainer.shap_values(X_test)
shap.force_plot(explainer.expected_value[0], shap_values[..., 0], X_test)

## Sample code from Gemini ... untested!!

In [None]:
# ?? explainer = shap.KernelExplainer(knn_regressor.predict_proba, X_train)
explainer = shap.KernelExplainer(knn_regressor, X_train)
shap_values = explainer(X_test)

# Visualize the results
# Summary plot
shap.plots.beeswarm(shap_values)

# Force plot for individual predictions
shap.plots.force(explainer.expected_value, shap_values[0,:], X_test.iloc[0,:])

# Dependence plot for feature interactions
shap.plots.dependence_plot("feature_name", shap_values, X_test)