<h1 style = "background-color: white; color : #fe346e; font-size: 30px; font-family:garamond; font-weight:normal; border-radius: 75px 150px; text-align: center"> Bike Sharing System</h1> 
<div class = 'image'> <img style="float:center; border:5px solid grey; width:75%" align=center src = https://storage.googleapis.com/kaggle-competitions/kaggle/3948/media/bikes.png> 
</div>

Bike sharing systems are a means of renting bicycles where the process of obtaining membership, rental, and bike return is automated via a network of kiosk locations throughout a city. Using these systems, people are able rent a bike from a one location and return it to a different place on an as-needed basis. Currently, there are over 500 bike-sharing programs around the world.

The data generated by these systems makes them attractive for researchers because the duration of travel, departure location, arrival location, and time elapsed is explicitly recorded. Bike sharing systems therefore function as a sensor network, which can be used for studying mobility in a city. In this competition, participants are asked to combine historical usage patterns with weather data in order to forecast bike rental demand in the Capital Bikeshare program in Washington, D.C.

<h2 style = "background-color: white; color : #fe346e; font-size: 30px; font-family:garamond; font-weight:normal; border-radius: 75px 150px; text-align: left"> Data Fields Description</h2> 
<ol>
    <li> <strong>datetime : </strong> hourly date + timestamp </li>
    <li> <strong>season : </strong>  1 = spring, 2 = summer, 3 = fall, 4 = winter  </li>
    <li> <strong>holiday : </strong> whether the day is considered a holiday</li>
    <li> <strong>workingday : </strong> whether the day is neither a weekend nor holiday </li>
    <li> <strong>weather</strong></li>
        
        1: Clear, Few clouds, Partly cloudy, Partly cloudy  
        2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist 
        3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds 
        4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
        
    
<li> <strong>temp : </strong> temperature in Celsius  </li>
<li> <strong>atemp : </strong> "feels like" temperature in Celsius  </li>
<li> <strong>humidity : </strong> relative humidity </li>
<li> <strong>windspeed : </strong>  wind speed  </li>
<li> <strong>atemp</strong> "feels like" temperature in Celsius  </li>
<li> <strong>casual : </strong> number of non-registered user rentals initiated</li>
<li> <strong>registered : </strong>  number of registered user rentals initiated </li>
<li> <strong>count : </strong>   number of total rentals </li>
</ol>

<h2 style = "background-color: white; color : #fe346e; font-size: 30px; font-family:garamond; font-weight:normal; border-radius: 75px 150px; text-align: left"> Import Libraries and Utilities </h2> 

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats
%matplotlib  inline

<h2 style = "background-color: white; color : #fe346e; font-size: 30px; font-family:garamond; font-weight:normal; border-radius: 75px 150px; text-align: left"> Data Loading</h2> 

In [None]:
train =  pd.read_csv('/kaggle/input/bike-sharing-demand/train.csv')
test = pd.read_csv('../input/bike-sharing-demand/test.csv', parse_dates = ['datetime'])
sampsub = pd.read_csv("../input/bike-sharing-demand/sampleSubmission.csv")

In [None]:
# show top 5 rows
train.head()

In [None]:
#size of dataframe
print(train.shape)
print(test.shape)

<h2 style = "background-color: white; color : #fe346e; font-size: 30px; font-family:garamond; font-weight:normal; border-radius: 75px 150px; text-align: left"> Varriable Identification and Typecasting</h2> 

In [None]:
print(train.info())
print(test.info())

In [None]:
train["season"] = train["season"].astype("category")
train["weather"] = train["weather"].astype("category")
train["holiday"] =  train["holiday"].astype("category")
train["workingday"] =  train["workingday"].astype("category")

test["season"] = test["season"].astype("category")
test["weather"] = test["weather"].astype("category")
test["holiday"] =  test["holiday"].astype("category")
test["workingday"] =  test["workingday"].astype("category")

In [None]:
# Extract month , dow, hour from date
date = pd.to_datetime(train['datetime'])
train['month'] = date.dt.month_name().astype("category")
train['dow'] = date.dt.day_name().astype("category")
train['hour'] = date.dt.hour.astype("category")

date1 = pd.to_datetime(test['datetime'])
test['month'] = date1.dt.month_name().astype("category")
test['dow'] = date1.dt.day_name().astype("category")
test['hour'] = date1.dt.hour.astype("category")

In [None]:
train.head()

In [None]:
test.head()

<h2 style = "background-color: white; color : #fe346e; font-size: 30px; font-family:garamond; font-weight:normal; border-radius: 75px 150px; text-align: left"> Univariate Analysis - Numerical</h2> 

In [None]:
'''
    This function performs the univariate analyse for numeric varriable 
    
    Parameters
    ---------
    data  : dataframe, 
    field : string, feature name  
    
    Returns:
        skewness, kurtosis, mean,min,max,rangeval, histogram, probplot,boxplot of varriable
'''
def univariate_analysis_numeric(data,field):
    skewness = data[field].skew()
    kurtosis = data[field].kurt()
    mean = data[field].mean()
    minval = data[field].min()
    maxval = data[field].max()
    st_dev = data[field].std()
    median = data[field].median()
    rangeval = maxval - minval
    points = mean-st_dev, mean+st_dev
    fig, axes = plt.subplots(1, 3,figsize=(15,6))
    sns.histplot(data=data, x=field,kde=True, ax=axes[0])
    sns.boxplot(data=data, x=field,  ax=axes[1])
    stats.probplot(data[field],plot=axes[2])
    fig.suptitle('std_dev = {}; kurtosis = {};\nskew = {}; range = {}\nmean = {}; median = {}'.format((round(points[0],2),round(points[1],2)),
                                                                                                   round(kurtosis,2),
                                                                                                   round(skewness,2),
                                                                                                   (round(minval,2),round(maxval,2),round(rangeval,2)),
                                                                                                   round(mean,2),
                                                                                                   round(median,2)),fontsize="13")    

In [None]:
univariate_analysis_numeric(train,"temp")

In [None]:
univariate_analysis_numeric(train,"atemp")

In [None]:
univariate_analysis_numeric(train,"humidity")

In [None]:
univariate_analysis_numeric(train,"windspeed")

In [None]:
univariate_analysis_numeric(train,"casual")

In [None]:
univariate_analysis_numeric(train,"registered")

In [None]:
univariate_analysis_numeric(train,"count")


<h2 style = "background-color: white; color : #fe346e; font-size: 30px; font-family:garamond; font-weight:normal; border-radius: 75px 150px; text-align: left"> Univariate Analysis - Categorical</h2> 

In [None]:
def univariate_analysis_categorical(data,field):
    norm_count = data[field].value_counts(normalize = True)
    val_count =  data[field].value_counts()
    n_uni = data[field].nunique()
    norm_df = pd.DataFrame(norm_count)
    norm_df.columns = ["fraction"]
    #Plotting the variable with every information
    fig,axes = plt.subplots(1,2,figsize=(15,6))
    ax = sns.countplot(data=data,x=field,ax=axes[0])
    ax.set_title('n_uniques = {} \n value counts \n {};'.format(n_uni,val_count))
    ax1 = sns.barplot(x=norm_df.index,y="fraction" ,data=norm_df, ax=axes[1])
    ax1.set_title('n_uniques = {} \n value counts \n {};'.format(n_uni,norm_count))

In [None]:
univariate_analysis_categorical(train,"weather")

In [None]:
univariate_analysis_categorical(train,"holiday")

In [None]:
univariate_analysis_categorical(train,"holiday")

In [None]:
univariate_analysis_categorical(train,"hour")

In [None]:
univariate_analysis_categorical(train,"dow")

In [None]:
univariate_analysis_categorical(train,"month")

<h2 style = "background-color: white; color : #fe346e; font-size: 30px; font-family:garamond; font-weight:normal; border-radius: 75px 150px; text-align: left">  Bivariate Analysis</h2> 

In [None]:
sns.heatmap(train.corr(),annot=True)

In [None]:
sns.pairplot(train)

In [None]:

def draw_ridge_plot(data,catfield,numfield):
    # Initialize the FacetGrid object
    g = sns.FacetGrid(train, col=catfield,hue=catfield)

    # # Draw the densities in a few steps
    g.map(sns.kdeplot, numfield, shade=False)
    g.map(plt.axhline, y=0, lw=2)

    # # Define and use a simple function to label the plot in axes coordinates
    def label(x, color, label):
        ax = plt.gca()
        ax.text(0, .2, label, color=color, ha="left", va="center", transform=ax.transAxes)
    g.map(label, numfield)

    # Set the subplots to overlap
    # g.fig.subplots_adjust(hspace=-.25)

    # # Remove axes details that don't play well with overlap
    # g.set_titles("")
    g.set(yticks=[])
    g.despine(bottom=True, left=True)

    g1 = sns.catplot(data=data,x=catfield,kind="box",y=numfield,col=catfield)
    g1.set(xticks=[])

In [None]:
draw_ridge_plot(train,"season","count")

In [None]:
draw_ridge_plot(train,"season","humidity")

In [None]:
draw_ridge_plot(train,"season","windspeed")

In [None]:
draw_ridge_plot(train,"season","temp")

In [None]:
draw_ridge_plot(train,"season","atemp")

In [None]:
draw_ridge_plot(train,"season","registered")

In [None]:
draw_ridge_plot(train,"season","casual")

In [None]:
draw_ridge_plot(train,"weather","count")

In [None]:
draw_ridge_plot(train,"weather","humidity")

In [None]:
draw_ridge_plot(train,"weather","atemp")

In [None]:
draw_ridge_plot(train,"weather","temp")

In [None]:
draw_ridge_plot(train,"weather","windspeed")

In [None]:
draw_ridge_plot(train,"weather","registered")

In [None]:
draw_ridge_plot(train,"weather","casual")

In [None]:
draw_ridge_plot(train,"month","count")

In [None]:
draw_ridge_plot(train,"month","humidity")

In [None]:
draw_ridge_plot(train,"month","temp")

In [None]:
draw_ridge_plot(train,"month","atemp")

In [None]:
draw_ridge_plot(train,"month","windspeed")

In [None]:
draw_ridge_plot(train,"month","registered")

In [None]:
draw_ridge_plot(train,"month","casual")

In [None]:
draw_ridge_plot(train,"dow","count")

In [None]:
draw_ridge_plot(train,"dow","temp")

In [None]:
draw_ridge_plot(train,"dow","humidity")

In [None]:
draw_ridge_plot(train,"dow","windspeed")

In [None]:
draw_ridge_plot(train,"dow","registered")

In [None]:
draw_ridge_plot(train,"dow","casual")

In [None]:
draw_ridge_plot(train,"hour","count")

In [None]:
draw_ridge_plot(train,"hour","temp")

In [None]:
draw_ridge_plot(train,"hour","humidity")

In [None]:
draw_ridge_plot(train,"hour","casual")

In [None]:
draw_ridge_plot(train,"hour","registered")

In [None]:
import numpy as np
def BVA_categorical_plot(data, tar, cat):
  '''
  take data and two categorical variables,
  calculates the chi2 significance between the two variables 
  and prints the result with countplot & CrossTab
  '''
  #isolating the variables
  data = data[[cat,tar]][:]

  #forming a crosstab
  table = pd.crosstab(data[tar],data[cat],)
  f_obs = np.array([table.iloc[0][:].values,
                    table.iloc[1][:].values])

  #performing chi2 test
  from scipy.stats import chi2_contingency
  chi, p, dof, expected = chi2_contingency(f_obs)
  '''   P-value ≤ α: The variables have a statistically significant association (Reject H0)
        If the p-value is less than or equal to the significance level, you reject the null hypothesis and conclude that there is a statistically significant association between the variables. 
    P-value > α: Cannot conclude that the variables are associated (Fail to reject H0)
        If the p-value is larger than the significance level, you fail to reject the null hypothesis because there is not enough evidence to conclude that the variables are associated.   '''
    
  #checking whether results are significant
  if p<0.05:
    sig = True
  else:
    sig = False

  #plotting grouped plot
#   plt.title("p-value = {}\n difference significant? = {}\n".format(round(p,8),sig))

  #plotting percent stacked bar plot
  groupeddata = data.groupby(cat)[tar].value_counts(normalize=True).unstack()
  plt.figure(figsize=(20, 20))
  ax1 = plt.subplot(2,3,1)
  ax2 = plt.subplot(2,3,2)
  ax3 = plt.subplot(2,3,3)
  sns.countplot(x=cat, hue=tar, data=data,ax=ax1)
  ax1.set_title("p-value = {}\n difference significant? = {}\n".format(round(p,8),sig))
  groupeddata.plot(kind='bar', stacked='True', ax=ax2)
  tbl = ax3.table(cellText=groupeddata.values.round(decimals=2),
          rowLabels=groupeddata.index,
          colLabels=groupeddata.columns,
          cellLoc = 'right', rowLoc = 'center',loc="center",bbox = [0.1, 0, 2, 1])
  tbl.auto_set_font_size(False)
  tbl.set_fontsize(10)
  tbl.scale(2, 2)
  ax3.axis("off")
  int_level = data[cat].value_counts()

In [None]:
BVA_categorical_plot(train,"season","weather")

In [None]:
BVA_categorical_plot(train,"season","holiday")

In [None]:
BVA_categorical_plot(train,"season","workingday")

In [None]:
BVA_categorical_plot(train,"dow","weather")

In [None]:
BVA_categorical_plot(train,"weather","hour")

<h2 style = "background-color: white; color : #fe346e; font-size: 30px; font-family:garamond; font-weight:normal; border-radius: 75px 150px; text-align: left">  Multivariate Analysis </h2> 

In [None]:
fig, scatter = plt.subplots(figsize = (20,15))
sns.scatterplot(x="temp",y="casual",hue="season",style="weather",data=train,ax=scatter)

In [None]:
fig, scatter = plt.subplots(figsize = (20,15))
sns.scatterplot(x="temp",y="registered",hue="season",style="weather",data=train,ax=scatter)

In [None]:
# fig, scatter = plt.subplots(figsize = (20,15))
sns.catplot(data=train,x="hour",y="casual",row="season",col="weather")

In [None]:
# fig, scatter = plt.subplots(figsize = (20,15))
sns.catplot(data=train,x="season",y="casual",row="hour",col="weather",kind="box")

In [None]:
sns.catplot(data=train,x="season",y="casual",row="hour",col="workingday" ,kind="box")

# Feature Engineering

In [None]:
train.head()

In [None]:
test.head()

### Drop Unwanted Columns

<table>
    <thead>
        <th>Column</th>
        <th>Reason</th>        
    </thead>
    <tbody>
        <tr>
            <td>datetime</td>
            <td>Extracted hour,month, day of week</td>
        </tr>
        <tr>
            <td>casual,registered</td>
            <td>both columns sum is count ,so i have removed</td>
        </tr>
        <tr>
            <td>atemp</td>
            <td>temp and a temp is highly correlated</td>
        </tr>
        <tr>
            <td>holiday</td>
            <td>Most of the bike rental happened in workingday and also we have a column for working day </td>
        </tr>
    </tbody>
</table>

In [None]:
train0 = train.drop(["datetime","casual","registered","atemp"],axis=1)
test0  =  test.drop(["datetime","atemp"],axis=1) 

### Categorical Encoding

In [None]:
train_df =  pd.get_dummies(train0,drop_first=True)
test_df = pd.get_dummies(test0,drop_first=True)

In [None]:
train_df.info()
test_df.info()

### Scaling

In [None]:
feature_scale =  ["temp","humidity","windspeed"]
from sklearn.preprocessing import MinMaxScaler,StandardScaler
scaler = StandardScaler()
scaler_test = StandardScaler()
scaler.fit(train_df[feature_scale])
train_df[feature_scale] = scaler.transform(train_df[feature_scale])

scaler_test = StandardScaler()
scaler_test.fit(test_df[feature_scale])
test_df[feature_scale] = scaler_test.transform(test_df[feature_scale])

In [None]:
final_data=train_df

# Model Building

In [None]:
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.linear_model import Ridge,Lasso,RidgeCV, LassoCV, ElasticNet, ElasticNetCV, LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error,r2_score
import statsmodels.formula.api as smf
from statsmodels.stats.outliers_influence import variance_inflation_factor
import pickle

In [None]:
# Let's create a function to create adjusted R-Squared
def adj_r2(x,r2):
    n = x.shape[0]
    p = x.shape[1]
    adjusted_r2 = 1-(1-r2)*(n-1)/(n-p-1)
    return adjusted_r2

In [None]:
np.random.seed(0)
# split the train and validation data
df_train, df_test = train_test_split(final_data, train_size = 0.7, test_size = 0.3, random_state = 100)
y_train = df_train["count"]
X_train = df_train.drop(["count"],axis=1)
X_val =  df_test.drop(["count"],axis=1)
y_val = df_test["count"]

In [None]:
X_train.info()

In [None]:
'''
    train_X: Independent features for training  
    train_Y: target feature for training 
    Val_X  : Independent feature for Validation 
    Val_Y  : target feature for Validation
'''
def build_lr_model(train_X,train_Y,val_X,val_Y,modelname):
    lregression = LinearRegression()
    lregression.fit(train_X,train_Y)
    reg_predict_model(lregression,val_X, val_Y)
    save_model(modelname,lregression)
    
def build_lasso_model(train_X,train_Y,val_X, val_Y,modelname):
    # Lasso Regularization
    # LassoCV will return best alpha and coefficients after performing 10 cross validations
    lasscv = LassoCV(alphas = None,cv =10, max_iter = 100000, normalize = True)
    lasscv.fit(train_X, train_Y)    
    alpha = lasscv.alpha_
    print("Best alpha : ",alpha)
    lasso_reg = Lasso(alpha)
    lasso_reg.fit(train_X, train_Y)
    reg_predict_model(lasso_reg,val_X, val_Y)
#     modelname = 'lasso_finalized_model.pickle'
    save_model(modelname,lasso_reg)

def build_ridge_model(train_X,train_Y,val_X, val_Y,modelname):
    # Using Ridge regression model
    # RidgeCV will return best alpha and coefficients after performing 10 cross validations. 
    # We will pass an array of random numbers for ridgeCV to select best alpha from them    
    alphas = np.random.uniform(low=0, high=10, size=(50,))
    ridgecv = RidgeCV(alphas = alphas,cv=10,normalize = True)
    ridgecv.fit(train_X, train_Y)
    print("Best alpha : ",ridgecv.alpha_)
    ridge_model = Ridge(alpha=ridgecv.alpha_)
    ridge_model.fit(train_X, train_Y)
    reg_predict_model(ridge_model,val_X, val_Y)    
#     modelname = 'ridge_finalized_model.pickle'
    save_model(modelname,ridge_model)

def build_elasticnet_model(train_X,train_Y,val_X, val_Y,modelname):
    # Elastic net
    elasticCV = ElasticNetCV(alphas = None, cv =10)
    elasticCV.fit(train_X, train_Y)
    print("Best alpha : ",elasticCV.alpha_)
    print(" l1 ratio : ",elasticCV.l1_ratio)
    elasticnet_reg = ElasticNet(alpha = elasticCV.alpha_,l1_ratio=0.5)
    elasticnet_reg.fit(train_X, train_Y)    
    reg_predict_model(elasticnet_reg,val_X, val_Y)    
#     modelname = 'elasticnet_finalized_model.pickle'
    save_model(modelname,elasticnet_reg)

def build_knn_model(train_X,train_Y,val_X,val_Y,modelname):
    neigh = KNeighborsRegressor(n_neighbors=5)
    neigh.fit(train_X, train_Y)
    reg_predict_model(neigh,val_X, val_Y)    
    save_model(modelname,neigh)    
    
def save_model(modelname,model):
    # saving the model to the local file system
    pickle.dump(model, open(modelname, 'wb'))    

def reg_predict_model(model,val_X, val_Y):    
    predict_Y =  model.predict(val_X)
    print("Mean Squared Error",mean_squared_error(val_Y,predict_Y,squared=False))
    print("Mean Absolute Error",mean_absolute_error(val_Y,predict_Y))
    print("R2",r2_score(val_Y,predict_Y))
    print("adjR2",adj_r2(val_X,r2_score(val_Y,predict_Y)))
    print("MeanSquaredError",mean_squared_error(val_Y,predict_Y))
    
def model_stats_analysis(data,exclude_features,target_feature):
    lm = smf.ols(formula=target_feature+'~ '+'+'.join(data.columns.difference(exclude_features)), data=data).fit()
    print(lm.summary())

def find_vif(data):
    vif = pd.DataFrame()
    vif['Features'] = data.columns
    vif['VIF'] = [variance_inflation_factor(data.values, i) for i in range(data.shape[1])]
    vif['VIF'] = round(vif['VIF'], 2)
    vif = vif.sort_values(by = "VIF", ascending = False)
    print(vif)  

### Analyse the model 

In [None]:
# here consider all columns and analyse the performance of the model
model_stats_analysis(final_data,['count'],'count')

In [None]:
find_vif(X_train)


**VIF and P-Value of features**
  
   **P-Value**
    
    Null Hypothesis - No relationship with features and target column

    significant p-value = 0.05
    
    so, p-value < 0.05 - rejecting null hypothesis means include feature because it has a relationship with target column
        p-value > 0.05 - Accepting null hypothesis means exclude the feature because it doesn't affect the target column
 
 **Varriance Inflation Factor**
          
          VIF =  1/1-R2
          
          if R2 is 0.90,  
          VIF = 1/(1-0.9) =  10
          so VIF 10, means 90% relationship b/w the independent features, it makes multi colinearity.
<table style="width:50%; text-align:center">
    <thead>
        <th>Column</th>
        <th>Reason</th>        
    </thead>
    <tbody>
        <tr>
            <td>dow_Saturday</td>
            <td>High P Value</td>
        </tr>
        <tr>
            <td>dow_Tuesday</td>
            <td>High P value</td>
        </tr>
        <tr>
            <td>weather_4</td>
            <td>High P value</td>
        </tr>
        <tr>
            <td>holiday_1</td>
            <td>High P_value </td>
        </tr>
        <tr>
            <td>dow_Wednesday</td>
            <td>High P_value </td>
        </tr>
        <tr>
            <td>dow_Thursday</td>
            <td>High P_value </td>
        </tr>
        <tr>
            <td>windspeed</td>
            <td>High P_value </td>
        </tr>
        <tr>
            <td>workingday_1</td>
            <td>High VIF </td>
        </tr>
        <tr>
            <td>dow_Monday</td>
            <td>High P_value </td>
        </tr>
    </tbody>
</table>

In [None]:
model_stats_analysis(final_data,['count','dow_Monday','month_July','month_October','month_September',"month_August","month_June","month_November","month_December",'month_January','month_February','month_March','dow_Sunday','dow_Tuesday','weather_4','workingday_1','windspeed','holiday_1','dow_Wednesday','dow_Thursday'],'count')

In [None]:
exclude_columns = ['count','dow_Monday','month_July','month_October','month_September',"month_August","month_June","month_November","month_December",'month_January','month_February','month_March','dow_Sunday','dow_Tuesday','weather_4','workingday_1','windspeed','holiday_1','dow_Wednesday','dow_Thursday']
selected_columns = X_train.columns.difference(exclude_columns)
find_vif(X_train[selected_columns])

In [None]:
X_val.head()

In [None]:
print("------------Simple model with all features-------------")
# build simple model with all features
build_lr_model(X_train,y_train,X_val,y_val,"simple_model_with_all_features.pickle")
print("------------Basic model with selected features-------------")
# build simple model selected  features
build_lr_model(X_train[selected_columns],y_train,X_val[selected_columns],y_val,"simple_model_with_selected_features.pickle")

In [None]:
print("------------Lasso with all features-------------")
build_lasso_model(X_train,y_train,X_val,y_val,"lasso_model_with_all_features.pickle")
# print("----------Lasso with selected features---------")
# build_lasso_model(X_train[selected_columns],y_train,X_val[selected_columns],y_val)

In [None]:
print("-------------Ridge with all Features---------")
build_ridge_model(X_train,y_train,X_val,y_val,"ridge_model_with_all_features.pickle")
# print("-------------Ridge with Selected Features---------")
# build_ridge_model(X_train[selected_columns],y_train,X_val[selected_columns],y_val)

In [None]:
print("-------------elasticnet with all Features---------")
build_elasticnet_model(X_train,y_train,X_val,y_val,"elasticnet_model_with_all_features.pickle")
# print("-------------elasticnet with Selected Features---------")
# build_elasticnet_model(X_train[selected_columns],y_train,X_val[selected_columns],y_val)

In [None]:
print("-------------KNN with all Features---------")
build_knn_model(X_train,y_train,X_val,y_val,"knn_model_with_all_features.pickle")
print("-------------knn with Selected Features---------")
build_knn_model(X_train[selected_columns],y_train,X_val[selected_columns],y_val,"knn_model_with_selected_features.pickle")

In [None]:
# import os
# os.remove("/kaggle/working/elasticnet_finalized_model.pickle")

In [None]:
test_df[selected_columns].info()

In [None]:
loaded_model = pickle.load(open("knn_model_with_selected_features.pickle", 'rb'))
a = loaded_model.predict(test_df[selected_columns])

In [None]:
submission = pd.DataFrame({ "datetime": sampsub.datetime, "count": a})

In [None]:
submission["count"] = np.where(submission["count"]>0,submission["count"],0)

In [None]:
submission.to_csv("submission_knn.csv", index=False)