#**BLUE BIKE TRIP DURATION PREDICTION (Total: / 15 points)**

Context: You have received some data from Blue Bikes (the Boston Bikesharing Service). They have asked you to provide a predictive model that can accurately predict how long a given bike rental will last, at the time the rental begins. The use case is that the bike share company wants to be able to predict how long a customer will have the bike in their possession, when they begin their rental, in order to better manage operational efficiency across the bike network. Note that when the customer initiates a bike rental, they enter the starting station ID and ending station ID for their trip, into the mobile app.

#*Import and Pre-process Data*

In [3]:
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt

bluebikes = pd.read_csv('https://raw.githubusercontent.com/gburtch/NYU---ModernAI/refs/heads/main/Session%204/datasets/bluebikes_sample.csv')

# This function MUST return a pair of objects (predictors, labels, in that order) as numpy arrays.
def processData(data):

    # pre-process your data here, and return your two numpy arrays...
    startmin = []
    for i in range(len(data)):
        startmin.append(int(data.loc[i,'starttime'].split(":")[0]))

    data['startmin'] = startmin
    
    data['usertype_bin'] = np.where(data['usertype']=='Subscriber',1,0)

    # This will throw away string variables.
    data = data.select_dtypes([np.number])
    
    # Here is how we could one-hot encode our data. 
    gender_onehot = utils.to_categorical(data['gender'])
    usertype_onehot = utils.to_categorical(data['usertype_bin'])

    predictors_cont = data[['start station latitude','start station longitude','end station latitude','end station longitude','birth year','startmin']].to_numpy()

    # Pulling out continuous predictors, and 'normalizing' them.
    # You could also accomplish this with a BatchNormalization() layer in your model.
    predictors_cont = np.subtract(predictors_cont,np.mean(predictors_cont,axis=0).reshape(1,predictors_cont.shape[1]))
    predictors_cont = np.divide(predictors_cont,np.std(predictors_cont,axis=0).reshape(1,predictors_cont.shape[1]))

    # Putting everything back together.
    data = np.concatenate((data[['tripduration']].to_numpy(),predictors_cont,usertype_onehot,gender_onehot),axis=1)
    
    # Create the labels vector and the matrix of predictors.
    labels = data[:,0]
    predictors = data[:,1:]
    
    train_labels = labels
    train_predictors = predictors

    return train_predictors, train_labels


In [4]:
bluebikes.head()

Unnamed: 0,tripduration,starttime,stoptime,start station id,start station name,start station latitude,start station longitude,end station id,end station name,end station latitude,end station longitude,bikeid,usertype,birth year,gender
0,1584,09:36.7,36:00.9,442,Hyde Park Ave at Walk Hill St,42.296067,-71.116012,122,Burlington Ave at Brookline Ave,42.345733,-71.100694,4587,Subscriber,1967,1
1,894,40:48.2,55:43.0,80,MIT Stata Center at Vassar St / Main St,42.362131,-71.091156,144,Rogers St & Land Blvd,42.365758,-71.076994,2340,Subscriber,1994,1
2,973,58:05.4,14:18.4,57,Columbus Ave at Massachusetts Ave,42.340543,-71.081388,68,Central Square at Mass Ave / Essex St,42.36507,-71.1031,2910,Subscriber,1994,1
3,606,46:45.0,56:51.4,149,175 N Harvard St,42.363796,-71.129164,221,Verizon Innovation Hub 10 Ware Street,42.372509,-71.113054,4526,Subscriber,1992,1
4,428,49:27.9,56:36.7,426,Surface Rd at Summer St,42.352946,-71.056564,420,Charles St at Pinckney St,42.358725,-71.070795,3780,Subscriber,1989,1


#*Specify Your Neural Network Architecture, Pre-Process Your Sample*

Calling the data pre-processing function on the sample.

In [None]:
predictors, labels = processData(bluebikes)

Specifying my Neural Network's structure.

In [None]:
def build_model():

    # specify your model architecture here using the Keras sequential API
    # compile your model, specifying the loss and other metrics you might want to track, plus the optimizer

    return model

#*Train Your Neural Network Here*

In [None]:
model = build_model()

history = model.fit() ## specify your data and other parameters here for model fit

Plot your model performance over training here:

In [None]:
import matplotlib.pyplot as plt

# Build your plot.

plt.show()

#*Choose Final Configuration and Produce That Model Here:*

In [None]:
model = build_model()
model.fit(predictors,labels,epochs=80, batch_size=50) # for example

Here's what the resulting model looks like.

In [None]:
model.summary()