<div style="text-align:right">Mario Stanke, University of Greifswald, Germany</div>  

# Linear Regression, Bike Sharing Demand
## Solution using TensorFlow 2

The biggest difference: we need neither know the gradient, nor a closed form solution.

In [2]:
import tensorflow as tf
import numpy as np
import pandas as pd

In [3]:
# load the training data
df = pd.read_csv("bikes-summerdays.tbl", sep='\s+') # data frame
print(df.head())
print(df.dtypes)
df['count'] = df['count'].astype(float) # convert count data to floats as regression target
m = df.shape[0] # training set size
print("m =", m)
meancount = np.mean(df['count'])
print ("mean count = ", meancount)

    temp  count
0  13.12    173
1  13.12     75
2  13.12     89
3  13.94     95
4  14.76    110
temp     float64
count      int64
dtype: object
m = 1482
mean count =  311.0


### preprare training data as numpy arrays

In [4]:
target = df.pop('count') # remove to-be-predicted variable from data frame, WARNING: error if called again
# make a tf.data.Dataset object with x, y pairs: x are here all other features - only the temp

X = np.ones((m, 2))
X[:,1:2] = np.array(df.values) / 10 - 2.5 # normalize temperatures
y = np.array(target.values)
print(X[0:5,])
print(y[0:5])

[[ 1.    -1.188]
 [ 1.    -1.188]
 [ 1.    -1.188]
 [ 1.    -1.106]
 [ 1.    -1.024]]
[173.  75.  89.  95. 110.]


## Solution 1: Low-Level

In [5]:
# model parameters (weights)
theta = tf.Variable([[2000.], [3000.]], dtype=tf.float64, name="weights")
# a better starting value for theta would be [[meancount], [0]]

In [6]:
# construct model
def mean_squared_error():
    pred = tf.linalg.matmul(X, theta)
    pred = tf.reshape(pred, [m]) # reshape from m x 1 matrix to vector
    E = tf.reduce_sum((pred - y)**2) / 2 / m
    return E

In [7]:
N = 100
alpha = 0.5 # learning rate
for i in range(N):
    with tf.GradientTape() as tape:
        E = mean_squared_error()
    grad = tape.gradient(E, theta) # let tf compute the derivate of E wrt to theta
    theta.assign(theta - alpha * grad)
    if (i%10 == 0):
        print ("error after {} iterations: {}".format(i, E))

print("final error = ", mean_squared_error().numpy())  
print("theta after optimization:\n", theta.numpy())  

error after 0 iterations: 2281157.3448043186
error after 10 iterations: 91310.51605397751
error after 20 iterations: 19156.243921148453
error after 30 iterations: 14007.496586335401
error after 40 iterations: 13640.089430363834
error after 50 iterations: 13613.871787603943
error after 60 iterations: 13612.000934571513
error after 70 iterations: 13611.86743320753
error after 80 iterations: 13611.857906743458
error after 90 iterations: 13611.857226948776
final error =  13611.857178439612
theta after optimization:
 [[315.48111677]
 [115.54233945]]


## Solution 2: High-Level
This approach is recommended for most applications using previously defined model classes.


In [8]:
optimizer = tf.keras.optimizers.SGD(learning_rate=.5)
# SGD: stochastic gradient descent
loss_object = tf.keras.losses.MeanSquaredError()
# this is 2x the error defined in class

In [9]:
dataset = tf.data.Dataset.from_tensor_slices((X, y))
# a tf.data.Dataset has commonly used functions for random sampling, obtaining subsets
# iterating over large numbers of images on disk (using TFRecord)
dataset = dataset.shuffle(m).batch(m) # random order, use whole dataset as 'batch'

In [10]:
# get a predefined linear model with one single output variable and one weight per input
model = tf.keras.layers.Dense(1, use_bias=False, # bias would be redundant
                              kernel_initializer=tf.constant_initializer([2000, 3000]))

In [11]:
# one gradient descent step
def train_step(features, temp):
    with tf.GradientTape() as tape:
        pred = model(features)
        pred = tf.reshape(pred, [-1])
        E = loss_object(temp, pred)

    grads = tape.gradient(E, model.trainable_variables)
    # this makes a parameter update using the gradient
    optimizer.apply_gradients(zip(grads, model.trainable_variables))
    return E

In [12]:
for epoch in range(100):
    # loop over batches of dataset
    # Here it is here just a single big batch for comparability with Solution 1.
    for (batch, (f, t)) in enumerate(dataset):
        # batch: running nr
        # f: batch_size x 2 batch matrix X
        # t: batch labels y
        E = train_step(f, t)

In [13]:
# final loss computed on all training data
pred = model(X)
pred = tf.reshape(pred, [m])
E = loss_object(y, pred)
print("final error = ", E.numpy() / 2) # divide by 2 for comparability with custom loss function above
print(model.trainable_variables[0].numpy())

final error =  13611.857421875
[[315.48083425]
 [115.53685644]]
