# Tensorflow Exercises: Linear Regression

## Scope: 
Use Tensorflow 2.0 and pandas to build a simple linear regression model and then test the results on a hold-out set.


## Load Data/Format Data

In [1]:
import pandas as pd, tensorflow as tf, numpy as np, sklearn

In [2]:
df = pd.read_csv("cars.csv")
df

Unnamed: 0,Dimensions.Height,Dimensions.Length,Dimensions.Width,Engine Information.Driveline,Engine Information.Engine Type,Engine Information.Hybrid,Engine Information.Number of Forward Gears,Engine Information.Transmission,Fuel Information.City mpg,Fuel Information.Fuel Type,Fuel Information.Highway mpg,Identification.Classification,Identification.ID,Identification.Make,Identification.Model Year,Identification.Year,Engine Information.Engine Statistics.Horsepower,Engine Information.Engine Statistics.Torque
0,140,143,202,All-wheel drive,Audi 3.2L 6 cylinder 250hp 236ft-lbs,True,6,6 Speed Automatic Select Shift,18,Gasoline,25,Automatic transmission,2009 Audi A3 3.2,Audi,2009 Audi A3,2009,250,236
1,140,143,202,Front-wheel drive,Audi 2.0L 4 cylinder 200 hp 207 ft-lbs Turbo,True,6,6 Speed Automatic Select Shift,22,Gasoline,28,Automatic transmission,2009 Audi A3 2.0 T AT,Audi,2009 Audi A3,2009,200,207
2,140,143,202,Front-wheel drive,Audi 2.0L 4 cylinder 200 hp 207 ft-lbs Turbo,True,6,6 Speed Manual,21,Gasoline,30,Manual transmission,2009 Audi A3 2.0 T,Audi,2009 Audi A3,2009,200,207
3,140,143,202,All-wheel drive,Audi 2.0L 4 cylinder 200 hp 207 ft-lbs Turbo,True,6,6 Speed Automatic Select Shift,21,Gasoline,28,Automatic transmission,2009 Audi A3 2.0 T Quattro,Audi,2009 Audi A3,2009,200,207
4,140,143,202,All-wheel drive,Audi 2.0L 4 cylinder 200 hp 207 ft-lbs Turbo,True,6,6 Speed Automatic Select Shift,21,Gasoline,28,Automatic transmission,2009 Audi A3 2.0 T Quattro,Audi,2009 Audi A3,2009,200,207
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5071,13,253,201,Front-wheel drive,Honda 3.5L 6 Cylinder 250 hp 253 ft-lbs,True,5,5 Speed Automatic,18,Gasoline,25,Automatic transmission,2012 Honda Pilot EX-L,Honda,2012 Honda Pilot,2012,250,253
5072,141,249,108,All-wheel drive,Lamborghini 5.2L 10 cylinder 552 hp 398 ft-lbs,True,6,6 Speed Manual,12,Gasoline,20,Manual transmission,2012 Lamborghini Gallardo Coupe LP 560-4,Lamborghini,2012 Lamborghini Gallardo Coup,2012,552,398
5073,160,249,108,All-wheel drive,Lamborghini 5.2L 10 cylinder 552 hp 398 ft-lbs,True,6,6 Speed Manual,12,Gasoline,20,Manual transmission,2012 Lamborghini Gallardo LP 560-4 Spyder,Lamborghini,2012 Lamborghini Gallardo Spyder,2012,552,398
5074,200,210,110,Rear-wheel drive,BMW 3.0L 6 cylinder 315hp 330 ft-lbs Turbo,True,6,6 Speed Automatic Select Shift,17,Gasoline,25,Automatic transmission,2012 BMW 740i Sedan,BMW,2012 BMW 7 Series,2012,315,330


The results shown above need to be turned into an array variable for Tensorflow so we access the values attribute.

In [3]:
indep_vars = df[[ "Identification.Year","Engine Information.Engine Statistics.Horsepower","Engine Information.Engine Statistics.Torque"]].values
dep_var = df["Fuel Information.City mpg"].values

## Function 

In [4]:
# Define the mean squared error function that is used to fit the model.
def mean_squared_error( Y , y_pred ):
    return tf.reduce_mean( tf.square( y_pred - Y ) )

def mean_squared_error_deriv( Y , y_pred ):
    return tf.reshape( tf.reduce_mean( 2 * ( y_pred - Y ) ) , [ 1 , 1 ] )
    
def h ( X , weights , bias ):
    return tf.tensordot( X , weights , axes=1 ) + bias

## Fit Regression model

Initialize hyperparameters for the model. We will be performing the following:

1. Using Batch gradient descent with batches of size 10.
2. Train the model with 100 repititions.
3. We will use a step size of .001 for the gradient descent term.


In [5]:
indep_vars = tf.constant(indep_vars, dtype = tf.float32)
dep_vars = tf.constant(dep_var, dtype = tf.float32)

num_epochs = 100
num_samples = indep_vars.shape[0]
batch_size = 10
learning_rate = .001

dataset = tf.data.Dataset.from_tensor_slices(( indep_vars , dep_var )) 
dataset = dataset.shuffle( 500 ).repeat( num_epochs ).batch( batch_size )
iterator = dataset.__iter__()

In [6]:
num_features = indep_vars.shape[1]
weights = tf.random.normal((num_features,1))
bias = 0
epochs_plot = list()
loss_plot = list()

for i in range( num_epochs ) :
    
    epoch_loss = list()
    for b in range( int(num_samples/batch_size) ):
        x_batch , y_batch = iterator.get_next()
   
        output = h( x_batch , weights , bias ) 
        loss = epoch_loss.append( mean_squared_error( y_batch , output ).numpy() )
    
        dJ_dH = mean_squared_error_deriv( y_batch , output)
        dH_dW = x_batch
        dJ_dW = tf.reduce_mean( dJ_dH * dH_dW )
        dJ_dB = tf.reduce_mean( dJ_dH )
    
        weights -= ( learning_rate * dJ_dW )
        bias -= ( learning_rate * dJ_dB ) 
        
    loss = np.array( epoch_loss ).mean()
    epochs_plot.append( i + 1 )
    loss_plot.append( loss ) 
    
    print( 'Loss is {}'.format( loss ) ) 

InvalidArgumentError: cannot compute Sub as input #1(zero-based) was expected to be a float tensor but is a int64 tensor [Op:Sub]

## Review Regression Coefficients

## Test Regression Results

Example of the use of gradient tape function:

While, in general, we might seek to calculate the analytic derivative, it is often unfeasible or undesireable. In order to compensate, we can calculate the results using backpropogation. 

The chief function of this is GradientTape(). This function defines the computation of the gradient with respect to some inputs. 

A simple example: Get derivative of a quadratic equation:

In [7]:
z = tf.Variable(3.0)
with tf.GradientTape() as tape:
    w = z**2 + 3*z + 2
    
dy_dx = tape.gradient(w, z)
dy_dx.numpy()

9.0

We can actually add a third layer to this! Let's add something annoying and arbitrary:

In [8]:
z = tf.Variable(3.0)
with tf.GradientTape() as tape:
    w = z**2 + 3*z + 2
    second_level = 4**w+2**z+3*z*w+1
    
    
dy_dx = tape.gradient(second_level,w, z)
dy_dx.numpy()

4572740300000.0

We can still get it to just stop at w, though:

In [9]:
z = tf.Variable(3.0)
with tf.GradientTape() as tape:
    w = z**2 + 3*z + 2
    second_level = 4**w+2**z+3*z*w+1
    
    
dy_dx = tape.gradient(w, z)
dy_dx.numpy()

9.0

This can obviously be used on gradients:

In [10]:
w = tf.Variable(tf.random.normal((3, 2)), name='w')
b = tf.Variable(tf.zeros(2, dtype=tf.float32), name='b')
x = [[1., 2., 3.]]
print( x @ w + b)

tf.Tensor([[-8.954019   -0.97417885]], shape=(1, 2), dtype=float32)


Finally, let's calculate the gradient of the MSE 

In [78]:
Y = tf.constant([[2., 2., 3.]])
y_pred = tf.Variable([1,2,3])


def mean_squared_error_deriv( Y , y_pred ):
    return tf.reshape( tf.reduce_mean( 2 * ( y_pred - Y ) ) , [ 1 , 1 ] )    

def mean_squared_error_deriv( Y , y_pred ):
    return tf.reshape( tf.reduce_mean( 2 * ( y_pred - Y ) ) , [ 1 , 1 ] )    


In [86]:
mean_squared_error_deriv(Y = [2., 2., 3.], y_pred = [1,2,3])

TypeError: unsupported operand type(s) for -: 'list' and 'list'

In [20]:
def mean_squared_error( Y , y_pred ):
    return tf.reduce_mean( tf.square( y_pred - Y ) )

 

tf.Tensor([0. 0. 0. 0.], shape=(4,), dtype=float32)


In [25]:
Y = tf.constant([2.0,2.0,2.0,2.0],dtype = 'float32')
y_pred = tf.Variable([1.0,2.0,2.0,2.0],dtype = 'float32')    
with tf.GradientTape() as tape:
    mse = tf.reduce_mean( tf.square( y_pred - Y ) )
grads = tape.gradient(mse, y_pred) 

tf.Tensor([-0.5  0.   0.   0. ], shape=(4,), dtype=float32)


In [83]:
x_one = tf.constant([[1., 2., 3.]])
x_two = tf.constant([[2., 2., 3.]])

with tf.GradientTape() as tape:
    end_res = mean_squared_error(Y = x_one, y_pred = x_two)
    
hold = tape.gradient(end_res, x_one,x_two)

In [85]:
print(hold)

None


In [38]:
Y

<tf.Tensor: shape=(4060, 1), dtype=float32, numpy=
array([[12.],
       [15.],
       [26.],
       ...,
       [18.],
       [12.],
       [21.]], dtype=float32)>

In [37]:
output

<tf.Tensor: shape=(50, 1), dtype=float32, numpy=
array([[16.263437 ],
       [22.87446  ],
       [14.933083 ],
       [20.99522  ],
       [15.14272  ],
       [16.178055 ],
       [12.628096 ],
       [20.890894 ],
       [25.122879 ],
       [28.302065 ],
       [16.058205 ],
       [28.302065 ],
       [12.562002 ],
       [11.213196 ],
       [19.673159 ],
       [12.153124 ],
       [12.203455 ],
       [14.6565075],
       [20.27971  ],
       [17.736618 ],
       [17.38979  ],
       [24.154129 ],
       [23.883762 ],
       [15.776989 ],
       [19.673159 ],
       [17.310509 ],
       [19.5929   ],
       [19.088959 ],
       [11.469494 ],
       [17.738205 ],
       [22.306158 ],
       [16.26185  ],
       [14.69145  ],
       [24.155716 ],
       [23.974167 ],
       [23.560411 ],
       [22.71113  ],
       [12.094118 ],
       [ 9.052976 ],
       [ 9.455627 ],
       [14.382736 ],
       [14.933083 ],
       [17.419239 ],
       [19.089584 ],
       [12.20187  ],
      

In [44]:
#y_pred = tf.Variable([1.0,2.0,3.0,4.0],dtype = tf.float32)
#Y = tf.Variable( [1.0,2.0,3.0,4.0], dtype=tf.float32 ) 
output = tf.Variable( output, dtype=tf.float32 ) 
y_batch = tf.Variable( y_batch, dtype=tf.float32 ) 
with tf.GradientTape() as tape:
    mse_val = tf.reduce_mean( tf.square( output - y_batch ) )
    
tape.gradient(mse_val,output)

<tf.Tensor: shape=(50, 1), dtype=float32, numpy=
array([[-0.06946251],
       [ 0.2749784 ],
       [-0.12267669],
       [ 0.03980881],
       [-0.19429119],
       [-0.03287781],
       [-0.5748762 ],
       [ 0.35563576],
       [ 0.40491515],
       [ 0.65208256],
       [ 0.16232818],
       [ 0.53208256],
       [-0.2975199 ],
       [-0.47147214],
       [ 0.26692635],
       [-0.31387505],
       [-0.3518618 ],
       [-0.0537397 ],
       [ 0.45118842],
       [ 0.06946472],
       [ 0.01559158],
       [ 0.56616515],
       [ 0.43535048],
       [-0.00892044],
       [ 0.42692634],
       [ 0.01242035],
       [ 0.26371595],
       [ 0.00355835],
       [-0.38122025],
       [ 0.0695282 ],
       [ 0.37224633],
       [-0.06952599],
       [-0.01234199],
       [ 0.48622862],
       [ 0.43896666],
       [ 0.42241645],
       [ 0.2684452 ],
       [-0.35623527],
       [-0.757881  ],
       [-0.6217749 ],
       [-0.10469055],
       [-0.12267669],
       [ 0.13676956],
     

In [41]:
mean_squared_error_deriv( y_batch , output)

<tf.Tensor: shape=(1, 1), dtype=float32, numpy=array([[0.48277283]], dtype=float32)>

In [50]:
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split

data = pd.read_csv( 'cars.csv' )
data.head()

continuous_features = data[ [ "Identification.Year","Engine Information.Engine Statistics.Horsepower","Engine Information.Engine Statistics.Torque"] ].values / 100 
#categorical_research_features = data[ [ 'Research' ] ].values 

X = np.concatenate( [ continuous_features ] , axis=1 )
Y = data[ [ 'Fuel Information.City mpg' ] ].values

train_features , test_features ,train_labels, test_labels = train_test_split( X , Y , test_size=0.2 )

X = tf.constant( train_features , dtype=tf.float32 )
Y = tf.constant( train_labels , dtype=tf.float32 ) 
                                                          
test_X = tf.constant( test_features , dtype=tf.float32 ) 
test_Y = tf.constant( test_labels , dtype=tf.float32 ) 

def mean_squared_error( Y , y_pred ):
    return tf.reduce_mean( tf.square( y_pred - Y ) )
# Analytic gradient of the mean squared error:
def mean_squared_error_deriv( Y , y_pred ):
    return tf.reshape( tf.reduce_mean( 2 * ( y_pred - Y ) ) , [ 1 , 1 ] )    

# Generate approximation of the derivative using backpropogration:
 
# Apply matrix multiplication operation to the vector of betas. Add bias term instead of creating an additional row of the design matrix.
def h ( X , weights , bias ):
    return tf.tensordot( X , weights , axes=1 ) + bias

# Arbitrary choices. Note to self: How to optimize these for performance/convergence?
num_epochs = 100
num_samples = X.shape[0]
batch_size = 1
learning_rate = 0.001

# The data.Dataset call below makes the data available within Tensorflow and allows for transformations to be applied to it.
dataset = tf.data.Dataset.from_tensor_slices(( X , Y )) 
dataset = dataset.shuffle( 500 ).repeat( num_epochs ).batch( batch_size )
iterator = dataset.__iter__()

num_features = X.shape[1]
weights = tf.random.normal( ( num_features , 1 ) )
bias = 0

epochs_plot = list()
loss_plot = list()

for i in range( num_epochs ) :
    
    epoch_loss = list()
    for b in range( int(num_samples/batch_size) ):
        x_batch , y_batch = iterator.get_next()
   
        output = h( x_batch , weights , bias ) 
        loss = epoch_loss.append( mean_squared_error( y_batch , output ).numpy() )
    
    
        output = tf.Variable( output, dtype=tf.float32 ) 
        y_batch = tf.Variable( y_batch, dtype=tf.float32 ) 
        with tf.GradientTape() as tape:
            mse_val = tf.reshape( tf.reduce_mean( tf.square( output - y_batch ) )  , [ 1 , 1 ] )    
    
        dJ_dH = tape.gradient(mse_val,output)
    
  #      dJ_dH = mean_squared_error_deriv( y_batch , output)
#        print(dJ_dH)
        
        
        dH_dW = x_batch
        dJ_dW = tf.reduce_mean( dJ_dH * dH_dW )
        dJ_dB = tf.reduce_mean( dJ_dH )
    
        weights -= ( learning_rate * dJ_dW )
        bias -= ( learning_rate * dJ_dB ) 
        
    loss = np.array( epoch_loss ).mean()
    epochs_plot.append( i + 1 )
    loss_plot.append( loss ) 
    
    print( 'Loss is {}'.format( loss ) ) 

Loss is 81.42101287841797
Loss is 72.92327117919922
Loss is 69.03453063964844
Loss is 63.12940216064453
Loss is 58.98978805541992
Loss is 55.490230560302734
Loss is 51.039493560791016
Loss is 47.757041931152344
Loss is 44.857513427734375
Loss is 41.416664123535156
Loss is 39.01887130737305
Loss is 36.765480041503906
Loss is 34.76766586303711
Loss is 32.530059814453125
Loss is 30.165863037109375
Loss is 29.10077667236328
Loss is 27.17170524597168
Loss is 26.260286331176758
Loss is 24.8105411529541
Loss is 23.622365951538086
Loss is 23.105907440185547
Loss is 21.522336959838867
Loss is 20.898897171020508
Loss is 20.23624038696289
Loss is 19.429529190063477
Loss is 18.677927017211914
Loss is 17.74614715576172
Loss is 17.66293716430664
Loss is 17.229711532592773
Loss is 16.668922424316406
Loss is 16.346702575683594
Loss is 15.848626136779785
Loss is 15.572113990783691
Loss is 15.224883079528809
Loss is 14.993885040283203
Loss is 14.937765121459961
Loss is 14.286945343017578
Loss is 14.1529

In [26]:
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split

data = pd.read_csv( 'cars.csv' )
data.head()

continuous_features = data[ [ "Identification.Year","Engine Information.Engine Statistics.Horsepower","Engine Information.Engine Statistics.Torque"] ].values / 100 
#categorical_research_features = data[ [ 'Research' ] ].values 

X = np.concatenate( [ continuous_features ] , axis=1 )
Y = data[ [ 'Fuel Information.City mpg' ] ].values

train_features , test_features ,train_labels, test_labels = train_test_split( X , Y , test_size=0.2 )

X = tf.constant( train_features , dtype=tf.float32 )
Y = tf.constant( train_labels , dtype=tf.float32 ) 
                                                          
test_X = tf.constant( test_features , dtype=tf.float32 ) 
test_Y = tf.constant( test_labels , dtype=tf.float32 ) 

def mean_squared_error( Y , y_pred ):
    return tf.reduce_mean( tf.square( y_pred - Y ) )
# Analytic gradient of the mean squared error:
def mean_squared_error_deriv( Y , y_pred ):
    return tf.reshape( tf.reduce_mean( 2 * ( y_pred - Y ) ) , [ 1 , 1 ] )    

# Generate approximation of the derivative using backpropogration:
def mean_squared_error_deriv( Y , y_pred ):
    return tf.reshape( tf.reduce_mean( 2 * ( y_pred - Y ) ) , [ 1 , 1 ] )    

# Apply matrix multiplication operation to the vector of betas. Add bias term instead of creating an additional row of the design matrix.
def h ( X , weights , bias ):
    return tf.tensordot( X , weights , axes=1 ) + bias

# Arbitrary choices. Note to self: How to optimize these for performance/convergence?
num_epochs = 1000
num_samples = X.shape[0]
batch_size = 50
learning_rate = 0.001

# The data.Dataset call below makes the data available within Tensorflow and allows for transformations to be applied to it.
dataset = tf.data.Dataset.from_tensor_slices(( X , Y )) 
dataset = dataset.shuffle( 500 ).repeat( num_epochs ).batch( batch_size )
iterator = dataset.__iter__()

num_features = X.shape[1]
weights = tf.random.normal( ( num_features , 1 ) )
bias = 0

epochs_plot = list()
loss_plot = list()

for i in range( num_epochs ) :
    
    epoch_loss = list()
    for b in range( int(num_samples/batch_size) ):
        x_batch , y_batch = iterator.get_next()
   
        output = h( x_batch , weights , bias ) 
        loss = epoch_loss.append( mean_squared_error( y_batch , output ).numpy() )
    
        dJ_dH = mean_squared_error_deriv( y_batch , output)
        dH_dW = x_batch
        dJ_dW = tf.reduce_mean( dJ_dH * dH_dW )
        dJ_dB = tf.reduce_mean( dJ_dH )
    
        weights -= ( learning_rate * dJ_dW )
        bias -= ( learning_rate * dJ_dB ) 
        
    loss = np.array( epoch_loss ).mean()
    epochs_plot.append( i + 1 )
    loss_plot.append( loss ) 
    
    print( 'Loss is {}'.format( loss ) ) 

Loss is 39.903804779052734
Loss is 39.946510314941406
Loss is 39.99775314331055
Loss is 39.776546478271484
Loss is 40.196739196777344
Loss is 39.92945098876953
Loss is 39.87966537475586
Loss is 40.149742126464844
Loss is 39.769683837890625
Loss is 39.77928161621094
Loss is 40.140892028808594
Loss is 39.89653396606445
Loss is 40.01445388793945
Loss is 39.97110366821289
Loss is 40.02952575683594
Loss is 39.89756774902344
Loss is 39.82274627685547
Loss is 40.13584899902344
Loss is 39.74961853027344
Loss is 39.86760711669922
Loss is 40.324928283691406
Loss is 40.04932403564453
Loss is 39.83456039428711
Loss is 39.68816375732422
Loss is 40.21446228027344
Loss is 39.97736740112305
Loss is 39.53959655761719
Loss is 40.21420669555664
Loss is 39.90022277832031
Loss is 39.67402267456055
Loss is 40.03942108154297
Loss is 40.137638092041016
Loss is 40.00418472290039
Loss is 39.95606231689453
Loss is 39.80842208862305
Loss is 40.097686767578125
Loss is 39.55204391479492
Loss is 40.20629119873047
Lo

Loss is 39.93193435668945
Loss is 39.53889846801758
Loss is 39.09602737426758
Loss is 39.99235153198242
Loss is 39.48787307739258
Loss is 39.66982650756836
Loss is 40.08349609375
Loss is 38.80995178222656
Loss is 40.04288101196289
Loss is 39.75446319580078
Loss is 40.139095306396484
Loss is 39.021793365478516
Loss is 39.8248176574707
Loss is 39.70288848876953
Loss is 39.34148406982422
Loss is 39.530250549316406
Loss is 39.7408447265625
Loss is 39.78431701660156
Loss is 39.443695068359375
Loss is 39.72011947631836
Loss is 39.36939239501953
Loss is 39.64442825317383
Loss is 39.44346618652344
Loss is 39.77347183227539
Loss is 39.38665771484375
Loss is 39.49243927001953
Loss is 39.62996292114258
Loss is 39.47174072265625
Loss is 39.65428924560547
Loss is 39.87224578857422
Loss is 39.567588806152344
Loss is 39.120426177978516
Loss is 40.540924072265625
Loss is 39.04827880859375
Loss is 39.81692123413086
Loss is 39.457298278808594
Loss is 39.70067596435547
Loss is 39.38643264770508
Loss is 3

Loss is 39.40446090698242
Loss is 39.182350158691406
Loss is 39.92704391479492
Loss is 38.75687789916992
Loss is 39.31351089477539
Loss is 39.35899353027344
Loss is 39.30873107910156
Loss is 39.43804168701172
Loss is 39.07075119018555
Loss is 38.9127197265625
Loss is 39.432132720947266
Loss is 39.15298080444336
Loss is 39.442806243896484
Loss is 39.33467102050781
Loss is 39.22709274291992
Loss is 39.13779830932617
Loss is 39.054508209228516
Loss is 40.03168869018555
Loss is 38.64311599731445
Loss is 39.50849533081055
Loss is 38.761756896972656
Loss is 39.96044158935547
Loss is 38.768367767333984
Loss is 39.53134536743164
Loss is 39.246158599853516
Loss is 39.15071487426758
Loss is 39.05927658081055
Loss is 39.15628433227539
Loss is 39.89268112182617
Loss is 39.01598358154297
Loss is 38.89332962036133
Loss is 39.615631103515625
Loss is 38.965335845947266
Loss is 39.47858810424805
Loss is 39.04080581665039
Loss is 39.37290954589844
Loss is 39.014835357666016
Loss is 39.169681549072266
Lo

Loss is 39.15685272216797
Loss is 38.491241455078125
Loss is 38.88188171386719
Loss is 39.051239013671875
Loss is 38.782169342041016
Loss is 38.70978546142578
Loss is 39.13188552856445
Loss is 39.137603759765625
Loss is 38.87211608886719
Loss is 38.99895477294922
Loss is 38.467063903808594
Loss is 38.9178466796875
Loss is 39.29508972167969
Loss is 38.740699768066406
Loss is 38.890926361083984
Loss is 38.6588249206543
Loss is 39.27604293823242
Loss is 39.026432037353516
Loss is 38.77635955810547
Loss is 39.020050048828125
Loss is 38.90385055541992
Loss is 38.69395446777344
Loss is 39.28723907470703
Loss is 38.88602828979492
Loss is 38.60912322998047
Loss is 39.15458297729492
Loss is 38.28508377075195
Loss is 39.48484802246094
Loss is 38.66511917114258
Loss is 38.60062789916992
Loss is 39.04462814331055
Loss is 38.8216552734375
Loss is 38.96780776977539
Loss is 38.362972259521484
Loss is 39.38046646118164
Loss is 38.73121643066406
Loss is 38.847835540771484
Loss is 38.66238784790039
Loss

In [None]:
import matplotlib.pyplot as plt
plt.plot( epochs_plot , loss_plot ) 
plt.show()
