# Python Library: TensorFlow 2.0

**Run this from tf env, with tensorflow 2.0+**

In [1]:
import tensorflow as tf

In [2]:
tf.__version__

'2.0.0'

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## 0. Introduction

Tensorflow is an open-source library for graph-based numerical computation.     
Has both low and high level APIs.       
Can be used to perform addition, multiplication and differentiation.      
Can be used to train ML models.       

Important changes in TensorFlow 2.0:        
Eager exucution is now enabled y default, which allows users to write simpler and more intuitive code.       
Modeling building is now centered around the Keras and Estimators high-level APIs.         



In [4]:
# 0D Tensor
d0 = tf.ones((1,))
d0

<tf.Tensor: id=5, shape=(1,), dtype=float32, numpy=array([1.], dtype=float32)>

In [5]:
# 1D Tensor
d1 = tf.ones((2,))
d1

<tf.Tensor: id=8, shape=(2,), dtype=float32, numpy=array([1., 1.], dtype=float32)>

In [6]:
# 2D Tensor
d2 = tf.ones((2,2))
d2

<tf.Tensor: id=11, shape=(2, 2), dtype=float32, numpy=
array([[1., 1.],
       [1., 1.]], dtype=float32)>

In [7]:
# 3D Tensor
d3 = tf.ones((2,2,2))
d3

<tf.Tensor: id=14, shape=(2, 2, 2), dtype=float32, numpy=
array([[[1., 1.],
        [1., 1.]],

       [[1., 1.],
        [1., 1.]]], dtype=float32)>

In [8]:
# Print the 3D tensor
print(d3.numpy())

[[[1. 1.]
  [1. 1.]]

 [[1. 1.]
  [1. 1.]]]


### 0.1 Defining constants in TensorFlow

A constant is the simplest category of tensor.      
A constant does not change and cannot be trained.      

It can, however, have any dimension.      


In [9]:
from tensorflow import constant

# Define a 2x3 constant tensor of 3s.
a = constant(3, shape=[2,3])

# define a 2x2 tensor, which is constructed from the 1D tensor: 1,2,3,4
b = constant([1,2,3,4],shape=[2,2])

In [10]:
print(a.numpy())
print(b.numpy())

[[3 3 3]
 [3 3 3]]
[[1 2]
 [3 4]]


In [12]:
# some convenient way of defining constant
input_tensor = constant([1,2,3,4],shape=[2,2])

a = tf.constant([1,2,3])

b = tf.zeros([2,2])

c = tf.zeros_like(input_tensor)

d = tf.ones([2,2])

e = tf. ones_like(input_tensor)

f = tf.fill([3,3],7)

print(a.numpy())
print(b.numpy())
print(c.numpy())
print(d.numpy())
print(e.numpy())
print(f.numpy())

[1 2 3]
[[0. 0.]
 [0. 0.]]
[[0 0]
 [0 0]]
[[1. 1.]
 [1. 1.]]
[[1 1]
 [1 1]]
[[7 7 7]
 [7 7 7]
 [7 7 7]]


### 0.2 Defining and initialising variables

Unlike a constant, a variable's value can change during computation. The value of a variable is shared, persistent and modifiable.         
Its data type and shape are fixed.         



In [13]:
# Define a variable
a0 = tf.Variable([1,2,3,4,5,6], dtype=tf.float32)
a1 = tf.Variable([1,2,3,4,5,6], dtype=tf.int16)

In [14]:
# Define a constant
b = tf.constant(2,tf.float32)

In [15]:
# compute their product
c0 = tf.multiply(a0,b)
c1 = a0*b

In [16]:
print(c0.numpy())
print(c1.numpy())

[ 2.  4.  6.  8. 10. 12.]
[ 2.  4.  6.  8. 10. 12.]


Defining data as constants:        

After you have imported constant, you will use it to transform a numpy array, credit_numpy, into a tensorflow constant, credit_constant. This array contains feature columns from a dataset on credit card holders and is previewed in the image below. We will return to this dataset in later chapters.             

Note that tensorflow version 2.0 allows you to use data as either a numpy array or a tensorflow constant object. Using a constant will ensure that any operations performed with that object are done in tensorflow.

In [4]:
credit_numpy = np.array([[ 2.0000e+00,  2.4000e+01,  1.0000e+00,  3.9130e+03],
       [ 2.0000e+00,  2.6000e+01,  2.0000e+00,  2.6820e+03],
       [ 2.0000e+00,  3.4000e+01,  2.0000e+00,  2.9239e+04],
       [ 2.0000e+00,  3.7000e+01,  2.0000e+00,  3.5650e+03],
       [ 3.0000e+00,  4.1000e+01,  1.0000e+00, -1.6450e+03],
       [ 2.0000e+00,  4.6000e+01,  1.0000e+00,  4.7929e+04]])

In [5]:
# Import constant from TensorFlow
from tensorflow import constant

# Convert the credit_numpy array into a tensorflow constant
credit_constant = constant(credit_numpy)

# Print constant datatype
print('The datatype is:', credit_constant.dtype)

# Print constant shape
print('The shape is:', credit_constant.shape)

The datatype is: <dtype: 'float64'>
The shape is: (6, 4)


## 1. Basic Operation of Tensorflow

Tensorflow has a model of computation that revolves around the use of graphs. A Tensorflow graph contains edges and nodes, where the edges are tensors and the modes are operations.      

The add() operation performs element-wise **addition** with two tensors.       
Element-wise addition requires both tensors to have the same shape.            
The add() operator is overloaded, which means that we can also perform addition using the plus symbol.      


In [4]:
# addition operator
from tensorflow import constant, add

# Define 0-dimensional tensors
A0 = constant([1])
B0 = constant([2])

# Define 1-dimensional tensors
A1 = constant([1,2])
B1 = constant([3,4])

# Define 2-dimensional tensors
A2 = constant([[1,2],[3,4]])
B2 = constant([[5,6],[7,8]])

# Perform tensor addition with add()
C0 = add(A0,B0) #scaler addition
C1 = add(A1,B1) #vector addition
C2 = add(A2,B2) #matrix addition

print(C0.numpy())
print(C1.numpy())
print(C2.numpy())

[3]
[4 6]
[[ 6  8]
 [10 12]]


**Element-wise multiplication** performed using multiply() operation.        
The tensors involved must have the same shape.     
**Matrix multiplication** performed wutg matmul() operator.         
The matmul(A,B) operation multiplies A by B.       
Number of columns of A must equal to the number of rows of B.      


In [9]:
from tensorflow import ones, matmul, multiply

# Define tensors
A0 = ones(1)
A31 = ones([3,1])
A34 = ones([3,4])
A43 = ones([4,3])

ex1 = multiply(A0,A0)
ex2 = multiply(A31,A31)
ex3 = multiply(A34,A34)

print(ex1.numpy())
print(ex2.numpy())
print(ex3.numpy())

[1.]
[[1.]
 [1.]
 [1.]]
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]


Summing over tensor dimensions:          

The reduce_sum() operator sums over the dimensions of a tensor.      
reduce_sum(A) sums over all dimensions of A.      
reduce_sum(A,i) sums over dimension i.       
In each case we reduce the size of the tensor by summing over one of its dimensions.

In [12]:
from tensorflow import ones, reduce_sum

# Define a 2x3x4 tensors of ones
A = ones([2,3,4])

# Sum over all dimension
B = reduce_sum(A)

# Sum over dimension 0,1,2
B0 = reduce_sum(A,0) # 3x4 matrix of 2s
B1 = reduce_sum(A,1) # 2x4 matrix of 3s
B2 = reduce_sum(A,2) # 2x3 matrix of 4s

print(A.numpy())
print(B.numpy())
print(B0.numpy())
print(B1.numpy())
print(B2.numpy())

[[[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]

 [[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]]
24.0
[[2. 2. 2. 2.]
 [2. 2. 2. 2.]
 [2. 2. 2. 2.]]
[[3. 3. 3. 3.]
 [3. 3. 3. 3.]]
[[4. 4. 4.]
 [4. 4. 4.]]


In [14]:
from tensorflow import ones_like

# Define tensors A1 and A23 as constants
A1 = constant([1, 2, 3, 4])
A23 = constant([[1, 2, 3], [1, 6, 4]])

# Define B1 and B23 to have the correct shape
B1 = ones_like(A1)
B23 = ones_like(A23)

# Perform element-wise multiplication
C1 = multiply(A1,B1)
C23 = multiply(A23,B23)

# Print the tensors C1 and C23
print('C1: {}'.format(C1.numpy()))
print('C23: {}'.format(C23.numpy()))

C1: [1 2 3 4]
C23: [[1 2 3]
 [1 6 4]]


In [15]:
# Define features, params, and bill as constants
features = constant([[2, 24], [2, 26], [2, 57], [1, 37]])
params = constant([[1000], [150]])
bill = constant([[3913], [2682], [8617], [64400]])

# Compute billpred using features and params
billpred = matmul(features,params)

# Compute and print the error
error = bill - billpred
print(error.numpy())

[[-1687]
 [-3218]
 [-1933]
 [57850]]


In [16]:
wealth = constant([[11, 50], [7, 2], [4, 60], [3, 0],[25,10]])

wealth_all = reduce_sum(wealth)
# Sum over dimension 0,1
wealth0 = reduce_sum(wealth,0) # 3x4 matrix of 2s
wealth1 = reduce_sum(wealth,1) # 2x4 matrix of 3s


print(wealth.numpy())
print(wealth_all.numpy())
print(wealth0.numpy())
print(wealth1.numpy())


[[11 50]
 [ 7  2]
 [ 4 60]
 [ 3  0]
 [25 10]]
172
[ 50 122]
[61  9 64  3 35]


## 2. Advanced operations

gradient():         
Computes the slope of a function at a point.       

reshape():        
Reshape a tensor (e.g. 10x10 to 100x1)         

random():       
Populates tensor with entries drawn.        


**Finding the optimum (gradient())**        

In many ML problems, we will want to find the optimum of a funciton, e.g. minimise the loss function or maximise the objective function.        

Minimum: lowest value of a loss function.          
Maximum: highest value of objective function.           

We can do this using the gradient() operation.      
We start this process by passing points to the gradient operation until we find one where the gradient is 0, that is the **optimum**.        
Minimum: change in gradient > 0.       
Maximum: change in gradient < 0.          




In [17]:
# Define x
x = tf.Variable(-1.0)

# Define y to be x^2 wihtin instance of GradientTape
# watch method to an instance of gradient tape and then pass the variable x,
# this will allow us to compute the rate of change of y wrt x.
with tf.GradientTape() as tape:
    tape.watch(x)
    y = tf.multiply(x,x)
    
# Evaluate the gradient of y at x = -1, using the tape instance of gradienttape
g = tape.gradient(y,x)
print(g.numpy())

# slope is -2 at x = -1

-2.0


**Images as tensor (reshape())**        

A grayscale image has a natural respresentation as matrix with values between 0 and 255. Some algorithm require us to reshape matrices into vectors before using them as inputs. 

In [19]:
# generate grayscale image
gray = tf.random.uniform([2,2],maxval=255, dtype="int32")

# reshape grayscale image
gray = tf.reshape(gray, [2*2,1])

print(gray.numpy())
# generate a color image
color = tf.random.uniform([2,2,3],maxval=255, dtype="int32")

# reshape color image
color = tf.reshape(color,[2*2,3])
print(color.numpy())

[[221]
 [232]
 [206]
 [111]]
[[105 109  38]
 [145 170 187]
 [ 94 225  87]
 [ 50 136 170]]


You are given a loss function, y=x2, which you want to minimize. You can do this by computing the slope using the GradientTape() operation at different values of x. If the slope is positive, you can decrease the loss by lowering x. If it is negative, you can decrease it by increasing x. This is how gradient descent works.

In [21]:
def compute_gradient(x0):
  	# Define x as a variable with an initial value of x0
	x = tf.Variable(x0)
	with tf.GradientTape() as tape:
		tape.watch(x)
        # Define y using the multiply operation
		y = tf.multiply(x,x)
    # Return the gradient of y with respect to x
	return tape.gradient(y, x).numpy()

# Compute and print gradients at x = -1, 1, and 0
print(compute_gradient(-1.0))
print(compute_gradient(1.0))
print(compute_gradient(0.0))

-2.0
2.0
0.0


## 3. Linear Model

### 3.1 Input data

When we train a ML model, we will want to import data from an external source. Numeric data wil need to be assigned a type, and text and image data will need to be converted into a usable format.       

External datasets can be imported using TensorFlow. It is useful for complex data pipeline. We can also use pandas to import data. And then we can convert data into numpy array, which we can use without further modification in TensorFlow.     
Dataset will contain columns with different data types. 

In [27]:
housing = pd.read_csv("../Machine_Learning_basics/data/kc_house_data.csv")
housing.head()

Unnamed: 0,id,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,...,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
0,7129300520,20141013T000000,221900.0,3,1.0,1180,5650,1.0,0,0,...,7,1180,0,1955,0,98178,47.5112,-122.257,1340,5650
1,6414100192,20141209T000000,538000.0,3,2.25,2570,7242,2.0,0,0,...,7,2170,400,1951,1991,98125,47.721,-122.319,1690,7639
2,5631500400,20150225T000000,180000.0,2,1.0,770,10000,1.0,0,0,...,6,770,0,1933,0,98028,47.7379,-122.233,2720,8062
3,2487200875,20141209T000000,604000.0,4,3.0,1960,5000,1.0,0,0,...,7,1050,910,1965,0,98136,47.5208,-122.393,1360,5000
4,1954400510,20150218T000000,510000.0,3,2.0,1680,8080,1.0,0,0,...,8,1680,0,1987,0,98074,47.6168,-122.045,1800,7503


In [28]:
housing.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21613 entries, 0 to 21612
Data columns (total 21 columns):
id               21613 non-null int64
date             21613 non-null object
price            21613 non-null float64
bedrooms         21613 non-null int64
bathrooms        21613 non-null float64
sqft_living      21613 non-null int64
sqft_lot         21613 non-null int64
floors           21613 non-null float64
waterfront       21613 non-null int64
view             21613 non-null int64
condition        21613 non-null int64
grade            21613 non-null int64
sqft_above       21613 non-null int64
sqft_basement    21613 non-null int64
yr_built         21613 non-null int64
yr_renovated     21613 non-null int64
zipcode          21613 non-null int64
lat              21613 non-null float64
long             21613 non-null float64
sqft_living15    21613 non-null int64
sqft_lot15       21613 non-null int64
dtypes: float64(5), int64(15), object(1)
memory usage: 3.5+ MB


In [26]:
#convert to numpy array
housing_arr = np.array(housing)

In [None]:
# setting data type with np array

#convert price olumns to float32
price = np.array(housing["price"],np.float32)

#convert waterfrount column to Boolean
# waterfront is a np array after conversion
waterfront = np.array(housing["waterfront"], np.bool)


In [None]:
# setting data type using case in TensorFlow

#convert price olumns to float32
price = tf.cast(housing["price"],tf.float32)

#convert waterfrount column to Boolean
# waterfront is tf.tensor type after conversion
waterfront = tf.cast(housing["waterfront"],tf.bool)



In [29]:
# Import numpy and tensorflow with their standard aliases
import tensorflow as tf
import numpy as np

# Use a numpy array to define price as a 32-bit float
price = np.array(housing['price'], np.float32)

# Define waterfront as a Boolean using cast
waterfront = tf.cast(housing['waterfront'], tf.bool)

# Print price and waterfront
print(price)
print(waterfront)

[221900. 538000. 180000. ... 402101. 400000. 325000.]
tf.Tensor([False False False ... False False False], shape=(21613,), dtype=bool)


### 3.2 Loss functions

We need the loss funtion to train models because they tell us how well our model explains the data.     
Measure of model fit: w/o this feedback, it is unclear how to adjust model parameters during the training process.     
High loss value indicates that the model fit is poor: we typically want to minimise the loss function. (in some case we also might want to maximise a function instead). We can always place a minus sign before the function we want to maximize and instead minimise it.      

TensorFlow has operations for common loss functions (for linear model):       
1) Mean squared error (MSE)           
2) Mean absolute error (MAE)        
3) Huber error           

Loss functions are accessible from tf.keras.losses():       
tf.keras.losses.mse()         
tf.keras.losses.mae()      
tf.keras.losses.Huber()       

MSE:       
Strongly penalise outliers      
High sensitivity near minimum       

MAE:    
Scales linearly with size of error      
Low sensitity near minimum       

Huber:    
Similar to MSE near minimum      
Similar to MAE away form minimum        

For greater sensitivity near the minimum, use MSE or Huber.      
To minimise the impact of outliers, use MAE or Huber.          

In many cases, the training process will require us to supply a function that accepts our model's variables and ata and returns a loss.

In [None]:
# Define a loss function

# compute the MSE loss.
# need two tensors to compute it: the actual values or target
# predicted values or predictions

loss = tf.keras.losses.mse(targets, predictions)

In [None]:
# define a linear regression model
def linear_regression(intercept, slope = slope, features = features):
    return intercept + features*slope

# define a loss function ot compute the MSE
def loss_function(intercept, slope, targets = targets, features = features):
    # compute the predictions for a linear model
    predictions = linear_regression(intercept, slope)
    
    # return the loss
    return tf.keras.losses.mse(targets, predictions)

In [None]:
# compute the loss for test data inputs
loss_function(intercept, slope, test_targets, test_features)

In [None]:
# Initialize a variable named scalar
scalar = tf.Variable(1.0, dtype=tf.float32)

# Define the model
def model(scalar, features = features):
  	return scalar * features

# Define a loss function
def loss_function(scalar, features = features, targets = targets):
	# Compute the predicted values
	predictions = model(scalar, features)
    
	# Return the mean absolute error loss
	return tf.keras.losses.mae(targets, predictions)

# Evaluate the loss function and print the loss
print(loss_function(scalar).numpy())

### 3.3 Linear Regression

A linear regression model assumes a linear relationship (univariate regression):       
price = intercept + size * slope + error (to predict house price with size of house)      



In [39]:
# Linear regression in TensorFlow

# Define the targets and features
price = np.array(housing["price"],np.float32)
size = np.array(housing["sqft_living"],np.float32)

# Define the intercept and slope, initialisation
intercept = tf.Variable(0.1, dtype=np.float32)
slope = tf.Variable(0.1, dtype=np.float32)

# define a model: linear regression
def linear_regression(intercept, slope, features = size):
    return intercept + features*slope

# compute the predicted and loss
def loss_function(intercept, slope, targets = price, features = size):
    predictions = linear_regression(intercept, slope)
    return tf.keras.losses.mse(targets, predictions)

# Define an optimisation operation
# learning rate = 0.5
opt = tf.keras.optimizers.Adam(0.5)

# minimise the loss function and price the loss
for j in range(1000):
    opt.minimize(lambda: loss_function(intercept,slope), var_list=[intercept,slope])
    if j % 100 == 0:
	    print(loss_function(intercept, slope).numpy())
    
# print the trained parameter
print(intercept.numpy(), slope.numpy())

424840500000.0
305881000000.0
219757840000.0
160266650000.0
121212770000.0
97023190000.0
82984410000.0
75397860000.0
71600530000.0
69847540000.0
246.97383 253.7275


### 3.3 Batch training

We can use batch training to handle large datasets.        

If the dataset is much larger and we cannot fit the entire dataset in memory, we can instead divide it into batchs and train on those batches sequentially.        

A single pass over all of the batch is called an epoch and the process itself is called batch training.      
Batch trianing also allow us to update model weights and optimizer parameters after each batch, rather than at the end of the epoch.       

pd.read_csv() allows us to load data in batches.      
Avoid loading entire dataset.     
We can do this by using the chunksize parameter to provide batch size.         

Full sample:       
1) One update per epoch         
2) Accepts datasets w/o modification       
3) Limited by memory      

Batch Training:       
1) Multiple updates per epoch       
2) Requires division of dataset      
3) No limit on dataset size          



In [None]:
# load data in batches

for batch in pd.read_csv("../Machine_Learning_basics/data/kc_house_data.csv", chunksize=100):
    #extract price column
    price = np.array(batch["price"],np.float32)
    
    #extract size column
    size = np.array(batch["size"],np.float32)

In [41]:
# train a linear model in batches, minimumal example

# Define trainable variable
intercept = tf.Variable(0.1, dtype=np.float32)
slope = tf.Variable(0.1,dtype=np.float32)

# Define model
def linear_regression(intercept, slope, features):
    return intercept + features*slope

# Compute predicted values and return loss function
def loss_function(intercept, slope,targets,features):
    predictions = linear_regression(intercept,slope,features)
    return tf.keras.losses.mse(targets,predictions)

# Define optimizer
opt = tf.keras.optimizers.Adam()

# load data in batches
for batch in pd.read_csv("../Machine_Learning_basics/data/kc_house_data.csv", chunksize=100):
    #extract the target and feature columns
    price_batch = np.array(batch["price"],np.float32)
    size_batch = np.array(batch["sqft_lot"],np.float32)
    
    #minimize the loss function
    opt.minimize(lambda: loss_function(intercept, slope, price_batch, size_batch),var_list=[intercept,slope])
    
print(intercept.numpy(),slope.numpy())

0.31781912 0.29831016
