# linear regression in tensorflow

In this notebook, you will use tensorflow to fit a linear model to simulated training data.

[Tensorflow](https://tensorflow.org) is a high-performance library for doing computations using "tensors", which is just a 'technical' term for the same "n-dimensional arrays" used by [numpy](https://numpy.org).

Technically "dimension" is a bit off. Really, we should be thinking about the "rank" of the "array" or "tensor". A "tensor" is a mathematical object that has 'magnitude' (a number) in multiple 'directions'. The number of possible simultaneous 'directions' is the tensor's "rank".

Well, that's pretty abstract, but that's math for you. Let's look at some examples.

A simple, regular number (called a "scalar") has rank zero. It's 'just a number'; the "base case".

A "vector" has rank one. It can be described as a 'list' of numbers, one for each "dimension" in the vector. For example, here's a vector with three dimensions (although it has rank 1!):

    [-1.0, 0.0, 2.38]

There are 3 numbers (scalars) in the vector, so it's dimension is 3. The following vecor has 4 numbers in it (dimension=4); it's rank is still one, though.

    [0.28, 12.51, -13.60, 22.79]

And this vector is 1-dimensional:

    [0.21]

It's also *different* from the scalar value, 0.21, which is a rank-zero entity.

Okay, so vectors have rank-1 and can be *any*-dimensional.

Matrices have rank two, and they can also be *any*-dimensional. For example, this matrix is 2x2-dimensional:

    [ 0.1  0.2 ]
    [ 1.9  2.1 ]

And this one is 2x3-dimensional (in row-major order):

    [ 0.2  0.1  0.0 ]
    [ 0.0  0.0  9.3 ]

In python, matrices and higher-ranked objects will need to be enclosed in outer-most brackets, like this:

    [ [ 0.1  0.2 ]
      [ 1.9  2.1 ] ]

Which makes it a bit more 'obvious' that the matrix is a 'list' of vectors. Sort of.

Once you get "higher-rank" than two, there is no 'special' name for the object, so you just call it a "tensor". More correctly, a "vector" is a "special name" for a rank-1 tensor, and a "matrix" is a "special name" for a rank-2 tensor. Can you guess what "scalar" is a "special name" for?

So, for most of statistics, machine-learning and artificial-intelligence, just about *everything* is just a "tensor". The "tensorflow" library is just a library that provides super-fast, GPU-accelerated operations on "tensors". Because so many things in AI are just operations on tensors, we can use tensorflow to do AI-related things super-fast, which is good for us!

To start using tensorflow, we need to import the tensorflow library. Because we don't like to type a lot of characters (and "tensorflow" is a lot of characters), it is common to import tensorflow like this:

In [1]:
import tensorflow as tf

Then we can use it like this:

In [None]:
print(tf.__version__)

If you get a 'name error', remember to run the first code cell, first!

Otherwise, you should see the current version of tensorflow, which when I first ran this code, was

    2.5.0


## simulating data for tensorflow

To fit a linear model in tensorflow, the first thing we need to do is generate some data.

Fortunately, we can just use scikit-learn to simulate our x,y training data, and then convert the numpy array from scikit_learn into a "tensor" dataset in tensorflow.

First, simulate the data using scikit-learn:

In [3]:
import sklearn.datasets

x,y = sklearn.datasets.make_regression(n_samples=100,
                                       n_features=1,
                                       bias=50.0,
                                       noise=10.0,
                                       random_state=221891)

Provided there is no error, that should have generated 100 x,y sample points.

We can plot them, of course:

In [None]:
import matplotlib.pyplot as plt

plt.scatter(x,y, marker='o', s=2)

That looks like about 100 x,y coordinates lying approximately on a diagonal line with y-intercept of 100.

Now we have our simulated data, but it isn't in a form that tensorflow can use. We need to 'convert' our simulated data from a numpy array to a tensorflow dataset.

Fortunately, there is a tensorflow function that does exactly what we want!

A data set in tensorflow is encapsulated as a "tensorflow.data.Dataset" object. Tensorflow is what's called an "object oriented" library, which just means that it uses a programming concept called an "object", which is just about as abstract a thing that you can have, even for a programming language.

An "object" in computer programming is a conceptual block of code that consists of data and operations (called "methods") that can be done on the data. A "Dataset" in tensorflow is an object that consists of "data" and "methods" that can be executed on the data.

Fortunately for us, unless we need to write our own new objects (we won't in this class!), we don't need to worry *too much* about what objects in tensorflow are. We just need to know how to use them!

To create a Dataset in tensorflow, we just need to use the "static method" called "from_tensor_slices". A "static method" is technically a method that is called on the object's "class", without actually creating an object! But again, we don't need to really *understand* the technicalities of object-oriented programming in order to use it!

In [5]:
data = tf.data.Dataset.from_tensor_slices((x,y))

Provided there are no errors, we now have a tensorflow "Dataset" object stored in the "data" variable. The Dataset consists of our simulated x,y pairs.

A tensorflow Dataset object is *not* very useful *to us*. We can't plot it or print it or easily extract or add elements to it. All these things are *much* easier to do using numpy arrays.

But that tensorflow Dataset object is *very* usefult *to tensorflow*. Packaging our data as a tensorflow Dataset object will allow the tensorflow to *efficiently* use our data to train AI systems. It also makes coding the AI-training process *much* easier (for us!). So, we pay a little bit of up-front headache packaging our numpy data into a tensorflow Dataset object, but it's a 'win-win' in the long-run.

The take-home from the previous code-block is that we can create a tensorflow Dataset object by using the static method:

    tf.data.Dataset.from_tensor_slices(INPUT)

where we replace "INPUT" with our own data, wherever it came from. In our case, our simulated x,y pairs were stored in the "x" and "y" variables, respectively. We created the INPUT parameter by 'packing' x and y into a python "tuple", using the "(" and ")" symbols (open- and close-parenthesis, respectively).

A python "tuple" is an ordered list of items, which is immutable (cannot be changed) after its creation. If we have python variables "x" and "y", we can create a "tuple" holding x,y pairs with the python syntax:

    (x,y)

which is the INPUT to the from_tensor_slices method.

We stored our tensorflow Dataset in the variable "data", so we can use it later.

## tensorflow linear model

We have simulated our training data and packaged it for tensorflow. The next thing we need is a linear model.

Well, a neural network can quite easily be used to implement a linear model.

In fact, we know that *any* neural network lacking non-linear activations is just a linear model!

But why 'waste' model parameters; we'll go with the *simplest* neural network that is a linear model, which is just a single-neuron with linear activation (ie, no non-linear activation).

Even though this *particular* network is very simple, the *exact same* procedure is used to build *any* neural network in tensorflow (at least approximately; some really odd models require you to build your own objects, but these are pretty rare cases). Once you get the basic procedure down, you can use the same procedure to generate *very* complex state-of-the-art models.

The approach we'll use to build our neural network models is based on the "Keras" application programming interface (API, a specification for communicating information across programs). Basically, the "Keras API" specifies how to create and use neural network models, and the implimentation of that "interface" is entirely up to you.

Tensorflow has implemented the Keras API, so we can use that API to create and train our model, and tensorflow will take care of the rest!

The simplest type of neural network is a simple "feed-forward" model, in which information proceeds sequentially through each layer in the model. Interestingly, a *large number* of existing neural network models are 'just' sequential models using simple building blocks. Even more 'cutting edge' architectures that use multiple complex data branches and joins (like "residual" or "dense" architectures, as well as U-shaped networks used in various fields) can be implemented as sequential models of more complex computations (using custom sub-classes to create new objects). The take-home message here is that sequential models (although 'simple'), actually account for at least 90% of all neural network architectures. It *may* be that *any* neural network could be encoded sequentially; can any directed acyclic graph be embedded in a linear sequence? I'll leave that math for those better than I, though :)

For our purposes, we can implement *a lot* of *very* complex models using the Keras Sequential framework.

So, let's build a Sequential model using Keras in tensorflow...

In [6]:
model = tf.keras.models.Sequential()

whew... that's some heavy-lifting! :)

That one line of python code created a new sequential model.

Unfortunately... that model is "empty". There's "nothing" in it, at all. No computation. No data. nothing. kinda sad... empty... model...

So, let's create a linear model and put it in our model as the first layer.

We'll use a Keras "Layer" object to implement our linear model. A "Layer" in Keras implents a neural network "layer"; a collection of neurons all connected to the same inputs. Layers can be as complex as you'd like, and there are many 'advanced' neuron types that do fairly complex operations. All of these 'advanced' neurons are built on smaller components, which are built using smaller components... which are implemented using linear models (combined with non-linear activations). Once we really 'understand' the simplest case, we can use the same code to run much more complex models, just by 'swapping in' a different type of "Layer" object.

But all we need *now* is a simple linear model, so we can use the Keras "Dense" Layer. The "Dense" layer implements a 'densely connected' neural network layer. The Dense layer can consist of *any number* of neurons, each of which will be connected to *all* the layer's inputs. Each input has a "weight" (trainable parameter) and (optionally) a "bias". Each neuron accepts *all* the inputs and returns the linear-sum of the weighted input values, plus the bias term. All weights and biases (one for each neuron in the Layer) are trainable parameters of the model.

That sounds pretty complicated, but if we use a layer having only 1 neuron, we know a single neuron is just a simple linear model, so we have our linear model implementation as a single-neuron Dense Layer Sequential Model. If we want to make the model more complex, we can add more neurons in the Dense layer - this would create a multi-linear model (multiple ys fit to the same xs), but we don't need that for our problem.

Let's build the network layer and add it to our model, implementing the linear model in tensorflow.

In [7]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(units=1, input_shape=[1]))

Line 1 is just reproduced from the code cell above, so we don't end up inadvertently adding multiple layers to our model. It just creates an empty sequential model.

Line 2 implements the simple linear model.

We're actually doing three things here.

1.  implementing a simple linear model in 1 dimension using a "Dense" neural network layer with 1 neuron.
2.  adding the linear model to our sequential model, so we can fit it to data later on.
3.  specifying the rank and dimensionality of the input tensor, so tensorflow knows how to build the model.

To implement the single-neuron Dense layer (ie, 'simple linear model'), we specify the "units=1" option to the "tf.keras.layers.Dense(...)" constructor, telling it we want a single neuron layer. If we wanted 2 neurons, we could specify

    units=2

or any number we'd wish, but for now we just want a simple linear model.

You can edit *any* of these code cells, so you can feel free to play around if you are interested. In many cases, 'tinkering' with neural networks in this way will 'work' out-of-the-box (meaning they will calculate and not cause an error); whether the model is actually *useful* for anything...

In any case, you can always pull a 'fresh' jupyter notebook from the github repository (clicking directly on the link to the notebook from the course website *should* give you a fresh notebook; if you want to save your notebook for your personal use, you can save it as a copy to your google drive (howto is [here](https://bryankolaczkowski.github.io/lessons/jupyter_colab_workflow.html)). 

But for now, we don't need a multi-linear model, so we'll just stick with

    units=1

to implement a simple linear model.

The code

    model.add(...)

adds the new Dense layer to the model, which was previously an empty model. If you wanted another layer, you could just execute another call to

    model.add(...)

with a different layer specification. But for now we only need 1 layer, so we don't need to call "model.add" again.

Finally, we need to specify 'the rank and dimesionality of the input tensor'. That sounds like it could trip us up. Thankfully, the 'syntax' is pretty simple, especially in our case. We have 1-dimensional data; one input scalar "x" is mapped to one output scalar "y". We specify this 'shape' as

    [1]

to tell the tensorflow model to be built in 1 dimension ("1"), with input tensors of rank 1 (because there is a single value between "[", and "]"). We'll need more complex input shapes later on, when we work with multi-dimensional and multi-rank predictors, but for now the input specification is pretty simple.

We specify the shape of the input data using the

    input_shape=INPUT

option, when we create the *first* layer in our network. There are alternative ways to specify the input shape, but this is convenient in most cases.

In our case, we set input_shape equal to the data specification of our simple linear model in 1 dimension.

    input_shape=[1]

Now that our simple linear model is built, we can 'see' it using the model's "summary" method:

In [None]:
model.summary()

## create the loss function

Our neural network needs something to optimze, so it can fit the model's parameters to the training data.

XXHEREXX

1. specify loss function
2. specify optimizer
3. create training data set from simulated data - concept of batching and training epochs
4. train model (ie, fit model to data over multiple epochs)
5. see loss function (MSE) going down.
6. model is 'fit'; let's use it to predict y values for our simulated x values, and print the y_hat values against y_true, to 'see' how well fit our model is
7. SUMMARY: end-to-end example, with brief summary afterwards
8. take-home messages and discussion

In [None]:
model.compile(optimizer=tf.keras.optimizers.SGD(),
              loss=tf.keras.losses.MeanSquaredError())

data = data.batch(10)
model.fit(data, epochs=100)
