# Introduction to Machine Learning - TF

We are going to use TensorFlow for this part of the demo - it works on the same principles that JAX does - but just hides them further in the code.

TensorFlow and `pytorch` are the most popular large ML fitting frameworks out there. Both are easy to use and there are loads of tutorials available on the web.

## Some Straight Line Data

Who hasn't been forced to fit a straight line?

Lets create some data that approximates a straight line, with some "errors". We'll fit this!

In [None]:
import numpy as np
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split

In [None]:
# Straight line with jitter
x = np.random.normal(size=10000)
jitter = np.random.normal(size=10000)
y = 3*x + 2 + 0.5*jitter

We have 100 points like this - best to look at them graphically

## Plotting the Jitter-y straight line

Next, lets plot this to make sure we got what we wanted: A line that is $y = f(x) = mx + b + jitter$, where $m=3$ and $b=2$.

In [None]:
from matplotlib import pyplot as plt

plt.scatter(x, y)
plt.plot(x, 3*x + 2, color='red')
plt.xlabel('x')
plt.ylabel('y')
plt.show()

As expected - a messy line!

## Create a network to fit this line

Create a network that is 2 fully interconnected layers with one output layer. It takes a single value as input.

In [None]:
model = tf.keras.Sequential()
model.add(layers.Dense(16, activation='relu', input_shape=(1,)))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(1))
print(model.summary())

Where does the parameter column come from? Can we account for the total number of trainable parameters?

## Test and Training

We split the data into the test and training samples - to avoid bias.

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3)

In [None]:
print(f"Number of training samples in x: {len(x_train)}")
print(f"Number of testing samples in x: {len(x_test)}")
print(f"Number of training samples in y: {len(y_train)}")
print(f"Number of testing samples in y: {len(y_test)}")


## Fit

Here we fit the function to the input data. First, we run a TensorFlow step which will make further steps more efficient - we compile it.

In [None]:
model.compile(optimizer='rmsprop', loss='mse')

The `mse` stands for Mean Squared Error - which is exactly what were using:

```python
mse = tf.reduce_mean(tf.square(y_true - y_pred))
```

In [None]:
model.fit(x_train, y_train, epochs=30, validation_data=(x_test, y_test))

## Plotting the result!

Lets fit the line and plot the results as we did last time.

In [None]:
x_coord = np.linspace(-3, 3, 100)
y_coord = model.predict(x_coord)

In [None]:
plt.scatter(x_test, y_test, label='Test Data', color='black')
plt.plot(x, 3*x + 2, color='green', label='Real L,ine')
plt.plot(x_coord, y_coord, color='red', label='Model Prediction')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

Looks great - but note we can't read the slope off the parameters!

In [None]:
for i, layer in enumerate(model.layers):
    print(f"Layer {i+1} weights:")
    print(layer.get_weights())
