In [None]:
# Use seaborn for pairplot
!pip install -q seaborn

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns


# Make numpy printouts easier to read.
np.set_printoptions(precision=3, suppress=True)

In [None]:
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing

print(tf.__version__)

### Get the data
First download and import the dataset using pandas:

In [None]:
url = '../input/yeh-concret-data/Concrete_Data_Yeh.csv'
column_names = ["cement","slag","flyash","water","superplasticizer","coarseaggregate","fineaggregate","age","csMPa"
]

raw_dataset = pd.read_csv(url)

In [None]:
dataset = raw_dataset.copy()
dataset.tail()

### Clean the data

The dataset contains a few unknown values.

In [None]:
dataset.isna().sum()

Drop those rows to keep this initial tutorial simple.

In [None]:
dataset = dataset.dropna()

The `"Origin"` column is really categorical, not numeric. So convert that to a one-hot:

Note: You can set up the `keras.Model` to do this kind of transformation for you. That's beyond the scope of this tutorial. See the [preprocessing layers](../structured_data/preprocessing_layers.ipynb) or [Loading CSV data](../load_data/csv.ipynb) tutorials for examples.

### Split the data into train and test

Now split the dataset into a training set and a test set.

We will use the test set in the final evaluation of our models.

In [None]:
train_dataset = dataset.sample(frac=0.8, random_state=0)
test_dataset = dataset.drop(train_dataset.index)

### Inspect the data

Have a quick look at the joint distribution of a few pairs of columns from the training set.

Looking at the top row it should be clear that the fuel efficiency (MPG) is a function of all the other parameters. Looking at the other rows it should be clear that they are each functions of eachother.

In [None]:
sns.pairplot(train_dataset[["cement","slag","flyash","water","superplasticizer","coarseaggregate","fineaggregate","age","csMPa"]], diag_kind='kde')

Also look at the overall statistics, note how each feature covers a very different range:

In [None]:
train_dataset.describe().transpose()

### Split features from labels

Separate the target value, the "label", from the features. This label is the value that you will train the model to predict.

In [None]:
train_features = train_dataset.copy()
test_features = test_dataset.copy()

train_labels = train_features.pop('csMPa')
test_labels = test_features.pop('csMPa')

## Normalization

In the table of statistics it's easy to see how different the ranges of each feature are.

In [None]:
train_dataset.describe().transpose()[['mean', 'std']]

It is good practice to normalize features that use different scales and ranges. 

One reason this is important is because the features are multiplied by the model weights. So the scale of the outputs and the scale of the gradients are affected by the scale of the inputs. 

Although a model *might* converge without feature normalization, normalization makes training much more stable. 

### The Normalization layer
The `preprocessing.Normalization` layer is a clean and simple way to build that preprocessing into your model.

The first step is to create the layer:

In [None]:
normalizer = preprocessing.Normalization()

Then `.adapt()` it to the data:

In [None]:
normalizer.adapt(np.array(train_features))

This calculates the mean and variance, and stores them in the layer. 

In [None]:
print(normalizer.mean.numpy())

When the layer is called it returns the input data, with each feature independently normalized:

In [None]:
first = np.array(train_features[:1])

with np.printoptions(precision=2, suppress=True):
  print('First example:', first)
  print()
  print('Normalized:', normalizer(first).numpy())

## Linear regression

Before building a DNN model, start with a linear regression.

### One Variable

Start with a single-variable linear regression, to predict `MPG` from `Horsepower`.

Training a model with `tf.keras` typically starts by defining the model architecture.

In this case use a `keras.Sequential` model. This model represents a sequence of steps. In this case there are two steps:

* Normalize the input `horsepower`.
* Apply a linear transformation ($y = mx+b$) to produce 1 output using `layers.Dense`.

The number of _inputs_ can either be set by the `input_shape` argument, or automatically when the model is run for the first time.

First create the horsepower `Normalization` layer:

In [None]:
superplasticizer = np.array(train_features['superplasticizer'])

superplasticizer_normalizer = preprocessing.Normalization(input_shape=[1,])
superplasticizer_normalizer.adapt(superplasticizer)

Build the sequential model:

In [None]:
superplasticizer_model = tf.keras.Sequential([
    superplasticizer_normalizer,
    layers.Dense(units=1)
])

superplasticizer_model.summary()

This model will predict `MPG` from `Horsepower`.

Run the untrained model on the first 10 horse-power values. The output won't be good, but you'll see that it has the expected shape, `(10,1)`:

In [None]:
superplasticizer_model.predict(superplasticizer[:10])

Once the model is built, configure the training procedure using the `Model.compile()` method. The most important arguments to compile are the `loss` and the `optimizer` since these define what will be optimized (`mean_absolute_error`) and how (using the `optimizers.Adam`).

In [None]:
superplasticizer_model.compile(
    optimizer=tf.optimizers.Adam(learning_rate=0.1),
    loss='mean_absolute_error')

Once the training is configured, use `Model.fit()` to execute the training:

In [None]:
%%time
history = superplasticizer_model.fit(
    train_features['superplasticizer'], train_labels,
    epochs=100,
    # suppress logging
    verbose=0,
    # Calculate validation results on 20% of the training data
    validation_split = 0.2)

Visualize the model's training progress using the stats stored in the `history` object.

In [None]:
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
hist.tail()

In [None]:
def plot_loss(history):
  plt.plot(history.history['loss'], label='loss')
  plt.plot(history.history['val_loss'], label='val_loss')
  plt.ylim([0, 100])
  plt.xlabel('Epoch')
  plt.ylabel('Error [csMPa]')
  plt.legend()
  plt.grid(True)

In [None]:
plot_loss(history)

Collect the results on the test set, for later:

In [None]:
test_results = {}

test_results['superplasticizer'] = superplasticizer_model.evaluate(
    test_features['superplasticizer'],
    test_labels, verbose=0)

SInce this is a single variable regression it's easy to look at the model's predictions as a function of the input:

In [None]:
x = tf.linspace(10, 70, 101)
y = superplasticizer_model.predict(x)

In [None]:
def plot_superplasticizer(x, y):
  plt.scatter(train_features['superplasticizer'], train_labels, label='Data')
  plt.plot(x, y, color='k', label='Predictions')
  plt.xlabel('superplasticizer')
  plt.ylabel('csMPa')
  plt.legend()

In [None]:
plot_superplasticizer(x,y)

### Multiple inputs

You can use an almost identical setup to make predictions based on multiple inputs. This model still does the same $y = mx+b$ except that $m$ is a matrix and $b$ is a vector.

This time use the `Normalization` layer that was adapted to the whole dataset.

In [None]:
linear_model = tf.keras.Sequential([
    normalizer,
    layers.Dense(units=1)
])

When you call this model on a batch of inputs, it produces `units=1` outputs for each example.

In [None]:
linear_model.predict(train_features[:10])

When you call the model it's weight matrices will be built. Now you can see that the `kernel` (the $m$ in $y=mx+b$) has a shape of `(9,1)`.

In [None]:
linear_model.layers[1].kernel

Use the same `compile` and `fit` calls as for the single input `horsepower` model:

In [None]:
linear_model.compile(
    optimizer=tf.optimizers.Adam(learning_rate=0.1),
    loss='mean_absolute_error')

In [None]:
%%time
history = linear_model.fit(
    train_features, train_labels, 
    epochs=100,
    # suppress logging
    verbose=0,
    # Calculate validation results on 20% of the training data
    validation_split = 0.2)

Using all the inputs achieves a much lower training and validation error than the `horsepower` model: 

In [None]:
plot_loss(history)

Collect the results on the test set, for later:

In [None]:
test_results['linear_model'] = linear_model.evaluate(
    test_features, test_labels, verbose=0)

## A DNN regression

The previous section implemented linear models for single and multiple inputs.

This section implements single-input and multiple-input DNN models. The code is basically the same except the model is expanded to include some "hidden"  non-linear layers. The name "hidden" here just means not directly connected to the inputs or outputs.

These models will contain a few more layers than the linear model:

* The normalization layer.
* Two hidden, nonlinear, `Dense` layers using the `relu` nonlinearity.
* A linear single-output layer.

Both will use the same training procedure so the `compile` method is included in the `build_and_compile_model` function below.

In [None]:
def build_and_compile_model(norm):
  model = keras.Sequential([
      norm,
      layers.Dense(64, activation='relu'),
      layers.Dense(64, activation='relu'),
      layers.Dense(1)
  ])

  model.compile(loss='mean_absolute_error',
                optimizer=tf.keras.optimizers.Adam(0.001))
  return model

### One variable

Start with a DNN model for a single input: "Horsepower"

In [None]:
dnn_superplasticizer_model = build_and_compile_model(superplasticizer_normalizer)

This model has quite a few more trainable parameters than the linear models.

In [None]:
dnn_superplasticizer_model.summary()

Train the model:

In [None]:
%%time
history = dnn_superplasticizer_model.fit(
    train_features['superplasticizer'], train_labels,
    validation_split=0.2,
    verbose=0, epochs=100)

This model does slightly better than the linear-horsepower model.

In [None]:
plot_loss(history)

If you plot the predictions as a function of `Horsepower`, you'll see how this model takes advantage of the nonlinearity provided by the hidden layers:

In [None]:
x = tf.linspace(2, 50, 10)
y = dnn_superplasticizer_model.predict(x)

In [None]:
plot_superplasticizer(x, y)

Collect the results on the test set, for later:

In [None]:
test_results['dnn_superplasticizer_model'] = dnn_superplasticizer_model.evaluate(
    test_features['superplasticizer'], test_labels,
    verbose=0)

### Full model

If you repeat this process using all the inputs it slightly improves the performance on the validation dataset.

In [None]:
dnn_model = build_and_compile_model(normalizer)
dnn_model.summary()

In [None]:
%%time


history = dnn_model.fit(
    train_features, train_labels,
    validation_split=0.2,
    verbose=0, epochs=100)

In [None]:
plot_loss(history)

Collect the results on the test set:

In [None]:
test_results['dnn_model'] = dnn_model.evaluate(test_features, test_labels, verbose=0)

## Performance

Now that all the models are trained check the test-set performance and see how they did:

In [None]:
pd.DataFrame(test_results, index=['Mean absolute error [csMPa]']).T

These results match the validation error seen during training.

### Make predictions

Finally, predict have a look at the errors made by the model when making predictions on the test set:

In [None]:
test_predictions = dnn_model.predict(test_features).flatten()

a = plt.axes(aspect='equal')
plt.scatter(test_labels, test_predictions)
plt.xlabel('True Values [csMPa]')
plt.ylabel('Predictions [csMPa]')
lims = [0, 50]
plt.xlim(lims)
plt.ylim(lims)
_ = plt.plot(lims, lims)


It looks like the model predicts reasonably well. 

Now take a look at the error distribution:

In [None]:
error = test_predictions - test_labels
plt.hist(error, bins=25)
plt.xlabel('Prediction Error [csMPa]')
_ = plt.ylabel('Count')

If you're happy with the model save it for later use:

In [None]:
dnn_model.save('model')

If you reload the model, it gives identical output:

In [None]:
print(test_features)

In [None]:
reloaded = tf.keras.models.load_model('model')

test_results['reloaded'] = reloaded.evaluate(
    test_features, test_labels, verbose=0)

In [None]:
new_data=train_features[:1].copy()
new_data=new_data.replace([500.0, 0.0, 0.0, 200.0,0.0,1125.0,613.0,3],[498.0, 0.0, 0.0, 200.0,0.0,1125.0,613.0,4])
predict=reloaded.predict(new_data)
print(predict)
print(new_data)
tf.keras.utils.plot_model(
    reloaded, to_file='model.png', show_shapes=False, show_layer_names=True,
    rankdir='TB', expand_nested=False, dpi=96)

In [None]:
pd.DataFrame(test_results, index=['Mean absolute error [c]']).T