<a href="https://colab.research.google.com/github/sargyri/Drop_Lev/blob/master/Machine_learning/Lev_droplet_water_predict_vol_volt_from_coord.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Copyright 2018 The TensorFlow Authors.

In [0]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

In [0]:
#@title MIT License
#
# Copyright (c) 2017 François Chollet
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.

# Lev Droplet 

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/tutorials/keras/basic_regression"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/docs/blob/r2.0rc/site/en/tutorials/keras/basic_regression.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/docs/blob/r2.0rc/site/en/tutorials/keras/basic_regression.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

NN for predicting volume and voltage from the coordinate in polar system

In [0]:
# Use seaborn for pairplot
!pip install seaborn
#!pip install talos

In [0]:
from __future__ import absolute_import, division, print_function, unicode_literals

import pathlib

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

import tensorflow as tf
#tf.disable_v2_behavior()

from tensorflow import keras
from tensorflow.keras import layers

print(tf.__version__)

### Get the data
First download the dataset.

In [0]:
from google.colab import files
uploaded = files.upload()

In [0]:
import io
csvfilename=str('TritonX100_conci_0.0494_all.csv')
input = pd.read_csv(io.BytesIO(uploaded[csvfilename]))
#input = pd.read_csv(csvfilename)
# Dataset is now stored in a Pandas Dataframe
#input.size

Import it using pandas

In [0]:
column_names=[]
for i in range(300):
  column_names.append('rho_%d' %i)
for i in range(300):
  column_names.append('phi_%d' %i)
for i in range(300):
  column_names.append('theta_%d' %i)
column_names.append('height')
column_names.append('width')  
column_names.append('volume')
column_names.append('R_sph')
column_names.append('volt')
#column_names.append('Intensity')
#column_names.append('time')
column_names.append('TritonX_conc')
column_names.append('Ar')
#column_names.append('ST_calib')
#column_names.append('Ps')
#column_names.append('Ps_err')
#column_names.append('Ps_stderr')
#column_names.append('Ps_dB')
column_names.append('st')
raw_dataset = pd.read_csv(csvfilename, sep="\t",names=column_names, na_values = "?", comment='#',  skipinitialspace=True) #header=None,                   
dataset = raw_dataset.copy()
dataset.tail()

### Split the data into train and test

Now split the dataset into a training set and a test set.

We will use the test set in the final evaluation of our model.

In [0]:
train_dataset = dataset.sample(frac=0.8,random_state=0)
test_dataset = dataset.drop(train_dataset.index)

### Inspect the data

Have a quick look at the joint distribution of a few pairs of columns from the training set.

In [0]:
#sns.pairplot(train_dataset[column_names], diag_kind="kde")
#plt.show()



```
# This is formatted as code
```

Also look at the overall statistics:

In [0]:
train_stats = train_dataset.describe()
train_stats.pop('volt')
train_stats.pop('volume')
train_stats = train_stats.transpose()
train_stats

### Split features from labels

Separate the target value, or "label", from the features. This label is the value that you will train the model to predict.

In [0]:
train_labels = train_dataset[['volume', 'volt']]
train_dataset.pop('volume')
train_dataset.pop('volt')
test_labels = test_dataset[['volume', 'volt']]
test_dataset.pop('volume')
test_dataset.pop('volt')

### Normalize the data

Look again at the `train_stats` block above and note how different the ranges of each feature are.

It is good practice to normalize features that use different scales and ranges. Although the model *might* converge without feature normalization, it makes training more difficult, and it makes the resulting model dependent on the choice of units used in the input.

Note: Although we intentionally generate these statistics from only the training dataset, these statistics will also be used to normalize the test dataset. We need to do that to project the test dataset into the same distribution that the model has been trained on.

In [0]:
def norm(x):
  return (x - train_stats['mean']) / train_stats['std']
normed_train_data = norm(train_dataset)
normed_test_data = norm(test_dataset)
#normed_train_data = train_dataset
#normed_test_data = test_dataset


This normalized data is what we will use to train the model.

Caution: The statistics used to normalize the inputs here (mean and standard deviation) need to be applied to any other data that is fed to the model, along with the one-hot encoding that we did earlier.  That includes the test set as well as live data when the model is used in production.

## The model

### Build the model

Let's build our model. Here, we'll use a `Sequential` model with two densely connected hidden layers, and an output layer that returns a single, continuous value. The model building steps are wrapped in a function, `build_model`, since we'll create a second model, later on.

In [0]:
def build_model():
  model = keras.Sequential([
    layers.Dense(800, activation=tf.nn.sigmoid, input_shape=[len(train_dataset.keys())]),
    layers.Dense(200, activation=tf.nn.sigmoid),
    layers.Dense(20, activation=tf.nn.sigmoid),
    layers.Dense(2)
  ])

  #optimizer = tf.keras.optimizers.RMSprop(0.0000001)
  #optimizer = tf.keras.optimizers.Adagrad(learning_rate=0.0001)
  optimizer = tf.keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)
  #optimizer = tf.keras.optimizers.Adamax(learning_rate=0.001, beta_1=0.8, beta_2=0.995)

  #model.compile(loss='mean_squared_error',
  #              optimizer=optimizer,
  #              metrics=['mean_absolute_error', 'mean_squared_error'])
  
  model.compile(loss='mean_absolute_error',
                optimizer=optimizer,
                metrics=['mean_absolute_error', 'mean_squared_error'])
  
  
  return model

In [0]:
model = build_model()

### Inspect the model

Use the `.summary` method to print a simple description of the model

In [0]:
model.summary()


Now try out the model. Take a batch of `10` examples from the training data and call `model.predict` on it.

In [0]:
example_batch = normed_train_data[:10]
example_result = model.predict(example_batch)
example_result

It seems to be working, and it produces a result of the expected shape and type.

### Train the model

Train the model for 1000 epochs, and record the training and validation accuracy in the `history` object.

In [0]:
# Display training progress by printing a single dot for each completed epoch
class PrintDot(keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs):
    if epoch % 100 == 0: print('')
    print('.', end='')

EPOCHS = 200

history = model.fit(
  normed_train_data, train_labels,
  epochs=EPOCHS, validation_split = 0.2, verbose=0,
  callbacks=[PrintDot()])

Visualize the model's training progress using the stats stored in the `history` object.

In [0]:
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
hist.tail()

In [0]:
def plot_history(history):
  hist = pd.DataFrame(history.history)
  hist['epoch'] = history.epoch

  plt.figure()
  plt.xlabel('Epoch')
  plt.ylabel('Mean Abs Error [volt]')
  plt.plot(hist['epoch'], hist['mean_absolute_error'],
           label='Train Error')
  plt.plot(hist['epoch'], hist['val_mean_absolute_error'],
           label = 'Val Error')
  plt.ylim([0,5])
  plt.legend()

  plt.figure()
  plt.xlabel('Epoch')
  plt.ylabel('Mean Square Error [$volt^2$]')
  plt.plot(hist['epoch'], hist['mean_squared_error'],
           label='Train Error')
  plt.plot(hist['epoch'], hist['val_mean_squared_error'],
           label = 'Val Error')
  plt.ylim([0,20])
  plt.legend()
  plt.show()


plot_history(history)

This graph shows little improvement, or even degradation in the validation error after about 100 epochs. Let's update the `model.fit` call to automatically stop training when the validation score doesn't improve. We'll use an *EarlyStopping callback* that tests a training condition for  every epoch. If a set amount of epochs elapses without showing improvement, then automatically stop the training.

You can learn more about this callback [here](https://www.tensorflow.org/versions/master/api_docs/python/tf/keras/callbacks/EarlyStopping).

In [0]:
model = build_model()

# The patience parameter is the amount of epochs to check for improvement
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=100)

history = model.fit(normed_train_data, train_labels, epochs=EPOCHS,
                    validation_split = 0.2, verbose=0, callbacks=[early_stop, PrintDot()])

plot_history(history)

The graph shows that on the validation set, the average error is usually around +/- 2 MPG. Is this good? We'll leave that decision up to you.

Let's see how well the model generalizes by using the **test** set, which we did not use when training the model.  This tells us how well we can expect the model to predict when we use it in the real world.

In [0]:
loss, mae, mse = model.evaluate(normed_test_data, test_labels, verbose=0)

print("Testing set Mean Abs Error: {:5.2f} µl".format(mae))

### Make predictions

Finally, predict volume values using data in the testing set:

In [0]:
test_predictions = model.predict(normed_test_data)
test_predictions2 = model.predict(normed_train_data)
#######################################################################
plt.scatter(test_labels.iloc[:,0], test_predictions[:,0])
plt.title('Normalized test data')
plt.xlabel('True Values [volume - μL]')
plt.ylabel('Predictions [volume - μL]')
plt.axis('equal')
plt.axis('square')
plt.xlim([0,plt.xlim()[1]])
plt.ylim([0,plt.ylim()[1]])
_ = plt.plot([-100, 100], [-100, 100])
plt.show()

plt.scatter(train_labels.iloc[:,0], test_predictions2[:,0])
plt.title('Normalized training data')
plt.xlabel('True Values [volume - μL]')
plt.ylabel('Predictions [volume - μL]')
plt.axis('equal')
plt.axis('square')
plt.xlim([0,plt.xlim()[1]])
plt.ylim([0,plt.ylim()[1]])
_ = plt.plot([-100, 100], [-100, 100])
plt.show()
#######################################################################
plt.scatter(test_labels.iloc[:,1], test_predictions[:,1])
plt.title('Normalized test data')
plt.xlabel('True Values [voltage - Volt]')
plt.ylabel('Predictions [voltage - Volt]')
plt.axis('equal')
plt.axis('square')
plt.xlim([0,plt.xlim()[1]])
plt.ylim([0,plt.ylim()[1]])
_ = plt.plot([-100, 100], [-100, 100])
plt.show()

plt.scatter(train_labels.iloc[:,1], test_predictions2[:,1])
plt.title('Normalized training data')
plt.xlabel('True Values [voltage - Volt]')
plt.ylabel('Predictions [voltage - Volt]')
plt.axis('equal')
plt.axis('square')
plt.xlim([0,plt.xlim()[1]])
plt.ylim([0,plt.ylim()[1]])
_ = plt.plot([-100, 100], [-100, 100])
plt.show()


Seriously?

In [0]:
error = test_predictions[:, 0] - test_labels.iloc[:, 0]
plt.hist(error, bins = 50)
plt.xlabel("Prediction Error [Volume - μL]")
_ = plt.ylabel("Count")
plt.show()
#################################################################
error = test_predictions[:, 1] - test_labels.iloc[:, 1]
plt.hist(error, bins = 50)
plt.xlabel("Prediction Error [Voltage - Volt]")
_ = plt.ylabel("Count")
plt.show()

# Saving the Model

When you are satisfied with the model and the accuracy continue from here:

Saving the model in a **SavedModel** format

In [0]:
!pip install -q pyyaml h5py  # Required to save models in HDF5 format

In [0]:
import os

In [0]:
# Save the entire model as a SavedModel.

!mkdir -p saved_model #Ceates a folder named "saved_model"
model.save('saved_model/DropLev_volt_vol_prediction_model') 

The SavedModel format is a directory containing a protobuf binary and a Tensorflow checkpoint. Inspect the saved model directory:

In [0]:
# DropLev_volt_vol_prediction_model directory
!ls saved_model

# Contains an assets folder, saved_model.pb, and variables folder.
!ls saved_model/DropLev_volt_vol_prediction_model

Saving the model in a **HDF5** format

In [0]:
# Save the entire model to a HDF5 file.
# The '.h5' extension indicates that the model should be saved to HDF5.
model.save('DropLev_volt_vol_prediction_model.h5') 

# Reload a fresh Keras model from the saved model:

For **SavedModel** format

In [0]:
new_model = tf.keras.models.load_model('saved_model/DropLev_volt_vol_prediction_model')

# Check its architecture
new_model.summary()

For **HDF5** format

In [0]:
# Recreate the exact same model, including its weights and the optimizer
new_model = tf.keras.models.load_model('DropLev_volt_vol_prediction_model.h5')

# Show the model architecture
new_model.summary()

Evaluate the restored model (for both formats)

In [0]:
# Evaluation
loss, mae, mse = new_model.evaluate(normed_test_data, test_labels)#, verbose=0)
print('Restored model, Mean absolute error: {:5.2f}'.format(mae))

print(new_model.predict(normed_test_data).shape)