## Challenge exercise

Create a neural network that is capable of finding the volume of a cylinder given the radius of its base (r) and its height (h). Assume that the radius and height of the cylinder are both in the range 0.5 to 2.0. Unlike in the challenge exercise for c_estimator.ipynb, assume that your measurements of r, h and V are all rounded off to the nearest 0.1. Simulate the necessary training dataset. This time, you will need a lot more data to get a good predictor.

Hint (highlight to see):
<p style='color:white'>
Create random values for r and h and compute V. Then, round off r, h and V (i.e., the volume is computed from the true value of r and h; it's only your measurement that is rounded off). Your dataset will consist of the round values of r, h and V. Do this for both the training and evaluation datasets.
</p>

Now modify the "noise" so that instead of just rounding off the value, there is up to a 10% error (uniformly distributed) in the measurement followed by rounding off.

In [1]:
import tensorflow as tf
import shutil
print(tf.__version__)

1.15.0


In [2]:
import pandas as pd
import numpy as np
from typing import Tuple, Dict

In [3]:
tf.enable_eager_execution()

# Creating fake data

In [4]:
import math

def add_round_noise(v: float) -> float:
    return math.floor(v * 10 + 0.5) / 10

def volume(r: float, h: float) -> float:
    return add_round_noise(math.pi * r ** 2 * h)

volume(2, 3)

37.7

In [5]:
def rand(data_size, min_val, max_val):
    r = np.random.random(data_size)
    return list((r + min_val) * (max_val - min_val))

rand(3, 0.5, 2.0)

[1.2742404469366355, 2.2051396315068557, 1.9081067880572844]

In [6]:
def create_data(file_path, data_size):
    df = pd.DataFrame({
        'r': rand(train_data_size, 0.5, 2.0),
        'h': rand(train_data_size, 0.5, 2.0),
    })
    df['v'] = df.apply(lambda x: volume(x['r'], x['h']), axis=1)
    df.to_csv(file_path, index=False)

In [7]:
train_data_size = 100000
train_csv_path = 'cylinder_train.csv'
eval_data_size = 1000
eval_csv_path = 'cylinder_eval.csv'

create_data(train_csv_path, train_data_size)
create_data(eval_csv_path, eval_data_size)

In [8]:
!head 'cylinder_train.csv'

h,r,v
1.696460927899504,1.6230925450593552,14.0
2.0359477976286646,1.5057178587118276,14.5
0.9006614468655583,1.0591731938427562,3.2
1.4151538120003724,1.9466229873368162,16.8
1.4948395876296874,0.8571580037105848,3.5
2.052116860978505,1.593828024225184,16.4
1.7112787281230688,1.610704828978848,13.9
1.180002745604126,1.180618590214592,5.2
1.7225918047720534,0.8837887689581585,4.2


In [9]:
!head 'cylinder_eval.csv'

h,r,v
0.888658107415006,1.550455670358788,6.7
1.910877694137314,1.1957766222808017,8.6
1.924271372383691,0.9403067950068089,5.3
1.0671182789484401,1.7583366830839484,10.4
0.9196097568858635,1.7855797677458498,9.2
1.6183310587875859,1.9954173688282038,20.2
1.110292591661596,1.874900388051557,12.3
1.459788824484539,1.2149361556919744,6.8
1.6461979292623592,1.123337340116479,6.5


# Data Function

In [10]:
FEATURE_NAMES = ['h', 'r']

def parse_row(row: str) -> Tuple[Dict[str, tf.Tensor], tf.Tensor]:
    cols = tf.decode_csv(row, record_defaults=[0.0, 0.0, 0.0])
    label = cols.pop()
    features = dict(zip(FEATURE_NAMES, cols))
    return features, label

parse_row('1.0, 2.0, 3.0')

({'h': <tf.Tensor: id=4, shape=(), dtype=float32, numpy=1.0>,
  'r': <tf.Tensor: id=5, shape=(), dtype=float32, numpy=2.0>},
 <tf.Tensor: id=6, shape=(), dtype=float32, numpy=3.0>)

In [11]:
def training_input_fn(path, batch_size = 128):
    dataset = tf.data.TextLineDataset(path).skip(1).map(parse_row).cache()
    return dataset.repeat().shuffle(1024).batch(batch_size)

def eval_input_fn(path, batch_size = 128):
    return tf.data.TextLineDataset(path).skip(1).map(parse_row).batch(batch_size)

In [12]:
next(iter(training_input_fn(train_csv_path, batch_size=2)))

({'h': <tf.Tensor: id=51, shape=(2,), dtype=float32, numpy=array([0.9521232, 1.8231676], dtype=float32)>,
  'r': <tf.Tensor: id=52, shape=(2,), dtype=float32, numpy=array([2.0135014, 1.4123929], dtype=float32)>},
 <tf.Tensor: id=53, shape=(2,), dtype=float32, numpy=array([12.1, 11.4], dtype=float32)>)

In [13]:
next(iter(eval_input_fn(eval_csv_path, batch_size=2)))

({'h': <tf.Tensor: id=89, shape=(2,), dtype=float32, numpy=array([0.8886581, 1.9108777], dtype=float32)>,
  'r': <tf.Tensor: id=90, shape=(2,), dtype=float32, numpy=array([1.5504557, 1.1957766], dtype=float32)>},
 <tf.Tensor: id=91, shape=(2,), dtype=float32, numpy=array([6.7, 8.6], dtype=float32)>)

# Model

In [14]:
feature_columns = [tf.feature_column.numeric_column(name) for name in FEATURE_NAMES]
model_dir = 'cy_model'
lr = 0.001

shutil.rmtree(model_dir, ignore_errors=True)

model = tf.estimator.DNNRegressor(
    hidden_units=[128, 128],
    feature_columns=feature_columns,
    model_dir=model_dir,
    optimizer=tf.train.AdamOptimizer(lr),
    config=tf.estimator.RunConfig(tf_random_seed=1),
)

INFO:tensorflow:Using config: {'_eval_distribute': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_task_type': 'worker', '_num_ps_replicas': 0, '_model_dir': 'cy_model', '_service': None, '_device_fn': None, '_experimental_max_worker_delay_secs': None, '_is_chief': True, '_task_id': 0, '_experimental_distribute': None, '_evaluation_master': '', '_log_step_count_steps': 100, '_save_checkpoints_steps': None, '_num_worker_replicas': 1, '_protocol': None, '_train_distribute': None, '_save_summary_steps': 100, '_save_checkpoints_secs': 600, '_master': '', '_tf_random_seed': 1, '_session_creation_timeout_secs': 7200, '_keep_checkpoint_max': 5, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fc2d93a71d0>, '_global_id_in_cluster': 0, '_keep_checkpoint_every_n_hours': 10000}


In [17]:
model.train(lambda: training_input_fn(train_csv_path, batch_size=128), steps=10000)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from cy_model/model.ckpt-1000
Instructions for updating:
Use standard file utilities to get mtimes.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1000 into cy_model/model.ckpt.
INFO:tensorflow:loss = 22.795193, step = 1000
INFO:tensorflow:global_step/sec: 126.673
INFO:tensorflow:loss = 13.352621, step = 1100 (0.791 sec)
INFO:tensorflow:global_step/sec: 137.247
INFO:tensorflow:loss = 8.445463, step = 1200 (0.730 sec)
INFO:tensorflow:global_step/sec: 135.405
INFO:tensorflow:loss = 8.710244, step = 1300 (0.736 sec)
INFO:tensorflow:global_step/sec: 141.625
INFO:tensorflow:loss = 5.968074, step = 1400 (0.706 sec)
INFO:tensorflow:global_step/sec: 134.454
INFO:tensorflow:loss = 5.1130958, step = 1500 (0.744 sec)
INFO:tensorflow:gl

<tensorflow_estimator.python.estimator.canned.dnn.DNNRegressor at 0x7fc2d93e4668>

In [18]:
metrics = model.evaluate(lambda: eval_input_fn(eval_csv_path))
metrics['average_loss']**0.5

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-11-21T13:48:35Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from cy_model/model.ckpt-11000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-11-21-13:48:41
INFO:tensorflow:Saving dict for global step 11000: average_loss = 0.002301367, global_step = 11000, label/mean = 11.455752, loss = 0.29429245, prediction/mean = 11.459977
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 11000: cy_model/model.ckpt-11000


0.04797256507364776