In [1]:
import tensorflow as tf
from tensorflow.feature_column import numeric_column
from tensorflow.feature_column import crossed_column
from tensorflow.feature_column import indicator_column
from tensorflow.feature_column import categorical_column_with_identity
from tensorflow_transform.tf_metadata import dataset_schema

tf.__version__

'1.13.1'

In [2]:
with open('temp_dir.txt') as file:
    temp_dir = file.read()
import os
file_pattern = os.path.join(temp_dir, "training.tfr-*")
file_pattern

'C:\\Users\\wgi\\AppData\\Local\\Temp\\tmpj9tjlo0n\\training.tfr-*'

```make_tfr_input_fn```, as described in [InputFunctions.ipynb](InputFunctions.ipynb):

In [3]:
from training_functions import make_tfr_input_fn

In [4]:
train_input_fn = make_tfr_input_fn(
    filename_pattern=file_pattern,
    batch_size=2, 
    options={'num_epochs': None,  # repeat infinitely
             'shuffle_buffer_size': 1000,
             'prefetch_buffer_size': 1000,
             'reader_num_threads': 10,
             'parser_num_threads': 10,
             'sloppy_ordering': True,
             'distribute': False})

### Creating the input layer
We create a $170$-dimensional layer: $168$ dimensions for the hour of the week and two more for $\beta_1$ and $\beta_2$. And we start from the original data (Actually, $\beta_1$ and $\beta_2$ are already processed, i.e. scaled).

In [5]:
features = train_input_fn()[0] # We omit the 'humidity' label
features

{'beta1': <tf.Tensor 'IteratorGetNext:0' shape=(2, 1) dtype=float32>,
 'beta2': <tf.Tensor 'IteratorGetNext:1' shape=(2, 1) dtype=float32>,
 'hour': <tf.Tensor 'IteratorGetNext:2' shape=(2, 1) dtype=int64>,
 'weekday': <tf.Tensor 'IteratorGetNext:3' shape=(2, 1) dtype=int64>}

The betas are simple numeric columns.

In [6]:
beta1 = numeric_column('beta1')
beta2 = numeric_column('beta2')

---
Remember: There were particular hours on particular days where the quality of our prediction of the humidity suddenly decreased significantly. Thus here, we encode the hour of the week in the assumption that it is essentially influencing the problem.
We create that $24 \times 7 = 168$-dimensional feature cross for the one-hot-encoded *hour of the week*.

In [7]:
weekday = categorical_column_with_identity('weekday', num_buckets=7)
hour = categorical_column_with_identity('hour', num_buckets=24)
hour_of_week = indicator_column(crossed_column([weekday, hour], 24*7))

In [8]:
all_feature_columns = [beta1, beta2, hour_of_week]

input_layer = tf.feature_column.input_layer( 
    features, 
    feature_columns=[beta1, beta2, hour_of_week])
input_layer

Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please 

<tf.Tensor 'input_layer/concat:0' shape=(2, 170) dtype=float32>

In [9]:
with tf.Session() as sess:
    inp170=sess.run(input_layer)

Below you can see, that we have 2 records (that's the batch size, we chose), both consisting of two float features - the $\beta$s, and a single value of $1$ the position of which indicating the very hour of the week when the $\beta$s have been measured. What may appear a massive waste is actually a very efficient way of dealing with categorical values in the context of machine learning.

In [10]:
inp170

array([[0.4236383 , 0.90770316, 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.  