https://colab.research.google.com/notebooks/mlcc/first_steps_with_tensor_flow.ipynb?utm_source=mlcc&utm_campaign=colab-external&utm_medium=referral&utm_content=firststeps-colab&hl=en#scrollTo=EL8-9d4ZJNR7

Very broadly speaking, here's the pseudocode for a linear classification program implemented in tf.estimator:

In [1]:
import tensorflow as tf

# Set up a linear classifier.
classifier = tf.estimator.LinearClassifier(feature_columns)

# Train the model on some example data.
classifier.train(input_fn=train_input_fn, steps=2000)

# Use it to predict.
predictions = classifier.predict(input_fn=predict_input_fn)

  from ._conv import register_converters as _register_converters


NameError: name 'feature_columns' is not defined

Load the necessary libraries

In [8]:
from __future__ import print_function

import math
from IPython import display
from matplotlib import cm
from matplotlib import gridspec
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from sklearn import metrics
import tensorflow as tf
from tensorflow.python.data import Dataset

tf.logging.set_verbosity(tf.logging.ERROR)
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format

Next, we'll load our data set.

In [9]:
california_housing_dataframe = pd.read_csv("https://download.mlcc.google.com/mledu-datasets/california_housing_train.csv", sep=",")

Randomise the data, to be sure there we do not get any pathological ordering effetcs that might harm the performance of Stochastic Gradient Descent.
Additionally, we'll scale `median_house_value` to be in units of thousands, so it can be learned a little more easily with learning rates that are usually used.

In [10]:
california_housing_dataframe = california_housing_dataframe.reindex(
    np.random.permutation(california_housing_dataframe.index))
california_housing_dataframe["median_house_value"] /= 1000.0
california_housing_dataframe

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
1703,-117.2,32.7,32.0,4164.0,701.0,1277.0,607.0,6.7,500.0
15727,-122.4,37.8,52.0,464.0,202.0,286.0,148.0,1.6,112.5
3976,-118.0,33.8,25.0,1323.0,208.0,852.0,229.0,4.6,237.3
12004,-121.4,36.2,28.0,1057.0,249.0,288.0,130.0,3.1,146.9
7341,-118.3,33.8,33.0,2194.0,469.0,987.0,397.0,5.1,318.9
...,...,...,...,...,...,...,...,...,...
10079,-119.8,36.7,31.0,2214.0,432.0,1326.0,416.0,2.2,66.7
6199,-118.2,34.0,44.0,448.0,116.0,504.0,96.0,1.9,98.6
11179,-121.0,37.5,32.0,946.0,198.0,624.0,173.0,2.0,97.9
13659,-122.0,37.6,31.0,2878.0,478.0,1276.0,485.0,6.2,282.5


## Examine the Data
It's a good idea to get to know the data a little bit before working with it

In [11]:
california_housing_dataframe.describe()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
count,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0
mean,-119.6,35.6,28.6,2643.7,539.4,1429.6,501.2,3.9,207.3
std,2.0,2.1,12.6,2179.9,421.5,1147.9,384.5,1.9,116.0
min,-124.3,32.5,1.0,2.0,1.0,3.0,1.0,0.5,15.0
25%,-121.8,33.9,18.0,1462.0,297.0,790.0,282.0,2.6,119.4
50%,-118.5,34.2,29.0,2127.0,434.0,1167.0,409.0,3.5,180.4
75%,-118.0,37.7,37.0,3151.2,648.2,1721.0,605.2,4.8,265.0
max,-114.3,42.0,52.0,37937.0,6445.0,35682.0,6082.0,15.0,500.0


## Build the first model
In this exercise, we'll try to predict `median_house_value`, which will be our label. We'll use `total_rooms` as our input feature.

**NOTE:** The data is at the city block level, so this feature represents the total number of rooms in that block

To train the model, we'll usethe  LinearRegressor interface provided by the TensorFlow Estimator API. This API takes care of a lot of the low-level model plumbing, and exposes convinient methods for performing model traiing, evaluation and inference.

### Step 1: Define Features and Configure Feature Columns
In order to import our training data into TensorFlow, we need to specify what type of data each feature contains. There are two main types of data that will be used in this course:
  - **Categorical Data: ** Textual data
  - **Numerical Data: ** Integer or float

In TensorFlow, we indicate a feature's data type using a construct called a **feature column**. Feature columns store only a description of the feature data; they do not contain the feature data itself.

To start, we're going to use just one numeric input feature, `total_rooms`. The following code pulls the `total_rooms` data from our `california_housing_dataframe` and defines the feature column using `numeric_column`, which specifies its data is numeric:

In [12]:
# Define the input feature: total_rooms.
my_feature = california_housing_dataframe[["total_rooms"]]

# Configure a numeric feature column for total_rooms.
feature_columns = [tf.feature_column.numeric_column("total_rooms")]

**NOTE:** The shape of the `total_rooms` data is a one-dimensional array (a list of the total number of rooms in each block). This is the default shape for `numeric_column`, so we don't have to pass it an argument.

### Step 2: Define the Target
Next, we'll define our target, which is `median_house_value`. Again, we can pull it from our `california_housing_dataframe`:

In [13]:
# Define the label.
targets = california_housing_dataframe["median_house_value"]

### Step 3: Configure the LinearRegressor
Next, we'll configure a linear regression model using LinearRegressor. We'll train this model using the `GradientDescentOptimizer`, which implements Min-Batch Stochastic Gradient Descent (SGD). The `learning_rate` argument controls the size of the gradient step.

**NOTE: ** To be safe, we also apply gradient clipping to our optimiser via `clip_gradients_by_norm`. Gradient clipping ensures the magnitude of the gradients do not become too large during training, which can cause gradient descent to fails.

In [14]:
# Use gradient descent as the optimiser for training the model.
my_optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.0000001)
my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0)

# Configure the linear regression model with our feature columns and optimizer.
# Set a learning rate of 0.0000001 for Gradient Descent.
linear_regressor = tf.estimator.LinearRegressor(
    feature_columns=feature_columns,
    optimizer=my_optimizer
)

### Step 4: Define the Input Function
To import our California housing data into our `LinearRegressor`, we need to define an input function, which instructs TensorFlow how to preprocess the data, as well as how to batch, shuffle, and repeat it during model training.

First, we'll convert our *pandas* feature data into a dict of NumPy arrays. We can then use the TensorFlow Dataset API to construct a dataset object from our data, and then break our data into batches of `batch_size`, to be repeated for the specified number of epochs (num_epochs)

**NOTE: **when the default value of `num_epochs=None` is passed to `repeat()`, the input data will be repeated indefinitley.

Next, if `shuffle` is set to `True`, we'll shuffle the data so that it's passed to the model randomly during training. The `buffer_size` argument specifies the size of the dataset from which `shuffle` will randomly sample.

Finally, our input function constructs an iterator fot the dataset and returns the next batch of data to the LinearRegressor.

In [20]:
def my_input_fn(features, targets, batch_size=1, shuffle=True, num_epochs=None):
    """Trains a linear regression model of one feature.
    
    Args:
      features: pandas DataFrame of features
      targets: pandas DataFrame of targets
      batch_size: Size of batches to be passed to the model
      shuffle: True or False. Whether to shuffle the data
      num_epochs: Number of epochs for which data should be repeated. None = repeat indefinitley
    Returns:
      Tuple of (features, labels) for the next data batch
    """
    
    # Convert pandas data into a dict of np arrays.
    features = {key:np.array(value) for key,value in dict(features).items()}
    
    # Construct a dataset, and configure batching/repeating.
    ds = Dataset.from_tensor_slices((features, targets)) # warning 2GB limit
    ds = ds.batch(batch_size).repeat(num_epochs)
    
    # Shuffle the data, if specified.
    if shuffle:
        ds = ds.shuffle(buffer_size=10000)
        
    # Return the next batch of data.
    features, labels = ds.make_one_shot_iterator().get_next()
    return features, labels