# Recommender Systems: Deep and Wide

All Rights Reserved to Google.

<img src="https://www.balabit.com/wp-content/uploads/2017/02/collaborative_filtering4-e1438083772522.png"  width="400">

In the previous session we learned about Collaborative Filtering. We also had somewhat easier exercises left to the audience. In this session, we will have more intense exercises.

The wide and deep tensorflow model was developed at Google, where they combined linear model and deep neural network. This architecture helped Google to achieve better product recommendations for Play Store users. They idea for this tutorial stems from a various famous <a href="https://ai.google/research/pubs/pub45413" style="color: #6D00FF;">paper</a> that has been making its rounds in the industry.

In this notebook we present some basics of the most up-to-date recommender systems to really get you started in this field. 

<img src="images/wide_n_deep.png"  width="1000">

The figure above shows a comparison of a wide model (logistic regression with sparse features and transformations), a deep model (feed-forward neural network with an embedding layer and several hidden layers), and a Wide & Deep model (joint training of both). 

### GOALS: 

In this tutorial we will learn about:
- Wide models, Deep models and Wide and Deep models.
- Exporting Models to Prediction
- Running Predictions


Ready?

## 0. Making the necessary imports

This tutorial will be making use of mostly tensorflow, pandas and numpy.

In [2]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import shutil
import sys

import tensorflow as tf
import numpy as np
import pandas as pd

## 1. Feature management



### Organizing the data

These steps are to organize the csv features in two different categories: numerical and categorical. These is a crucial step and the building blocks for the models we will be building thoughout this block.


Let's assume I told you that the following features are:
    
- categorical: workclass, education, marital_status, occupation, relationship, race, gender, native_country, income_bracker

- numerical: age, fnlwgt, education_num, occupation, capital_loss, hours_per_week, native_country

<font color='red'>
##TODO: How would you construct the `_CSV_COLUMN_STRUCTURE` seen as categorical features are represented by `[''] `and numerical by `[0]`? 
</font>

In [None]:
_CSV_COLUMNS = [
    'age', 'workclass', 'fnlwgt', 'education', 'education_num',
    'marital_status', 'occupation', 'relationship', 'race', 'gender',
    'capital_gain', 'capital_loss', 'hours_per_week', 'native_country',
    'income_bracket'
]

_CSV_COLUMN_STRUCTURE = ##YOUR CODE

### Define feature columns

The dataset used in this tutorial is used to predict the probability that the individual has an annual income of over 50,000 dollars using the Census Income Dataset. This is a common issue in Recommender systems. Although this problem could be solved using Logistic Regression; we can make use of a more advanced model in order to make more accurate predictions. The data can be found at `census_data/` and will be loaded later. Let's define the base categorical and continuous feature columns that we'll use. These base columns will be the building blocks used by both the wide part and the deep part of the model.
#### Wide Columns
The wide model is a linear model with a wide set of sparse and crossed feature columns. Wide models with crossed feature columns can memorize sparse interactions between features effectively. That being said, one limitation of crossed feature columns is that they do not generalize to feature combinations that have not appeared in the training data. 


- <font color='red'>(##TODO: How would you be able to define the categorial columns? ##HINT: You may find this <a href="https://www.tensorflow.org/api_docs/python/tf/feature_column/categorical_column_with_vocabulary_list">this</a> helpful.)</font>


In [None]:
# Continuous columns
age = tf.feature_column.numeric_column('age')
education_num = tf.feature_column.numeric_column('education_num')
capital_gain = tf.feature_column.numeric_column('capital_gain')
capital_loss = tf.feature_column.numeric_column('capital_loss')
hours_per_week = tf.feature_column.numeric_column('hours_per_week')

education = ##CODE HERE(
    'education', [
        'Bachelors', 'HS-grad', '11th', 'Masters', '9th', 'Some-college',
        'Assoc-acdm', 'Assoc-voc', '7th-8th', 'Doctorate', 'Prof-school',
        '5th-6th', '10th', '1st-4th', 'Preschool', '12th'])

marital_status = ##CODE HERE(
    'marital_status', [
        'Married-civ-spouse', 'Divorced', 'Married-spouse-absent',
        'Never-married', 'Separated', 'Married-AF-spouse', 'Widowed'])

relationship = ##CODE HERE(
    'relationship', [
        'Husband', 'Not-in-family', 'Wife', 'Own-child', 'Unmarried',
        'Other-relative'])

workclass = ##CODE HERE(
    'workclass', [
        'Self-emp-not-inc', 'Private', 'State-gov', 'Federal-gov',
        'Local-gov', '?', 'Self-emp-inc', 'Without-pay', 'Never-worked'])

- <font color='red'>(##TODO: Let's assume that for occupation your sparse features are in string format, and you want to distribute your inputs into a finite number of buckets by hashing. How can you do this? Let's assume the hash_bucket size is 1000 ##HINT: You may find this <a href="https://www.tensorflow.org/api_docs/python/tf/feature_column/categorical_column_with_hash_bucket">this</a> helpful.)</font> 

In [None]:
# Hashing:
 occupation = ##YOUR CODE HERE

 # Transformations.
 age_buckets = tf.feature_column.bucketized_column(
     age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])

 # Wide columns is a combination of base_columns and crossed
 base_columns = [
     education, marital_status, relationship, workclass, occupation,
     age_buckets,
 ]


- <font color='red'>(From what was taught in the class part, how can we define the wide columns?</font> 

In [None]:
crossed_columns = [
    tf.feature_column.crossed_column(
        ['education', 'occupation'], hash_bucket_size=1000),
    tf.feature_column.crossed_column(
        [age_buckets, 'education', 'occupation'], hash_bucket_size=1000),
]

## wide_columns = ##YOUR CODE HERE

#### Deep Columns
The deep model is a feed-forward neural network. Each of the sparse, high-dimensional categorical features are first converted into a low-dimensional and dense real-valued vector, often referred to as an embedding vector. These low-dimensional dense embedding vectors are concatenated with the continuous features, and then fed into the hidden layers of a neural network in the forward pass. The embedding values are initialized randomly, and are trained along with all other model parameters to minimize the training loss. 


In [None]:
  deep_columns = [
      age,
      education_num,
      capital_gain,
      capital_loss,
      hours_per_week,
      tf.feature_column.indicator_column(workclass),
      tf.feature_column.indicator_column(education),
      tf.feature_column.indicator_column(marital_status),
      tf.feature_column.indicator_column(relationship),
      tf.feature_column.embedding_column(occupation, dimension=8),
  ]

## 2. Input Function
In order to create our model we will be making use of Tensorflow's <a href="https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator" style="color: #6D00FF;">tf.estimator API</a>. For that we need to define an input function. Here we need to parse the input data and then use the Dataset API in order to represent an input pipeline as a collection of elements (nested structures of tensors) and a "logical plan" of transformations that act on those elements.


<font color='red'>##TODO: Complete the code in such a way that you can parse the record into tensors and repear the input indefinetely, and the batch. ##HINT: You may find this <a href="https://www.tensorflow.org/programmers_guide/datasets">this</a> helpful.</font> 


In [None]:
def input_fn(data_file, num_epochs, shuffle, batch_size):
  """Generate an input function for the Estimator."""
  assert tf.gfile.Exists(data_file), (
      '%s not found. Please make sure you have either run data_download.py or '
      'set both arguments --train_data and --test_data.' % data_file)

  def parse_csv(value):
    print('Parsing', data_file)
    columns = tf.decode_csv(value, record_defaults=_CSV_COLUMN_STRUCTURE)
    features = dict(zip(_CSV_COLUMNS, columns))
    labels = features.pop('income_bracket')
    return features, tf.equal(labels, '>50K')

  # Extract lines from input files using the Dataset API.
  dataset = tf.data.TextLineDataset(data_file)

  if shuffle:
    dataset = dataset.shuffle(buffer_size=_NUM_EXAMPLES['train'])
    
  ## YOUR CODE HERE

  features, labels = iterator.get_next()
  return features, labels


## 3. Modeling
Now, on to the "fun" part: modeling. For this part we will create 3 different models: wide, deep and wide-and-deep and compare the results from each one.

<font color='red'>##TODO: You can play with the `hidden_units` to see how that affects the loss of your model. This will only affect deep models, obviously. </font> 

In [None]:
hidden_units = [100, 75, 50, 25]
train_epochs = 40
epochs_per_eval = 2
batch_size = 40
_NUM_EXAMPLES = {
    'train': 32561,
    'validation': 16281,
}


run_config = tf.estimator.RunConfig().replace(session_config=tf.ConfigProto(device_count={'GPU': 0}))
train_data = 'census_data/adult.data'
test_data = 'census_data/adult.test'

### Running Wide Model

The wide model is a linear model with a wide set of sparse and crossed feature columns. Wide models with crossed feature columns can memorize sparse interactions between features effectively. That being said, one limitation of crossed feature columns is that they do not generalize to feature combinations that have not appeared in the training data. 

<font color='red'>##TODO: You can create the wide model in multiple ways, however, the easiest way is by using a canned trainer. If you have already done that, I would encourage you to think how you would build it from scratch. Call the instructor if you want to run by ideas.
</font> 

In [None]:
model_dir = 'wide_model'
#model =  ##YOUR CODE HERE 

Now evaluate. <font color='red'>##TODO: Add code to evaluate your trainer. ##HINT: You may find this <a href="https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator#evaluate">this</a> helpful.
</font> 

In [None]:
  # Train and evaluate the model every epochs_per_eval epochs.
  for n in range(train_epochs // epochs_per_eval):
    model.train(input_fn=lambda: input_fn(
        train_data, epochs_per_eval, True, batch_size))

    results = ##YOUR CODE HERE

    # Display evaluation metrics
    print('Results at epoch', (n + 1) * epochs_per_eval)
    print('-' * 60)

    for key in sorted(results):
      print('%s: %s' % (key, results[key]))


### Running Deep Model

The deep model is a feed-forward neural network, as shown in the previous figure. Each of the sparse, high-dimensional categorical features are first converted into a low-dimensional and dense real-valued vector, often referred to as an embedding vector. These low-dimensional dense embedding vectors are concatenated with the continuous features, and then fed into the hidden layers of a neural network in the forward pass. The embedding values are initialized randomly, and are trained along with all other model parameters to minimize the training loss. We will actually learn more about embeddings in the last session of today's tutorial.

Another way to represent categorical columns to feed into a neural network is via a one-hot or multi-hot representation. This is often appropriate for categorical columns with only a few possible values. As an example of a one-hot representation, for the relationship column, "Husband" can be represented as [1, 0, 0, 0, 0, 0], and "Not-in-family" as [0, 1, 0, 0, 0, 0], etc. This is a fixed representation, whereas embeddings are more flexible and calculated at training time.

We'll configure the embeddings for the categorical columns using embedding_column, and concatenate them with the continuous columns. We also use indicator_column to create multi-hot representations of some categorical columns.

<font color='red'>##TODO: You can create the deep model in multiple ways, however, the easiest way is by using a canned trainer. If you have already done that, I would encourage you to think how you would build it from scratch. Call the instructor if you want to run by ideas.
</font> 

In [None]:
model_dir = 'deep_model'
#model =  ##YOUR CODE HERE 

Now evaluate. <font color='red'>##TODO: Add code to evaluate your trainer. ##HINT: You may find this <a href="https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator#evaluate">this</a> helpful.
</font> 

In [None]:
  # Train and evaluate the model every epochs_per_eval epochs.
  for n in range(train_epochs // epochs_per_eval):
    model.train(input_fn=lambda: input_fn(
        train_data, epochs_per_eval, True, batch_size))

    results = ##YOUR CODE HERE

    # Display evaluation metrics
    print('Results at epoch', (n + 1) * epochs_per_eval)
    print('-' * 60)

    for key in sorted(results):
      print('%s: %s' % (key, results[key]))


### Running Wide and Deep Model
<font color='red'>##TODO: You can create the wide and deep model in multiple ways, however, the easiest way is by using a canned trainer `DNNLinearCombinedClassifier`. If you have already done that, I would encourage you to think how you would build it from scratch. Call the instructor if you want to run by ideas.
</font> 

In [None]:
model_dir = 'wide_and_deep_model'
# model = ##YOUR CODE HERE

Now evaluate. <font color='red'>##TODO: Add code to evaluate your trainer. ##HINT: You may find this <a href="https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator#evaluate">this</a> helpful.
</font> 

In [None]:
  # Train and evaluate the model every epochs_per_eval epochs.
  for n in range(train_epochs // epochs_per_eval):
    model.train(input_fn=lambda: input_fn(
        train_data, epochs_per_eval, True, batch_size))

    results = ##YOUR CODE HERE

    # Display evaluation metrics
    print('Results at epoch', (n + 1) * epochs_per_eval)
    print('-' * 60)

    for key in sorted(results):
      print('%s: %s' % (key, results[key]))


## 4. Exporting your model
<font color='red'>##TODO: Now export your model. ##HINT: You may find this <a href="https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator#export_savedmodel">this</a> helpful.
</font> 

In [None]:
  ## YOUR CODE
  # servable_model_path = ## YOUR CODE
  print("*********** Done Exporting at Path - %s", servable_model_path )

A Feature contains Lists which may hold zero or more values.  These lists are the base values BytesList, FloatList, Int64List.

In [None]:
def _float_feature(value):
	return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))


def _bytes_feature(value):
	return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))


def _int64_feature(value):
	return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

## 5. Making Predictions

Now we are ready to use the saved model to make predictions. Isn't that exciting? You have built not one, not two, but three models in this section. And will be using the Estimator API to export our saved model.


<font color='red'>##TODO:Use pandas to read the input csv.  ##HINT: You may find this <a href="https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator#export_savedmodel">this</a> helpful.
</font> 

In [None]:
predictionoutputfile = 'census_output.csv'
predictioninputfile = 'census_input.csv'

input_file = ## YOUR CODE HERE
input_file.head()

 <font color='red'>##TODO:Load from saved model and get the predictor.  ##HINT: You may find this <a href="https://www.tensorflow.org/api_docs/python/tf/contrib/predictor/from_saved_model">this</a> helpful.
</font> 

In [None]:
with tf.Session() as sess:
    
    ##YOUR CODE HERE:
    # LOAD SAVED MODEL
    # GET PREDICTOR
    #predictor = 

    prediction_OutFile = open(predictionoutputfile, 'w')

    # Write Header for CSV file
    prediction_OutFile.write("age, workclass, fnlwgt, education, education_num,marital_status, occupation, relationship, race, gender,capital_gain, capital_loss, hours_per_week, native_country,predicted_income_bracket,probability")
    prediction_OutFile.write('\n')

    # Read file and create feature_dict for each record
    with open(predictioninputfile) as inf:
        # Skip header
        next(inf)
        for line in inf:

            # Read data, using python, into our features
            age, workclass, fnlwgt, education, education_num, marital_status, occupation, relationship, race, gender, capital_gain, capital_loss, hours_per_week, native_country = line.strip().split(",")

            # Create a feature_dict for train.example - Get Feature Columns using
            feature_dict = {
                'age': _float_feature(value=int(age)),
                'workclass': _bytes_feature(value=workclass.encode()),
                'fnlwgt': _float_feature(value=int(fnlwgt)),
                'education': _bytes_feature(value=education.encode()),
                'education_num': _float_feature(value=int(education_num)),
                'marital_status': _bytes_feature(value=marital_status.encode()),
                'occupation': _bytes_feature(value=occupation.encode()),
                'relationship': _bytes_feature(value=relationship.encode()),
                'race': _bytes_feature(value=race.encode()),
                'gender': _bytes_feature(value=gender.encode()),
                'capital_gain': _float_feature(value=int(capital_gain)),
                'capital_loss': _float_feature(value=int(capital_loss)),
                'hours_per_week': _float_feature(value=float(hours_per_week)),
                'native_country': _bytes_feature(value=native_country.encode()),
            }

            # Prepare model input

            model_input = tf.train.Example(features=tf.train.Features(feature=feature_dict))

            model_input = model_input.SerializeToString()
            output_dict = predictor({"inputs": [model_input]})

            print(" prediction Label is ", output_dict['classes'])
            print('Probability : ' + str(output_dict['scores']))

            prediction_OutFile.write(str(age)+ "," + workclass+ "," + str(fnlwgt)+ "," + education+ "," + str(education_num) + "," + marital_status + "," + occupation + "," + relationship + "," + race+ "," +gender+ "," + str(capital_gain)+ "," + str(capital_loss)+ "," + str(hours_per_week)+ "," + native_country+ ",")
            label_index = np.argmax(output_dict['scores'])
            prediction_OutFile.write(str(label_index))
            prediction_OutFile.write(',')
            prediction_OutFile.write(str(output_dict['scores'][0][label_index]))
            prediction_OutFile.write('\n')


Now that you have finished this session. You may want to iterate through the model until obtaining a smaller eval loss. Some things you can change:

- Batch Size
- Number of Epochs

Can you try?

## Credits / Source:

[0] https://www.tensorflow.org/tutorials/wide_and_deep

[1] https://ai.google/research/pubs/pub45413