<a href="https://colab.research.google.com/github/jpradeesh3800/ml/blob/master/Feature_Columns.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Different Feature Columns in Tensorflow:


1.   Numeric Column
2.   Bucketized Column
3.   Indicator Column
4.   Embedding Column
5.   Categorical Column with vocabulary list
6.   Categorical Column with vocabulary file
7.   Categorical Column with identity
8.   Categorical Column with hash bucket
9.   Crossed Column


https://medium.com/ml-book/demonstration-of-tensorflow-feature-columns-tf-feature-column-3bfcca4ca5c4

https://www.tensorflow.org/tutorials/structured_data/feature_columns


**Tensorflow 2.0 is recommended**



In [1]:
!pip install tensorflow==2.0.0-beta1
import tensorflow as tf
import numpy as np
import pandas as pd



  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [0]:
data = {'marks': [55,21,63,88,74,54,95,41,84,52],
        'grade': ['average','poor','average','good','good','average','good','average','good','average'],
        'point': ['c','f','c+','b+','b','c','a','d+','b+','c']}

In [15]:
df = pd.DataFrame(data)
df

Unnamed: 0,marks,grade,point
0,55,average,c
1,21,poor,f
2,63,average,c+
3,88,good,b+
4,74,good,b
5,54,average,c
6,95,good,a
7,41,average,d+
8,84,good,b+
9,52,average,c


In [0]:
def demo(feature_column):
  feature_layer = tf.keras.layers.DenseFeatures(feature_column)
  print(feature_layer(data).numpy())

**Numeric Column**

In [10]:
marks = tf.feature_column.numeric_column('marks')
demo(marks)

[[55.]
 [21.]
 [63.]
 [88.]
 [74.]
 [54.]
 [95.]
 [41.]
 [84.]
 [52.]]


**Bucketized Column**

We can bucketize only numeric column

In the below Example, 9 buckets are formed

Bucketized column is categorical feature of the buckets.

In [11]:
marks = tf.feature_column.numeric_column('marks')
marks_bucket = tf.feature_column.bucketized_column(marks,boundaries=[30,40,50,60,70,80,90])
demo(marks_bucket)

[[0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0.]]


**Indicator Column**

We can only create a indicator column on categorical column

One Hot representation of Categorical features

The vocabulary can be passed as a list using categorical_column_with_vocabulary_list, or loaded from a file using categorical_column_with_vocabulary_file.

In [14]:
grades_cat = tf.feature_column.categorical_column_with_vocabulary_list('grade', ['good','average','poor'])
# demo(grades_cat)  # this produces an error
grades_ind = tf.feature_column.indicator_column(grades_cat)
demo(grades_ind)

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
[[0. 1. 0.]
 [0. 0. 1.]
 [0. 1. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [0. 1. 0.]
 [1. 0. 0.]
 [0. 1. 0.]
 [1. 0. 0.]
 [0. 1. 0.]]


**Embedding Column**

We can only embed a categorical column

We can tune dimensions

This generally used when no of categories in categorical column is large

In [16]:
point_cat = tf.feature_column.categorical_column_with_vocabulary_list('point',df.point.unique())
point_emb = tf.feature_column.embedding_column(point_cat,dimension=4)
demo(point_emb)
point_ind = tf.feature_column.indicator_column(point_cat)
demo(point_ind)

[[ 0.46624577 -0.33630025 -0.29890206 -0.15033424]
 [-0.35583702  0.28632733  0.34825355 -0.04789528]
 [-0.61125535  0.34375608  0.59248054  0.09033636]
 [ 0.13905767  0.47210488  0.03239832  0.428059  ]
 [ 0.7185869   0.16123182  0.27988112 -0.26894826]
 [ 0.46624577 -0.33630025 -0.29890206 -0.15033424]
 [-0.1681554  -0.95208335 -0.26689884  0.07343774]
 [-0.5371247   0.68602467  0.39954835 -0.12581506]
 [ 0.13905767  0.47210488  0.03239832  0.428059  ]
 [ 0.46624577 -0.33630025 -0.29890206 -0.15033424]]
[[1. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 1. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0.]]


*Every Categorical column(5-9 in initial list) has to be wrapped with either indicator or embedding column*



**Hash Bucket**

Alternative to Embedding column

We can tune hash_bucket_size here

In [18]:
point_hash = tf.feature_column.categorical_column_with_hash_bucket('point',hash_bucket_size=5)
point_ind = tf.feature_column.indicator_column(point_hash)
demo(point_ind)

Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
[[0. 0. 1. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]
 [0. 0. 1. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 1.]
 [0. 0. 1. 0. 0.]]


**Crossed Column**

Combining features into a single feature, better known as feature crosses, enables a model to learn separate weights for each combination of features.

We can tune hash_bucket_size here

In [20]:
marks = tf.feature_column.numeric_column('marks')
marks_bucket = tf.feature_column.bucketized_column(marks,boundaries=[30,40,50,60,70,80,90])

grades_cat = tf.feature_column.categorical_column_with_vocabulary_list('grade', ['good','average','poor'])

crossed_feature = tf.feature_column.crossed_column([marks_bucket,grades_cat],hash_bucket_size=20)
crossed_ind = tf.feature_column.indicator_column(crossed_feature)
demo(crossed_ind)

Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
[[0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
