##### Using autoencoder, we're gonna make embedding vectors for each building_id.
I expect the autoencoder can retract some informative feature via unsupervised learning.<br>
For example, the similar pattern between buildings could be repersented as numeric values.<br>
If we know this values, the final model will be more robust since he won't need to learn that values from the final learning step.

#### Load modules

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.backend as K

from sklearn.preprocessing import StandardScaler

#### Load dataset

In [2]:
df = pd.read_pickle('train.pickle')
df.dropna(axis='columns', inplace=True)

building_id = df.building_id.values.reshape(-1, 1)
others = StandardScaler().fit_transform(df.drop(columns='building_id'))

In [3]:
building_id.shape

(19869988, 1)

In [4]:
others.shape

(19869988, 18)

#### Build autoencoder

In [5]:
def construct():
    K.clear_session()
    
    building_id = keras.layers.Input(shape=1)
    building_embedding = keras.layers.Embedding(1449, 32)(building_id)
    building_embedding = keras.layers.Flatten()(building_embedding)
    
    others = keras.layers.Input(shape=18)
    prev = keras.layers.Concatenate()([building_embedding, others])
        
    prev = keras.layers.Dense(16)(prev)
    prev = keras.layers.BatchNormalization(momentum=0.8)(prev)
    prev = keras.layers.Activation('elu')(prev)
    
    prev = keras.layers.Dense(8)(prev)
    prev = keras.layers.BatchNormalization(momentum=0.8)(prev)
    prev = keras.layers.Activation('elu')(prev)
    
    prev = keras.layers.Dense(16)(prev)
    prev = keras.layers.BatchNormalization(momentum=0.8)(prev)
    prev = keras.layers.Activation('elu')(prev)
    
    prev = keras.layers.Dense(18)(prev)
    
    autoencoder = keras.Model(inputs=[building_id, others], outputs=prev)
    embedding = keras.Model(inputs=building_id, outputs=building_embedding)
    
    return autoencoder, embedding
    
autoencoder, embedding = construct()

In [6]:
early_stopping = keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)

autoencoder.compile(loss='mse')
autoencoder.fit(
    x=[building_id, others],
    y=others,
    batch_size=65536,
    epochs=100,
    callbacks=[early_stopping],
    validation_split=0.2
)

Train on 15895990 samples, validate on 3973998 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch

<tensorflow.python.keras.callbacks.History at 0x1ed898805c8>

#### Save trained embedding model

In [7]:
embedding.save('building')

Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Assets written to: building\assets
