# Motivation

Encoder-Decoder structures are widely used in seq2seq model and other applications.

Beyond that, CNN has the most outstanding results in image classifications

We hypothesis that RNN-CNN network can deal with incomplete sketches better as this structure take consider the stroke sequence of sketches. Meanwhile, CNN is used for predicting the labels using hidden feature vectors provided by RNN encoder.

## 1. Data Loading

In [3]:
import sys
sys.executable

'/home/heylamourding/miniconda3/bin/python'

In [4]:
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
import glob
import os.path as path
import ndjson
import random
import pickle
import numpy as np
import torch
import torch.nn as nn
import time
import math

ModuleNotFoundError: No module named 'tensorflow'

Our team will use one preprocessed data raw files for modelling in order to compare results.

In [2]:
from read_data import *
dataset_path='/home/heylamourding/quickdraw/dsy'

In [3]:
train_X,train_Y,test_X,test_Y=get_dataset(dataset_path,test_r=1.0)

In [7]:
import pandas as pd
df_trainX = pd.DataFrame({'drawing':train_X})
df_testX = pd.DataFrame({'drawing':test_X})
# Calculate Stroke Number of Each Image
df_trainX['stroke_number'] = df_trainX['drawing'].str.len()
df_testX['stroke_number'] = df_testX['drawing'].str.len()

## 2. Data Preprocessing

We will use datasets format which proposed in sketch RNN. Each example in the dataset is stored as list of coordinate offsets: ∆x, ∆y, and a binary value representing whether the pen is lifted away from the paper. 

In this notebook, we will try to use two type of structures. 



### 2.1 1st Structure

### 2.1.1 Data Analysis

Dimension = (samples) * (3* max points of each strokes of all drawings) * (max strokes of all drawings)

In [8]:
import itertools
def create_stroke(df):
    final = []
    # Image
    for i in range(df.shape[0]):
        num = df.loc[i,'stroke_number']
        # Strokes 
        stroke_ls = []
        for stroke in range(num):
            X = df.loc[i,'drawing'][stroke][0] #points of stroke 
            Y = df.loc[i,'drawing'][stroke][1] #points of stroke
            X_offset = np.diff(np.array(X)) # points of stroke 
            Y_offset = np.diff(np.array(Y)) # points of stroke
            binary = [0]*(X_offset.shape[0]-1)
            binary.append(1) # points of stroke 
            binary = np.array(binary)
            stroke_ar = np.vstack((X_offset,Y_offset,binary)).reshape(-1) 
            stroke_ls.append(stroke_ar)
        final.append(stroke_ls)
    return final

In [9]:
trainX_final = create_stroke(df_trainX)

In [10]:
print('Max stroke is: ', df_trainX[['stroke_number']].max())
print('Min stroke is: ', df_trainX[['stroke_number']].min())

Max stroke is:  stroke_number    30
dtype: int64
Min stroke is:  stroke_number    1
dtype: int64


In [11]:
max_stroke = 0 
min_stroke = 500
for i in range(len(trainX_final)):
    for j in range(len(trainX_final[i])):
        temp = trainX_final[i][j].shape[0]
        if temp < min_stroke:
            min_stroke = temp
        elif temp > max_stroke:
            max_stroke = temp
print('Max dimensions *3 of overall stroke', max_stroke) 
print('Min dimensions *3 of overall stroke', min_stroke)  

Max dimensions *3 of overall stroke 597
Min dimensions *3 of overall stroke 3


As what mentioned above, I will preprocess trainX_final into following dimensions **N\*597\*30**

Before this, as our computational resources is limited, only part of training data will be used. In order to eliminate imbalanced data effect, I will use following functions to select balanced data.

### 2.1.2 Balanced Data filtering

In [12]:
from random import shuffle
def balance_filter(ls, label_list, name, SL, EL):   
    # Get indices list that have balance labels 
    label_ar = np.array(label_list)
    label1_indices = np.array(np.where(label_ar == 0)).reshape(-1)[SL:EL].tolist()
    label2_indices = np.array(np.where(label_ar == 1)).reshape(-1)[SL:EL].tolist()
    label3_indices = np.array(np.where(label_ar == 2)).reshape(-1)[SL:EL].tolist()
    label4_indices = np.array(np.where(label_ar == 3)).reshape(-1)[SL:EL].tolist()
    label5_indices = np.array(np.where(label_ar == 4)).reshape(-1)[SL:EL].tolist()
    train_indices = []
    temp = [label1_indices, label2_indices, label3_indices, label4_indices, label5_indices]
    for i in range(5):
        target = temp[i]
        for item in range(len(target)):
            train_indices.append(target[item])
#     print(train_indices)
    # Shuffle indices
    shuffle(train_indices)
    # Select X & Y
    final_X_ls = []
    final_label_ls = []
    for i in range(len(train_indices)):
#         print('i')
#         print(i)
        indice = train_indices[i]
        #print('indice')
        #print(indice)
        final_X_ls.append(ls[indice])
        final_label_ls.append(label_list[indice])

    final_label_ar = np.array(final_label_ls)
    print(name +'label 0 has', sum(final_label_ar==0))
    print(name +'label 1 has', sum(final_label_ar==1))
    print(name +'label 2 has', sum(final_label_ar==2))
    print(name +'label 3 has', sum(final_label_ar==3))
    print(name + 'label 4 has', sum(final_label_ar==4))

    return final_X_ls, final_label_ar

In [13]:
train_X_balance, train_Y_balance = balance_filter(trainX_final, train_Y, 'train', 0, 3000)

trainlabel 0 has 3000
trainlabel 1 has 3000
trainlabel 2 has 3000
trainlabel 3 has 3000
trainlabel 4 has 3000


In [14]:
max_stroke = 0 
min_stroke = 500
for i in range(len(train_X_balance)):
    for j in range(len(train_X_balance[i])):
        temp = train_X_balance[i][j].shape[0]
        if temp < min_stroke:
            min_stroke = temp
        elif temp > max_stroke:
            max_stroke = temp
print('Max dimensions *3 of overall stroke', max_stroke) 
print('Min dimensions *3 of overall stroke', min_stroke)  

Max dimensions *3 of overall stroke 597
Min dimensions *3 of overall stroke 3


### 2.1.3 Padding Data

In order simplifying the RNN training process, we will padding our training data into same time steps length and same dimensions. 

It means that two operations will be done. One is padding points of each stroke of all samples to 597 and padding strokes of each images to 30

In [15]:
def pad_points(data_stroke, max_pts = 597):
    final = []
    for i in range(len(data_stroke)):
        # ith image 
        for j in range(len(data_stroke[i])):
            # jth strokes 
            orig = len(data_stroke[i][j])
            # pts*3 of jth stroke 
            #print(orig)
            if orig < max_pts:
                pad = np.array([0]*(max_pts-orig))
                data_stroke[i][j] = np.hstack((data_stroke[i][j],pad))
            else:
                data_stroke[i][j] = data_stroke[i][j][:max_pts]
            #print(pad.shape)
            #print(data_stroke[i][j].shape)

        final.append(np.array(data_stroke[i]))
            #print(data_stroke[i][j].shape)
    return final

In [16]:
trainX_input = pad_points(train_X_balance)

In [17]:
def pad_stroke(data_stroke, max_stroke, max_pts):
    for i in range(len(data_stroke)):
        # ith image 
        # get No.stroke for this image
        orig = data_stroke[i].shape[0]
        #print(orig)
        pad = np.zeros(((max_stroke-orig),max_pts))
        #print(pad.shape)
        #print(data_stroke[i].shape)
        data_stroke[i] = np.vstack((data_stroke[i],pad))
        #print(data_stroke[i].shape)
    return data_stroke

In [18]:
trainX_ar = np.array(pad_stroke(trainX_input, max_stroke=30, max_pts=597))

In [19]:
print(trainX_ar.shape)

(15000, 30, 597)


In [22]:
import gc
gc.collect()
del trainX_input, train_X_balance, trainX_final

### 2.1.4 Data Normalization

In [23]:
from sklearn import preprocessing
X_train = preprocessing.scale(trainX_ar.reshape(trainX_ar.shape[0],-1))

In [24]:
# Reshape back 
X_train = X_train.reshape((15000,30,597))

### 2.1.5 Categorical 

In [25]:
from keras.utils.np_utils import to_categorical

categorical_labels = to_categorical(train_Y_balance, num_classes=5)

Using TensorFlow backend.


In [26]:
np.save(dataset_path+'/X_train.npy',X_train)
np.save(dataset_path+'/Y_train.npy',categorical_labels)

In [1]:
X_train = np.load(dataset_path+'/X_train.npy')
categorical_labels = np.load(dataset_path+'/Y_train.npy')

NameError: name 'np' is not defined

In [28]:
X_train.shape

(15000, 30, 597)

## 3. Simple Model Modelling 

Starting with Small Dataset for hyperparameter tuning. 

We will use adam as our optimizer hence we won't change learning rate but only use default parameters.

Following hyperparameters will be tuned:

1. Batch Size [32, 64, 256]
2. LSTM Units [100,300,500]
3. Dropout Rate [0.2, 0.4, 0.5]
4. Encoder Dense [256,400, 625, 1024]
5. Decoder Dense [16, 32, 64, 128]


In [55]:
# Set GPU
from keras import backend as K
K.theano_backend._get_available_gpus()

AttributeError: module 'keras.backend' has no attribute 'theano_backend'

In [49]:
# KERAS model (explained above)
import keras
config = tf.ConfigProto( device_count = {'GPU': 0 } ) 
sess = tf.Session(config=config) 
keras.backend.set_session(sess)

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, GRU, LSTM
from keras.layers import Conv2D, MaxPooling2D, Input, Reshape
from keras.layers.convolutional import ZeroPadding2D
from keras.models import load_model

In [50]:
args = {}
args['LSTMUnits'] = 300
args['batch'] = 256
args['epochs'] = 1
args['dropout'] = 0.2
args['len_category'] = 5
args['enc_dense'] = 256
args['num_filters'] = 1
args['kernelS'] = 3
args['stride'] = 2
args['poolS'] = 2 
args['dec_dense'] = 32

In [51]:
model = Sequential()
model.add(LSTM(args['LSTMUnits'],return_sequences=False,input_shape=(30,597)))
model.add(Dense(args['enc_dense'], activation='relu'))
model.add(Reshape((16,16,1)))
model.add(Conv2D(args['num_filters'],args['kernelS'],strides=(args['stride'],args['stride']), activation ='relu'))
#model.add(MaxPooling2D(pool_size=(args['poolS'],args['poolS'])))
model.add(Flatten())
model.add(Dense(args['dec_dense'], activation='relu'))
model.add(Dropout(args['dropout']))
model.add(Dense(args['len_category'], activation='softmax'))

model.compile(loss='mean_squared_error',
          optimizer='adam',
          metrics=['accuracy'])

In [52]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_5 (LSTM)                (None, 300)               1077600   
_________________________________________________________________
dense_13 (Dense)             (None, 256)               77056     
_________________________________________________________________
reshape_5 (Reshape)          (None, 16, 16, 1)         0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 7, 7, 1)           10        
_________________________________________________________________
flatten_5 (Flatten)          (None, 49)                0         
_________________________________________________________________
dense_14 (Dense)             (None, 32)                1600      
_________________________________________________________________
dropout_5 (Dropout)          (None, 32)                0         
__________

In [53]:
model.fit(X_train, categorical_labels,
          batch_size = args['batch'], nb_epoch= args['epochs'], 
          verbose=1,validation_split=0.2)

  This is separate from the ipykernel package so we can avoid doing imports until


Train on 12000 samples, validate on 3000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10

KeyboardInterrupt: 