# Classification using Keras

The target values must be in a column of the dataframe, and must be a numerical datatype, so we must convert them before training.

IMPORTANT: Notice that we are importing `ImageDataGenerator` from `keras_preprocessing` instead of `keras.preprocessing`, because the people at Keras are a bunch of old reactionary dinosaurs.

See the discussion [here](https://medium.com/@vijayabhaskar96/tutorial-on-keras-imagedatagenerator-with-flow-from-dataframe-8bd5776e45c1), and [here](https://medium.com/@vijayabhaskar96/tutorial-on-keras-imagedatagenerator-with-flow-from-dataframe-8bd5776e45c1).

In [27]:
from keras.models import Sequential
from keras_preprocessing.image import ImageDataGenerator
from keras.layers import Dense, Activation, Flatten, Dropout, BatchNormalization
from keras.layers import Conv2D, MaxPooling2D
from keras import regularizers, optimizers
from sklearn.model_selection import train_test_split

import pandas as pd
import numpy as np

## Load data and preparation

The csv files `db.csv` contains all the metadata we have extracted from wikiart: every entry corresponds to an artwork, and the `_id` column contains the filename of the image in the `data/images/` directory associated to that artwork.

In [50]:
nrows = 5000
df = pd.read_csv("../data/db.csv",nrows=nrows,na_values="?")

Train/test split the database:

In [57]:
df_train, df_test = train_test_split(df, 
                                     test_size=0.2,
                                     shuffle=True)

In [58]:
df_train.head()

Unnamed: 0,_id,artistname,genre,image,image_size_data,style,title,year
3713,577275aeedc2cb3880cedba0,maurice prendergast,genre painting,https://uploads8.wikiart.org/images/maurice-pr...,"[{'sizekb': 15, 'width': 210, 'height': 243, '...",impressionism,nantasket,1900-1905
4515,5772890dedc2cb38800c8133,howard pyle,illustration,https://uploads6.wikiart.org/images/howard-pyl...,"[{'sizekb': 7, 'width': 210, 'height': 151, 'u...",romanticism,sorrow,
443,57727b3eedc2cb3880e0fccf,giovanni battista piranesi,mythological painting,https://uploads4.wikiart.org/images/giovanni-b...,"[{'sizekb': 11, 'width': 210, 'height': 262, '...",neoclassicism,venus in kythera,
1664,57727195edc2cb3880c22fbf,camille pissarro,cityscape,https://uploads5.wikiart.org/images/camille-pi...,"[{'sizekb': 7, 'width': 210, 'height': 174, 'u...",impressionism,garden of the louvre fog effect,1899
259,57727b35edc2cb3880e0f133,giovanni battista piranesi,design,https://uploads2.wikiart.org/images/giovanni-b...,"[{'sizekb': 6, 'width': 210, 'height': 139, 'u...",neoclassicism,"the roman antiquities, t. 4, plate xlii. vista...",


Let's initialize a dataimage generator: it is a nice interface towards many (pre)processing method in Keras, including some utilities for data augmentation.

## Create data generators

In [59]:
datagen = ImageDataGenerator(rescale=1/255,validation_split=0.2)

We will use the amazing `flow_from_dataframe` function to serve the data we need.

If the files do not have an extension, run this in a shell:

    $ for f in *; do mv "$f" "$f.jpg"; done

In [72]:
train_generator = datagen.flow_from_dataframe(
                                df_train,
                                directory="../data/images/",
                                x_col="_id",
                                has_ext=False,
                                target_size=(32,32),
                                y_col="genre",
                                batch_size=32,
                                class_mode="categorical")

Found 4000 images belonging to 49 classes.


The number of classes will be used in the very last step of the NN:

In [80]:
train_generator.class_indices.keys()

dict_keys(['genre painting', 'illustration', 'mythological painting', 'cityscape', 'design', 'landscape', 'shan shui', 'flower painting', 'religious painting', 'figurative', 'sketch and study', 'capriccio', 'portrait', 'animal painting', 'abstract', 'nude painting (nu)', 'interior', 'graffiti', 'history painting', 'sculpture', 'wildlife painting', 'still life', 'marina', nan, 'caricature', 'veduta', 'symbolic painting', 'advertisement', 'photo', 'allegorical painting', 'self-portrait', 'literary painting', 'installation', 'religious painting,landscape', 'pastorale', 'design,cityscape', 'history painting,landscape', 'landscape,religious painting', 'battle painting', 'bird-and-flower painting', 'portrait,tronie', 'mythological painting,nude painting (nu)', 'cloudscape', 'mural', 'landscape,literary painting', 'portrait,allegorical painting', 'history painting,battle painting', 'poster', 'marina,battle painting'])

In [61]:
nclass = train_generator.num_classes

In [70]:
test_generator = datagen.flow_from_dataframe(
                                df_test,
                                directory="../data/images/",
                                x_col="_id",
                                has_ext=False,
                                target_size=(32,32),
                                y_col="genre",
                                batch_size=20,
                                shuffle=False,
                                class_mode="categorical")

Found 1000 images belonging to 39 classes.


# Model Architecture

We'll start with the most basic architecture imaginable. 

It would be nice to use transfer learning.

In [44]:
model = Sequential()

model.add(Conv2D(32, (3, 3), padding='same',
                 input_shape=(32,32,3)))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nclass, activation='softmax'))

Pick an optimizer and compile the model

In [45]:
model.compile(
    optimizers.rmsprop(lr=0.0001, 
                       decay=1e-6),
    loss="categorical_crossentropy",
    metrics=["accuracy"])

# Training

In [46]:
model.fit_generator(generator=train_generator,
                    epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f74a7e99898>

# Evaluation

In [49]:
model.evaluate_generator(generator=test_generator)

ValueError: Error when checking target: expected dense_12 to have shape (22,) but got array with shape (19,)