The target values must be in a column of the dataframe, and must be a numerical datatype, so we must convert them before training.

IMPORTANT: Notice that we are importing `ImageDataGenerator` from `keras_preprocessing` instead of `keras.preprocessing`, because the people at Keras are a bunch of old reactionary dinosaurs.

See the discussion [here](https://medium.com/@vijayabhaskar96/tutorial-on-keras-imagedatagenerator-with-flow-from-dataframe-8bd5776e45c1), and [here](https://medium.com/@vijayabhaskar96/tutorial-on-keras-imagedatagenerator-with-flow-from-dataframe-8bd5776e45c1).

In [1]:
from keras_preprocessing.image import ImageDataGenerator
from keras.models import load_model

import pandas as pd
import numpy as np

Using TensorFlow backend.


# Load data and preparation

The csv files `db.csv` contains all the metadata we have extracted from wikiart: every entry corresponds to an artwork, and the `_id` column contains the filename of the image in the `data/images/` directory associated to that artwork.

In [35]:
nrows = 5
nrows = None # to load all 
df = pd.read_csv("../data/db.csv",nrows=nrows,na_values="?")

Let's look at this dataframe:

In [36]:
df.head()

Unnamed: 0,_id,artistname,genre,image,image_size_data,style,title,year
0,57727b22edc2cb3880e0d820,giovanni battista piranesi,design,https://uploads1.wikiart.org/images/giovanni-b...,"[{'sizekb': 13, 'width': 210, 'height': 266, '...",neoclassicism,"plan, elevation and details of doric temples i...",
1,57727b22edc2cb3880e0d830,giovanni battista piranesi,design,https://uploads0.wikiart.org/images/giovanni-b...,"[{'sizekb': 8, 'width': 210, 'height': 149, 'u...",neoclassicism,plans of elevations and sections of thermopolium,
2,57727b22edc2cb3880e0d840,giovanni battista piranesi,design,https://uploads3.wikiart.org/images/giovanni-b...,"[{'sizekb': 7, 'width': 210, 'height': 148, 'u...",neoclassicism,plants related to the houses opposite to that ...,
3,57727b22edc2cb3880e0d850,giovanni battista piranesi,sketch and study,https://uploads2.wikiart.org/images/giovanni-b...,"[{'sizekb': 6, 'width': 210, 'height': 161, 'u...",neoclassicism,pluto,
4,57727b23edc2cb3880e0d860,giovanni battista piranesi,sketch and study,https://uploads8.wikiart.org/images/giovanni-b...,"[{'sizekb': 10, 'width': 210, 'height': 296, '...",neoclassicism,pluto and proserpina,


Let's initialize a dataimage generator: it is a nice interface towards many (pre)processing method in Keras, including some utilities for data augmentation.

We will use the amazing `flow_from_dataframe` function to serve the data we need.

If the files do not have an extension, run this in a shell:

    $ for f in *; do mv "$f" "$f.jpg"; done

In [37]:
df.describe()

Unnamed: 0,_id,artistname,genre,image,image_size_data,style,title,year
count,152014,152014,149142,152014,152014,148475,152011,118672
unique,152014,2854,363,152010,152010,921,115378,2693
top,57728505edc2cb3880ffba8a,vincent van gogh,portrait,https://uploads3.wikiart.org/images/nathan-alt...,"[{'sizekb': 10, 'width': 210, 'height': 189, '...",impressionism,untitled,1910
freq,1,1927,21514,3,3,14639,4440,1303


## Classes

Decide here what feature we want to predict, and save in the `classes` set all the possible values: they are the values that appear at least once in the database.

In [38]:
feature = "genre"
classes = list(set(df[feature]))
nclass = len(classes)
print(nclass)

364


A few examples of the kind of classes we will predict:

In [39]:
print(list(classes)[:10])

[nan, 'poster', 'cityscape,symbolic painting', 'stabile,sculpture', "trompe-l'œil,nude painting (nu)", 'cloudscape,landscape', 'self-portrait,symbolic painting', 'architecture,religious painting', 'figurative,sculpture', 'shan shui']


Notice how some of these labels are composite: there are subclasses. During the training for simplicity we dropped the lower level specifications and just kept the top level tags. For example "cubism,precisionism" will be simply classified as "cubism".

The number of classes will be needed later, the NN must know what's the output dimension.

## Image Size

The images will be scaled down to this size

In [40]:
img_size = (256,256)

## Create data generators

In [41]:
datagen = ImageDataGenerator(rescale=1/255,validation_split=0.2)

In [42]:
test_generator = datagen.flow_from_dataframe(
                                df,
                                directory="../data/images/",
                                x_col="_id",
                                has_ext=False,
                                target_size=img_size,
                                y_col=feature,
                                batch_size=32,
                                classes = classes)

Found 152014 images belonging to 364 classes.


# Load Model 

In [14]:
model = load_model('../data/model48.h5')



# Evaluate

This returns the loss and accuracy

In [None]:
model.predict_generator(test_generator,steps=1)