# Introduction

This is ShaCo, a simple Shape Counting dataset. The motivation is providing a simple benchmark, easily customizable, where it's easy to generate as many images as wanted for training.

This first version includes 50000 images divided on a training set of 42500 and a test set of 7500.
The images contain anywhere from 1 to 5 blue circles, 1 to 4 green squares, and 0 to 3 red squares.
The number of shapes for a given image is selected at random. The size of the squares and circles also has some small random variation.
Blue circle and green square quantities are randomly generated with a normal distribution. Red squares are generated depending on the number of green squares, to simulate how in many counting problems there might be some type of object that is less frequent.
Overlapping of the shapes is allowed, and transparency is used to avoid problems of shapes fully covering one another.

This is a work in progress.

# Imports

In [None]:
import numpy as np
import pandas as pd
import pickle
import os

print(os.listdir("../input/shape count"))

# Reading the input files

The images are presented both in the Keras friendly form as .pkl files and as image files.
Here we load them from the pickle objects.
The labels are loaded from the .csv files and transformed into numpy arrays for Keras.

In [None]:
with open('../input/shape count/train_images_transparent.pkl', 'rb') as inputfile:
    x_train = pickle.load(inputfile)
    
with open('../input/shape count/test_images_transparent.pkl', 'rb') as inputfile:
    x_test = pickle.load(inputfile)
    
y_train = pd.read_csv('../input/shape count/train_labels.csv', header = None)

y_test = pd.read_csv('../input/shape count/test_labels.csv',header = None)

y_train = np.array(y_train)

y_test = np.array(y_test)

The images are 100x100 pixels in size, and contain a random number of semi-transparent shapes.
(Blue circles, red/green squares).
Lets see an example:

In [None]:
from matplotlib import pyplot as plt
import random

rand = random.randint(0,len(x_train))

print('This image has:')
print(str(y_train[rand][0]) + ' blue circles' )
print(str(y_train[rand][1]) + ' green squares' )
print(str(y_train[rand][2]) + ' red squares' )
plt.imshow(x_train[rand]);

The shapes are semi-transparent so that even if a full overlap occurs, it should still be possible to tell how many shapes there are.

Let's make sure that the images are in the proper format. (They should already be if loaded from the .pkl objects).

In [None]:
from keras import backend as K

img_rows, img_cols = 100,100

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 3, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 3, img_rows, img_cols)
    input_shape = (3, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols,3)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols,3)
    input_shape = (img_rows, img_cols, 3)
    
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

Now lets do some simple pre-processing. Just normalizing the images, dividing by 255.

In [None]:
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

We are ready to build our model for Keras. For the purpose of this example, let's get a baseline by using a pair of densely connected layers (a simple Multi Layer Perceptron), and let's also try a small Convolutional Network.

We will need two models.

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Flatten
from keras.layers import Conv2D, MaxPooling2D

model1 = Sequential()
model1.add(Flatten(input_shape=x_train.shape[1:4]))
model1.add(Dense(512, activation='relu'))
model1.add(Dense(y_train.shape[1], activation='relu'))

model2 = Sequential()
model2.add(Conv2D(filters = 32, kernel_size=(3,3), padding = 'same', input_shape=x_train.shape[1:4]))
model2.add(Conv2D(filters = 64, kernel_size=(3,3), padding = 'same'))
model2.add(MaxPooling2D())
model2.add(Flatten())
model2.add(Dense(512, activation='relu'))
model2.add(Dense(y_train.shape[1], activation='relu'))

Finally lets compile our models. We can try MSE as loss since this is a regression problem where we try to count the number of objects on the image. The output of the network is a list of 3 values corresponding to the ammounts of each type of shape.
For the sake of this simple example we won't use cross-validation or any type of regularization or optimization. These can be tried later on to improve results. Let's just see how these simple models do for this task.

In [None]:
batch_size = 128
epochs = 5

model1.compile(loss='mean_squared_error',
              optimizer='adam')

model1.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1)

score1 = model1.evaluate(x_test, y_test, verbose=1)

model2.compile(loss='mean_squared_error',
              optimizer='adam')

model2.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1)

score2 = model2.evaluate(x_test, y_test, verbose=1)

print('MLP loss:', score1)
print('CNN loss:', score2)

But its not obvious how many miscounts happened, just from looking at the MSE.
We retrieve the number of miscounts on the test set below:

In [None]:
miscounts1 = []
miscounts2 = []

for i in range(0,len(x_test)-1):

    sample = x_test[i]
    sample = np.expand_dims(sample,axis=0)
    sample1 = np.round(model1.predict(sample))
    sample2 = np.round(model2.predict(sample))
    sample1 = np.abs(y_test[i] - sample1)
    sample2 = np.abs(y_test[i] - sample2)
    miscounts1.append(sample1)
    miscounts2.append(sample2)
    


In [None]:
totals = pd.DataFrame(pd.DataFrame(y_test).sum())
totals1 = pd.DataFrame(pd.DataFrame(np.array(miscounts1).squeeze()).sum())
totals2 = pd.DataFrame(pd.DataFrame(np.array(miscounts2).squeeze()).sum())
totals = pd.concat([totals,totals1,totals2],axis=1)


totals.columns = ['Test set totals', 'MLP miscounts', 'CNN miscounts']
totals.rename(index={0:'Blue Circles',1:'Green Squares', 2:'Red Squares'}, inplace=True)
totals.loc['Total'] = totals.sum()

print(totals)

We see that these models can learn how to count the shapes relatively well with little training.
However given that the dataset is relatively simple, can the result be optimized until there are no miscounts?
Using a small convolutional block changed the number of miscounts by an order of magnitude.

Miscounts of green squares happen more often than either Blue Circles (most frequent shape) or Red Squares (least frequent shape). Not immediately clear why that is happening.

Preliminary tests with VGG16 indicated that deeper models aren't necessarily better for this dataset out of the box, so some more fine tuning might be necessary, or maybe using different architectures altogether.

If and when this dataset is fully solved and there are no more miscounts, it is pretty easy to increment the challenge with more shapes, more variability, other sources of difficulty and other aspects.

Suggestions regarding how to improve the dataset, the models or others, are very welcome!