# Analysing results

Now that we have scored our images, we can start analysing the results.

At this point we have a few goals.

1. Select a number of pictures to use for advertising from the scored set.
2. Analyze the usefulness of aesthetic score itself.

In [None]:
# Required for showing plots within the notebook.
%matplotlib inline
import pandas as pd

Let's start by looking at the perfomance `cpa_std` against aesthetic

In [None]:
training_data_file = './../data/active_products_clean.csv'
train = pd.read_csv(training_data_file)

In [None]:
predict_data_filename = './../data/10kProducts_scores.csv'
scores = pd.read_csv(predict_data_filename)
scores.hist(bins=20)
scores.head()

In [None]:
# Prices looks odd. We can get a better price-distribution with restricting the axes
prices = scores['price'].fillna(0)
prices.hist(bins=range(10, 1000, 50))
prices.max()

In [None]:
# Looking at the highest Aesthetics separately:
high_aes = scores.nlargest(500, 'aesthetic')
high_aes.hist(bins=20)
high_aes.head()
high_aes.plot(x='average', y='aesthetic', kind='scatter')
high_aes.corr()

# Selecting a number of pictures from 10k products

Our best guess currently is purely random. Let's see if we can improve it.

In [None]:
NUM_OF_PRODUCTS = 100

# Purely random choice
sample = scores.sample(NUM_OF_PRODUCTS)

In [None]:
# Combined selection by parameters
price_aes = scores.where(scores['price'] > 500).where(scores['aesthetic'] > 0.4)
selected_products = price_aes.sample(100, weights=price_aes['average'])
selected_products.head()