# Analysing results

Now that we have scored our images, we can start analysing the results.

At this point we have a few goals.

1. Analyze the usefulness of aesthetic score or other features
2. Select a number of pictures to use for advertising from the scored set.

In [None]:
# Required for showing plots within the notebook.
%matplotlib inline
import pandas as pd

Try to find out:
- Does aesthetic score correlate with any variable?
- Does high aesthetic score work better than low aesthetic score?
- Does price reflect picture quality?
- What variables would you use for ad quality?


Here is some python to get you started

In [None]:
# Loading the data
training_data_file = './../data/active_products_scored.csv'
predict_data_file = './../data/10kProducts_scored.csv'
train = pd.read_csv(training_data_file)
scores = pd.read_csv(predict_data_file)

In [None]:
scores.hist(bins=20)
scores.head()

In [None]:
# Prices looks odd. We can get a better price-distribution with restricting the axes
prices = scores['price'].fillna(0)
prices.hist(bins=range(10, 1000, 50))
prices.max()

In [None]:
# Looking at the highest Aesthetics separately:
high_aes = scores.nlargest(500, 'aesthetic')
high_aes.hist(bins=20)
high_aes.head()
high_aes.plot(x='average', y='aesthetic', kind='scatter')
high_aes.corr()

# Selecting a number of pictures from 10k products

Our best guess currently is purely random. Let's see if we can improve it.

In [None]:
NUM_OF_PRODUCTS = 100

# Purely random choice
sample = scores.sample(NUM_OF_PRODUCTS)

Most likely, we would like to use a combination of parameters.
Here is a model which selects a number of products where both price and aesthetic value are high enough.

In [None]:
# Combined selection by parameters
# Selecting only pictures with higher price and higher aesthetic values.
price_aes = scores.where(scores['price'] > 500).where(scores['aesthetic'] > 0.5)
selected_products = price_aes.sample(NUM_OF_PRODUCTS, weights=price_aes['average'])
selected_products.head()