# Aesthetic Classification

In this notebook we work with different functions to make a model and obtain results from image descriptors.
This will be an example in order to create scripts that generate automaticatly the results for our paper.

## A bit of set up

We need numpy and pandas for data. Pickle and gzip for read the extracted features. Our folder with the code of our functions. Different models from scikit.

In [1]:
# set up Python environment: numpy for numerical routines
import numpy as np
import pandas as pd

# for store the results
from six.moves import cPickle as pickle
import gzip

# our code (utilsData needs a view)
import sys
sys.path.append('../pycode/')
import utilsData
from preprocess import utilities

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2


In [2]:
from sklearn.metrics import roc_auc_score, accuracy_score
import full_models

In [3]:
%load_ext memory_profiler

## AVA dataset
We start with AVA data. First, a info package must be load. It contains information about votes, style features, labels and IDs. Then with the information of the arff file and readARFF function, we extract the features with their IDs. Finally, the information is combined.

In [4]:
features_file = '../features/AVA/GHIST.arff'
#features_file = '../features/AesNet_CaffeNet_fc7.pklz'
output_file = '../prueba.pklz'
selected_model = 'NBG'
decaf_discrete = 'True'

In [5]:
if features_file[-4:] == 'pklz':
    features = pickle.load(open(features_file,'rb',pickle.HIGHEST_PROTOCOL))
else:
    features = utilsData.readARFF(features_file)
    
features['id'] = features['id'].astype(int)
#for test in notebooks
#features = features.iloc[:,-101:]

# we take the name of the features and delete de ID
features_names = np.array(features.columns)
index = np.argwhere(features_names=='id')
features_names = np.delete(features_names, index)

# this line is for normalize decaf features
if (decaf_discrete == 'True'):
    features[features_names],_ = utilities.reference_forward_implementation(np.array(features[features_names]),5,2,1.5,0.75)

data = pickle.load(gzip.open('../packages/AVA_info.pklz','rb',2))

In [6]:
data=data.merge(features, on='id', copy=False)

In [7]:
num_images = data.shape[0]

# to free space
del features

In [16]:
data.sort_values(['VotesMean'])

Unnamed: 0,line,id,vote1,vote2,vote3,vote4,vote5,vote6,vote7,vote8,...,var247,var248,var249,var250,var251,var252,var253,var254,var255,var256
135021,135198,7143,176,37,19,20,6,3,3,1,...,0.022848,0.001029,0.000010,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
54043,54044,221721,218,72,52,29,9,1,0,0,...,0.000002,0.000002,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
112908,113085,309716,106,45,45,17,3,1,0,0,...,0.005940,0.005435,0.004465,0.001258,0.000426,0.000120,0.000010,0.000004,0.000000,0.000000
174244,174421,8791,138,57,31,26,10,4,0,1,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
189545,189722,11120,119,26,31,14,7,2,0,1,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
7532,7533,212523,169,56,39,27,8,1,2,1,...,0.000476,0.000391,0.000476,0.000527,0.000425,0.000425,0.000901,0.001423,0.004762,0.092610
224076,224254,104023,195,16,4,14,17,7,2,2,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
131106,131283,335172,150,61,41,26,4,2,0,2,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
224150,224328,80279,155,66,32,24,9,8,1,0,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
215641,215819,45563,106,51,36,25,11,1,0,0,...,0.000031,0.000019,0.000010,0.000015,0.000019,0.000019,0.000012,0.000004,0.000006,0.000008


In [8]:
data_aux = data[np.append(features_names,['Class'])]
data_aux['Class'] = pd.Categorical(data_aux['Class'],range(0,len(data_aux['Class'].unique())))
del data

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


In [9]:
np.random.seed(1000)
num_folds = 5
folds = np.random.choice(range(0,num_images),replace=False,size=(num_folds,int(num_images/num_folds)))

In [10]:
results = {}
results['balanced']=0
results['AUC']=0
results['accuracy']=0

for i in range(0, num_folds):
    
    train_indices = np.delete(folds,i,axis=0).reshape(-1)
    train_indices = train_indices[utilities.balance_class(data_aux['Class'].cat.codes[train_indices])]
    
    test_indices = folds[i]
    
    if selected_model == 'NB':
        predictions = full_models.fullNB(data_aux, train_indices, test_indices)
        
    elif selected_model == 'AODE':
        predictions = full_models.fullAODE(data_aux, train_indices, test_indices)
    
    elif selected_model == 'NBG':
        predictions = full_models.fullNBG(data_aux, train_indices, test_indices, features_names, 'Class')
    
    elif selected_model == 'SVM':
        predictions = full_models.fullSVM(data_aux, train_indices, test_indices, features_names, 'Class')
        
    elif selected_model == 'ELM':
        predictions = full_models.fullELM(data_aux, train_indices, test_indices, features_names, 'Class')
        
    elif selected_model == 'GBoost':
        predictions = full_models.fullGBoost(data_aux, train_indices, test_indices, features_names, 'Class')
    
    results['balanced'] += utilsData.balanced_accuracy(data_aux['Class'].cat.codes[test_indices], predictions)
    results['AUC'] += roc_auc_score(data_aux['Class'].cat.codes[test_indices], predictions)
    results['accuracy'] += accuracy_score(data_aux['Class'].cat.codes[test_indices], (predictions >= 0.5).astype(int))
    
results['balanced'] /= num_folds
results['AUC'] /= num_folds
results['accuracy'] /= num_folds

In [11]:
results

{'AUC': 0.50069373981168153,
 'accuracy': 0.51046406892500484,
 'balanced': 0.50071524531063916}

In [None]:
pickle.dump(results, gzip.open( output_file, "wb" ), 2)