# Aesthetic Classification

In this notebook we work with different functions to make a model and obtain results from image descriptors.
This will be an example in order to create scripts that generate automaticatly the results for our paper.

## A bit of set up

We need numpy and pandas for data. Pickle and gzip for read the extracted features. Our folder with the code of our functions. Different models from scikit.

In [1]:
# set up Python environment: numpy for numerical routines
import numpy as np
import pandas as pd

# for store the results
from six.moves import cPickle as pickle
import gzip

# our code (utilsData needs a view)
import sys
sys.path.append('../pycode/')
import utilsData
from preprocess import utilities

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2


In [2]:
from sklearn.metrics import roc_auc_score, accuracy_score
import full_models

SyntaxError: invalid syntax (nb.py, line 117)

In [None]:
%load_ext memory_profiler

## AVA dataset
We start with AVA data. First, a info package must be load. It contains information about votes, style features, labels and IDs. Then with the information of the arff file and readARFF function, we extract the features with their IDs. Finally, the information is combined.

In [3]:
features_file = '../features/AVA/GHIST.arff'
#features_file = '../features/AesNet_CaffeNet_fc6.pklz'
output_file = '../prueba.pklz'
selected_model = 'NBG'
decaf_discrete = 'False'

In [4]:
if features_file[-4:] == 'pklz':
    features = pickle.load(open(features_file,'rb',pickle.HIGHEST_PROTOCOL))
else:
    features = utilsData.readARFF(features_file)
    
features['id'] = features['id'].astype(int)
#for test in notebooks
#features = features.iloc[:,-101:]

# we take the name of the features and delete de ID
features_names = np.array(features.columns)
index = np.argwhere(features_names=='id')
features_names = np.delete(features_names, index)

# this line is for normalize decaf features
if (decaf_discrete == 'True'):
    features[features_names],_ = utilities.reference_forward_implementation(np.array(features[features_names]),5,2,1.5,0.75)

data = pickle.load(gzip.open('../packages/AVA_info.pklz','rb',2))

In [5]:
data=data.merge(features, on='id', copy=False)

In [6]:
num_images = data.shape[0]

# to free space
del features

In [27]:
data_aux = data[np.append(features_names,['Class'])]
data_aux['Class'] = pd.Categorical(data_aux['Class'],[0,1])
data_aux['Class'].cat.categories=['Mala','Buena']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


In [28]:
np.random.seed(1000)
num_folds = 5
folds = np.random.choice(range(0,num_images),replace=False,size=(num_folds,int(num_images/num_folds)))

## todo esto es de test

In [29]:
i=0
train_indices = np.delete(folds,i,axis=0).reshape(-1)
train_indices = train_indices[utilities.balance_class(data_aux['Class'].cat.codes[train_indices])]

test_indices = folds[i]

In [30]:
# own models and functions
from preprocess.mdl import MDL_method
from preprocess.unsupervised import Unsupervised_method
from models.nb import Naive_Bayes
from models.aode_fast import AODE_fast

In [31]:
discretization = Unsupervised_method()
discretization.frequency = True
discretization.bins = 5
discretization.train(data_aux.loc[train_indices])
data_fold = discretization.process(data_aux)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  data[to_change] = k


In [32]:
model = Naive_Bayes()
model.fit(data_fold.loc[train_indices])

  


In [33]:
classes_test = model.predict_class(data_fold.loc[test_indices])

In [15]:
classes_test

[1, 1, 1, 0, 1, ..., 1, 1, 1, 1, 0]
Length: 51070
Categories (2, int64): [0, 1]

In [26]:
classes_test

[Buena, Buena, Buena, Mala, Buena, ..., Buena, Buena, Buena, Buena, Mala]
Length: 51070
Categories (2, object): [Mala, Buena]

In [34]:
classes_test

[1, 1, 1, 0, 1, ..., 1, 1, 1, 1, 0]
Length: 51070
Categories (2, int64): [1, 0]

In [35]:
accuracy_score(data_aux['Class'].cat.codes[test_indices], (model.predict_probs(data_fold.loc[test_indices])[1] >= 0.5).astype(int))

0.56851380458194634

In [None]:
aux=model._predict_probs_base(data_fold.loc[test_indices])

In [None]:
model.variables_dict[model.class_index]

In [None]:
results = {}
results['balanced']=0
results['AUC']=0
results['accuracy']=0

for i in range(0, num_folds):
    
    train_indices = np.delete(folds,i,axis=0).reshape(-1)
    train_indices = train_indices[utilities.balance_class(data_aux['Class'].cat.codes[train_indices])]
    
    test_indices = folds[i]
    
    if selected_model == 'NB':
        predictions = full_models.fullNB(data_aux, train_indices, test_indices)
        
    elif selected_model == 'AODE':
        predictions = full_models.fullAODE(data_aux, train_indices, test_indices)
    
    elif selected_model == 'NBG':
        predictions = full_models.fullNBG(data_aux, train_indices, test_indices, features_names, 'Class')
    
    elif selected_model == 'SVM':
        predictions = full_models.fullSVM(data_aux, train_indices, test_indices, features_names, 'Class')
        
    elif selected_model == 'ELM':
        predictions = full_models.fullELM(data_aux, train_indices, test_indices, features_names, 'Class')
        
    elif selected_model == 'GBoost':
        predictions = full_models.fullGBoost(data_aux, train_indices, test_indices, features_names, 'Class')
    
    results['balanced'] += utilsData.balanced_accuracy(data_aux['Class'].cat.codes[test_indices], predictions)
    results['AUC'] += roc_auc_score(data_aux['Class'].cat.codes[test_indices], predictions)
    results['accuracy'] += accuracy_score(data_aux['Class'].cat.codes[test_indices], (predictions >= 0.5).astype(int))
    
results['balanced'] /= num_folds
results['AUC'] /= num_folds
results['accuracy'] /= num_folds

In [None]:
results

In [None]:
pickle.dump(results, gzip.open( output_file, "wb" ), 2)

## Testing models with the same partition as in the finetuning

In [None]:
train_indices = pickle.load(gzip.open('../models/train_indexes_AesNet.pklz','rb',2))
test_indices = pickle.load(gzip.open('../models/test_indexes_AesNet.pklz','rb',2))

In [None]:
predictions = full_models.fullNBG(data_aux, train_indices, test_indices, features_names, 'Class')

In [None]:
results = {}
results['balanced'] = utilsData.balanced_accuracy(data_aux['Class'].cat.codes[test_indices], predictions)
results['AUC'] = roc_auc_score(data_aux['Class'].cat.codes[test_indices], predictions)
results['accuracy'] = accuracy_score(data_aux['Class'].cat.codes[test_indices], (predictions >= 0.5).astype(int))

In [None]:
results