## Use the emotion-histograms extracted with the corresponding notebook to train an image to an emotion (emotion-distribution) classifier.
- the notebook to make the histograms is at __analysis/extract_emotion_histogram_per_image.ipynb__ 
- if you do not want to train the model; you can load the pretrained one (set do_trainining=False)
- the pretrained one is located at https://www.dropbox.com/s/8dfj3b36q15iieo/best_model.pt?dl=0

#### Friendly Remarks from Panos :
    - Predicting the emotional-responses without the text/explanations is very hard. 
    - Predicting the emotional-responses given the explanations is much easier.  
    - Predicting the emotional-responses given the text & image is not (significantly) easier, than relying only on text.
    
###### <=> people can have very different emotional-reactions given an image. 

very fine-grained remarks:
    - I did also train the image2emotion with "cleaner" data (e.g., drop images for which the emotion maximizer has less than 0.30 of total mass/votes). It does not make better predictions on test images w.r.t. the majority emotion.
    - These are interesting stuff... if you want to play/work together let me know.

In [1]:
import torch
import argparse
import pandas as pd
import os.path as osp
import numpy as np
from ast import literal_eval
from plotly.offline import init_notebook_mode


from artemis.in_out.neural_net_oriented import torch_load_model, torch_save_model, save_state_dicts
from artemis.in_out.neural_net_oriented import image_emotion_distribution_df_to_pytorch_dataset
from artemis.in_out.basics import create_dir
from artemis.utils.visualization import plot_confusion_matrix
from artemis.emotions import ARTEMIS_EMOTIONS

from artemis.neural_models.mlp import MLP
from artemis.neural_models.resnet_encoder import ResnetEncoder
from artemis.neural_models.image_emotion_clf import ImageEmotionClassifier
from artemis.neural_models.image_emotion_clf import single_epoch_train, evaluate_on_dataset

init_notebook_mode()
%load_ext autoreload
%autoreload 2

In [2]:
# Load saved histograms of emotion choices as computed with "previous" notebook (see top-README if you are lost)
image_hists_file = '../../../data/image-emotion-histogram.csv'
image_hists = pd.read_csv(image_hists_file)

# this literal_eval brings the saved string to its corresponding native (list) type
image_hists.emotion_histogram = image_hists.emotion_histogram.apply(literal_eval)

# normalize the histograms
image_hists.emotion_histogram = image_hists.emotion_histogram.apply(lambda x: (np.array(x) / float(sum(x))).astype('float32'))

print(f'Histograms corresponding to {len(image_hists)} images')

Histograms corresponding to 80031 images


#### In cell below you need to use YOUR PATHS.
- I will use the pre-processed ArtEmis dataset; as prepared by the script __preprocess_artemis_data.py --preprocess-for-deep-nets True__ (see STEP.1 at top-README) 

- Specifically this way, I can utilize the same train/test/val splits accross all my neural-based experiments.

In [3]:
artemis_preprocessed_dir = '/home/optas/DATA/OUT/artemis/preprocessed_data/for_neural_nets'
save_dir = '/home/optas/DATA/OUT/artemis/neural_nets/img_to_emotion'  # for trained model
wikiart_img_dir = '/home/optas/DATA/Images/Wiki-Art/rescaled_max_size_to_600px_same_aspect_ratio'

create_dir(save_dir)
checkpoint_file = osp.join(save_dir, 'best_model.pt')

# minor parameters
GPU_ID = 0 

In [4]:
## Prepare the artemis dataset (merge it with the emotion-histograms.)
artemis_data = pd.read_csv(osp.join(artemis_preprocessed_dir, 'artemis_preprocessed.csv'))
print('Annotations loaded:', len(artemis_data))

## keep each image once.
artemis_data = artemis_data.drop_duplicates(subset=['art_style', 'painting'])
artemis_data.reset_index(inplace=True, drop=True)

# keep only relevant info + merge
artemis_data = artemis_data[['art_style', 'painting', 'split']] 
artemis_data = artemis_data.merge(image_hists)
artemis_data = artemis_data.rename(columns={'emotion_histogram': 'emotion_distribution'})

n_emotions = len(image_hists.emotion_histogram[0])
print('Using {} emotion-classes.'.format(n_emotions))
assert all(image_hists.emotion_histogram.apply(len) == n_emotions)

Annotations loaded: 429431
Using 9 emotion-classes.


In [5]:
# to see the emotion_distribution column
artemis_data.head()

Unnamed: 0,art_style,painting,split,emotion_distribution
0,Post_Impressionism,vincent-van-gogh_portrait-of-madame-ginoux-l-a...,train,"[0.0, 0.1, 0.2, 0.0, 0.0, 0.1, 0.1, 0.2, 0.3]"
1,Expressionism,wassily-kandinsky_study-for-autumn-1909,train,"[0.2857143, 0.42857143, 0.0, 0.0, 0.0, 0.14285..."
2,Impressionism,konstantin-korovin_yaroslavna-s-lament-1909,train,"[0.42857143, 0.14285715, 0.2857143, 0.0, 0.0, ..."
3,Impressionism,paul-gauguin_mette-gauguin-in-an-evening-dress...,test,"[0.2857143, 0.14285715, 0.14285715, 0.14285715..."
4,Impressionism,pericles-pantazis_still-life-with-quinces-1880,train,"[0.0, 0.0, 0.85714287, 0.0, 0.0, 0.0, 0.0, 0.0..."


In [6]:
parser = argparse.ArgumentParser() # use for convenience instead of say a dictionary
args = parser.parse_args([])

# deep-net data-handling params. note if you want to reuse this net with neural-speaker 
# it makes sense to keep some of the (image-oriented) parameters the same accross the nets.
args.lanczos = True
args.img_dim = 256
args.num_workers = 8
args.batch_size = 128
args.gpu_id = 0

args.img_dir = wikiart_img_dir

In [7]:
## prepare data
data_loaders, datasets = image_emotion_distribution_df_to_pytorch_dataset(artemis_data, args)

In [8]:
## Prepate the Neural-Net Stuff (model, optimizer etc.)
## This is what I used for the paper with minimal hyper-param-tuning. You can use different nets/configs here...

In [9]:
device = torch.device("cuda:" + str(args.gpu_id) if torch.cuda.is_available() else "cpu")
criterion = torch.nn.KLDivLoss(reduction='batchmean').to(device)

img_encoder = ResnetEncoder('resnet34', adapt_image_size=1).unfreeze(level=7, verbose=True)
img_emb_dim = img_encoder.embedding_dimension()

# here we make an MLP closing with LogSoftmax since we want to train this net via KLDivLoss
clf_head = MLP(img_emb_dim, [100, n_emotions], dropout_rate=0.3, b_norm=True, closure=torch.nn.LogSoftmax(dim=-1))

model = ImageEmotionClassifier(img_encoder, clf_head).to(device);
optimizer = torch.optim.Adam([{'params': filter(lambda p: p.requires_grad, model.parameters()), 'lr': 5e-4}])

From 8 layers, you are unfreezing the last 1


In [10]:
## helper function.
## to evaluate how well the model does according to the class that it finds most likely
## note it only concerns the predictions on examples (images) with a single -unique maximizer- emotion
def evaluate_argmax_prediction(dataset, guesses):
    labels = dataset.labels
    labels = np.vstack(labels.to_numpy())
    unique_max = (labels == labels.max(1, keepdims=True)).sum(1) == 1
    umax_ids = np.where(unique_max)[0]
    gt_max = np.argmax(labels[unique_max], 1)
    max_pred = np.argmax(guesses[umax_ids], 1)
    return (gt_max == max_pred).mean()

In [9]:
do_training = True
max_train_epochs = 25
no_improvement = 0
min_eval_loss = np.Inf

if do_training:
    for epoch in range(1, max_train_epochs+1):
        train_loss = single_epoch_train(model, data_loaders['train'], criterion, optimizer, device)
        print('Train Loss: {:.3f}'.format(train_loss))

        eval_loss, _ = \
        evaluate_on_dataset(model, data_loaders['val'], criterion, device, detailed=False)
        print('Eval Loss: {:.3f}'.format(eval_loss))

        if eval_loss < min_eval_loss:
            min_eval_loss = eval_loss
            no_improvement = 0
            print('Epoch {}. Validation loss improved!'.format(epoch))
            torch_save_model(model, checkpoint_file)
                
            test_loss, test_confidence = \
            evaluate_on_dataset(model, data_loaders['test'], criterion, device, detailed=True)
            print('Test Loss: {:.3f}'.format(test_loss))                

            dataset = data_loaders['test'].dataset        
            arg_max_acc = evaluate_argmax_prediction(dataset, test_confidence)
            print('Test arg_max_acc: {:.3f}'.format(arg_max_acc))
        else:
            no_improvement += 1
        
        if no_improvement >=5 :
            print('Breaking at epoch {}. Since for 5 epoch we observed no (validation) improvement.'.format(epoch))
            break

HBox(children=(FloatProgress(value=0.0, max=527.0), HTML(value='')))


Train Loss: 0.843


HBox(children=(FloatProgress(value=0.0, max=16.0), HTML(value='')))


Eval Loss: 0.797
Epoch 1. Validation loss improved!


HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))


Test Loss: 0.788
Test arg_max_acc: 0.478


HBox(children=(FloatProgress(value=0.0, max=527.0), HTML(value='')))


Train Loss: 0.778


HBox(children=(FloatProgress(value=0.0, max=16.0), HTML(value='')))


Eval Loss: 0.785
Epoch 2. Validation loss improved!


HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))


Test Loss: 0.779
Test arg_max_acc: 0.483


HBox(children=(FloatProgress(value=0.0, max=527.0), HTML(value='')))


Train Loss: 0.745


HBox(children=(FloatProgress(value=0.0, max=16.0), HTML(value='')))


Eval Loss: 0.800


HBox(children=(FloatProgress(value=0.0, max=527.0), HTML(value='')))


Train Loss: 0.699


HBox(children=(FloatProgress(value=0.0, max=16.0), HTML(value='')))


Eval Loss: 0.817


HBox(children=(FloatProgress(value=0.0, max=527.0), HTML(value='')))


Train Loss: 0.634


HBox(children=(FloatProgress(value=0.0, max=16.0), HTML(value='')))


Eval Loss: 0.842


HBox(children=(FloatProgress(value=0.0, max=527.0), HTML(value='')))


Train Loss: 0.563


HBox(children=(FloatProgress(value=0.0, max=16.0), HTML(value='')))


Eval Loss: 0.910


HBox(children=(FloatProgress(value=0.0, max=527.0), HTML(value='')))


Train Loss: 0.490


HBox(children=(FloatProgress(value=0.0, max=16.0), HTML(value='')))


Eval Loss: 0.965
Breaking at epoch 7. Since for 5 epoch we observed no (validation) improvement.


### Below is rudimentary analysis of the trained system.

In [12]:
load_best_model = True

if not do_training or load_best_model:
    model = torch_load_model(checkpoint_file)
    test_loss, test_confidence = evaluate_on_dataset(model, data_loaders['test'], criterion, device, detailed=True)

HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))




In [13]:
## how often the most & second most, predicted emotions are positive vs. negative?
preds = torch.from_numpy(test_confidence)
top2 = preds.topk(2).indices
has_pos = torch.any(top2 <= 3, -1)
has_neg = torch.any((top2 >=4) & (top2 !=8), -1)
has_else = torch.any(top2 == 8, -1)
pn = (has_pos & has_neg).double().mean().item()
pne = ((has_pos & has_neg) | (has_pos & has_else) | (has_neg & has_else)).double().mean().item()
print('The classifier finds the 1st/2nd most likely emotions to be negative/positive, or contain something-else')
print(pn, pne)

The classifier finds the 1st/2nd most likely emotions to be negative/positive, or contain something-else
0.2183285011975293 0.3268624732131602


In [14]:
# How well it does on test images that have strong majority in emotions?
labels = data_loaders['test'].dataset.labels
labels = np.vstack(labels.to_numpy())



for use_strong_domi in [True, False]:
    print('use_strong_domi:', use_strong_domi)
    if use_strong_domi:
        dominant_max = (labels.max(1) > 0.5)
    else:
        dominant_max = (labels.max(1) >= 0.5)

    umax_ids = np.where(dominant_max)[0]
    gt_max = np.argmax(labels[dominant_max], 1)
    max_pred = np.argmax(test_confidence[umax_ids], 1)    

    print('Test images with dominant majority', dominant_max.mean())
    print('Guess-correctly', (gt_max == max_pred).mean(), '\n')

use_strong_domi: True
Test images with dominant majority 0.3810664313626623
Guess-correctly 0.6023817399933841 

use_strong_domi: False
Test images with dominant majority 0.454556914156057
Guess-correctly 0.5757071547420965 



In [18]:
plot_confusion_matrix(ground_truth=gt_max, predictions=max_pred, labels=ARTEMIS_EMOTIONS)

In [20]:
# For the curious one. Images where people "together" aggree on anger are rare. Why?
plot_confusion_matrix(ground_truth=gt_max, predictions=max_pred, labels=ARTEMIS_EMOTIONS, normalize=False)