created on: Thu Jan 16 13:35:03 2020
<br>
Group 7
<br>
@author: V.B., C.L.

<h1>Group 7 - Images sociales<span class="tocSkip"></span>
    
<br>    
<center>SeatGuru results<center>

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Environment" data-toc-modified-id="Environment-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Environment</a></span><ul class="toc-item"><li><span><a href="#Libraries" data-toc-modified-id="Libraries-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Libraries</a></span></li><li><span><a href="#Parameters-and-data" data-toc-modified-id="Parameters-and-data-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Parameters and data</a></span></li></ul></li><li><span><a href="#Results" data-toc-modified-id="Results-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Results</a></span><ul class="toc-item"><li><span><a href="#View" data-toc-modified-id="View-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>View</a></span></li><li><span><a href="#Exteriors" data-toc-modified-id="Exteriors-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Exteriors</a></span></li><li><span><a href="#Interiors" data-toc-modified-id="Interiors-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Interiors</a></span></li><li><span><a href="#DataFrames-with-predictions-errors" data-toc-modified-id="DataFrames-with-predictions-errors-3.4"><span class="toc-item-num">3.4&nbsp;&nbsp;</span>DataFrames with predictions errors</a></span></li></ul></li></ul></div>

# Introduction

This notebook aims to analyse scores resulting from our models, after all images went through the pipeline.
Depending on the chosen aircraft manufacturer, you would have to change the `type_ext` and `type_int` lists.



# Highlights


The View model obtained very high scores: 98,9% without considering images classified as "Others", and 96,7% with "Others".

We tried to compare probabilities obtained on Airbus and Boeing against probabilities for other aircraft manufacturers, but we didn’t come to any conclusion enabling to distinguish them properly.

For manufacturer detection on Interiors (Int_man), the score seems satisfying at first glance but must be interpreted with caution. Indeed:

With only Airbus and Boeing, the score reaches 88%. 
When taking into account all manufacturers, the score is 75%.
However, while train accuracy reached nearly 1 , the test accuracy was only 0.6. We can’t ruled out that these model's good scores have been achieved thanks to the training part learnt "by heart".

The models could be more robust if trained with images coming from more diverse social media. Furthermore, few people apprear on SeatGuru images, whereas there are lots of selfies on Instagram, and we think that this issue requires further consideration.

For aircraft types detection in Exteriors (Ext_typ), the accuracy is very low: 14% for 11 models taken into account whereas the model performed well on Airliners images. Our hypothesis is that images taken by professional photographers and images taken by travellers are quite different. Thus, Airliners is probably not the best choice to train the model.

To conclude, a unique source for the training doesn’t seem to be a good way to create a model able to generalize well, and a greater amount of data is needed.

# Environment
To ensure a proper functioning of this code file, `python 3.6` or later version is required.
## Libraries

In [2]:
import pandas as pd
import matplotlib.pyplot as plt

## Parameters and data

In [3]:
path_project = './../'
path_pred = path_project + 'Results/'
path_real = path_project + 'ImagesStats/'
pred_file_name = 'g7_pred_SEATGURU_4'

In [6]:
# Retreive predictions DataFrame and ground truth DataFrame
df_pred = pd.read_csv(path_pred + pred_file_name + '.csv', sep=';')
df_real = pd.read_csv(path_real + 'g7_SEATGURU_annotate.csv', sep=';')

df_real.rename(columns={'aircraft_manufacturer': 'manufacturer',
                        'aircraft_type': 'type', 'name': 'img'}, inplace=True)

# Merge --> 'x' columns: ground truth; 'y': pred
df = pd.merge(df_real, df_pred, on='img')

df = df.drop(['format', 'height', 'width', 'height_to_width', 'ncol'], axis=1)

df = df.reindex(pd.Index(['img', 'view_x', 'view_y', 'view_proba',
                          'manufacturer_x', 'manufacturer_y', 'manufacturer_proba',
                          'type_x', 'type_y', 'type_proba']), axis=1).reset_index(drop=True)

df.head()

Unnamed: 0,img,view_x,view_y,view_proba,manufacturer_x,manufacturer_y,manufacturer_proba,type_x,type_y,type_proba
0,Cathay_Pacific_Airways_Boeing_777-300ER_C_0.jpg,Meal,Meal,0.999996,Boeing,,,777,,
1,KLM_Airbus_A330-300_1.jpg,Int,Int,0.999998,Airbus,Airbus,0.99875,A330,A330,0.998699
2,American_Airlines_Boeing_767-300_3.jpg,Int,Int,1.0,Boeing,Boeing,0.999102,767,777,0.998976
3,Air_Canada_Boeing_767-300ER_v2_3.jpg,Meal,Meal,0.999946,Boeing,,,767,,
4,United_Airlines_Q400_A_2.jpg,Ext,Ext,0.99999,Other,Boeing,,,757,0.44616


# Results

In [7]:
man = ['Airbus', 'Boeing']

# Types for which exterior model was trained
types_ext = ['A320', 'A321', 'A330', 'A340', 'A350',
             'A380', '737', '747', '757', '777', '787']

# Types for which interior model was trained
int_airb = ['A320','A321','A330','A340','A350']
int_boeing = ['737','747','757','777', '787']
types_int = int_airb + int_boeing

## View

In [8]:
df_view = df[df.apply(lambda x: x['view_x'] == x['view_y'], axis=1)]

print('Scores for the viewpoint:')
print(
    f'{round(len(df_view) / len(df[df["view_x"] != "Others"]), 6)} (without Others category)')
print(f'{round(len(df_view) / len(df), 6)}')

Scores for the viewpoint:
0.989988 (without Others category)
0.967136


## Exteriors

In [9]:
df_ext_pred_OK = df[(df['view_x'] == 'Ext') & (df['view_y'] == 'Ext')]

df_ext_ab = df_ext_pred_OK[df_ext_pred_OK['manufacturer_x'].isin(man)]
df_ext_man_OK = df_ext_ab[df_ext_ab['manufacturer_x']
                          == df_ext_ab['manufacturer_y']]

print(
    f'Score for manufacturer detection for planes detected as "Ext": \n{round(len(df_ext_man_OK) / len(df_ext_ab), 6)}')
print(
    f'\nWith other manufacturers: \n{round(len(df_ext_man_OK) / len(df_ext_pred_OK), 6)}')

Score for manufacturer detection for planes detected as "Ext": 
0.584211

With other manufacturers: 
0.406593


In [10]:
df_type_ab = df_ext_pred_OK[df_ext_pred_OK['type_x'].isin(types_ext)]
df_ext_type_OK = df_ext_pred_OK[df_ext_pred_OK['type_x']
                                == df_ext_pred_OK['type_y']]

print('Score for aircraft type detection for planes detected as "Ext" (aircrafts included in training): ')
print(round(len(df_ext_type_OK)/len(df_type_ab), 6))

Score for aircraft type detection for planes detected as "Ext" (aircrafts included in training): 
0.141243


## Interiors

In [11]:
df_int_pred_OK = df[(df['view_x'] == 'Int') & (
    df['view_y'] == 'Int')]

df_int_ab = df_int_pred_OK[df_int_pred_OK['manufacturer_x'].isin(man)]
df_int_man_OK = df_int_ab[df_int_ab['manufacturer_x']
                          == df_int_ab['manufacturer_y']]

print('Score for manufacturer detection for planes detected as "Int" (Airbus and Boeing only): ')
print(round(len(df_int_man_OK)/len(df_int_ab), 6))
print('\nScore for manufacturer detection for planes detected as "Int" (with all manufacturers): ')
print(round(len(df_int_man_OK)/len(df_int_pred_OK), 6))

Score for manufacturer detection for planes detected as "Int" (Airbus and Boeing only): 
0.882279

Score for manufacturer detection for planes detected as "Int" (with all manufacturers): 
0.757527


In [12]:
df_type_ab = df_int_pred_OK[df_int_pred_OK['type_x'].isin(types_int)]
df_int_type_OK = df_int_pred_OK[df_int_pred_OK['type_x']
                                == df_int_pred_OK['type_y']]

print(
    f'Score for aircraft type detection for planes detected as "Int": \n{round(len(df_int_type_OK)/len(df_type_ab), 6)}')

Score for aircraft type detection for planes detected as "Int": 
0.66867


In [13]:
df_int_airbus = df_int_pred_OK[df_int_pred_OK['type_x'].isin(int_airb)]
df_int_airbus_OK = df_int_airbus[df_int_airbus['type_x']
                                 == df_int_airbus['type_y']]

print('Score for aircraft type detection for planes detected as Airbus interiors (types included in training): ')
print(round(len(df_int_airbus_OK) / len(df_int_airbus), 6))

Score for aircraft type detection for planes detected as Airbus interiors (types included in training): 
0.651613


In [14]:
df_int_boeing = df_int_pred_OK[df_int_pred_OK['type_x'].isin(int_boeing)]
df_int_boeing_OK = df_int_boeing[df_int_boeing['type_x']
                                 == df_int_boeing['type_y']]

print('Score for aircraft type detection for planes detected as Boeing interiors (types included in training): ')
print(round(len(df_int_boeing_OK)/len(df_int_boeing), 6))

Score for aircraft type detection for planes detected as Boeing interiors (types included in training): 
0.642757


## DataFrames with predictions errors

In [15]:
df['pred_OK'] = df.apply(lambda x:  x['view_x'] == x['view_y'], axis=1)
df_view = df[['img', 'view_x', 'view_y', 'view_proba', 'pred_OK']]

df_view_OK = df_view[df_view['pred_OK'] == True]
df_view_OK

Unnamed: 0,img,view_x,view_y,view_proba,pred_OK
0,Cathay_Pacific_Airways_Boeing_777-300ER_C_0.jpg,Meal,Meal,0.999996,True
1,KLM_Airbus_A330-300_1.jpg,Int,Int,0.999998,True
2,American_Airlines_Boeing_767-300_3.jpg,Int,Int,1.000000,True
3,Air_Canada_Boeing_767-300ER_v2_3.jpg,Meal,Meal,0.999946,True
4,United_Airlines_Q400_A_2.jpg,Ext,Ext,0.999990,True
...,...,...,...,...,...
2551,Air_Canada_Airbus_A330_C_0.jpg,Int,Int,0.999976,True
2552,Qatar_Airways_Airbus_A321_2.jpg,Int,Int,0.999851,True
2553,Frontier_Airlines_Airbus_A320_4.jpg,Ext_Int,Ext_Int,0.999627,True
2554,Spirit_Airlines_Airbus_A320_V2_3.jpg,Int,Int,0.998991,True


In [16]:
df_view_err_no_others = df_view[(df_view['pred_OK'] == False) & (
    df_view['view_x'] != 'Others')]
df_view_err_no_others

Unnamed: 0,img,view_x,view_y,view_proba,pred_OK
180,KLM_Boeing_777-200ER_2.jpg,Int,Meal,0.646653,False
298,Austrian_Airlines_Airbus_A320_4.jpg,Ext_Int,Ext,0.569161,False
356,United_Airlines_Boeing_737-900_E_4.jpg,Ext,Ext_Int,0.963786,False
494,Turkish_Airlines_Airbus_A330-300_2.jpg,Int,Meal,0.99967,False
617,Delta_Airlines_Boeing_767-300ER_C_0.jpg,Int,Meal,0.990855,False
632,Hainan_Airlines_Airbus_A330-300_0.jpg,Ext_Int,Ext,0.998867,False
708,Cathay_Pacific_Airways_Boeing_777-300_B_4.jpg,Meal,Int,0.849697,False
845,British_Airways_Boeing_787-9_1.jpg,Ext_Int,Int,0.661129,False
932,Hawaiian_Airlines_ATR_42-500_0.jpg,Ext_Int,Ext,0.999878,False
978,TAM_Airlines_Boeing_767-300ER_4.jpg,Meal,Int,0.759586,False


In [27]:
df_view_err_others = df_view[(df_view['pred_OK'] == False) & (
    df_view['view_x'] == 'Others')]
df_view_err_others

Unnamed: 0,img,view_x,view_y,view_proba,pred_OK
121,Air_India_Airbus_A319_V1_3.jpg,Others,Meal,0.966262,False
122,Avianca_Airbus_A319-100_4.jpg,Others,Ext,0.521139,False
123,British_Airways_Airbus_A319_D_1.jpg,Others,Int,0.999406,False
124,LAN_Airlines_Airbus_A319_V2_2.jpg,Others,Int,0.863998,False
125,Spirit_Airlines_Airbus_A319_B_4.jpg,Others,Int,0.999255,False
364,Aer_Lingus_Airbus_A320_3.jpg,Others,Int,0.803373,False
365,Alitalia_Airlines_Airbus_A320_B_new_3.jpg,Others,Meal,0.898313,False
366,Avianca_Airbus_A320-200_0.jpg,Others,Int,0.995585,False
367,Avianca_Airbus_A320-200_1.jpg,Others,Int,0.844808,False
368,Avianca_Airbus_A320-200_3.jpg,Others,Ext_Int,0.924222,False
