In this script, we conduct ensemble learning on the Boston Test set, basing on the analyses of each single model's confusion matrix and performance. After 1. Averaging ensemble; 2. Conditional ensemble; 3. Weighted ensemble, we produced an "All results" table. We also tried 4. Subdistrict conditional ensemble, but decided not to consider it.


##### 1. Averaging ensemble

We tried 4 averaging ensemble: 

- all single models; 

- models with best accuracy (cv0cv2cv4full);

- models best at predicting safety = 0 (cv3full);

- models best at predicting safety = 1 (cv0cv1cv2cv4own).

Averaging ensembled models had better accuracy than single models. Model 'cv3full' was best at predicting safety = 0 (TN_Rate 0.756619) and model 'cv0cv1cv2cv4own' was best at predicting safety = 1 (TP_Rate 0.779363), but they were weak at predicting the other target respectively. Therefore, we would use these 2 models in conditional and weighted ensemble.

##### 2. Averaging + Conditional ensemble

'Conditional ensemble1': When predicting safety = 0, we use the prediction by 'cv3full'; else we use the prediction by 'cv0cv1cv2cv4own'. It got TN_Rate 0.800279. However the TP_Rate was as low as 0.643249.

'Conditional ensemble2': When predicting safety = 1, we use the prediction by 'cv0cv1cv2cv4own'; else we use the prediction by 'cv3full'. It got TP_Rate 0.819978. However the TN_Rate was as low as 0.647933.

Comparing to 1. Averaging ensemble, these two conditional ensembled models were even better at predicting only one of the targets each, but were weaker at predicting the other target. Therefore, we looked at weighted ensemble.

##### 3. Averaging + Weighted ensemble

'weighted ensemble1': We gave prediction by model 'cv3full' 0.4 weight; and prediction by model 'cv0cv1cv2cv4own' 0.6 weight. It got the best accuracy among all models: 0.741132. But this model was better at predicting safety = 1 than safety = 0 (TP_Rate 0.750274; TN_Rate 0.733395)


'weighted ensemble2': We gave prediction by model 'cv3full' 0.45 weight; and prediction by model 'cv0cv1cv2cv4own' 0.55 weight. It got the 2nd best accuracy among all models: 0.740881. This model performance was relatively even at predicting both safety = 1 and safety = 0 (TP_Rate 0.747530; TN_Rate 0.735253)

'weighted ensemble3': We gave prediction by model 'cv3full' 0.55 weight; and prediction by model 'cv0cv1cv2cv4own' 0.45 weight. It got the 3rd best accuracy among all models: 0.735094. This model is better at predicting safety = 0 than safety = 1 (TP_Rate 0.731065; TN_Rate 0.738504)

##### 4. Subdistrict conditional ensemble

We analyzed the confusion matrix by subdistrict, and tried one ensembled model based on the sigle model's performance by subdistrict. This model's performance was not better than the models produced in 1,2,3. Also considering that the subdistrict is not a feature in neural networks of the images, we would not consider this ensemble methond.


##### Conclusion
Examining the "All results" table, we decided to choose 'weighted ensemble2' as our final model, as it is a balance of overall accuracy and the prediction of both safety = 0 and safety = 1.

Next, we would use the ensemble strategy of 'weighted ensemble2' to predict Toronto street views (ensemble_toronto.ipynb).

In [160]:
import pandas as pd
import numpy as np
import glob
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score

### Before ensembling: single model performance

#### Single model performance of models produced in Transfer Learning
Six models (best models produced from cv0, cv1, cv2, cv3, cv4 and whole training dataset) were produced in the transfer learning process. Validation accuracy showed that the models' accuracy was around 70%. Later we would show that averaging ensemble could boost the accuracy.

In [161]:
cv_score = pd.read_pickle("/Users/zhanglingling/Desktop/ML1030/boston_train_evaluate/cv_score.pickle")
cv_score 

CV round,0,1,2,3,4,mean,std
train_loss,0.167847,0.070979,0.085303,0.051624,0.052649,0.08568,0.042939
train_acc,0.9471,0.972844,0.970884,0.980929,0.979752,0.970302,0.012226
val_loss,1.802394,1.540489,1.852702,1.808886,1.562203,1.713335,0.133567
val_acc,0.669805,0.672003,0.691994,0.672214,0.692622,0.679727,0.010308


In [163]:
wholedata_score = pd.read_pickle("/Users/zhanglingling/Desktop/ML1030/boston_train_evaluate/wholedata_score.pickle")
wholedata_score 

Unnamed: 0,loss,acc
0,0.058055,0.978464


#### Single model performance of model produced by our own cnn
Training loss: 0.3158, Training acc: 0.8696, val_loss: 0.6401, val_acc: 0.680

We calculate the confusion matrix of each single model

#### Data preparation

As boston_prediction_own_cnn.csv and the other predictions were produced in 2 different VMs, we need to unify them first. 

boston_prediction_own_cnn.csv has one more row than the other predictions. That is because on the VM that run transfer learning, one test image was not able to be fetched by the Google Street View API. So we removed that row in boston_prediction_own_cnn.csv as well.


In [166]:
# p = pd.read_csv('/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston_prediction_own_cnn.csv')     
# t = pd.read_csv('/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv4.prediction.csv')
# df = pd.merge(p, t, how='left', on='_file', 
#                    indicator=True)
# df[df['_merge'] == 'left_only']
# p = p[p['_file'] != 'gsv_1578.jpg']
# p.to_csv('/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston_prediction_own_cnn.csv', index = False)

In [191]:
predict_dir = '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/'
file_list = list(glob.glob(predict_dir + "*.csv*"))
file_list.sort()
file_list

['/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv0.prediction.csv',
 '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv1.prediction.csv',
 '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv2.prediction.csv',
 '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv3.prediction.csv',
 '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv4.prediction.csv',
 '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.wholedata.hdf5.prediction.csv',
 '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston_prediction_own_cnn.csv']

In [192]:
df_list = []
for f in file_list:
    df = pd.read_csv(f)
    df = df.sort_values("_file")
    df_list.append(df)

In [193]:
test_csv = "/Users/zhanglingling/Desktop/ML1030/us_safety/boston_test_fetched_with_target.csv"  
test_df = pd.read_csv(test_csv)
test_df = test_df.sort_values("_file")
target = "safety"
img_name_col = "_file"
test_df = test_df[[img_name_col, target]]
print(test_df.shape)
test_df.head()

(3976, 2)


Unnamed: 0,_file,safety
0,gsv_0.jpg,1
1,gsv_1.jpg,1
10,gsv_10.jpg,0
99,gsv_100.jpg,1
992,gsv_1000.jpg,1


Note that test_df also have 3976 samples. In later functions we used df = df[df['_merge'] == 'both'] to resovle this.

In [194]:
df = pd.merge(test_df, df_list[0], how='left', on='_file', 
                   indicator=True)
df[df['_merge'] == 'left_only']

Unnamed: 0,_file,safety,0,1,_merge
640,gsv_1578.jpg,0,,,left_only


In test, set we have 2153 actual safety = 0 and 1822 actual safety =1

In [195]:
df = df[df['_merge'] == 'both']
df.groupby(['safety']).count()

Unnamed: 0_level_0,_file,0,1,_merge
safety,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,2153,2153,2153,2153
1,1822,1822,1822,1822


In [202]:
# function that calculates each single model's confusion matrix and performance tables
def matrix_performance_singlemodel(test_df, data):
    
    #for a single model, no need to average
    prediction = data 
    prediction['pred_safety'] =  np.where(prediction['0'] > 0.5, 0, 1)
    
    #prepare y_true, y_pred
    df = pd.merge(test_df, prediction, how='left', on='_file', 
                   indicator=True)
    df[df['_merge'] == 'left_only']
    df = df[df['_merge'] == 'both']
    df['pred_safety'] = df['pred_safety'].astype(int)
    
    y_true = df['safety']
    y_pred = df['pred_safety']
    
    #confusion matrix
    tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
    matrix = pd.DataFrame([{'tn': tn, 'fp': fp, 'fn': fn, 'tp': tp}])
    matrix['tn_rate'] = matrix['tn'] / (matrix['tn'] + matrix['fp'])
    matrix['tp_rate_recall'] = matrix['tp'] / (matrix['fn'] + matrix['tp'])

    #matrix['fp_rate'] = matrix['fp'] / (matrix['tn'] + matrix['fp'])
    #matrix['fn_rate'] = matrix['fn'] / (matrix['fn'] + matrix['tp'])
  
    #performance
    matrix['accuracy'] = accuracy_score(y_true, y_pred) # accuracy: (tp + tn) / (p + n)
    matrix['f1_score'] = f1_score(y_true, y_pred) # f1: 2 tp / (2 tp + fp + fn)
    matrix['precision'] = precision_score(y_true, y_pred) # precision tp / (tp + fp)

    
    return matrix

In [203]:
conf_matrix_table = pd.DataFrame(columns=['fn','fp','tn', 'tp', 'tn_rate', 'tp_rate_recall', 'accuracy', 'f1_score', 'precision'])

for i in range(len(df_list)):
    conf_matrix = matrix_performance_singlemodel(test_df, df_list[i])
    conf_matrix_table = pd.concat([conf_matrix_table, conf_matrix])


conf_matrix_table['model'] = ['cv0', 'cv1', 'cv2', 'cv3', 'cv4', 'full', 'own']


In [204]:
conf_matrix_table

Unnamed: 0,fn,fp,tn,tp,tn_rate,tp_rate_recall,accuracy,f1_score,precision,model
0,485,704,1449,1337,0.673014,0.733809,0.700881,0.692208,0.655071,cv0
0,442,775,1378,1380,0.640037,0.757409,0.693836,0.69399,0.640371,cv1
0,537,649,1504,1285,0.69856,0.705269,0.701635,0.684239,0.664426,cv2
0,653,558,1595,1169,0.740827,0.641603,0.695346,0.658777,0.676896,cv3
0,505,684,1469,1317,0.682304,0.722832,0.700881,0.688988,0.658171,cv4
0,579,536,1617,1243,0.751045,0.682217,0.719497,0.690364,0.698707,full
0,552,741,1412,1270,0.655829,0.697036,0.674717,0.662666,0.631527,own


From above results, we can see that for the 7 single models:

2 models - cv3, full are better than the others in predicting 0;

5 models - cv0, cv1, cv2, cv4, own are better than the others in predicting 1

## 1. Averaging ensemble

In [205]:
# function that calculates averging ensemble model's confusion matrix and performance tables
def matrix_performance(test_df, file_list):
    
    #produce averaging ensembled model's prediction - 'averaged_prediction'
    df_list = []
    for f in file_list:
        df = pd.read_csv(f)
        df = df.sort_values("_file")
        df_list.append(df)
        
    averaged_prediction = pd.concat(df_list).groupby('_file').mean()
    averaged_prediction.reset_index(level=0, inplace=True)
    averaged_prediction['pred_safety'] =  np.where(averaged_prediction['0'] > 0.5, 0, 1)
    
    #prepare y_true, y_pred
    df = pd.merge(test_df, averaged_prediction, how='left', on='_file', 
                   indicator=True)
    df[df['_merge'] == 'left_only']
    df = df[df['_merge'] == 'both']
    df['pred_safety'] = df['pred_safety'].astype(int)
    
    
    y_true = df['safety']
    y_pred = df['pred_safety']
    
    #confusion matrix
    tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
    matrix = pd.DataFrame([{'tn': tn, 'fp': fp, 'fn': fn, 'tp': tp}])
    
    matrix['tn_rate'] = matrix['tn'] / (matrix['tn'] + matrix['fp'])
    matrix['tp_rate_recall'] = matrix['tp'] / (matrix['fn'] + matrix['tp'])
    
    #matrix['fp_rate'] = matrix['fp'] / (matrix['tn'] + matrix['fp'])
    #matrix['fn_rate'] = matrix['fn'] / (matrix['fn'] + matrix['tp'])
    
    #performance
    matrix['accuracy'] = accuracy_score(y_true, y_pred) # accuracy: (tp + tn) / (p + n)
    matrix['f1_score'] = f1_score(y_true, y_pred) # f1: 2 tp / (2 tp + fp + fn)
    matrix['precision'] = precision_score(y_true, y_pred) # precision tp / (tp + fp)
    #recall = recall_score(y_true, y_pred) # recall: tp / (tp + fn)
    
    
    return matrix


### 1.1 Averaging ensemble of all single models

In [206]:
predict_dir = '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/'
file_list = list(glob.glob(predict_dir + "*.csv*"))
file_list.sort()
file_list

['/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv0.prediction.csv',
 '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv1.prediction.csv',
 '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv2.prediction.csv',
 '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv3.prediction.csv',
 '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv4.prediction.csv',
 '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.wholedata.hdf5.prediction.csv',
 '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston_prediction_own_cnn.csv']

In [207]:
conf_matrix_all = matrix_performance(test_df, file_list)
conf_matrix_all['model'] = ['all models']
display(conf_matrix_all)

Unnamed: 0,fn,fp,tn,tp,tn_rate,tp_rate_recall,accuracy,f1_score,precision,model
0,435,602,1551,1387,0.72039,0.761251,0.739119,0.727893,0.697335,all models


### 1.2 Averaging ensemble of models withs best accuracy only (cv0, cv2, cv4, full)

In [208]:
file_list = ['/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv0.prediction.csv',
             '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv2.prediction.csv',
             '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv4.prediction.csv',
             '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.wholedata.hdf5.prediction.csv']
conf_matrix_cv0cv2cv4full = matrix_performance(test_df, file_list)
conf_matrix_cv0cv2cv4full['model'] = ['cv0cv2cv4full']

display(conf_matrix_cv0cv2cv4full)

Unnamed: 0,fn,fp,tn,tp,tn_rate,tp_rate_recall,accuracy,f1_score,precision,model
0,462,620,1533,1360,0.71203,0.746432,0.727799,0.715413,0.686869,cv0cv2cv4full


### 1.4 Averaging ensemble of models best at predicting target 0 only (cv3, full)


In [209]:
file_list = ['/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv3.prediction.csv',
             '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.wholedata.hdf5.prediction.csv']
conf_matrix_cv3full = matrix_performance(test_df, file_list)
conf_matrix_cv3full['model'] = ['cv3full']

display(conf_matrix_cv3full)

Unnamed: 0,fn,fp,tn,tp,tn_rate,tp_rate_recall,accuracy,f1_score,precision,model
0,576,524,1629,1246,0.756619,0.683864,0.72327,0.693764,0.703955,cv3full


### 1.5 Averaging ensemble of models best at predicting target 1 only (cv0, cv1,  cv2, cv4, own)

In [210]:
file_list = ['/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv0.prediction.csv',
             '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv1.prediction.csv',
             '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv2.prediction.csv',
             '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv4.prediction.csv',
             '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston_prediction_own_cnn.csv']
conf_matrix_cv0cv1cv2cv4own = matrix_performance(test_df, file_list)
conf_matrix_cv0cv1cv2cv4own['model'] = ['cv0cv1cv2cv4own']

display(conf_matrix_cv0cv1cv2cv4own)

Unnamed: 0,fn,fp,tn,tp,tn_rate,tp_rate_recall,accuracy,f1_score,precision,model
0,402,664,1489,1420,0.691593,0.779363,0.731824,0.727087,0.681382,cv0cv1cv2cv4own


In [211]:
pd.concat([conf_matrix_table, 
          conf_matrix_all,  conf_matrix_cv0cv2cv4full,
         conf_matrix_cv3full,
          conf_matrix_cv0cv1cv2cv4own])

Unnamed: 0,fn,fp,tn,tp,tn_rate,tp_rate_recall,accuracy,f1_score,precision,model
0,485,704,1449,1337,0.673014,0.733809,0.700881,0.692208,0.655071,cv0
0,442,775,1378,1380,0.640037,0.757409,0.693836,0.69399,0.640371,cv1
0,537,649,1504,1285,0.69856,0.705269,0.701635,0.684239,0.664426,cv2
0,653,558,1595,1169,0.740827,0.641603,0.695346,0.658777,0.676896,cv3
0,505,684,1469,1317,0.682304,0.722832,0.700881,0.688988,0.658171,cv4
0,579,536,1617,1243,0.751045,0.682217,0.719497,0.690364,0.698707,full
0,552,741,1412,1270,0.655829,0.697036,0.674717,0.662666,0.631527,own
0,435,602,1551,1387,0.72039,0.761251,0.739119,0.727893,0.697335,all models
0,462,620,1533,1360,0.71203,0.746432,0.727799,0.715413,0.686869,cv0cv2cv4full
0,576,524,1629,1246,0.756619,0.683864,0.72327,0.693764,0.703955,cv3full


### Conclusion of averaging ensemble

Above results showed that

1) The performance of all averaging ensemble models was better than that of single models.

2) In terms of accuracy, 'all models' yielded the best accuracy (73.9%), that is because it get on average both good tn_rate (72.0%) and good tp_rate (76.1%). However, this ensembled model is not best when predicting safety = 0 alone and safety = 1 alone.

3) When predicting safety = 0, 'cv3full' yielded the best prediction (tn_rate 75.7%).

4) When predicting safety = 1, 'cv0cv1cv2cv4own' yielded the best prediction (tp_rate 77.9%).


## 2. Conditional ensemble
Next, we would try whether conditional ensemble could further boost the model performance.

In [212]:
# function that calculates conditional ensembling 
def cond_ensemble(file_list0, file_list1):
    
    #produce averaging ensembled model's prediction - 'averaged_prediction'
    df_list0 = []
    for f in file_list0:
        df = pd.read_csv(f)
        df = df.sort_values("_file")
        df_list0.append(df)
        
    prediction0 = pd.concat(df_list0).groupby('_file').mean()
    prediction0.reset_index(level=0, inplace=True)
    prediction0['pred_safety'] =  np.where(prediction0['0'] > 0.5, 0, 1)
    
    
    df_list1 = []
    for f in file_list1:
        df = pd.read_csv(f)
        df = df.sort_values("_file")
        df_list1.append(df)
        
    prediction1 = pd.concat(df_list1).groupby('_file').mean()
    prediction1.reset_index(level=0, inplace=True)
    prediction1['pred_safety'] =  np.where(prediction1['0'] > 0.5, 0, 1)
    
    
    pred = pd.merge(prediction0, prediction1, how='outer', on='_file')
    pred.columns = ['_file', 'pred0_0', 'pred0_1', 'pred0_pred_safety',  'pred1_0', 'pred1_1', 'pred1_pred_safety']

    return pred


In [213]:
# function that calculates each single model's confusion matrix and performance tables
def matrix_performance_cond_ensemble_model(test_df, data):
    
    #for a single model, no need to average
    prediction = data 
    
    #prepare y_true, y_pred
    df = pd.merge(test_df, prediction, how='left', on='_file', 
                   indicator=True)
    df[df['_merge'] == 'left_only']
    df = df[df['_merge'] == 'both']
    df['pred_safety'] = df['pred_safety'].astype(int)
    
    y_true = df['safety']
    y_pred = df['pred_safety']
    
    #confusion matrix
    tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
    matrix = pd.DataFrame([{'tn': tn, 'fp': fp, 'fn': fn, 'tp': tp}])
    matrix['tn_rate'] = matrix['tn'] / (matrix['tn'] + matrix['fp'])
    matrix['tp_rate_recall'] = matrix['tp'] / (matrix['fn'] + matrix['tp'])
    

    #matrix['fp_rate'] = matrix['fp'] / (matrix['tn'] + matrix['fp'])
    #matrix['fn_rate'] = matrix['fn'] / (matrix['fn'] + matrix['tp'])
    
    
   #performance
    matrix['accuracy'] = accuracy_score(y_true, y_pred) # accuracy: (tp + tn) / (p + n)
    matrix['f1_score'] = f1_score(y_true, y_pred) # f1: 2 tp / (2 tp + fp + fn)
    matrix['precision'] = precision_score(y_true, y_pred) # precision tp / (tp + fp)
    #recall = recall_score(y_true, y_pred) # recall: tp / (tp + fn)
    
    
    return matrix

### 2.1 Conditional ensemble 1
When predicting safety = 0, we use the prediction by 'cv3full'; else we use the prediction by 'cv0cv1cv2cv4own'.

In [214]:
file_list0 = ['/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv3.prediction.csv',
             '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.wholedata.hdf5.prediction.csv']


file_list1 = ['/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv0.prediction.csv',
             '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv1.prediction.csv',
             '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv2.prediction.csv',
             '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv4.prediction.csv',
             '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston_prediction_own_cnn.csv']

In [215]:
pred = cond_ensemble(file_list0, file_list1)
#pred.head()
pred['pred_safety'] = np.where(pred['pred0_pred_safety'] == 0, 0, pred['pred1_pred_safety'])
conf_matrix_cond1 = matrix_performance_cond_ensemble_model(test_df, pred)
conf_matrix_cond1['model'] = ['conditional ensemble1']

display(conf_matrix_cond1)

Unnamed: 0,fn,fp,tn,tp,tn_rate,tp_rate_recall,accuracy,f1_score,precision,model
0,650,430,1723,1172,0.800279,0.643249,0.728302,0.684579,0.731586,conditional ensemble1


### Conditional ensemble 2
When predicting safety = 1, we use the prediction by 'cv0cv1cv2cv4own'; else we use the prediction by 'cv3full'.

In [216]:
pred = cond_ensemble(file_list0, file_list1)
pred['pred_safety'] = np.where(pred['pred1_pred_safety'] == 1, 1, pred['pred0_pred_safety'])
conf_matrix_cond2 = matrix_performance_cond_ensemble_model(test_df, pred)
conf_matrix_cond2['model'] = ['conditional ensemble2']

display(conf_matrix_cond2)

Unnamed: 0,fn,fp,tn,tp,tn_rate,tp_rate_recall,accuracy,f1_score,precision,model
0,328,758,1395,1494,0.647933,0.819978,0.726792,0.733432,0.66341,conditional ensemble2


### Conclusion of conditional ensemble
Above results show that 'conditional ensemble 1' model is good at prediction target 0, at the expense of prediction 1.

On the contrary, 'conditional ensemble 2' model is good at prediction target 1, at the expense of prediction 0.

### 3. Weighted ensemble

### 3.1 Weighted ensemble 1

We give prediction by model 'cv3full' 0.4 weight; and prediction by model 'cv0cv1cv2cv4own' 0.6 weight.

In [217]:
pred = cond_ensemble(file_list0, file_list1)
pred['weight_pred_0'] = pred['pred0_0'] * 0.4 + pred['pred1_0'] * 0.6
pred['weight_pred_1'] = pred['pred0_1'] * 0.4 + pred['pred1_1'] * 0.6
pred['pred_safety'] =  np.where(pred['weight_pred_0'] > 0.5, 0, 1)
conf_matrix_w1 = matrix_performance_cond_ensemble_model(test_df, pred)
conf_matrix_w1['model'] = ['weighted ensemble1']

display(conf_matrix_w1)

Unnamed: 0,fn,fp,tn,tp,tn_rate,tp_rate_recall,accuracy,f1_score,precision,model
0,455,574,1579,1367,0.733395,0.750274,0.741132,0.726548,0.704276,weighted ensemble1


### 3.2 Weighted ensemble 2
We give prediction by model 'cv3full' 0.45 weight; and prediction by model 'cv0cv1cv2cv4own' 0.55 weight.

In [218]:
pred = cond_ensemble(file_list0, file_list1)
pred['weight_pred_0'] = pred['pred0_0'] * 0.45 + pred['pred1_0'] * 0.55
pred['weight_pred_1'] = pred['pred0_1'] * 0.45 + pred['pred1_1'] * 0.55
pred['pred_safety'] =  np.where(pred['weight_pred_0'] > 0.5, 0, 1)
conf_matrix_w2 = matrix_performance_cond_ensemble_model(test_df, pred)
conf_matrix_w2['model'] = ['weighted ensemble2']

display(conf_matrix_w2)

Unnamed: 0,fn,fp,tn,tp,tn_rate,tp_rate_recall,accuracy,f1_score,precision,model
0,460,570,1583,1362,0.735253,0.74753,0.740881,0.725626,0.704969,weighted ensemble2


### 3.3 Weighted ensemble 3
We give prediction by model 'cv3full' 0.55 weight; and prediction by model 'cv0cv1cv2cv4own' 0.45 weight.

In [219]:
pred = cond_ensemble(file_list0, file_list1)
pred['weight_pred_0'] = pred['pred0_0'] * 0.55 + pred['pred1_0'] * 0.45
pred['weight_pred_1'] = pred['pred0_1'] * 0.55 + pred['pred1_1'] * 0.45
pred['pred_safety'] =  np.where(pred['weight_pred_0'] > 0.5, 0, 1)
conf_matrix_w3 = matrix_performance_cond_ensemble_model(test_df, pred)
conf_matrix_w3['model'] = ['weighted ensemble3']

display(conf_matrix_w3)

Unnamed: 0,fn,fp,tn,tp,tn_rate,tp_rate_recall,accuracy,f1_score,precision,model
0,490,563,1590,1332,0.738504,0.731065,0.735094,0.716707,0.702902,weighted ensemble3


### All results

In [220]:
pd.concat([conf_matrix_table, 
          conf_matrix_all,  conf_matrix_cv0cv2cv4full,
          conf_matrix_cv3full,
          conf_matrix_cv0cv1cv2cv4own,
          conf_matrix_cond1, conf_matrix_cond2, conf_matrix_w1, conf_matrix_w2, conf_matrix_w3])

Unnamed: 0,fn,fp,tn,tp,tn_rate,tp_rate_recall,accuracy,f1_score,precision,model
0,485,704,1449,1337,0.673014,0.733809,0.700881,0.692208,0.655071,cv0
0,442,775,1378,1380,0.640037,0.757409,0.693836,0.69399,0.640371,cv1
0,537,649,1504,1285,0.69856,0.705269,0.701635,0.684239,0.664426,cv2
0,653,558,1595,1169,0.740827,0.641603,0.695346,0.658777,0.676896,cv3
0,505,684,1469,1317,0.682304,0.722832,0.700881,0.688988,0.658171,cv4
0,579,536,1617,1243,0.751045,0.682217,0.719497,0.690364,0.698707,full
0,552,741,1412,1270,0.655829,0.697036,0.674717,0.662666,0.631527,own
0,435,602,1551,1387,0.72039,0.761251,0.739119,0.727893,0.697335,all models
0,462,620,1533,1360,0.71203,0.746432,0.727799,0.715413,0.686869,cv0cv2cv4full
0,576,524,1629,1246,0.756619,0.683864,0.72327,0.693764,0.703955,cv3full


We consider "weighted ensemble2" as the best model, it has an accuracy of 74.1%, and good at predicting both safety = 0 and safety = 1.

In [223]:
# save final modle "weighted ensemble2"

pred = cond_ensemble(test_df, file_list0, file_list1)
pred['weight_pred_0'] = pred['pred0_0'] * 0.45 + pred['pred1_0'] * 0.55
pred['weight_pred_1'] = pred['pred0_1'] * 0.45 + pred['pred1_1'] * 0.55
pred['pred_safety'] =  np.where(pred['weight_pred_0'] > 0.5, 0, 1)
pred.head()

Unnamed: 0,_file,pred0_0,pred0_1,pred0_pred_safety,pred1_0,pred1_1,pred1_pred_safety,weight_pred_0,weight_pred_1,pred_safety
0,gsv_0.jpg,0.00124,0.998759,1,0.172197,0.827803,1,0.095266,0.904734,1
1,gsv_1.jpg,0.625578,0.374422,0,0.439558,0.560442,1,0.523267,0.476733,0
2,gsv_10.jpg,0.00035,0.99965,1,0.144033,0.855967,1,0.079376,0.920624,1
3,gsv_100.jpg,0.588373,0.411627,0,0.326131,0.673869,1,0.44414,0.55586,1
4,gsv_1000.jpg,0.032821,0.967179,1,0.186796,0.813204,1,0.117507,0.882493,1


In [233]:
final_pred_weighted_ensemble2 = pred[['_file', 'weight_pred_0', 'weight_pred_1', 'pred_safety']]
final_pred_weighted_ensemble2.columns = ['_file', '0', '1', 'pred_safety']

In [234]:
final_pred_weighted_ensemble2.head()

Unnamed: 0,_file,0,1,pred_safety
0,gsv_0.jpg,0.095266,0.904734,1
1,gsv_1.jpg,0.523267,0.476733,0
2,gsv_10.jpg,0.079376,0.920624,1
3,gsv_100.jpg,0.44414,0.55586,1
4,gsv_1000.jpg,0.117507,0.882493,1


In [235]:
final_pred_weighted_ensemble2.to_csv("/Users/zhanglingling/Desktop/ML1030/final_pred_boston_test_weighted_ensemble2.csv", index=False)

### 4. Subdistrict conditional ensemble

### 4.1 Subdistrict conditional ensemble 1

In [150]:
test_csv = "/Users/zhanglingling/Desktop/ML1030/us_safety/boston_test_fetched_with_target.csv"  
test_df = pd.read_csv(test_csv)
test_df = test_df.sort_values("_file")
test_df = test_df[['_file', 'safety', 'SUBDISTRIC']]

predict_dir = '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/'
file_list = list(glob.glob(predict_dir + "*.csv*"))
file_list.sort()
file_list

['/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv0.prediction.csv',
 '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv1.prediction.csv',
 '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv2.prediction.csv',
 '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv3.prediction.csv',
 '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.hdf5.cv4.prediction.csv',
 '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston.test_bestmodel.wholedata.hdf5.prediction.csv',
 '/Users/zhanglingling/Desktop/ML1030/boston_test_prediction/boston_prediction_own_cnn.csv']

In [151]:
df_list = []
for f in file_list:
    df = pd.read_csv(f)
    df = df.sort_values("_file")
    df_list.append(df)

In [152]:
def subdistrict_conf_matrix(test_df, df):
    
    df['pred_safety'] =  np.where(df['0'] > 0.5, 0, 1)
    
    
    df = pd.merge(test_df, df, how='left', on='_file', 
                   indicator=True)
    df[df['_merge'] == 'left_only']
    df = df[df['_merge'] == 'both']
    df['pred_safety'] = df['pred_safety'].astype(int)

    # subdistrict count for all test data
    subdistrict_count_all = df.groupby(['SUBDISTRIC']).count()['_file']
    count_all = pd.DataFrame(subdistrict_count_all)
    count_all.reset_index(level=0, inplace=True)

    #tp
    subdistrict_count = df.loc[np.where((df['safety'] == 1) & (df['pred_safety']== 1))].groupby(['SUBDISTRIC']).count()['_file']
    tp_subdistrict = pd.DataFrame(subdistrict_count)
    tp_subdistrict.reset_index(level=0, inplace=True)
    tp_subdistrict.columns = ['SUBDISTRIC', 'tp']


    #tn
    subdistrict_count = df.loc[np.where((df['safety'] == 0) & (df['pred_safety']== 0))].groupby(['SUBDISTRIC']).count()['_file']
    tn_subdistrict = pd.DataFrame(subdistrict_count)
    tn_subdistrict.reset_index(level=0, inplace=True)
    tn_subdistrict.columns = ['SUBDISTRIC', 'tn']


    #fn
    subdistrict_count = df.loc[np.where((df['safety'] == 1) & (df['pred_safety']== 0))].groupby(['SUBDISTRIC']).count()['_file']
    fn_subdistrict = pd.DataFrame(subdistrict_count)
    fn_subdistrict.reset_index(level=0, inplace=True)
    fn_subdistrict.columns = ['SUBDISTRIC', 'fn']


    #fp
    subdistrict_count = df.loc[np.where((df['safety'] == 0) & (df['pred_safety']== 1))].groupby(['SUBDISTRIC']).count()['_file']
    fp_subdistrict = pd.DataFrame(subdistrict_count)
    fp_subdistrict.reset_index(level=0, inplace=True)
    fp_subdistrict.columns = ['SUBDISTRIC', 'fp']

    subdistrict = pd.merge(tp_subdistrict, tn_subdistrict, how='outer', on='SUBDISTRIC')
    subdistrict =  pd.merge(subdistrict, fn_subdistrict, how='outer', on='SUBDISTRIC')
    subdistrict =  pd.merge(subdistrict, fp_subdistrict, how='outer', on='SUBDISTRIC')


    # subdistrict count for all test data
    subdistrict_count_all = df.groupby(['SUBDISTRIC']).count()['_file']
    count_all = pd.DataFrame(subdistrict_count_all)
    count_all.columns = ['count_all']
    count_all.reset_index(level=0, inplace=True)


    #fp_subdistrict['fp_rate'] = fp_subdistrict['fp_count']
    subdistrict = pd.merge(subdistrict, count_all, how='right', on='SUBDISTRIC')

    subdistrict['accuracy'] = (subdistrict['tp'] + subdistrict['tn'])/ subdistrict['count_all']
    
    subdistrict['tp_rate'] = subdistrict['tp'] / subdistrict['count_all']
    subdistrict['tn_rate'] = subdistrict['tn'] / subdistrict['count_all']
    #subdistrict['fn_rate'] = subdistrict['fn'] / subdistrict['count_all']
    #subdistrict['fp_rate'] = subdistrict['fp'] / subdistrict['count_all']


    return subdistrict 

In [153]:
subdistrict_list = []
for i in range(len(df_list)):
    subdistrict = subdistrict_conf_matrix(test_df, df_list[i])
    subdistrict_list.append(subdistrict)
subdistrict_list[0]['model'] = 'cv0'
subdistrict_list[1]['model'] = 'cv1'
subdistrict_list[2]['model'] = 'cv2'
subdistrict_list[3]['model'] = 'cv3'
subdistrict_list[4]['model'] = 'cv4'
subdistrict_list[5]['model'] = 'full'
subdistrict_list[6]['model'] = 'own'

In [154]:
pd.concat([subdistrict_list[0], subdistrict_list[1], subdistrict_list[2],
          subdistrict_list[3], subdistrict_list[4], subdistrict_list[5],
          subdistrict_list[6]]).sort_values(by = ['SUBDISTRIC', 'accuracy'])

Unnamed: 0,SUBDISTRIC,tp,tn,fn,fp,count_all,accuracy,tp_rate,tn_rate,model
0,Business,64,90,32,45,231,0.666667,0.277056,0.38961,own
0,Business,69,86,27,49,231,0.670996,0.298701,0.372294,cv1
0,Business,56,101,40,34,231,0.679654,0.242424,0.437229,cv3
0,Business,73,87,23,48,231,0.692641,0.316017,0.376623,cv4
0,Business,65,96,31,39,231,0.69697,0.281385,0.415584,cv2
0,Business,68,98,28,37,231,0.718615,0.294372,0.424242,cv0
0,Business,64,108,32,27,231,0.744589,0.277056,0.467532,full
1,Comm/Instit,30,26,10,20,86,0.651163,0.348837,0.302326,cv2
1,Comm/Instit,29,29,11,17,86,0.674419,0.337209,0.337209,cv1
1,Comm/Instit,30,31,10,15,86,0.709302,0.348837,0.360465,cv3


In [155]:
def cond_ensemble(test_df, file_list0, file_list1):
    
    #produce averaging ensembled model's prediction - 'averaged_prediction'
    df_list0 = []
    for f in file_list0:
        df = pd.read_csv(f)
        df = df.sort_values("_file")
        df_list0.append(df)
        
    prediction0 = pd.concat(df_list0).groupby('_file').mean()
    prediction0.reset_index(level=0, inplace=True)
    prediction0['pred_safety'] =  np.where(prediction0['0'] > 0.5, 0, 1)
    
    
    df_list1 = []
    for f in file_list1:
        df = pd.read_csv(f)
        df = df.sort_values("_file")
        df_list1.append(df)
        
    prediction1 = pd.concat(df_list1).groupby('_file').mean()
    prediction1.reset_index(level=0, inplace=True)
    prediction1['pred_safety'] =  np.where(prediction1['0'] > 0.5, 0, 1)
    
    
    pred = pd.merge(prediction0, prediction1, how='outer', on='_file')
    pred.columns = ['_file', 'pred0_0', 'pred0_1', 'pred0_pred_safety',  'pred1_0', 'pred1_1', 'pred1_pred_safety']

    return pred


In [156]:
df.head()

Unnamed: 0,_file,0,1,pred_safety
0,gsv_0.jpg,0.656174,0.343826,0
1,gsv_1.jpg,0.807845,0.192155,0
10,gsv_10.jpg,0.588457,0.411543,0
99,gsv_100.jpg,0.352693,0.647307,1
992,gsv_1000.jpg,0.023238,0.976762,1


In [157]:
df_cv0 = df_list[0][['_file', '0',  'pred_safety']]
df_cv0.columns = ['_file', 'cv0_0', 'cv0_pred_safety']

df_cv1 = df_list[1][['_file', '0',  'pred_safety']]
df_cv1.columns = ['_file', 'cv1_0', 'cv1_pred_safety']

df_cv2 = df_list[2][['_file', '0',  'pred_safety']]
df_cv2.columns = ['_file', 'cv2_0', 'cv2_pred_safety']

df_cv3 = df_list[3][['_file', '0',  'pred_safety']]
df_cv3.columns = ['_file', 'cv3_0',  'cv3_pred_safety']

df_cv4 = df_list[4][['_file', '0', 'pred_safety']]
df_cv4.columns = ['_file', 'cv4_0', 'cv4_pred_safety']

df_full = df_list[5][['_file', '0', 'pred_safety']]
df_full.columns = ['_file', 'full_0', 'full_pred_safety']

df_own = df_list[5][['_file', '0', 'pred_safety']]
df_own.columns = ['_file', 'own_0', 'full_own_safety']

In [158]:
df = pd.merge(df_cv0, df_cv1, how='outer', on='_file')
df = pd.merge(df, df_cv2, how='outer', on='_file')
df = pd.merge(df, df_cv3, how='outer', on='_file')
df = pd.merge(df, df_cv4, how='outer', on='_file')
df = pd.merge(df, df_full, how='outer', on='_file')
df = pd.merge(df, df_own, how='outer', on='_file')

In [159]:
df = pd.merge(test_df, df, how='right', on='_file')
df.head(10)

Unnamed: 0,_file,safety,SUBDISTRIC,cv0_0,cv0_pred_safety,cv1_0,cv1_pred_safety,cv2_0,cv2_pred_safety,cv3_0,cv3_pred_safety,cv4_0,cv4_pred_safety,full_0,full_pred_safety,own_0,full_own_safety
0,gsv_0.jpg,1,Industrial,0.172792,1,0.000897,1,0.00147,1,1e-05,1,0.029651,1,0.002470684,1,0.002470684,1
1,gsv_1.jpg,1,Mixed Use,0.001679,1,0.1416,1,0.489313,1,0.255481,1,0.757354,0,0.9956748,0,0.9956748,0
2,gsv_10.jpg,0,Open Space,0.12476,1,0.003538,1,0.002887,1,0.000699,1,0.000525,1,6.89e-07,1,6.89e-07,1
3,gsv_100.jpg,1,Residential,0.329768,1,0.838563,0,0.104049,1,0.564805,0,0.005583,1,0.6119415,0,0.6119415,0
4,gsv_1000.jpg,1,Residential,5e-06,1,0.001298,1,0.003349,1,0.000621,1,0.906091,0,0.06502168,1,0.06502168,1
5,gsv_1001.jpg,0,Residential,0.999497,0,0.998825,0,1.0,0,1.0,0,0.997662,0,0.9999574,0,0.9999574,0
6,gsv_1002.jpg,0,Residential,0.786304,0,0.101245,1,0.001166,1,0.135368,1,0.001162,1,0.113818,1,0.113818,1
7,gsv_1003.jpg,1,Residential,0.999889,0,0.917877,0,0.999985,0,0.995524,0,0.999337,0,0.7822251,0,0.7822251,0
8,gsv_1004.jpg,0,Industrial,0.986578,0,0.967329,0,0.999464,0,0.999999,0,0.999802,0,0.9992079,0,0.9992079,0
9,gsv_1005.jpg,0,Residential,1.0,0,0.969467,0,1.0,0,1.0,0,1.0,0,0.9994709,0,0.9994709,0


In [147]:
df.loc[df['SUBDISTRIC'] == "Business", "pred_safety"] = df["full_pred_safety"]
#df.loc[df['SUBDISTRIC'] == "Business", "pred_safety"] = np.where((df.loc[df['SUBDISTRIC'] == "Business"]["full_0"] + df.loc[df['SUBDISTRIC'] == "Business"]["cv4_0"]) / 2 > 0.5, 0, 1)
df.loc[df['SUBDISTRIC'] == "Comm/Instit", "pred_safety"] = df["cv4_pred_safety"]

df.loc[df['SUBDISTRIC'] == "Industrial", "pred_safety"] = df["cv4_pred_safety"]
df.loc[df['SUBDISTRIC'] == "Miscellaneous", "pred_safety"] = df["cv1_pred_safety"]

df.loc[df['SUBDISTRIC'] == "Mixed Use", "pred_safety"] = df["cv4_pred_safety"]
df.loc[df['SUBDISTRIC'] == "Open Space", "pred_safety"] = df["cv1_pred_safety"]

df.loc[df['SUBDISTRIC'] == "Residential", "pred_safety"] = df["full_pred_safety"]
#df.loc[df['SUBDISTRIC'] == "Residential", "pred_safety"] = np.where((df.loc[df['SUBDISTRIC'] == "Residential"]["full_0"] + df.loc[df['SUBDISTRIC'] == "Residential"]["cv1_0"]) / 2 > 0.5, 0, 1)
df["pred_safety"] = df["pred_safety"].astype(int)

In [148]:
def matrix_performance_cond_ensemble_model2(df):
    
    y_true = df['safety']
    y_pred = df['pred_safety']
    
    #confusion matrix
    tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
    matrix = pd.DataFrame([{'tn': tn, 'fp': fp, 'fn': fn, 'tp': tp}])
    matrix['tn_rate'] = matrix['tn'] / (matrix['tn'] + matrix['fp'])
    matrix['tp_rate'] = matrix['tp'] / (matrix['fn'] + matrix['tp'])
    

    matrix['fp_rate'] = matrix['fp'] / (matrix['tn'] + matrix['fp'])
    matrix['fn_rate'] = matrix['fn'] / (matrix['fn'] + matrix['tp'])
    
    
    #performance
    accuracy = accuracy_score(y_true, y_pred) # accuracy: (tp + tn) / (p + n)
    precision = precision_score(y_true, y_pred) # precision tp / (tp + fp)
    recall = recall_score(y_true, y_pred) # recall: tp / (tp + fn)
    f1 = f1_score(y_true, y_pred) # f1: 2 tp / (2 tp + fp + fn)
    performance = pd.DataFrame([{'Accuracy': accuracy, 'Precision': precision, 'Recall': recall, 'F1 score': f1}])
    
    return matrix, performance

In [149]:
conf_matrix_subdistrict_c1, performance_subdistrict_c1 = matrix_performance_cond_ensemble_model2(df)
display(conf_matrix_subdistrict_c1, performance_subdistrict_c1)

Unnamed: 0,fn,fp,tn,tp,tn_rate,tp_rate,fp_rate,fn_rate
0,527,614,1539,1295,0.714817,0.710757,0.285183,0.289243


Unnamed: 0,Accuracy,F1 score,Precision,Recall
0,0.712956,0.694184,0.678366,0.710757
