# Foreword & Remarks

- This work was adapted from the 1st place solution of the Jigsaw 2020 competition: https://www.kaggle.com/rafiko1/1st-place-jigsaw-post-processing-example
- First place post-processing details can be found at: https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification/discussion/160862. Main idea was to track the delta of predictions for each sample using successful submissions, then averaging them and 'pushing' the predictions in the same direction.
- I only configured it a little bit so that it can be used in this competition
- This notebook uses **only public notebook submission files**!! For me, using my ensemble as the "best sub" with minor tweaks to give a small boost in LB
- Since we are allowed to make 3 final submissions, if you are willing to risk it, feel free to have a submission that uses this pipeline

# Update Log

- **Version 6**: As per Chris' feedback, submisison dataframes are now sorted before their predictions are concatenated to the test dataframe. Also changed a public sub to another one as the submission could no longer be extracted. 

# Imports

In [2]:
# General imports
import numpy as np
import pandas as pd
import os
from matplotlib import pyplot as plt

In [1]:
WEIGHT = 1 # best to keep between 1 and 2 from the orignal authors

In [10]:
submission = pd.read_csv('../input/jpeg-melanoma-256x256/sample_submission.csv')
test = pd.read_csv("../input/jpeg-melanoma-256x256/test.csv")
# sub_best = pd.read_csv('../input/eda-modelling-of-the-external-data-inc-ensemble/external_meta_ensembled.csv')
test

Unnamed: 0,image_name,patient_id,sex,age_approx,anatom_site_general_challenge,width,height
0,ISIC_0052060,IP_3579794,male,70.0,,6000,4000
1,ISIC_0052349,IP_7782715,male,40.0,lower extremity,6000,4000
2,ISIC_0058510,IP_7960270,female,55.0,torso,6000,4000
3,ISIC_0073313,IP_6375035,female,50.0,torso,6000,4000
4,ISIC_0073502,IP_0589375,female,45.0,lower extremity,1920,1080
...,...,...,...,...,...,...,...
10977,ISIC_9992485,IP_4152479,male,40.0,torso,640,480
10978,ISIC_9996992,IP_4890115,male,35.0,torso,2592,1936
10979,ISIC_9997917,IP_2852390,male,25.0,upper extremity,640,480
10980,ISIC_9998234,IP_8861963,male,65.0,lower extremity,6000,4000


In [11]:
path = "../exp/256-0/submission_No0_256.csv"
sub = pd.read_csv(path)
sub

Unnamed: 0,image_name,target
0,ISIC_0052060,0.000205
1,ISIC_0052349,0.000021
2,ISIC_0058510,0.000011
3,ISIC_0073313,0.000017
4,ISIC_0073502,0.002016
...,...,...
10977,ISIC_9992485,0.000649
10978,ISIC_9996992,0.006972
10979,ISIC_9997917,0.025113
10980,ISIC_9998234,0.000075


In [12]:
submission

Unnamed: 0,image_name,target
0,ISIC_0052060,0
1,ISIC_0052349,0
2,ISIC_0058510,0
3,ISIC_0073313,0
4,ISIC_0073502,0
...,...,...
10977,ISIC_9992485,0
10978,ISIC_9996992,0
10979,ISIC_9997917,0
10980,ISIC_9998234,0


In [None]:
files_sub = [
    '../input/minmax-ensemble-0-9526-lb/submission.csv',
    '../input/new-basline-np-log2-ensemble-top-10/submission.csv',
    '../input/stacking-ensemble-on-my-submissions/submission_mean.csv',
    '../input/analysis-of-melanoma-metadata-and-effnet-ensemble/ensembled.csv',
    '../input/eda-modelling-of-the-external-data-inc-ensemble/external_meta_ensembled.csv',
    '../input/submission-exploration/submission.csv',
    '../input/rc-fork-siim-isic-melanoma-384x384/sub_EfficientNetB2_384.csv',
    '../input/train-cv/submission.csv',
    '../input/triple-stratified-kfold-with-tfrecords/submission.csv',
    '../input/rank-then-blend/blend_sub.csv',
    '../input/siim-isic-melanoma-classification-ensemble/submission.csv'
]
files_sub = sorted(files_sub)
print(len(files_sub))
files_sub

In [None]:
for file in files_sub:
    test[file.replace(".csv", "")] = pd.read_csv(file).sort_values('image_name')["target"]
test['id'] = test.index

In [None]:
test.head()

In [None]:
test.columns

In [None]:
# Derive the given sub increases or decreases in score
test["diff_good1"] =  test['../input/rank-then-blend/blend_sub'] - test['../input/triple-stratified-kfold-with-tfrecords/submission']
test["diff_good1"] =  test['../input/train-cv/submission'] - test['../input/siim-isic-melanoma-classification-ensemble/submission']
test["diff_good2"] = test['../input/rc-fork-siim-isic-melanoma-384x384/sub_EfficientNetB2_384'] - test['../input/submission-exploration/submission']
test["diff_good3"] = test['../input/analysis-of-melanoma-metadata-and-effnet-ensemble/ensembled'] - test['../input/new-basline-np-log2-ensemble-top-10/submission']

test["diff_bad1"] = test['../input/stacking-ensemble-on-my-submissions/submission_mean'] - test['../input/minmax-ensemble-0-9526-lb/submission']

In [None]:
test["sub_best"] = test['../input/eda-modelling-of-the-external-data-inc-ensemble/external_meta_ensembled']
col_comment = ["id", "image_name", "patient_id", "sub_best"]
col_diff = [column for column in test.columns if "diff" in column]
test_diff = test[col_comment + col_diff].reset_index(drop=True)

test_diff["diff_avg"] = test_diff[col_diff].mean(axis=1) # the mean trend

In [None]:
# Apply the post-processing technique in one line (as explained in the pseudo-code of my post.
test_diff["sub_new"] = test_diff.apply(lambda x: (1+WEIGHT*x["diff_avg"])*x["sub_best"] if x["diff_avg"]<0 else (1-WEIGHT*x["diff_avg"])*x["sub_best"] + WEIGHT*x["diff_avg"] , axis=1)

In [None]:
submission["target"] = sub_best["target"]
submission.head()

In [None]:
test_diff.head()

In [None]:
submission.loc[test["id"], "target"] = test_diff["sub_new"].values

In [None]:
submission.to_csv("submission.csv", index=False)
submission.head()

In [None]:
plt.hist(submission.target,bins=100)
plt.show()