# Objective

### Disclaimer: This notebook is not intended to give a high LB or show an awesome blend but rather to show how the blending can be done.

Here, I will blend multiple public submissions with 3 personal submissions (which you can replace with yours) to improve the LB score. Score ranges of my submissions were from 0.87 and 0.955. If you have better LB scores to begin with, you can definitely get better results than mine. Note that this is the final step in your classification, develope your model first and then polish results using this technique.

Why this approach works? Imagine one of your models rightly thinks that an image is a hit and gives it a very high score (e.g. 0.89) while other models fail to identify it as a melanoma and score it low (e.g. 0.3). Then, averaging scores can lead (depending on the weights used) to a score above the threshold and correctly classifying it. However, this is a very hand-waving argument!

In [None]:
!pip install xgboost

import numpy as np
import pandas as pd

from sklearn.datasets import load_iris
import xgboost as xgb
from sklearn.metrics import accuracy_score

In [None]:
train = pd.read_csv('../input/siim-isic-melanoma-classification/train.csv')
test = pd.read_csv('../input/siim-isic-melanoma-classification/test.csv')
train.head()

train.target.value_counts()

In [None]:
train['sex'] = train['sex'].fillna('na')
train['anatom_site_general_challenge'] = train['anatom_site_general_challenge'].fillna('na')
train['age_approx'] = train['age_approx'].fillna(0)

test['sex'] = test['sex'].fillna('na')
test['anatom_site_general_challenge'] = test['anatom_site_general_challenge'].fillna('na')
test['age_approx'] = test['age_approx'].fillna(0)

train['sex'] = train['sex'].astype("category").cat.codes +1
train['anatom_site_general_challenge'] = train['anatom_site_general_challenge'].astype("category").cat.codes +1

test['sex'] = test['sex'].astype("category").cat.codes +1
test['anatom_site_general_challenge'] = test['anatom_site_general_challenge'].astype("category").cat.codes +1

train.head()

In [None]:
test.head()

In [None]:
x_train = train[['sex', 'age_approx','anatom_site_general_challenge']]
y_train = train['target']

x_test = test[['sex', 'age_approx','anatom_site_general_challenge']]

train_DMatrix = xgb.DMatrix(x_train, label= y_train)
test_DMatrix = xgb.DMatrix(x_test)

In [None]:
clf = xgb.XGBClassifier(n_estimators=3000, 
                        max_depth=18, 
                        learning_rate=0.15, 
                        num_class = 2, 
                        objective='multi:softprob',
                        seed=0,  
                        nthread=-1, 
                        scale_pos_weight = (32542./584.))

clf.fit(x_train, y_train)

In [None]:
clf.predict_proba(x_test)[:,1]

sub_xgb = pd.read_csv('../input/siim-isic-melanoma-classification/sample_submission.csv')
sub_xgb.target = clf.predict_proba(x_test)[:,1]

In [None]:
sub_new = pd.read_csv('../input/siimisicmysubmissions/sub-new.csv')
sub_mean = pd.read_csv('../input/siimisicmysubmissions/sub-mean.csv')

submission = pd.read_csv('../input/siim-isic-melanoma-classification/sample_submission.csv')
submission.target = sub_mean.target *0.71 + sub_new.target *0.15 + sub_xgb.target *0.14
submission.head()

In [None]:
submission.to_csv('submission.csv', index = False)

### Just blend (or rank then blend, see https://www.kaggle.com/ragnar123/rank-then-blend) your submissions and enjoy!