## Problem Statement

* Congratulations – you have been hired as Chief Data Scientist of MedCamp – a not for profit organization **dedicated in      making health conditions for working professionals better**. MedCamp was started because the founders saw their family      suffer due to bad work life balance and neglected health.

* MedCamp organizes health camps in several cities with low work life balance. They reach out to working people and ask       them to register for these health camps. For those who attend, **MedCamp provides them facility to undergo health checks   or increase awareness by visiting various stalls**(depending on the format of camp). 

* MedCamp has conducted 65 such events over a period of 4 years and they see a **high drop off between “Registration” and     Number of people taking tests at the Camps**. In last 4 years, they have stored data of ~110,000 registrations they have   done.

* One of the huge costs in arranging these camps is the amount of inventory you need to carry. If you carry more than         required inventory, you incur unnecessarily high costs. On the other hand, if you carry less than required inventory for   conducting these medical checks, people end up having bad experience.

**The Process**:

* MedCamp employees / volunteers reach out to people and drive registrations.
  During the camp, People who “ShowUp” either undergo the medical tests or visit stalls depending on the format of health     camp.

**Other things to note**:

* Since this is a completely voluntary activity for the working professionals, MedCamp usually has little profile             information about these people.

* For a few camps, there was hardware failure, so some information about date and time of registration is lost.
  MedCamp runs 3 formats of these camps. The first and second format provides people with an instantaneous health score.     The third format provides information about several health issues through various awareness stalls.

**Favorable outcome**:

* **For the first 2 formats, a favourable outcome is defined as getting a health_score, while in the third format it is defined as visiting at least a stall**.

* **You need to predict the chances (probability) of having a favourable outcome**.

## Import Libraries

In [3]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

## Load dataset

In [6]:
train = pd.read_csv(r"D:\case_studies(eda)\Janatahack_healthcare_analysis\Train.csv")
display(train.head())
train.shape

Unnamed: 0,Patient_ID,Health_Camp_ID,Registration_Date,Var1,Var2,Var3,Var4,Var5
0,489652,6578,10-Sep-05,4,0,0,0,2
1,507246,6578,18-Aug-05,45,5,0,0,7
2,523729,6534,29-Apr-06,0,0,0,0,0
3,524931,6535,07-Feb-04,0,0,0,0,0
4,521364,6529,28-Feb-06,15,1,0,0,7


(75278, 8)

In [5]:
test = pd.read_csv(r"D:\case_studies(eda)\Janatahack_healthcare_analysis\test_l0Auv8Q.csv")
display(test.head())
test.shape

Unnamed: 0,Patient_ID,Health_Camp_ID,Registration_Date,Var1,Var2,Var3,Var4,Var5
0,505701,6548,21-May-06,1,0,0,0,2
1,500633,6584,02-Jun-06,0,0,0,0,0
2,506945,6582,10-Aug-06,0,0,0,0,0
3,497447,6551,27-Aug-06,0,0,0,0,0
4,496446,6533,19-Sep-06,0,0,0,0,0


(35249, 8)

In [4]:
submission = pd.read_csv(r"D:\case_studies(eda)\Janatahack_healthcare_analysis\sample_submmission.csv")
display(submission.head(2))
submission.shape

Unnamed: 0,Patient_ID,Health_Camp_ID,Outcome
0,505701,6548,0.5
1,500633,6584,0.5


(35249, 3)

## Importing the other files related to data

In [20]:
fhc = pd.read_csv(r"D:\case_studies(eda)\Janatahack_healthcare_analysis\First_Health_Camp_Attended.csv")
print('fhc(first_health_care):',fhc.shape)
shc = pd.read_csv(r"D:\case_studies(eda)\Janatahack_healthcare_analysis\Second_Health_Camp_Attended.csv")
print('shc(second_health_care):',shc.shape)
thc = pd.read_csv(r"D:\case_studies(eda)\Janatahack_healthcare_analysis\Third_Health_Camp_Attended.csv")
print('thc(third_health_care):',thc.shape)
hc = pd.read_csv(r"D:\case_studies(eda)\Janatahack_healthcare_analysis\Health_Camp_Detail.csv")
print('hc(health_camps):',hc.shape)
pp = pd.read_csv(r"D:\case_studies(eda)\Janatahack_healthcare_analysis\Patient_Profile.csv")
print('pp(patient_profile):',pp.shape)

fhc(first_health_care): (6218, 5)
shc(second_health_care): (7819, 3)
thc(third_health_care): (6515, 4)
hc(health_camps): (65, 6)
pp(patient_profile): (37633, 11)


In [11]:
fhc.columns

Index(['Patient_ID', 'Health_Camp_ID', 'Donation', 'Health_Score',
       'Unnamed: 4'],
      dtype='object')

In [12]:
shc.columns

Index(['Patient_ID', 'Health_Camp_ID', 'Health Score'], dtype='object')

In [13]:
thc.columns

Index(['Patient_ID', 'Health_Camp_ID', 'Number_of_stall_visited',
       'Last_Stall_Visited_Number'],
      dtype='object')

In [14]:
hc.columns

Index(['Health_Camp_ID', 'Camp_Start_Date', 'Camp_End_Date', 'Category1',
       'Category2', 'Category3'],
      dtype='object')

In [15]:
pp.columns

Index(['Patient_ID', 'Online_Follower', 'LinkedIn_Shared', 'Twitter_Shared',
       'Facebook_Shared', 'Income', 'Education_Score', 'Age',
       'First_Interaction', 'City_Type', 'Employer_Category'],
      dtype='object')

## Inferences
* We can say that patient_Id is given in each data

## Let's combine train and test

In [16]:
combined = pd.concat([train, test], ignore_index= True)

In [17]:
combined.shape, train.shape, test.shape

((110527, 8), (75278, 8), (35249, 8))

## Let's join patient profile to the combined dataset

In [19]:
combined = pd.merge(combined, pp, on = ['Patient_ID'], how = 'left')

## Combine the health camps with the dataset

In [21]:
combined = pd.merge(combined, fhc, on = ['Patient_ID', 'Health_Camp_ID'], how = 'left')

In [22]:
combined = pd.merge(combined, shc, on = ['Patient_ID', 'Health_Camp_ID'], how = 'left')

In [23]:
combined = pd.merge(combined, thc, on = ['Patient_ID', 'Health_Camp_ID'], how = 'left')

## combine Health_care details with dataset

In [25]:
combined = pd.merge(combined, hc, on = ['Health_Camp_ID'], how = 'left')

In [26]:
combined.shape, fhc.shape, shc.shape, thc.shape, hc.shape

((110527, 29), (6218, 5), (7819, 3), (6515, 4), (65, 6))

# Let's preview the data

In [27]:
combined.head()

Unnamed: 0,Patient_ID,Health_Camp_ID,Registration_Date,Var1,Var2,Var3,Var4,Var5,Online_Follower,LinkedIn_Shared,...,Health_Score,Unnamed: 4,Health Score,Number_of_stall_visited,Last_Stall_Visited_Number,Camp_Start_Date,Camp_End_Date,Category1,Category2,Category3
0,489652,6578,10-Sep-05,4,0,0,0,2,0,0,...,,,,2.0,1.0,16-Aug-05,14-Oct-05,Third,G,2
1,507246,6578,18-Aug-05,45,5,0,0,7,0,0,...,,,,,,16-Aug-05,14-Oct-05,Third,G,2
2,523729,6534,29-Apr-06,0,0,0,0,0,0,0,...,,,0.402054,,,17-Oct-05,07-Nov-07,Second,A,2
3,524931,6535,07-Feb-04,0,0,0,0,0,0,0,...,,,,,,01-Feb-04,18-Feb-04,First,E,2
4,521364,6529,28-Feb-06,15,1,0,0,7,0,0,...,,,0.845597,,,30-Mar-06,03-Apr-06,Second,A,2


# Feature Engineering

### Combining all oniline shared platforms and making a column 'Social_Media'

In [28]:
combined['Social_Media'] = combined.Online_Follower+\
combined.LinkedIn_Shared+\
combined.Twitter_Shared+ combined.Facebook_Shared

In [29]:
combined.Social_Media.nunique()

5

### Registration date
* We are changing the Registration date column because it is of datatype object

In [30]:
combined['Registration_Date'] = pd.to_datetime(combined.Registration_Date, dayfirst = True)
combined['First_Interaction'] = pd.to_datetime(combined.First_Interaction, dayfirst = True)
combined['Camp_Start_Date'] = pd.to_datetime(combined.Camp_Start_Date, dayfirst = True)
combined['Camp_End_Date'] = pd.to_datetime(combined.Camp_End_Date, dayfirst = True)

### Diff_Interaction_Days

In [31]:
combined['Iteraction_Days'] = combined['Registration_Date']-\
combined['First_Interaction']

### Remove days from the new variable created above

In [32]:
combined['Iteraction_Days'] = combined.Iteraction_Days.dt.days

### Camp_duration

In [33]:
combined['Camp_Duration'] = (combined['Camp_End_Date'] - combined['Camp_Start_Date']).dt.days

### Camp_start_date - Registration_date

In [35]:
combined['magic1'] = np.abs((combined['Camp_Start_Date'] - combined['Registration_Date']).dt.days)

### Camp_End_Date - Registration_Date

In [36]:
combined['magic2'] = np.abs((combined['Camp_End_Date'] - combined['Registration_Date']).dt.days)

### Patient_duration

In [37]:
combined['Patient_Duration'] = np.abs((combined['Camp_End_Date'] - combined['First_Interaction']).dt.days)

## Process of the health camp according to our learning is as follows :-

### * First_Iteraction >> Registration >> Camp_is_organised >> Person comes >> Health score else Stall visit

#### * First Iteraction Date >> Registration Date >> Camp Date
#### * Registration between camp(late_comers or say dates_seq) =  Camp end date >> camp start date >> registration date

In [41]:
def dates_between(start, reg, end):
    if(end>start>reg):
        return 1
    else:
        return 0
    

### Let's apply dates_between function on dataset and make a new function

In [43]:
combined['dates_seq'] = combined.apply(lambda x: dates_between(x['Camp_Start_Date'], x['Registration_Date'], x['Camp_End_Date']), axis = 1)

###  Creating registration day, month and year column 

In [50]:
combined['Registration_Days'] = combined.Registration_Date.dt.day
combined['Registration_Month'] = combined.Registration_Date.dt.month
combined['Registration_Year'] = combined.Registration_Date.dt.year

### camp start date and end date

In [52]:
combined['Camp_Start_Year'] = combined.Camp_Start_Date.dt.year
combined['Camp_End_Year'] = combined.Camp_End_Date.dt.year
combined['First_Int_Year'] = combined.First_Interaction.dt.year

In [53]:
combined.head()

Unnamed: 0,Patient_ID,Health_Camp_ID,Registration_Date,Var1,Var2,Var3,Var4,Var5,Online_Follower,LinkedIn_Shared,...,magic1,magic2,Patient_Duration,dates_seq,Registration_Days,Registration_Month,Registration_Year,Camp_Start_Year,Camp_End_Year,First_Int_Year
0,489652,6578,2005-09-10,4,0,0,0,2,0,0,...,25.0,34.0,312,0,10.0,9.0,2005.0,2005,2005,2004
1,507246,6578,2005-08-18,45,5,0,0,7,0,0,...,2.0,57.0,401,0,18.0,8.0,2005.0,2005,2005,2004
2,523729,6534,2006-04-29,0,0,0,0,0,0,0,...,194.0,557.0,1233,0,29.0,4.0,2006.0,2005,2007,2004
3,524931,6535,2004-02-07,0,0,0,0,0,0,0,...,6.0,11.0,11,0,7.0,2.0,2004.0,2004,2004,2004
4,521364,6529,2006-02-28,15,1,0,0,7,0,0,...,30.0,34.0,1004,1,28.0,2.0,2006.0,2006,2006,2003


### No. of patients per day

In [57]:
combined['Patients_Per_Day'] = combined.groupby('Registration_Days')['Patient_ID'].transform('nunique')

### No. of patients per month

In [58]:
combined['Patients_Per_Month'] = combined.groupby('Registration_Month')['Patient_ID'].transform('nunique')

### No. of patients per year

In [59]:
combined['Patients_Per_Year'] = combined.groupby('Registration_Year')['Patient_ID'].transform('nunique')

### Frequency of patient ID (How many times same patient has visited the camp)

In [61]:
combined['Patient_Frequency_Per_Day'] = combined.groupby('Patient_ID')['Registration_Days'].transform('nunique')
combined['Patient_Frequency_Per_Month'] = combined.groupby('Patient_ID')['Registration_Month'].transform('nunique')
combined['Patient_Frequency_Per_Year'] = combined.groupby('Patient_ID')['Registration_Year'].transform('nunique')

### No. of patient in health camps

In [63]:
combined['Patient_Freq_HC'] = combined.groupby('Health_Camp_ID')['Patient_ID'].transform('nunique')

### No. of health camps

In [88]:
combined['Health_Camps_Year'] = combined.groupby('Camp_End_Year')['Health_Camp_ID'].transform('nunique')

In [91]:
pd.set_option('display.max_columns', 52)
display(combined.head())
combined.shape

Unnamed: 0,Patient_ID,Health_Camp_ID,Registration_Date,Var1,Var2,Var3,Var4,Var5,Online_Follower,LinkedIn_Shared,Twitter_Shared,Facebook_Shared,Income,Education_Score,Age,First_Interaction,City_Type,Employer_Category,Donation,Health_Score,Unnamed: 4,Health Score,Number_of_stall_visited,Last_Stall_Visited_Number,Camp_Start_Date,Camp_End_Date,Category1,Category2,Category3,Social_Media,Iteraction_Days,Camp_Duration,magic1,magic2,Patient_Duration,dates_seq,Registration_Days,Registration_Month,Registration_Year,Camp_Start_Year,Camp_End_Year,First_Int_Year,Patients_Per_Day,Patients_Per_Month,Patients_Per_Year,Patient_Frequency_Per_Day,Patient_Frequency_Per_Month,Patient_Frequency_Per_Year,Patient_Freq_HC,Health_Camps_Year
0,489652,6578,2005-09-10,4,0,0,0,2,0,0,0,0,,,,2004-12-06,,,,,,,2.0,1.0,2005-08-16,2005-10-14,Third,G,2,0,278.0,59,25.0,34.0,312,0,10.0,9.0,2005.0,2005,2005,2004,2649.0,6585.0,15710.0,9,7,3,2837,19
1,507246,6578,2005-08-18,45,5,0,0,7,0,0,0,0,1.0,75.0,40.0,2004-09-08,C,Others,,,,,,,2005-08-16,2005-10-14,Third,G,2,0,344.0,59,2.0,57.0,401,0,18.0,8.0,2005.0,2005,2005,2004,3620.0,5804.0,15710.0,16,12,4,2837,19
2,523729,6534,2006-04-29,0,0,0,0,0,0,0,0,0,,,,2004-06-22,,,,,,0.402054,,,2005-10-17,2007-11-07,Second,A,2,0,676.0,751,194.0,557.0,1233,0,29.0,4.0,2006.0,2005,2007,2004,2518.0,4785.0,19318.0,5,4,2,3597,9
3,524931,6535,2004-02-07,0,0,0,0,0,0,0,0,0,,,,2004-02-07,I,,,,,,,,2004-02-01,2004-02-18,First,E,2,0,0.0,17,6.0,11.0,11,0,7.0,2.0,2004.0,2004,2004,2004,2363.0,5029.0,9646.0,4,4,3,1882,14
4,521364,6529,2006-02-28,15,1,0,0,7,0,0,0,1,1.0,70.0,40.0,2003-07-04,I,Technology,,,,0.845597,,,2006-03-30,2006-04-03,Second,A,2,1,970.0,4,30.0,34.0,1004,1,28.0,2.0,2006.0,2006,2006,2003,3012.0,5029.0,19318.0,17,9,4,3823,18


(110527, 50)

## Deciding our Target variable :-
### If the health_score >0 or health score>0 or Num_Stall visited >0 ....>can be our target variable
#### Retrun 1 else 0 

In [93]:
def tgt(hs, hs_, stall_visit, stall_no):
    if((hs>0) or (hs_>0) or (stall_visit>0) or (stall_no>0)):
        return(1)
    else:
        return(0)

In [95]:
combined.columns

Index(['Patient_ID', 'Health_Camp_ID', 'Registration_Date', 'Var1', 'Var2',
       'Var3', 'Var4', 'Var5', 'Online_Follower', 'LinkedIn_Shared',
       'Twitter_Shared', 'Facebook_Shared', 'Income', 'Education_Score', 'Age',
       'First_Interaction', 'City_Type', 'Employer_Category', 'Donation',
       'Health_Score', 'Unnamed: 4', 'Health Score', 'Number_of_stall_visited',
       'Last_Stall_Visited_Number', 'Camp_Start_Date', 'Camp_End_Date',
       'Category1', 'Category2', 'Category3', 'Social_Media',
       'Iteraction_Days', 'Camp_Duration', 'magic1', 'magic2',
       'Patient_Duration', 'dates_seq', 'Registration_Days',
       'Registration_Month', 'Registration_Year', 'Camp_Start_Year',
       'Camp_End_Year', 'First_Int_Year', 'Patients_Per_Day',
       'Patients_Per_Month', 'Patients_Per_Year', 'Patient_Frequency_Per_Day',
       'Patient_Frequency_Per_Month', 'Patient_Frequency_Per_Year',
       'Patient_Freq_HC', 'Health_Camps_Year'],
      dtype='object')

In [98]:
combined['Target'] = combined.apply(lambda x : tgt(x['Health_Score'], x['Health Score'], x['Number_of_stall_visited'], 
                              x['Last_Stall_Visited_Number']) , axis = 1)

#### checking the count of 'Target'

In [99]:
combined.Target.value_counts()

0    89993
1    20534
Name: Target, dtype: int64

### Dropping unnecessary columns

In [100]:
combined.columns

Index(['Patient_ID', 'Health_Camp_ID', 'Registration_Date', 'Var1', 'Var2',
       'Var3', 'Var4', 'Var5', 'Online_Follower', 'LinkedIn_Shared',
       'Twitter_Shared', 'Facebook_Shared', 'Income', 'Education_Score', 'Age',
       'First_Interaction', 'City_Type', 'Employer_Category', 'Donation',
       'Health_Score', 'Unnamed: 4', 'Health Score', 'Number_of_stall_visited',
       'Last_Stall_Visited_Number', 'Camp_Start_Date', 'Camp_End_Date',
       'Category1', 'Category2', 'Category3', 'Social_Media',
       'Iteraction_Days', 'Camp_Duration', 'magic1', 'magic2',
       'Patient_Duration', 'dates_seq', 'Registration_Days',
       'Registration_Month', 'Registration_Year', 'Camp_Start_Year',
       'Camp_End_Year', 'First_Int_Year', 'Patients_Per_Day',
       'Patients_Per_Month', 'Patients_Per_Year', 'Patient_Frequency_Per_Day',
       'Patient_Frequency_Per_Month', 'Patient_Frequency_Per_Year',
       'Patient_Freq_HC', 'Health_Camps_Year', 'Target'],
      dtype='object')

In [101]:
newdata = combined.drop(['Patient_ID', 'Health_Camp_ID', 'Registration_Date', 'Online_Follower', 'LinkedIn_Shared',
                          'Twitter_Shared', 'Facebook_Shared', 'First_Interaction','Unnamed: 4',
                           'Camp_Start_Date', 'Camp_End_Date'], axis = 1)

In [103]:
newdata.head()

Unnamed: 0,Var1,Var2,Var3,Var4,Var5,Income,Education_Score,Age,City_Type,Employer_Category,Donation,Health_Score,Health Score,Number_of_stall_visited,Last_Stall_Visited_Number,Category1,Category2,Category3,Social_Media,Iteraction_Days,Camp_Duration,magic1,magic2,Patient_Duration,dates_seq,Registration_Days,Registration_Month,Registration_Year,Camp_Start_Year,Camp_End_Year,First_Int_Year,Patients_Per_Day,Patients_Per_Month,Patients_Per_Year,Patient_Frequency_Per_Day,Patient_Frequency_Per_Month,Patient_Frequency_Per_Year,Patient_Freq_HC,Health_Camps_Year,Target
0,4,0,0,0,2,,,,,,,,,2.0,1.0,Third,G,2,0,278.0,59,25.0,34.0,312,0,10.0,9.0,2005.0,2005,2005,2004,2649.0,6585.0,15710.0,9,7,3,2837,19,1
1,45,5,0,0,7,1.0,75.0,40.0,C,Others,,,,,,Third,G,2,0,344.0,59,2.0,57.0,401,0,18.0,8.0,2005.0,2005,2005,2004,3620.0,5804.0,15710.0,16,12,4,2837,19,0
2,0,0,0,0,0,,,,,,,,0.402054,,,Second,A,2,0,676.0,751,194.0,557.0,1233,0,29.0,4.0,2006.0,2005,2007,2004,2518.0,4785.0,19318.0,5,4,2,3597,9,1
3,0,0,0,0,0,,,,I,,,,,,,First,E,2,0,0.0,17,6.0,11.0,11,0,7.0,2.0,2004.0,2004,2004,2004,2363.0,5029.0,9646.0,4,4,3,1882,14,0
4,15,1,0,0,7,1.0,70.0,40.0,I,Technology,,,0.845597,,,Second,A,2,1,970.0,4,30.0,34.0,1004,1,28.0,2.0,2006.0,2006,2006,2003,3012.0,5029.0,19318.0,17,9,4,3823,18,1


### Statistical Test on 'City_Type' vs 'Target'

In [104]:
tbl = pd.crosstab(newdata.City_Type, newdata.Target)

import scipy.stats as stats
teststats, pvalue, df, exp_freq = stats.chi2_contingency(tbl)
print(pvalue)

1.136816151637158e-10


## Inferences
* Education Score, Employee category and City category are related with Target

### Applying Frequency encoding on the Cat1, Cat2, Cat3

In [107]:
newdata['Category1'] = pd.factorize(newdata.Category1)[0]
newdata['Category2'] = pd.factorize(newdata.Category2)[0]

# pd.factorize is another way to apply ecoding

# Missing value Treatment

In [111]:
newdata.isnull().sum()/newdata.shape[0]

Var1                           0.000000
Var2                           0.000000
Var3                           0.000000
Var4                           0.000000
Var5                           0.000000
Income                         0.000000
Education_Score                0.000000
Age                            0.000000
City_Type                      0.421635
Employer_Category              0.822993
Donation                       0.943742
Health_Score                   0.943742
Health Score                   0.929257
Number_of_stall_visited        0.941055
Last_Stall_Visited_Number      0.941055
Category1                      0.000000
Category2                      0.000000
Category3                      0.000000
Social_Media                   0.000000
Iteraction_Days                0.003022
Camp_Duration                  0.000000
magic1                         0.003022
magic2                         0.003022
Patient_Duration               0.000000
dates_seq                      0.000000


## Donation
* we don't know that in which currency donation is done etc and we have 94% missing values in it, so we will drop the donation column.

In [112]:
newdata.drop('Donation', axis = 1, inplace = True)

## Replace 'None' in  Income, Education and Age by np.nan

In [114]:
newdata.replace(to_replace= 'None', value = np.nan, inplace = True)

## Dropping other unnecessary columns

In [116]:
newdata.drop(['Health_Score', 'Health Score', 'Number_of_stall_visited', 
              'Last_Stall_Visited_Number'], axis = 1, inplace = True )

In [118]:
newdata.head()

Unnamed: 0,Var1,Var2,Var3,Var4,Var5,Income,Education_Score,Age,City_Type,Employer_Category,Category1,Category2,Category3,Social_Media,Iteraction_Days,Camp_Duration,magic1,magic2,Patient_Duration,dates_seq,Registration_Days,Registration_Month,Registration_Year,Camp_Start_Year,Camp_End_Year,First_Int_Year,Patients_Per_Day,Patients_Per_Month,Patients_Per_Year,Patient_Frequency_Per_Day,Patient_Frequency_Per_Month,Patient_Frequency_Per_Year,Patient_Freq_HC,Health_Camps_Year,Target
0,4,0,0,0,2,,,,,,0,0,2,0,278.0,59,25.0,34.0,312,0,10.0,9.0,2005.0,2005,2005,2004,2649.0,6585.0,15710.0,9,7,3,2837,19,1
1,45,5,0,0,7,1.0,75.0,40.0,C,Others,0,0,2,0,344.0,59,2.0,57.0,401,0,18.0,8.0,2005.0,2005,2005,2004,3620.0,5804.0,15710.0,16,12,4,2837,19,0
2,0,0,0,0,0,,,,,,1,1,2,0,676.0,751,194.0,557.0,1233,0,29.0,4.0,2006.0,2005,2007,2004,2518.0,4785.0,19318.0,5,4,2,3597,9,1
3,0,0,0,0,0,,,,I,,2,2,2,0,0.0,17,6.0,11.0,11,0,7.0,2.0,2004.0,2004,2004,2004,2363.0,5029.0,9646.0,4,4,3,1882,14,0
4,15,1,0,0,7,1.0,70.0,40.0,I,Technology,1,1,2,1,970.0,4,30.0,34.0,1004,1,28.0,2.0,2006.0,2006,2006,2003,3012.0,5029.0,19318.0,17,9,4,3823,18,1


###  Replacing missing/NaN values by encoding
* (By using pd.factorize :- It will simply make categories in numbers according to alphabetical preference and if there are     nan vlaues or other different values then it will give that values in some other category) or (It will simply assign       random numbers.)
*Note: - pd.factorize should be used only when our target column is categorical and it should not be used when target column is numerical

In [119]:
newdata['Income'] = pd.factorize(newdata.Income)[0]
newdata['Education_Score'] = pd.factorize(newdata.Education_Score)[0]
newdata['City_Type'] = pd.factorize(newdata.City_Type)[0]
newdata['Employer_Category'] = pd.factorize(newdata.Employer_Category)[0]

In [121]:
newdata.head()

Unnamed: 0,Var1,Var2,Var3,Var4,Var5,Income,Education_Score,Age,City_Type,Employer_Category,Category1,Category2,Category3,Social_Media,Iteraction_Days,Camp_Duration,magic1,magic2,Patient_Duration,dates_seq,Registration_Days,Registration_Month,Registration_Year,Camp_Start_Year,Camp_End_Year,First_Int_Year,Patients_Per_Day,Patients_Per_Month,Patients_Per_Year,Patient_Frequency_Per_Day,Patient_Frequency_Per_Month,Patient_Frequency_Per_Year,Patient_Freq_HC,Health_Camps_Year,Target
0,4,0,0,0,2,-1,-1,,-1,-1,0,0,2,0,278.0,59,25.0,34.0,312,0,10.0,9.0,2005.0,2005,2005,2004,2649.0,6585.0,15710.0,9,7,3,2837,19,1
1,45,5,0,0,7,0,0,40.0,0,0,0,0,2,0,344.0,59,2.0,57.0,401,0,18.0,8.0,2005.0,2005,2005,2004,3620.0,5804.0,15710.0,16,12,4,2837,19,0
2,0,0,0,0,0,-1,-1,,-1,-1,1,1,2,0,676.0,751,194.0,557.0,1233,0,29.0,4.0,2006.0,2005,2007,2004,2518.0,4785.0,19318.0,5,4,2,3597,9,1
3,0,0,0,0,0,-1,-1,,1,-1,2,2,2,0,0.0,17,6.0,11.0,11,0,7.0,2.0,2004.0,2004,2004,2004,2363.0,5029.0,9646.0,4,4,3,1882,14,0
4,15,1,0,0,7,0,1,40.0,1,1,1,1,2,1,970.0,4,30.0,34.0,1004,1,28.0,2.0,2006.0,2006,2006,2003,3012.0,5029.0,19318.0,17,9,4,3823,18,1


### Dealing  missing values of 'Age'

In [122]:
newdata.Age.describe()

count     32602
unique       50
top          41
freq       2568
Name: Age, dtype: object

In [124]:
newdata.Age.info()

<class 'pandas.core.series.Series'>
Int64Index: 110527 entries, 0 to 110526
Series name: Age
Non-Null Count  Dtype 
--------------  ----- 
32602 non-null  object
dtypes: object(1)
memory usage: 5.7+ MB


### Age data_type is object, so we will convert it into float

In [125]:
newdata['Age'] = newdata.Age.astype('float')

In [126]:
newdata.Age.describe()

count    32602.000000
mean        48.208760
std         11.969104
min         30.000000
25%         40.000000
50%         44.000000
75%         51.000000
max         80.000000
Name: Age, dtype: float64

### We will fill na in 'Age' with  pd.factorize (encoding)

In [129]:
newdata['Age'] = pd.factorize(newdata.Age)[0]

In [135]:
newdata.isnull().sum()[newdata.isnull().sum()!=0]

Iteraction_Days       334
magic1                334
magic2                334
Registration_Days     334
Registration_Month    334
Registration_Year     334
Patients_Per_Day      334
Patients_Per_Month    334
Patients_Per_Year     334
dtype: int64

## Dealing missing values of remaining columns with median

In [136]:
cols = ['Iteraction_Days', 'magic1','magic2', 'Registration_Days', 'Registration_Month', 'Registration_Year',
        'Patients_Per_Day', 'Patients_Per_Month', 'Patients_Per_Year']

In [138]:
for i in cols:
    newdata.loc[:, i].fillna(newdata.loc[:, i].median(), inplace = True)

In [140]:
newdata.isnull().sum()

Var1                           0
Var2                           0
Var3                           0
Var4                           0
Var5                           0
Income                         0
Education_Score                0
Age                            0
City_Type                      0
Employer_Category              0
Category1                      0
Category2                      0
Category3                      0
Social_Media                   0
Iteraction_Days                0
Camp_Duration                  0
magic1                         0
magic2                         0
Patient_Duration               0
dates_seq                      0
Registration_Days              0
Registration_Month             0
Registration_Year              0
Camp_Start_Year                0
Camp_End_Year                  0
First_Int_Year                 0
Patients_Per_Day               0
Patients_Per_Month             0
Patients_Per_Year              0
Patient_Frequency_Per_Day      0
Patient_Fr

In [141]:
newdata.head()

Unnamed: 0,Var1,Var2,Var3,Var4,Var5,Income,Education_Score,Age,City_Type,Employer_Category,Category1,Category2,Category3,Social_Media,Iteraction_Days,Camp_Duration,magic1,magic2,Patient_Duration,dates_seq,Registration_Days,Registration_Month,Registration_Year,Camp_Start_Year,Camp_End_Year,First_Int_Year,Patients_Per_Day,Patients_Per_Month,Patients_Per_Year,Patient_Frequency_Per_Day,Patient_Frequency_Per_Month,Patient_Frequency_Per_Year,Patient_Freq_HC,Health_Camps_Year,Target
0,4,0,0,0,2,-1,-1,-1,-1,-1,0,0,2,0,278.0,59,25.0,34.0,312,0,10.0,9.0,2005.0,2005,2005,2004,2649.0,6585.0,15710.0,9,7,3,2837,19,1
1,45,5,0,0,7,0,0,0,0,0,0,0,2,0,344.0,59,2.0,57.0,401,0,18.0,8.0,2005.0,2005,2005,2004,3620.0,5804.0,15710.0,16,12,4,2837,19,0
2,0,0,0,0,0,-1,-1,-1,-1,-1,1,1,2,0,676.0,751,194.0,557.0,1233,0,29.0,4.0,2006.0,2005,2007,2004,2518.0,4785.0,19318.0,5,4,2,3597,9,1
3,0,0,0,0,0,-1,-1,-1,1,-1,2,2,2,0,0.0,17,6.0,11.0,11,0,7.0,2.0,2004.0,2004,2004,2004,2363.0,5029.0,9646.0,4,4,3,1882,14,0
4,15,1,0,0,7,0,1,0,1,1,1,1,2,1,970.0,4,30.0,34.0,1004,1,28.0,2.0,2006.0,2006,2006,2003,3012.0,5029.0,19318.0,17,9,4,3823,18,1


# Modelling

## Split the data in train and test

In [143]:
newtrain = newdata.loc[0 : train.shape[0]-1, :]
newtest = newdata.loc[train.shape[0]:, :]

## Drop the target from train and test

In [144]:
X = newtrain.drop('Target', axis =1)
newtest = newtest.drop('Target', axis = 1)
y = newtrain.Target

# Import the Libraries

In [145]:
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from catboost import CatBoostClassifier

In [146]:
from sklearn.model_selection import cross_val_score, StratifiedKFold

def base_models():
    models = dict()
    models['lg'] = LogisticRegression()
    models['tree'] = DecisionTreeClassifier(criterion = 'entropy')
    models['rf'] = RandomForestClassifier(criterion = 'entropy')
    models['gbm'] = GradientBoostingClassifier()
    models['xgb'] = XGBClassifier()
    models['lgbm'] = LGBMClassifier()
    models['catboost'] = CatBoostClassifier()
    return models

### Evaluation Function (checking evaluation score) ...

In [147]:
from sklearn.metrics import f1_score

def eval_score(model):
    cv = StratifiedKFold(n_splits = 5, shuffle = True, random_state = 42)
    score = cross_val_score(model, X, y, scoring = 'roc_auc', cv = cv, error_score = 'raise', n_jobs = -1)
    return score
    
    

## Build the Models...


In [148]:

models = base_models()
result, names = list(), list()
for name, model in models.items():
    finalscore = eval_score(model)
    result.append(finalscore)
    names.append(name)
    print('%s %.3f (%.3f)'% (name, np.mean(result), np.std(result)))

lg 0.701 (0.004)
tree 0.719 (0.018)
rf 0.776 (0.082)
gbm 0.798 (0.080)
xgb 0.814 (0.079)
lgbm 0.824 (0.076)
catboost 0.832 (0.073)


In [None]:
plt.boxplot(result, labels = names. showmeans = True)
plt.xticks(rotation = 90)

# Taking best 3 models :-

### 1. Lets start with XGBOOST

In [150]:
xgb = XGBClassifier()

kfold = StratifiedKFold(n_splits = 5, shuffle = True, random_state = 42)

pred_xgb = []
for train_index, test_index in kfold.split(X, y):
    xtrain = X.iloc[train_index]
    ytrain = y.iloc[train_index]
    pred_xgb.append(xgb.fit(xtrain, ytrain).predict_proba(newtest))

### Generate the predicted values ...

In [153]:
pd.DataFrame(np.array(pred_xgb)[0][1])

Unnamed: 0,0
0,0.54404
1,0.45596


## Getting file ready for submission

In [None]:
submission['Outcome'] = pd.DataFrame(np.array(pred_xgb)[0][1])

In [154]:
submission.to_csv('XGBModel_health.csv', index = False)  # 0.5

# 2. Let's take LGBM model

In [155]:
lgbm = LGBMClassifier()

kfold = StratifiedKFold(n_splits = 5, shuffle = True, random_state = 42)

pred_lgbm = []
for train_index, test_index in kfold.split(X, y):
    xtrain = X.iloc[train_index]
    ytrain = y.iloc[train_index]
    pred_lgbm.append(lgbm.fit(xtrain, ytrain).predict_proba(newtest))

## Getting file ready for submission

In [156]:
submission['Outcome'] = pd.DataFrame(np.array(pred_lgbm)[0][1])

In [157]:
submission.to_csv('LGBMModel_health.csv', index = False)  # 

# 3. Let's take catboost model

In [159]:
cat = CatBoostClassifier()

kfold = StratifiedKFold(n_splits = 5, shuffle = True, random_state = 42)

pred_cat = []
for train_index, test_index in kfold.split(X, y):
    xtrain = X.iloc[train_index]
    ytrain = y.iloc[train_index]
    pred_cat.append(cat.fit(xtrain, ytrain).predict_proba(newtest))

Learning rate set to 0.059277
0:	learn: 0.6561714	total: 181ms	remaining: 3m
1:	learn: 0.6231708	total: 203ms	remaining: 1m 41s
2:	learn: 0.5945105	total: 224ms	remaining: 1m 14s
3:	learn: 0.5709654	total: 243ms	remaining: 1m
4:	learn: 0.5505144	total: 259ms	remaining: 51.6s
5:	learn: 0.5342755	total: 276ms	remaining: 45.7s
6:	learn: 0.5195151	total: 292ms	remaining: 41.5s
7:	learn: 0.5052490	total: 310ms	remaining: 38.4s
8:	learn: 0.4939886	total: 326ms	remaining: 35.9s
9:	learn: 0.4847838	total: 343ms	remaining: 34s
10:	learn: 0.4764202	total: 359ms	remaining: 32.3s
11:	learn: 0.4687268	total: 375ms	remaining: 30.9s
12:	learn: 0.4625256	total: 392ms	remaining: 29.8s
13:	learn: 0.4567362	total: 409ms	remaining: 28.8s
14:	learn: 0.4506314	total: 431ms	remaining: 28.3s
15:	learn: 0.4458441	total: 452ms	remaining: 27.8s
16:	learn: 0.4418192	total: 470ms	remaining: 27.2s
17:	learn: 0.4384055	total: 486ms	remaining: 26.5s
18:	learn: 0.4351618	total: 502ms	remaining: 25.9s
19:	learn: 0.4326

160:	learn: 0.3863613	total: 3.16s	remaining: 16.5s
161:	learn: 0.3862469	total: 3.19s	remaining: 16.5s
162:	learn: 0.3860945	total: 3.21s	remaining: 16.5s
163:	learn: 0.3859270	total: 3.23s	remaining: 16.4s
164:	learn: 0.3858053	total: 3.24s	remaining: 16.4s
165:	learn: 0.3857111	total: 3.26s	remaining: 16.4s
166:	learn: 0.3856181	total: 3.28s	remaining: 16.4s
167:	learn: 0.3855027	total: 3.29s	remaining: 16.3s
168:	learn: 0.3854100	total: 3.31s	remaining: 16.3s
169:	learn: 0.3853017	total: 3.33s	remaining: 16.3s
170:	learn: 0.3851966	total: 3.35s	remaining: 16.2s
171:	learn: 0.3850795	total: 3.37s	remaining: 16.2s
172:	learn: 0.3849494	total: 3.39s	remaining: 16.2s
173:	learn: 0.3848280	total: 3.41s	remaining: 16.2s
174:	learn: 0.3847428	total: 3.43s	remaining: 16.2s
175:	learn: 0.3845956	total: 3.45s	remaining: 16.1s
176:	learn: 0.3844971	total: 3.47s	remaining: 16.1s
177:	learn: 0.3843667	total: 3.49s	remaining: 16.1s
178:	learn: 0.3842847	total: 3.5s	remaining: 16.1s
179:	learn: 0

330:	learn: 0.3681051	total: 6.22s	remaining: 12.6s
331:	learn: 0.3680309	total: 6.24s	remaining: 12.6s
332:	learn: 0.3680052	total: 6.26s	remaining: 12.5s
333:	learn: 0.3678882	total: 6.27s	remaining: 12.5s
334:	learn: 0.3678077	total: 6.29s	remaining: 12.5s
335:	learn: 0.3677227	total: 6.31s	remaining: 12.5s
336:	learn: 0.3676402	total: 6.32s	remaining: 12.4s
337:	learn: 0.3675427	total: 6.34s	remaining: 12.4s
338:	learn: 0.3674502	total: 6.36s	remaining: 12.4s
339:	learn: 0.3673578	total: 6.37s	remaining: 12.4s
340:	learn: 0.3672783	total: 6.39s	remaining: 12.3s
341:	learn: 0.3671650	total: 6.41s	remaining: 12.3s
342:	learn: 0.3670913	total: 6.43s	remaining: 12.3s
343:	learn: 0.3669861	total: 6.44s	remaining: 12.3s
344:	learn: 0.3668762	total: 6.46s	remaining: 12.3s
345:	learn: 0.3667471	total: 6.47s	remaining: 12.2s
346:	learn: 0.3666484	total: 6.49s	remaining: 12.2s
347:	learn: 0.3665631	total: 6.51s	remaining: 12.2s
348:	learn: 0.3664341	total: 6.53s	remaining: 12.2s
349:	learn: 

497:	learn: 0.3548116	total: 9.16s	remaining: 9.23s
498:	learn: 0.3547478	total: 9.18s	remaining: 9.22s
499:	learn: 0.3546878	total: 9.2s	remaining: 9.2s
500:	learn: 0.3546084	total: 9.21s	remaining: 9.18s
501:	learn: 0.3545332	total: 9.23s	remaining: 9.16s
502:	learn: 0.3544424	total: 9.25s	remaining: 9.14s
503:	learn: 0.3543838	total: 9.26s	remaining: 9.12s
504:	learn: 0.3543078	total: 9.28s	remaining: 9.1s
505:	learn: 0.3542232	total: 9.3s	remaining: 9.08s
506:	learn: 0.3541761	total: 9.31s	remaining: 9.06s
507:	learn: 0.3541069	total: 9.33s	remaining: 9.04s
508:	learn: 0.3540210	total: 9.35s	remaining: 9.02s
509:	learn: 0.3539636	total: 9.37s	remaining: 9s
510:	learn: 0.3539037	total: 9.39s	remaining: 8.98s
511:	learn: 0.3538687	total: 9.4s	remaining: 8.96s
512:	learn: 0.3538186	total: 9.42s	remaining: 8.94s
513:	learn: 0.3537521	total: 9.44s	remaining: 8.92s
514:	learn: 0.3537019	total: 9.45s	remaining: 8.9s
515:	learn: 0.3536667	total: 9.47s	remaining: 8.89s
516:	learn: 0.3536300

661:	learn: 0.3439221	total: 12.1s	remaining: 6.16s
662:	learn: 0.3438866	total: 12.1s	remaining: 6.15s
663:	learn: 0.3438224	total: 12.1s	remaining: 6.13s
664:	learn: 0.3437728	total: 12.1s	remaining: 6.11s
665:	learn: 0.3436908	total: 12.1s	remaining: 6.09s
666:	learn: 0.3436508	total: 12.2s	remaining: 6.07s
667:	learn: 0.3436018	total: 12.2s	remaining: 6.05s
668:	learn: 0.3435490	total: 12.2s	remaining: 6.03s
669:	learn: 0.3434670	total: 12.2s	remaining: 6.01s
670:	learn: 0.3434189	total: 12.2s	remaining: 6s
671:	learn: 0.3433449	total: 12.2s	remaining: 5.98s
672:	learn: 0.3432812	total: 12.3s	remaining: 5.96s
673:	learn: 0.3432318	total: 12.3s	remaining: 5.94s
674:	learn: 0.3431918	total: 12.3s	remaining: 5.92s
675:	learn: 0.3431265	total: 12.3s	remaining: 5.9s
676:	learn: 0.3430630	total: 12.3s	remaining: 5.89s
677:	learn: 0.3430003	total: 12.4s	remaining: 5.87s
678:	learn: 0.3429534	total: 12.4s	remaining: 5.85s
679:	learn: 0.3429084	total: 12.4s	remaining: 5.83s
680:	learn: 0.34

831:	learn: 0.3338907	total: 15s	remaining: 3.03s
832:	learn: 0.3338291	total: 15s	remaining: 3.02s
833:	learn: 0.3338022	total: 15.1s	remaining: 3s
834:	learn: 0.3337447	total: 15.1s	remaining: 2.98s
835:	learn: 0.3337053	total: 15.1s	remaining: 2.96s
836:	learn: 0.3336284	total: 15.1s	remaining: 2.94s
837:	learn: 0.3335705	total: 15.1s	remaining: 2.92s
838:	learn: 0.3335203	total: 15.1s	remaining: 2.9s
839:	learn: 0.3334557	total: 15.2s	remaining: 2.89s
840:	learn: 0.3333977	total: 15.2s	remaining: 2.87s
841:	learn: 0.3333437	total: 15.2s	remaining: 2.85s
842:	learn: 0.3333124	total: 15.2s	remaining: 2.83s
843:	learn: 0.3332495	total: 15.2s	remaining: 2.81s
844:	learn: 0.3331844	total: 15.2s	remaining: 2.79s
845:	learn: 0.3330975	total: 15.3s	remaining: 2.78s
846:	learn: 0.3330401	total: 15.3s	remaining: 2.76s
847:	learn: 0.3329845	total: 15.3s	remaining: 2.74s
848:	learn: 0.3329185	total: 15.3s	remaining: 2.72s
849:	learn: 0.3328849	total: 15.3s	remaining: 2.7s
850:	learn: 0.3328312

991:	learn: 0.3253183	total: 17.9s	remaining: 144ms
992:	learn: 0.3252534	total: 17.9s	remaining: 126ms
993:	learn: 0.3251872	total: 17.9s	remaining: 108ms
994:	learn: 0.3251376	total: 17.9s	remaining: 90ms
995:	learn: 0.3250585	total: 17.9s	remaining: 72ms
996:	learn: 0.3250019	total: 18s	remaining: 54ms
997:	learn: 0.3249399	total: 18s	remaining: 36ms
998:	learn: 0.3249021	total: 18s	remaining: 18ms
999:	learn: 0.3248636	total: 18s	remaining: 0us
Learning rate set to 0.059277
0:	learn: 0.6561297	total: 29.6ms	remaining: 29.6s
1:	learn: 0.6228840	total: 52.6ms	remaining: 26.2s
2:	learn: 0.5943537	total: 67.5ms	remaining: 22.4s
3:	learn: 0.5719479	total: 82.9ms	remaining: 20.6s
4:	learn: 0.5502894	total: 98.5ms	remaining: 19.6s
5:	learn: 0.5316815	total: 114ms	remaining: 18.9s
6:	learn: 0.5171287	total: 131ms	remaining: 18.6s
7:	learn: 0.5027633	total: 147ms	remaining: 18.2s
8:	learn: 0.4914664	total: 162ms	remaining: 17.8s
9:	learn: 0.4824540	total: 178ms	remaining: 17.6s
10:	learn: 0

162:	learn: 0.3869548	total: 2.87s	remaining: 14.7s
163:	learn: 0.3868450	total: 2.89s	remaining: 14.7s
164:	learn: 0.3866649	total: 2.9s	remaining: 14.7s
165:	learn: 0.3865734	total: 2.92s	remaining: 14.7s
166:	learn: 0.3864435	total: 2.94s	remaining: 14.7s
167:	learn: 0.3862824	total: 2.96s	remaining: 14.7s
168:	learn: 0.3861305	total: 2.98s	remaining: 14.6s
169:	learn: 0.3860667	total: 2.99s	remaining: 14.6s
170:	learn: 0.3859497	total: 3.01s	remaining: 14.6s
171:	learn: 0.3858806	total: 3.03s	remaining: 14.6s
172:	learn: 0.3858089	total: 3.04s	remaining: 14.6s
173:	learn: 0.3856241	total: 3.07s	remaining: 14.6s
174:	learn: 0.3854587	total: 3.08s	remaining: 14.5s
175:	learn: 0.3853689	total: 3.1s	remaining: 14.5s
176:	learn: 0.3852832	total: 3.12s	remaining: 14.5s
177:	learn: 0.3851631	total: 3.13s	remaining: 14.5s
178:	learn: 0.3850717	total: 3.15s	remaining: 14.5s
179:	learn: 0.3850196	total: 3.17s	remaining: 14.4s
180:	learn: 0.3849238	total: 3.18s	remaining: 14.4s
181:	learn: 0.

325:	learn: 0.3688767	total: 5.78s	remaining: 11.9s
326:	learn: 0.3688474	total: 5.8s	remaining: 11.9s
327:	learn: 0.3687412	total: 5.82s	remaining: 11.9s
328:	learn: 0.3686833	total: 5.83s	remaining: 11.9s
329:	learn: 0.3685648	total: 5.85s	remaining: 11.9s
330:	learn: 0.3684580	total: 5.87s	remaining: 11.9s
331:	learn: 0.3683450	total: 5.89s	remaining: 11.8s
332:	learn: 0.3682592	total: 5.9s	remaining: 11.8s
333:	learn: 0.3682008	total: 5.92s	remaining: 11.8s
334:	learn: 0.3680790	total: 5.94s	remaining: 11.8s
335:	learn: 0.3679758	total: 5.95s	remaining: 11.8s
336:	learn: 0.3678488	total: 5.97s	remaining: 11.7s
337:	learn: 0.3677599	total: 5.99s	remaining: 11.7s
338:	learn: 0.3676713	total: 6.01s	remaining: 11.7s
339:	learn: 0.3675760	total: 6.03s	remaining: 11.7s
340:	learn: 0.3674390	total: 6.05s	remaining: 11.7s
341:	learn: 0.3673601	total: 6.06s	remaining: 11.7s
342:	learn: 0.3673108	total: 6.08s	remaining: 11.6s
343:	learn: 0.3672076	total: 6.1s	remaining: 11.6s
344:	learn: 0.3

486:	learn: 0.3559963	total: 8.71s	remaining: 9.18s
487:	learn: 0.3559186	total: 8.74s	remaining: 9.17s
488:	learn: 0.3558398	total: 8.76s	remaining: 9.15s
489:	learn: 0.3557326	total: 8.78s	remaining: 9.13s
490:	learn: 0.3556462	total: 8.8s	remaining: 9.12s
491:	learn: 0.3555546	total: 8.82s	remaining: 9.1s
492:	learn: 0.3555510	total: 8.83s	remaining: 9.08s
493:	learn: 0.3554809	total: 8.85s	remaining: 9.07s
494:	learn: 0.3554197	total: 8.87s	remaining: 9.05s
495:	learn: 0.3553471	total: 8.89s	remaining: 9.03s
496:	learn: 0.3552543	total: 8.91s	remaining: 9.02s
497:	learn: 0.3551646	total: 8.93s	remaining: 9s
498:	learn: 0.3550882	total: 8.95s	remaining: 8.99s
499:	learn: 0.3550064	total: 8.97s	remaining: 8.97s
500:	learn: 0.3549358	total: 8.99s	remaining: 8.95s
501:	learn: 0.3548603	total: 9.01s	remaining: 8.93s
502:	learn: 0.3547664	total: 9.02s	remaining: 8.92s
503:	learn: 0.3547071	total: 9.04s	remaining: 8.9s
504:	learn: 0.3546024	total: 9.06s	remaining: 8.88s
505:	learn: 0.3545

652:	learn: 0.3446080	total: 12.1s	remaining: 6.41s
653:	learn: 0.3445186	total: 12.1s	remaining: 6.39s
654:	learn: 0.3444340	total: 12.1s	remaining: 6.37s
655:	learn: 0.3443816	total: 12.1s	remaining: 6.35s
656:	learn: 0.3442865	total: 12.1s	remaining: 6.33s
657:	learn: 0.3442172	total: 12.2s	remaining: 6.32s
658:	learn: 0.3441438	total: 12.2s	remaining: 6.3s
659:	learn: 0.3440812	total: 12.2s	remaining: 6.28s
660:	learn: 0.3440293	total: 12.2s	remaining: 6.26s
661:	learn: 0.3439618	total: 12.2s	remaining: 6.25s
662:	learn: 0.3438930	total: 12.3s	remaining: 6.23s
663:	learn: 0.3438086	total: 12.3s	remaining: 6.21s
664:	learn: 0.3437656	total: 12.3s	remaining: 6.19s
665:	learn: 0.3436800	total: 12.3s	remaining: 6.17s
666:	learn: 0.3435876	total: 12.3s	remaining: 6.15s
667:	learn: 0.3435315	total: 12.3s	remaining: 6.14s
668:	learn: 0.3434586	total: 12.4s	remaining: 6.12s
669:	learn: 0.3434250	total: 12.4s	remaining: 6.1s
670:	learn: 0.3433671	total: 12.4s	remaining: 6.08s
671:	learn: 0.

816:	learn: 0.3347806	total: 15.4s	remaining: 3.45s
817:	learn: 0.3347176	total: 15.4s	remaining: 3.43s
818:	learn: 0.3346537	total: 15.4s	remaining: 3.41s
819:	learn: 0.3346082	total: 15.4s	remaining: 3.39s
820:	learn: 0.3345609	total: 15.5s	remaining: 3.37s
821:	learn: 0.3345096	total: 15.5s	remaining: 3.35s
822:	learn: 0.3344558	total: 15.5s	remaining: 3.33s
823:	learn: 0.3344268	total: 15.5s	remaining: 3.31s
824:	learn: 0.3343370	total: 15.5s	remaining: 3.29s
825:	learn: 0.3342851	total: 15.5s	remaining: 3.27s
826:	learn: 0.3342304	total: 15.6s	remaining: 3.26s
827:	learn: 0.3341754	total: 15.6s	remaining: 3.24s
828:	learn: 0.3341221	total: 15.6s	remaining: 3.22s
829:	learn: 0.3340621	total: 15.6s	remaining: 3.2s
830:	learn: 0.3340055	total: 15.6s	remaining: 3.18s
831:	learn: 0.3339330	total: 15.7s	remaining: 3.16s
832:	learn: 0.3339090	total: 15.7s	remaining: 3.14s
833:	learn: 0.3338584	total: 15.7s	remaining: 3.12s
834:	learn: 0.3338319	total: 15.7s	remaining: 3.1s
835:	learn: 0.

983:	learn: 0.3257037	total: 18.5s	remaining: 301ms
984:	learn: 0.3256420	total: 18.5s	remaining: 282ms
985:	learn: 0.3255818	total: 18.6s	remaining: 264ms
986:	learn: 0.3255331	total: 18.6s	remaining: 245ms
987:	learn: 0.3255088	total: 18.6s	remaining: 226ms
988:	learn: 0.3254654	total: 18.6s	remaining: 207ms
989:	learn: 0.3254282	total: 18.6s	remaining: 188ms
990:	learn: 0.3253480	total: 18.7s	remaining: 169ms
991:	learn: 0.3252788	total: 18.7s	remaining: 151ms
992:	learn: 0.3252272	total: 18.7s	remaining: 132ms
993:	learn: 0.3251595	total: 18.7s	remaining: 113ms
994:	learn: 0.3250842	total: 18.7s	remaining: 94.2ms
995:	learn: 0.3250366	total: 18.8s	remaining: 75.4ms
996:	learn: 0.3249823	total: 18.8s	remaining: 56.5ms
997:	learn: 0.3249171	total: 18.8s	remaining: 37.7ms
998:	learn: 0.3248689	total: 18.8s	remaining: 18.8ms
999:	learn: 0.3248264	total: 18.8s	remaining: 0us
Learning rate set to 0.059277
0:	learn: 0.6567329	total: 20.2ms	remaining: 20.2s
1:	learn: 0.6232131	total: 43.3m

150:	learn: 0.3891536	total: 2.87s	remaining: 16.1s
151:	learn: 0.3890339	total: 2.88s	remaining: 16.1s
152:	learn: 0.3889235	total: 2.9s	remaining: 16.1s
153:	learn: 0.3887411	total: 2.92s	remaining: 16s
154:	learn: 0.3886366	total: 2.94s	remaining: 16s
155:	learn: 0.3885579	total: 2.95s	remaining: 16s
156:	learn: 0.3884325	total: 2.97s	remaining: 15.9s
157:	learn: 0.3881422	total: 2.99s	remaining: 15.9s
158:	learn: 0.3879919	total: 3s	remaining: 15.9s
159:	learn: 0.3879160	total: 3.02s	remaining: 15.9s
160:	learn: 0.3878223	total: 3.04s	remaining: 15.8s
161:	learn: 0.3877024	total: 3.06s	remaining: 15.8s
162:	learn: 0.3875729	total: 3.08s	remaining: 15.8s
163:	learn: 0.3874850	total: 3.1s	remaining: 15.8s
164:	learn: 0.3873903	total: 3.11s	remaining: 15.8s
165:	learn: 0.3873213	total: 3.13s	remaining: 15.7s
166:	learn: 0.3871602	total: 3.15s	remaining: 15.7s
167:	learn: 0.3870419	total: 3.16s	remaining: 15.7s
168:	learn: 0.3869432	total: 3.18s	remaining: 15.7s
169:	learn: 0.3868267	t

316:	learn: 0.3714098	total: 5.93s	remaining: 12.8s
317:	learn: 0.3713199	total: 5.96s	remaining: 12.8s
318:	learn: 0.3712331	total: 5.97s	remaining: 12.8s
319:	learn: 0.3711532	total: 6s	remaining: 12.7s
320:	learn: 0.3710533	total: 6.01s	remaining: 12.7s
321:	learn: 0.3709728	total: 6.03s	remaining: 12.7s
322:	learn: 0.3708903	total: 6.05s	remaining: 12.7s
323:	learn: 0.3707843	total: 6.07s	remaining: 12.7s
324:	learn: 0.3706656	total: 6.09s	remaining: 12.7s
325:	learn: 0.3705940	total: 6.12s	remaining: 12.6s
326:	learn: 0.3705311	total: 6.14s	remaining: 12.6s
327:	learn: 0.3704605	total: 6.16s	remaining: 12.6s
328:	learn: 0.3703690	total: 6.18s	remaining: 12.6s
329:	learn: 0.3702300	total: 6.2s	remaining: 12.6s
330:	learn: 0.3701225	total: 6.22s	remaining: 12.6s
331:	learn: 0.3700112	total: 6.24s	remaining: 12.6s
332:	learn: 0.3699273	total: 6.26s	remaining: 12.5s
333:	learn: 0.3698463	total: 6.28s	remaining: 12.5s
334:	learn: 0.3697911	total: 6.3s	remaining: 12.5s
335:	learn: 0.369

476:	learn: 0.3584758	total: 9.05s	remaining: 9.92s
477:	learn: 0.3584127	total: 9.07s	remaining: 9.91s
478:	learn: 0.3583579	total: 9.09s	remaining: 9.89s
479:	learn: 0.3582903	total: 9.12s	remaining: 9.88s
480:	learn: 0.3582095	total: 9.14s	remaining: 9.86s
481:	learn: 0.3581834	total: 9.16s	remaining: 9.85s
482:	learn: 0.3581130	total: 9.19s	remaining: 9.83s
483:	learn: 0.3580461	total: 9.21s	remaining: 9.82s
484:	learn: 0.3579757	total: 9.23s	remaining: 9.8s
485:	learn: 0.3579405	total: 9.26s	remaining: 9.79s
486:	learn: 0.3578662	total: 9.28s	remaining: 9.77s
487:	learn: 0.3578052	total: 9.3s	remaining: 9.76s
488:	learn: 0.3577311	total: 9.32s	remaining: 9.74s
489:	learn: 0.3577068	total: 9.35s	remaining: 9.73s
490:	learn: 0.3576503	total: 9.36s	remaining: 9.71s
491:	learn: 0.3575734	total: 9.38s	remaining: 9.69s
492:	learn: 0.3575028	total: 9.4s	remaining: 9.67s
493:	learn: 0.3574296	total: 9.43s	remaining: 9.66s
494:	learn: 0.3573641	total: 9.45s	remaining: 9.64s
495:	learn: 0.3

635:	learn: 0.3477220	total: 12.1s	remaining: 6.95s
636:	learn: 0.3476346	total: 12.2s	remaining: 6.93s
637:	learn: 0.3475319	total: 12.2s	remaining: 6.91s
638:	learn: 0.3474624	total: 12.2s	remaining: 6.89s
639:	learn: 0.3473651	total: 12.2s	remaining: 6.87s
640:	learn: 0.3472914	total: 12.2s	remaining: 6.86s
641:	learn: 0.3472151	total: 12.3s	remaining: 6.84s
642:	learn: 0.3471463	total: 12.3s	remaining: 6.82s
643:	learn: 0.3470977	total: 12.3s	remaining: 6.8s
644:	learn: 0.3470551	total: 12.3s	remaining: 6.78s
645:	learn: 0.3469910	total: 12.3s	remaining: 6.76s
646:	learn: 0.3469392	total: 12.4s	remaining: 6.74s
647:	learn: 0.3468960	total: 12.4s	remaining: 6.72s
648:	learn: 0.3468392	total: 12.4s	remaining: 6.7s
649:	learn: 0.3467681	total: 12.4s	remaining: 6.68s
650:	learn: 0.3467230	total: 12.4s	remaining: 6.66s
651:	learn: 0.3466719	total: 12.4s	remaining: 6.64s
652:	learn: 0.3466023	total: 12.5s	remaining: 6.63s
653:	learn: 0.3465572	total: 12.5s	remaining: 6.61s
654:	learn: 0.

797:	learn: 0.3373697	total: 15.4s	remaining: 3.91s
798:	learn: 0.3373121	total: 15.5s	remaining: 3.89s
799:	learn: 0.3372769	total: 15.5s	remaining: 3.87s
800:	learn: 0.3372261	total: 15.5s	remaining: 3.85s
801:	learn: 0.3371646	total: 15.5s	remaining: 3.83s
802:	learn: 0.3371182	total: 15.5s	remaining: 3.81s
803:	learn: 0.3370550	total: 15.6s	remaining: 3.8s
804:	learn: 0.3369884	total: 15.6s	remaining: 3.78s
805:	learn: 0.3369523	total: 15.6s	remaining: 3.76s
806:	learn: 0.3368955	total: 15.6s	remaining: 3.74s
807:	learn: 0.3368414	total: 15.7s	remaining: 3.72s
808:	learn: 0.3367513	total: 15.7s	remaining: 3.7s
809:	learn: 0.3366962	total: 15.7s	remaining: 3.68s
810:	learn: 0.3366143	total: 15.7s	remaining: 3.66s
811:	learn: 0.3365621	total: 15.7s	remaining: 3.64s
812:	learn: 0.3364748	total: 15.8s	remaining: 3.62s
813:	learn: 0.3364020	total: 15.8s	remaining: 3.6s
814:	learn: 0.3363517	total: 15.8s	remaining: 3.58s
815:	learn: 0.3362937	total: 15.8s	remaining: 3.56s
816:	learn: 0.3

964:	learn: 0.3281726	total: 18.7s	remaining: 678ms
965:	learn: 0.3281354	total: 18.7s	remaining: 659ms
966:	learn: 0.3280700	total: 18.7s	remaining: 639ms
967:	learn: 0.3280406	total: 18.8s	remaining: 620ms
968:	learn: 0.3280003	total: 18.8s	remaining: 601ms
969:	learn: 0.3279392	total: 18.8s	remaining: 581ms
970:	learn: 0.3278781	total: 18.8s	remaining: 562ms
971:	learn: 0.3278141	total: 18.8s	remaining: 542ms
972:	learn: 0.3277617	total: 18.8s	remaining: 523ms
973:	learn: 0.3276905	total: 18.9s	remaining: 504ms
974:	learn: 0.3276369	total: 18.9s	remaining: 484ms
975:	learn: 0.3276053	total: 18.9s	remaining: 465ms
976:	learn: 0.3275443	total: 18.9s	remaining: 445ms
977:	learn: 0.3274969	total: 18.9s	remaining: 426ms
978:	learn: 0.3274354	total: 19s	remaining: 407ms
979:	learn: 0.3273850	total: 19s	remaining: 387ms
980:	learn: 0.3273399	total: 19s	remaining: 368ms
981:	learn: 0.3272882	total: 19s	remaining: 349ms
982:	learn: 0.3272277	total: 19s	remaining: 329ms
983:	learn: 0.3271716	

133:	learn: 0.3905483	total: 2.5s	remaining: 16.2s
134:	learn: 0.3903845	total: 2.53s	remaining: 16.2s
135:	learn: 0.3902900	total: 2.54s	remaining: 16.2s
136:	learn: 0.3901919	total: 2.56s	remaining: 16.1s
137:	learn: 0.3900477	total: 2.58s	remaining: 16.1s
138:	learn: 0.3899479	total: 2.6s	remaining: 16.1s
139:	learn: 0.3898706	total: 2.61s	remaining: 16s
140:	learn: 0.3897621	total: 2.63s	remaining: 16s
141:	learn: 0.3896242	total: 2.65s	remaining: 16s
142:	learn: 0.3894917	total: 2.66s	remaining: 16s
143:	learn: 0.3893780	total: 2.68s	remaining: 15.9s
144:	learn: 0.3892601	total: 2.7s	remaining: 15.9s
145:	learn: 0.3891560	total: 2.72s	remaining: 15.9s
146:	learn: 0.3890449	total: 2.74s	remaining: 15.9s
147:	learn: 0.3889232	total: 2.76s	remaining: 15.9s
148:	learn: 0.3888602	total: 2.77s	remaining: 15.8s
149:	learn: 0.3887728	total: 2.79s	remaining: 15.8s
150:	learn: 0.3886204	total: 2.81s	remaining: 15.8s
151:	learn: 0.3885258	total: 2.82s	remaining: 15.8s
152:	learn: 0.3883596	t

299:	learn: 0.3715070	total: 5.6s	remaining: 13.1s
300:	learn: 0.3713923	total: 5.62s	remaining: 13.1s
301:	learn: 0.3713216	total: 5.64s	remaining: 13s
302:	learn: 0.3712059	total: 5.66s	remaining: 13s
303:	learn: 0.3710962	total: 5.68s	remaining: 13s
304:	learn: 0.3709842	total: 5.7s	remaining: 13s
305:	learn: 0.3709148	total: 5.71s	remaining: 13s
306:	learn: 0.3707738	total: 5.73s	remaining: 12.9s
307:	learn: 0.3706895	total: 5.75s	remaining: 12.9s
308:	learn: 0.3706396	total: 5.77s	remaining: 12.9s
309:	learn: 0.3705202	total: 5.79s	remaining: 12.9s
310:	learn: 0.3704285	total: 5.81s	remaining: 12.9s
311:	learn: 0.3703518	total: 5.83s	remaining: 12.9s
312:	learn: 0.3702332	total: 5.85s	remaining: 12.8s
313:	learn: 0.3701465	total: 5.87s	remaining: 12.8s
314:	learn: 0.3701096	total: 5.88s	remaining: 12.8s
315:	learn: 0.3699974	total: 5.9s	remaining: 12.8s
316:	learn: 0.3699121	total: 5.92s	remaining: 12.7s
317:	learn: 0.3698354	total: 5.94s	remaining: 12.7s
318:	learn: 0.3697249	tot

465:	learn: 0.3573630	total: 8.71s	remaining: 9.99s
466:	learn: 0.3572974	total: 8.74s	remaining: 9.98s
467:	learn: 0.3572040	total: 8.76s	remaining: 9.96s
468:	learn: 0.3571500	total: 8.78s	remaining: 9.94s
469:	learn: 0.3570876	total: 8.79s	remaining: 9.92s
470:	learn: 0.3570137	total: 8.81s	remaining: 9.9s
471:	learn: 0.3569409	total: 8.83s	remaining: 9.88s
472:	learn: 0.3568437	total: 8.85s	remaining: 9.86s
473:	learn: 0.3567592	total: 8.87s	remaining: 9.84s
474:	learn: 0.3567079	total: 8.88s	remaining: 9.82s
475:	learn: 0.3566257	total: 8.91s	remaining: 9.8s
476:	learn: 0.3565679	total: 8.93s	remaining: 9.79s
477:	learn: 0.3564753	total: 8.95s	remaining: 9.77s
478:	learn: 0.3564373	total: 8.97s	remaining: 9.75s
479:	learn: 0.3563266	total: 8.99s	remaining: 9.74s
480:	learn: 0.3562328	total: 9.01s	remaining: 9.72s
481:	learn: 0.3561462	total: 9.02s	remaining: 9.7s
482:	learn: 0.3560671	total: 9.04s	remaining: 9.68s
483:	learn: 0.3559950	total: 9.06s	remaining: 9.66s
484:	learn: 0.3

628:	learn: 0.3455212	total: 11.8s	remaining: 6.95s
629:	learn: 0.3454770	total: 11.8s	remaining: 6.95s
630:	learn: 0.3453692	total: 11.9s	remaining: 6.93s
631:	learn: 0.3453289	total: 11.9s	remaining: 6.92s
632:	learn: 0.3452812	total: 11.9s	remaining: 6.9s
633:	learn: 0.3452277	total: 11.9s	remaining: 6.88s
634:	learn: 0.3451813	total: 11.9s	remaining: 6.86s
635:	learn: 0.3450949	total: 12s	remaining: 6.84s
636:	learn: 0.3450250	total: 12s	remaining: 6.82s
637:	learn: 0.3449685	total: 12s	remaining: 6.8s
638:	learn: 0.3449168	total: 12s	remaining: 6.79s
639:	learn: 0.3448459	total: 12s	remaining: 6.77s
640:	learn: 0.3447591	total: 12.1s	remaining: 6.75s
641:	learn: 0.3446811	total: 12.1s	remaining: 6.73s
642:	learn: 0.3446094	total: 12.1s	remaining: 6.71s
643:	learn: 0.3445374	total: 12.1s	remaining: 6.69s
644:	learn: 0.3445027	total: 12.1s	remaining: 6.67s
645:	learn: 0.3444467	total: 12.1s	remaining: 6.65s
646:	learn: 0.3443662	total: 12.2s	remaining: 6.63s
647:	learn: 0.3442808	to

793:	learn: 0.3350994	total: 14.9s	remaining: 3.87s
794:	learn: 0.3350354	total: 14.9s	remaining: 3.85s
795:	learn: 0.3349764	total: 15s	remaining: 3.84s
796:	learn: 0.3349006	total: 15s	remaining: 3.82s
797:	learn: 0.3348180	total: 15s	remaining: 3.8s
798:	learn: 0.3347883	total: 15s	remaining: 3.78s
799:	learn: 0.3347446	total: 15s	remaining: 3.76s
800:	learn: 0.3346691	total: 15.1s	remaining: 3.74s
801:	learn: 0.3346193	total: 15.1s	remaining: 3.72s
802:	learn: 0.3345563	total: 15.1s	remaining: 3.7s
803:	learn: 0.3344892	total: 15.1s	remaining: 3.68s
804:	learn: 0.3344066	total: 15.1s	remaining: 3.67s
805:	learn: 0.3343660	total: 15.2s	remaining: 3.65s
806:	learn: 0.3343243	total: 15.2s	remaining: 3.63s
807:	learn: 0.3342808	total: 15.2s	remaining: 3.61s
808:	learn: 0.3342062	total: 15.2s	remaining: 3.59s
809:	learn: 0.3341795	total: 15.2s	remaining: 3.57s
810:	learn: 0.3341336	total: 15.2s	remaining: 3.55s
811:	learn: 0.3340571	total: 15.3s	remaining: 3.53s
812:	learn: 0.3339791	to

962:	learn: 0.3252713	total: 18.1s	remaining: 695ms
963:	learn: 0.3252309	total: 18.1s	remaining: 677ms
964:	learn: 0.3251809	total: 18.1s	remaining: 658ms
965:	learn: 0.3251389	total: 18.2s	remaining: 639ms
966:	learn: 0.3251236	total: 18.2s	remaining: 620ms
967:	learn: 0.3250632	total: 18.2s	remaining: 602ms
968:	learn: 0.3250078	total: 18.2s	remaining: 583ms
969:	learn: 0.3249437	total: 18.2s	remaining: 564ms
970:	learn: 0.3248855	total: 18.3s	remaining: 545ms
971:	learn: 0.3248278	total: 18.3s	remaining: 526ms
972:	learn: 0.3247883	total: 18.3s	remaining: 508ms
973:	learn: 0.3247385	total: 18.3s	remaining: 489ms
974:	learn: 0.3246879	total: 18.3s	remaining: 470ms
975:	learn: 0.3246206	total: 18.3s	remaining: 451ms
976:	learn: 0.3245951	total: 18.4s	remaining: 432ms
977:	learn: 0.3245835	total: 18.4s	remaining: 413ms
978:	learn: 0.3245435	total: 18.4s	remaining: 395ms
979:	learn: 0.3244847	total: 18.4s	remaining: 376ms
980:	learn: 0.3244054	total: 18.4s	remaining: 357ms
981:	learn: 

124:	learn: 0.3901664	total: 2.28s	remaining: 16s
125:	learn: 0.3900972	total: 2.31s	remaining: 16s
126:	learn: 0.3900126	total: 2.32s	remaining: 16s
127:	learn: 0.3898599	total: 2.34s	remaining: 16s
128:	learn: 0.3897399	total: 2.36s	remaining: 15.9s
129:	learn: 0.3896461	total: 2.38s	remaining: 15.9s
130:	learn: 0.3895135	total: 2.4s	remaining: 15.9s
131:	learn: 0.3894098	total: 2.41s	remaining: 15.9s
132:	learn: 0.3893466	total: 2.43s	remaining: 15.8s
133:	learn: 0.3892119	total: 2.45s	remaining: 15.8s
134:	learn: 0.3891133	total: 2.47s	remaining: 15.8s
135:	learn: 0.3890248	total: 2.49s	remaining: 15.8s
136:	learn: 0.3889293	total: 2.51s	remaining: 15.8s
137:	learn: 0.3887825	total: 2.53s	remaining: 15.8s
138:	learn: 0.3887086	total: 2.55s	remaining: 15.8s
139:	learn: 0.3885991	total: 2.57s	remaining: 15.8s
140:	learn: 0.3884898	total: 2.58s	remaining: 15.7s
141:	learn: 0.3883605	total: 2.6s	remaining: 15.7s
142:	learn: 0.3882129	total: 2.62s	remaining: 15.7s
143:	learn: 0.3880902	

291:	learn: 0.3707257	total: 5.35s	remaining: 13s
292:	learn: 0.3706710	total: 5.38s	remaining: 13s
293:	learn: 0.3705865	total: 5.4s	remaining: 13s
294:	learn: 0.3704657	total: 5.42s	remaining: 12.9s
295:	learn: 0.3703425	total: 5.43s	remaining: 12.9s
296:	learn: 0.3702232	total: 5.45s	remaining: 12.9s
297:	learn: 0.3701190	total: 5.47s	remaining: 12.9s
298:	learn: 0.3700043	total: 5.48s	remaining: 12.9s
299:	learn: 0.3698767	total: 5.5s	remaining: 12.8s
300:	learn: 0.3698162	total: 5.52s	remaining: 12.8s
301:	learn: 0.3697532	total: 5.53s	remaining: 12.8s
302:	learn: 0.3696672	total: 5.55s	remaining: 12.8s
303:	learn: 0.3695982	total: 5.57s	remaining: 12.8s
304:	learn: 0.3694744	total: 5.6s	remaining: 12.8s
305:	learn: 0.3693386	total: 5.61s	remaining: 12.7s
306:	learn: 0.3692392	total: 5.63s	remaining: 12.7s
307:	learn: 0.3691612	total: 5.65s	remaining: 12.7s
308:	learn: 0.3690737	total: 5.66s	remaining: 12.7s
309:	learn: 0.3689674	total: 5.68s	remaining: 12.6s
310:	learn: 0.3689077

459:	learn: 0.3563227	total: 8.49s	remaining: 9.97s
460:	learn: 0.3562306	total: 8.52s	remaining: 9.96s
461:	learn: 0.3561315	total: 8.54s	remaining: 9.94s
462:	learn: 0.3560857	total: 8.55s	remaining: 9.92s
463:	learn: 0.3560037	total: 8.57s	remaining: 9.9s
464:	learn: 0.3559321	total: 8.6s	remaining: 9.89s
465:	learn: 0.3558574	total: 8.62s	remaining: 9.88s
466:	learn: 0.3557619	total: 8.64s	remaining: 9.86s
467:	learn: 0.3557090	total: 8.66s	remaining: 9.85s
468:	learn: 0.3556341	total: 8.68s	remaining: 9.83s
469:	learn: 0.3555269	total: 8.71s	remaining: 9.82s
470:	learn: 0.3554563	total: 8.73s	remaining: 9.8s
471:	learn: 0.3553601	total: 8.75s	remaining: 9.79s
472:	learn: 0.3552358	total: 8.76s	remaining: 9.77s
473:	learn: 0.3551884	total: 8.78s	remaining: 9.75s
474:	learn: 0.3550994	total: 8.8s	remaining: 9.73s
475:	learn: 0.3550152	total: 8.82s	remaining: 9.71s
476:	learn: 0.3549488	total: 8.84s	remaining: 9.69s
477:	learn: 0.3548950	total: 8.86s	remaining: 9.67s
478:	learn: 0.35

620:	learn: 0.3447221	total: 11.6s	remaining: 7.08s
621:	learn: 0.3446443	total: 11.6s	remaining: 7.06s
622:	learn: 0.3445698	total: 11.6s	remaining: 7.04s
623:	learn: 0.3445260	total: 11.7s	remaining: 7.02s
624:	learn: 0.3444487	total: 11.7s	remaining: 7s
625:	learn: 0.3443751	total: 11.7s	remaining: 6.98s
626:	learn: 0.3443478	total: 11.7s	remaining: 6.96s
627:	learn: 0.3442631	total: 11.7s	remaining: 6.94s
628:	learn: 0.3441900	total: 11.7s	remaining: 6.92s
629:	learn: 0.3441216	total: 11.8s	remaining: 6.9s
630:	learn: 0.3440408	total: 11.8s	remaining: 6.89s
631:	learn: 0.3439553	total: 11.8s	remaining: 6.87s
632:	learn: 0.3438926	total: 11.8s	remaining: 6.85s
633:	learn: 0.3438687	total: 11.8s	remaining: 6.83s
634:	learn: 0.3437976	total: 11.9s	remaining: 6.82s
635:	learn: 0.3437466	total: 11.9s	remaining: 6.8s
636:	learn: 0.3436825	total: 11.9s	remaining: 6.78s
637:	learn: 0.3436088	total: 11.9s	remaining: 6.76s
638:	learn: 0.3435831	total: 11.9s	remaining: 6.74s
639:	learn: 0.343

787:	learn: 0.3344348	total: 14.8s	remaining: 3.98s
788:	learn: 0.3343840	total: 14.8s	remaining: 3.96s
789:	learn: 0.3343504	total: 14.8s	remaining: 3.94s
790:	learn: 0.3342966	total: 14.9s	remaining: 3.92s
791:	learn: 0.3342298	total: 14.9s	remaining: 3.91s
792:	learn: 0.3341902	total: 14.9s	remaining: 3.89s
793:	learn: 0.3341142	total: 14.9s	remaining: 3.87s
794:	learn: 0.3340614	total: 14.9s	remaining: 3.85s
795:	learn: 0.3339968	total: 15s	remaining: 3.83s
796:	learn: 0.3339219	total: 15s	remaining: 3.81s
797:	learn: 0.3338543	total: 15s	remaining: 3.8s
798:	learn: 0.3338133	total: 15s	remaining: 3.78s
799:	learn: 0.3337415	total: 15.1s	remaining: 3.76s
800:	learn: 0.3336904	total: 15.1s	remaining: 3.75s
801:	learn: 0.3336135	total: 15.1s	remaining: 3.73s
802:	learn: 0.3335607	total: 15.1s	remaining: 3.71s
803:	learn: 0.3335325	total: 15.1s	remaining: 3.69s
804:	learn: 0.3334705	total: 15.1s	remaining: 3.67s
805:	learn: 0.3334074	total: 15.2s	remaining: 3.65s
806:	learn: 0.3333437

954:	learn: 0.3251914	total: 18.1s	remaining: 851ms
955:	learn: 0.3251474	total: 18.1s	remaining: 832ms
956:	learn: 0.3250918	total: 18.1s	remaining: 813ms
957:	learn: 0.3250126	total: 18.1s	remaining: 795ms
958:	learn: 0.3249640	total: 18.1s	remaining: 776ms
959:	learn: 0.3249001	total: 18.2s	remaining: 757ms
960:	learn: 0.3248350	total: 18.2s	remaining: 738ms
961:	learn: 0.3247859	total: 18.2s	remaining: 719ms
962:	learn: 0.3247117	total: 18.2s	remaining: 700ms
963:	learn: 0.3246675	total: 18.2s	remaining: 681ms
964:	learn: 0.3246370	total: 18.3s	remaining: 663ms
965:	learn: 0.3245917	total: 18.3s	remaining: 644ms
966:	learn: 0.3245499	total: 18.3s	remaining: 625ms
967:	learn: 0.3244993	total: 18.3s	remaining: 606ms
968:	learn: 0.3244392	total: 18.3s	remaining: 587ms
969:	learn: 0.3243793	total: 18.4s	remaining: 568ms
970:	learn: 0.3243304	total: 18.4s	remaining: 549ms
971:	learn: 0.3242695	total: 18.4s	remaining: 530ms
972:	learn: 0.3242177	total: 18.4s	remaining: 511ms
973:	learn: 

## Getting file ready for submission

In [160]:
submission['Outcome'] = pd.DataFrame(np.array(pred_cat)[0][1])

In [161]:
submission.to_csv('CatModel_health.csv', index = False)  # 

# Stacking model

In [None]:
base = [('xgb', XGBClassifier())]

cv = StratifiedKFold(n_splits = 5, shuffle = True, random_state = 42)

from sklearn.ensemble import StackingClassifier

stack = StackingClassifier(estimators = base, final_estimator = LGBMClassifier(), cv = cv)

pred_xgb = []
for train_index, test_index in kfold.split(X, y):
    xtrain = X.iloc[train_index]
    ytrain = y.iloc[train_index]
    pred_xgb.append(stack.fit(xtrain, ytrain).predict_proba(newtest))

# Generating predicting file 

submission['Outcome'] = pd.DataFrame(np.array(pred_xgb)[0][1])

In [None]:
submission.to_csv('StackModel_health.csv', index = False)  # 0.75