## Google Play Store App Preference Determination

Dataset: https://www.kaggle.com/lava18/google-play-store-apps#googleplaystore.csv

In [1]:
#Import Libraries:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from catboost import CatBoostClassifier ,Pool
from sklearn.model_selection import train_test_split, cross_val_score, cross_val_predict, GridSearchCV

#### Step 1:how you chose the amount of records to generate :
#### Suppose we are interested in estimating the proportion of population preference of mobile apps . We believe that the figure is about 50% . If we decide the standard error not to be more than 2% and want to have 95% confidence interval in estimating the proportion of population preference of mobile app.

#### With 2% standard error :
####  n = z square * p *(1-p)/e square
####  n = 1.96*1.96 *0.5*0.5/0.02*0.02 =  2401
####  With 1% standard error :
####  n = 1.96*1.96 *0.5*0.5/0.01*0.01 =  9604

In [3]:
#Read data from csv file:
googlePlayStore = pd.read_excel("D:/MSIS/Course Materials - Sem 1/1 Data Mining/Assignment/Classification Problem/googleplaystore.xlsx")
googlePlayStore.head(3)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite â€“ FREE Live Cool Themes, Hid...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up


In [4]:
googlePlayStore.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10841 entries, 0 to 10840
Data columns (total 13 columns):
App               10840 non-null object
Category          10841 non-null object
Rating            9367 non-null float64
Reviews           10841 non-null object
Size              10841 non-null object
Installs          10841 non-null object
Type              10840 non-null object
Price             10841 non-null object
Content Rating    10840 non-null object
Genres            10841 non-null object
Last Updated      10841 non-null object
Current Ver       10833 non-null object
Android Ver       10838 non-null object
dtypes: float64(1), object(12)
memory usage: 1.1+ MB


1. Attributes with very small impact - NO_OF_USER_REVIEWS,CONTENT_RATING


2. Attributes that not have any impact - ANDROID_VER


3. Treat some of the continuous attributes as categorical as categorical as continuous - 

    a. Column NO_OF_INSTALLS is converted from numerical to categorical values-'Low','Medium','High','VeryHigh' based on bin size-0,100000,1000000,10000000,1000000000.
    
    b. Column CATEGORY is converted from object to categorical values - "ENTERTAINMENT" and "NON-ENTERTAINMENT".
    

4. New attribute "Usability Rating" is generated with random value between 1 and 6.


5. Attributes like 'Last Updated' and 'Current Ver' columns are removed from dataset.


6. Data cleaning is done to columns like-
    SIZE_OF_APP - Converted to Nummerical datatype from Object.
    NO_OF_INSTALLS - Converted to Nummerical datatype from Object.
    PRICE - Converted to Nummerical datatype from Object.
    OVERALL_RATING - All not null values are removed.

#### Remove Duplicates :

In [5]:
googlePlayStore.duplicated().sum()

483

In [6]:
googlePlayStore.drop_duplicates(inplace=True)

In [7]:
googlePlayStore.duplicated().sum()

0

#### Delete Irrelevant Attributes,'Genres', 'Last Updated' and 'Current Ver' columns¶

In [8]:
# Remove column-Genres from table:
googlePlayStore.drop('Genres',inplace=True,axis=1)

In [9]:
# Remove column-Last Updated from table:
googlePlayStore.drop('Last Updated',inplace=True,axis=1)

In [10]:
# Remove column-Current Ver from table:
googlePlayStore.drop('Current Ver',inplace=True,axis=1)

#### Add new column 'Usability Rating' with random numbers between 1 to 5.

In [11]:
googlePlayStore['Usability Rating'] = np.random.randint(1, 6, googlePlayStore.shape[0])

In [12]:
googlePlayStore['Usability Rating'].value_counts()

4    2129
3    2098
1    2093
2    2048
5    1990
Name: Usability Rating, dtype: int64

#### Rename Columns:

In [13]:
googlePlayStore.rename(columns={'App':'APP_NAME','Category':'CATEGORY','Rating':'OVERALL_RATING','Usability Rating':'USABILITY_STAR',
                               'Reviews':'NO_OF_USER_REVIEWS','Size':'SIZE_OF_APP_MB','Installs':'NO_OF_INSTALLS','Type':'TYPE',
                               'Price':'PRICE','Content Rating':'CONTENT_RATING','Android Ver':'ANDROID_VER'
                               },inplace=True)
googlePlayStore.head(2)

Unnamed: 0,APP_NAME,CATEGORY,OVERALL_RATING,NO_OF_USER_REVIEWS,SIZE_OF_APP_MB,NO_OF_INSTALLS,TYPE,PRICE,CONTENT_RATING,ANDROID_VER,USABILITY_STAR
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,4.0.3 and up,5
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,4.0.3 and up,5


#### Delete the Not null values 

In [14]:
googlePlayStore.isnull().sum()

APP_NAME                 1
CATEGORY                 0
OVERALL_RATING        1465
NO_OF_USER_REVIEWS       0
SIZE_OF_APP_MB           0
NO_OF_INSTALLS           0
TYPE                     1
PRICE                    0
CONTENT_RATING           1
ANDROID_VER              3
USABILITY_STAR           0
dtype: int64

In [15]:
googlePlayStore.replace(["NaN", 'NaT'], np.nan, inplace = True)
googlePlayStore = googlePlayStore.dropna()

In [16]:
googlePlayStore.isnull().sum()

APP_NAME              0
CATEGORY              0
OVERALL_RATING        0
NO_OF_USER_REVIEWS    0
SIZE_OF_APP_MB        0
NO_OF_INSTALLS        0
TYPE                  0
PRICE                 0
CONTENT_RATING        0
ANDROID_VER           0
USABILITY_STAR        0
dtype: int64

In [17]:
googlePlayStore.reset_index(drop=True)

Unnamed: 0,APP_NAME,CATEGORY,OVERALL_RATING,NO_OF_USER_REVIEWS,SIZE_OF_APP_MB,NO_OF_INSTALLS,TYPE,PRICE,CONTENT_RATING,ANDROID_VER,USABILITY_STAR
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,4.0.3 and up,5
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,4.0.3 and up,5
2,"U Launcher Lite â€“ FREE Live Cool Themes, Hid...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,4.0.3 and up,3
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,4.2 and up,3
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,4.4 and up,2
5,Paper flowers instructions,ART_AND_DESIGN,4.4,167,5.6M,"50,000+",Free,0,Everyone,2.3 and up,2
6,Smoke Effect Photo Maker - Smoke Editor,ART_AND_DESIGN,3.8,178,19M,"50,000+",Free,0,Everyone,4.0.3 and up,2
7,Infinite Painter,ART_AND_DESIGN,4.1,36815,29M,"1,000,000+",Free,0,Everyone,4.2 and up,2
8,Garden Coloring Book,ART_AND_DESIGN,4.4,13791,33M,"1,000,000+",Free,0,Everyone,3.0 and up,4
9,Kids Paint Free - Drawing Fun,ART_AND_DESIGN,4.7,121,3.1M,"10,000+",Free,0,Everyone,4.0.3 and up,5


####  Cleaning the column- SIZE_OF_APP
#### a. Change the column  value -'varies with devices' to nan
#### b. Change all data to MB

In [18]:
len(googlePlayStore[googlePlayStore.SIZE_OF_APP_MB == 'Varies with device'])

1468

In [19]:
googlePlayStore['SIZE_OF_APP_MB'].replace('Varies with device', np.nan, inplace = True )

In [20]:
len(googlePlayStore[googlePlayStore.SIZE_OF_APP_MB == 'Varies with device'])

0

In [21]:
googlePlayStore.SIZE_OF_APP_MB = (googlePlayStore.SIZE_OF_APP_MB.replace(r'[kM]+$', '', regex=True).astype(float) * \
             googlePlayStore.SIZE_OF_APP_MB.str.extract(r'[\d\.]+([kM]+)', expand=False).fillna(1).replace(['k','M'], [10**3,10**6]).astype(int))


In [22]:
googlePlayStore.SIZE_OF_APP_MB.isnull().sum()

1468

In [23]:
googlePlayStore['SIZE_OF_APP_MB'].fillna(googlePlayStore['SIZE_OF_APP_MB'].mean(),inplace=True)

In [24]:
googlePlayStore.SIZE_OF_APP_MB.isnull().sum()

0

In [25]:
# Change the columns in terms of Megabytes
googlePlayStore['SIZE_OF_APP_MB'] = googlePlayStore['SIZE_OF_APP_MB'].apply(lambda x: x/10**6)

#### Cleaning the column- NO_OF_INSTALLS
#### a. Remove the + and , and change it to numeric .

In [26]:
googlePlayStore.NO_OF_INSTALLS = googlePlayStore.NO_OF_INSTALLS.apply(lambda x: x.replace(',',''))
googlePlayStore.NO_OF_INSTALLS = googlePlayStore.NO_OF_INSTALLS.apply(lambda x: x.replace('+',''))
googlePlayStore.NO_OF_INSTALLS = googlePlayStore.NO_OF_INSTALLS.apply(lambda x: int(x))

#### Cleaning the column - PRICE
#### Remove the dollar sign and convert the data type to numeric.

In [36]:
googlePlayStore[googlePlayStore.PRICE!=0].index

Int64Index([  234,   235,   427,   476,   477,   481,   571,   851,   852,
              853,
            ...
            10594, 10645, 10675, 10679, 10682, 10690, 10697, 10760, 10782,
            10785],
           dtype='int64', length=612)

In [39]:
googlePlayStore.PRICE = googlePlayStore.PRICE.astype('str').apply(lambda x: x.replace('$','')).astype('float64')

In [41]:
googlePlayStore.PRICE[234]

4.99

#### Cleaning the column- OVERALL_RATING

In [42]:
googlePlayStore['OVERALL_RATING'].replace('Varies with device',googlePlayStore['OVERALL_RATING'].mean() , inplace = True )

#### Now there is no missing data

In [43]:
#missing data
null_records = googlePlayStore.isnull().sum().sort_values(ascending=False)
null_records

USABILITY_STAR        0
ANDROID_VER           0
CONTENT_RATING        0
PRICE                 0
TYPE                  0
NO_OF_INSTALLS        0
SIZE_OF_APP_MB        0
NO_OF_USER_REVIEWS    0
OVERALL_RATING        0
CATEGORY              0
APP_NAME              0
dtype: int64

In [44]:
googlePlayStore.head(2)

Unnamed: 0,APP_NAME,CATEGORY,OVERALL_RATING,NO_OF_USER_REVIEWS,SIZE_OF_APP_MB,NO_OF_INSTALLS,TYPE,PRICE,CONTENT_RATING,ANDROID_VER,USABILITY_STAR
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19.0,10000,Free,0.0,Everyone,4.0.3 and up,5
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14.0,500000,Free,0.0,Everyone,4.0.3 and up,5


#### Convert NO_OF_INSTALLS from numerical to categorical-'Low','Medium','High','VeryHigh' 
#### based on the bin size -0,100000,1000000,10000000,1000000000

In [45]:
googlePlayStore['NO_OF_INSTALLS_CAT'] = pd.cut(googlePlayStore['NO_OF_INSTALLS'],bins=[0,100000,1000000,10000000,1000000000],
                           labels=['Low','Medium','High','VeryHigh'])
googlePlayStore['NO_OF_INSTALLS_CAT'].value_counts()

Low         4321
Medium      2002
High        1815
VeryHigh     751
Name: NO_OF_INSTALLS_CAT, dtype: int64

In [46]:
googlePlayStore.drop(['NO_OF_INSTALLS'], axis = 1, inplace = True)

#### Convert CATEGORY from object to categorical - 'ENTERTAINMENT' and 'NON ENTERTAINMENT'

In [47]:
googlePlayStore['CATEGORY'].replace(['FAMILY','GAME','PHOTOGRAPHY','LIFESTYLE','SPORTS','NEWS_AND_MAGAZINES',
                                     'HEALTH_AND_FITNESS','SOCIAL' ,'BEAUTY','COMICS','ART_AND_DESIGN','FOOD_AND_DRINK',
                                     'DATING','VIDEO_PLAYERS','SHOPPING','TRAVEL_AND_LOCAL'],
                                     ['ENTERTAINMENT','ENTERTAINMENT','ENTERTAINMENT','ENTERTAINMENT',
                                      'ENTERTAINMENT','ENTERTAINMENT','ENTERTAINMENT','ENTERTAINMENT',
                                      'ENTERTAINMENT','ENTERTAINMENT','ENTERTAINMENT','ENTERTAINMENT',
                                     'ENTERTAINMENT','ENTERTAINMENT','ENTERTAINMENT',
                                     'ENTERTAINMENT'], inplace = True)

In [48]:
googlePlayStore['CATEGORY'].replace(['TOOLS','PRODUCTIVITY','FINANCE','PERSONALIZATION','COMMUNICATION',
                                     'MEDICAL','BUSINESS','BOOKS_AND_REFERENCE' ,'EDUCATION','MAPS_AND_NAVIGATION',
                                     'WEATHER','AUTO_AND_VEHICLES','HOUSE_AND_HOME','LIBRARIES_AND_DEMO',
                                    'LIBRARIES_AND_DEMO','PARENTING','EVENTS'],
                                     ['NON ENTERTAINMENT','NON ENTERTAINMENT','NON ENTERTAINMENT','NON ENTERTAINMENT',
                                      'NON ENTERTAINMENT','NON ENTERTAINMENT','NON ENTERTAINMENT','NON ENTERTAINMENT',
                                      'NON ENTERTAINMENT','NON ENTERTAINMENT','NON ENTERTAINMENT','NON ENTERTAINMENT',
                                     'NON ENTERTAINMENT','NON ENTERTAINMENT','NON ENTERTAINMENT',
                                     'NON ENTERTAINMENT','NON ENTERTAINMENT'], inplace = True)

In [49]:
googlePlayStore.CATEGORY.value_counts()

ENTERTAINMENT        5511
NON ENTERTAINMENT    3378
Name: CATEGORY, dtype: int64

####  Generate y label :
#### 1. Convert CATEGORY to numerical attribute . ENTERTAINMENT -0 and NON ENTERTAINMENT -1.
#### 2. Convert TYPE to numerical attribute . FREE - 0 and PAID -1.
#### 3. Convert NO_OF_INSTALLS_CAT to numerical attribute . Very High -4,High-3,Medium-2,Low-1.
#### 4. Convert CONTENT_RATING to numerical attribute . 'Everyone'-5, 'Teen'-4,'Mature 17+'-3,'Everyone 10+'-2,'Adults only 18+'-3,'Unrated'-1.

In [50]:
df_list = ['CATEGORY', 'OVERALL_RATING', 'NO_OF_USER_REVIEWS',
             'SIZE_OF_APP_MB', 'PRICE',
             'TYPE', 'NO_OF_INSTALLS_CAT','CONTENT_RATING','USABILITY_STAR']
googlePlayStore_df = googlePlayStore[df_list]

In [51]:
googlePlayStore_df.reset_index(drop=True)

Unnamed: 0,CATEGORY,OVERALL_RATING,NO_OF_USER_REVIEWS,SIZE_OF_APP_MB,PRICE,TYPE,NO_OF_INSTALLS_CAT,CONTENT_RATING,USABILITY_STAR
0,ENTERTAINMENT,4.1,159,19.000000,0.0,Free,Low,Everyone,5
1,ENTERTAINMENT,3.9,967,14.000000,0.0,Free,Medium,Everyone,5
2,ENTERTAINMENT,4.7,87510,8.700000,0.0,Free,High,Everyone,3
3,ENTERTAINMENT,4.5,215644,25.000000,0.0,Free,VeryHigh,Teen,3
4,ENTERTAINMENT,4.3,967,2.800000,0.0,Free,Low,Everyone,2
5,ENTERTAINMENT,4.4,167,5.600000,0.0,Free,Low,Everyone,2
6,ENTERTAINMENT,3.8,178,19.000000,0.0,Free,Low,Everyone,2
7,ENTERTAINMENT,4.1,36815,29.000000,0.0,Free,Medium,Everyone,2
8,ENTERTAINMENT,4.4,13791,33.000000,0.0,Free,Medium,Everyone,4
9,ENTERTAINMENT,4.7,121,3.100000,0.0,Free,Low,Everyone,5


In [52]:
googlePlayStore_df.isnull().sum()

CATEGORY              0
OVERALL_RATING        0
NO_OF_USER_REVIEWS    0
SIZE_OF_APP_MB        0
PRICE                 0
TYPE                  0
NO_OF_INSTALLS_CAT    0
CONTENT_RATING        0
USABILITY_STAR        0
dtype: int64

In [53]:
#replace ENTERTAINMENT - 0 and NON ENTERTAINMENT -1
googlePlayStore_df['CATEGORY'].replace(['ENTERTAINMENT', 'NON ENTERTAINMENT'],
                                  [0, 1], inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)


In [54]:
googlePlayStore_df['TYPE'].replace(['Free', 'Paid'],
                                  [0, 1], inplace = True)

In [55]:
googlePlayStore_df['NO_OF_INSTALLS_CAT'].replace(['VeryHigh', 'High','Medium','Low'],
                                  [4,3,2,1], inplace = True)

In [56]:
googlePlayStore_df['CONTENT_RATING'].replace(['Everyone', 'Teen','Mature 17+','Everyone 10+','Adults only 18+','Unrated'],
                                  [5,4,3,2,3,1], inplace = True)

In [57]:
googlePlayStore_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8889 entries, 0 to 10840
Data columns (total 9 columns):
CATEGORY              8889 non-null int64
OVERALL_RATING        8889 non-null float64
NO_OF_USER_REVIEWS    8889 non-null object
SIZE_OF_APP_MB        8889 non-null float64
PRICE                 8889 non-null float64
TYPE                  8889 non-null int64
NO_OF_INSTALLS_CAT    8889 non-null int64
CONTENT_RATING        8889 non-null int64
USABILITY_STAR        8889 non-null int64
dtypes: float64(3), int64(5), object(1)
memory usage: 1014.5+ KB


In [58]:
df_l = ['CATEGORY', 'OVERALL_RATING', 'SIZE_OF_APP_MB',
             'TYPE', 'NO_OF_INSTALLS_CAT','USABILITY_STAR']
googlePlayStore_n = googlePlayStore_df[df_l]

In [59]:
googlePlayStore_df_norm = (googlePlayStore_n - googlePlayStore_n.min()) / (googlePlayStore_n.max() - googlePlayStore_n.min())

In [60]:
googlePlayStore_df_norm.head(5)

Unnamed: 0,CATEGORY,OVERALL_RATING,SIZE_OF_APP_MB,TYPE,NO_OF_INSTALLS_CAT,USABILITY_STAR
0,0.0,0.775,0.189931,0.0,0.0,1.0
1,0.0,0.725,0.139927,0.0,0.333333,1.0
2,0.0,0.925,0.086922,0.0,0.666667,0.5
3,0.0,0.875,0.249936,0.0,1.0,0.5
4,0.0,0.825,0.027917,0.0,0.0,0.25


In [61]:
#create a data frame for the function value and class labels
class_label = pd.DataFrame(columns = ['Class_value'])

In [62]:
googlePlayStore_df_norm.reset_index(inplace=True)

In [63]:
googlePlayStore_df_norm.SIZE_OF_APP_MB.value_counts()

0.227473    1468
0.139927     154
0.129926     152
0.119925     151
0.149928     149
0.109924     149
0.169929     126
0.249936     125
0.159929     112
0.209933     112
0.239935     111
0.189931     110
0.199932     107
0.259937     106
0.229935      99
0.099923      99
0.179930      99
0.219934      92
0.269938      85
0.279939      74
0.369946      71
0.329943      70
0.299940      69
0.349945      66
0.309941      65
0.289940      63
0.032918      62
0.439952      56
0.399949      55
0.459954      54
            ... 
0.008906       1
0.009776       1
0.001445       1
0.002795       1
0.003695       1
0.000735       1
0.008006       1
0.006386       1
0.001355       1
0.005996       1
0.005735       1
0.006876       1
0.001945       1
0.006956       1
0.005165       1
0.009216       1
0.004025       1
0.009396       1
0.007046       1
0.000055       1
0.008486       1
0.004585       1
0.001515       1
0.002715       1
0.006276       1
0.002845       1
0.000245       1
0.001775      

In [64]:
# Generation of class label -
# High impact label like OVERALL_RATING , USABILITY_STAR and NO_OF_INSTALLS_CAT are given higher weightage 
# Low impact labels like CATEGORY,SIZE_OF_APP_MB and TYPE is given lower weightage.

In [65]:
#class value
#class value
for i in range (0, googlePlayStore_df_norm.shape[0]):
    value = - 0.5 * googlePlayStore_df_norm['CATEGORY'][i] + 20 * googlePlayStore_df_norm['OVERALL_RATING'][i] + \
            - 1 * googlePlayStore_df_norm['SIZE_OF_APP_MB'][i]  + \
             +3 * googlePlayStore_df_norm['TYPE'][i] + 15 * googlePlayStore_df_norm['NO_OF_INSTALLS_CAT'][i] + \
            + 30 * googlePlayStore_df_norm['USABILITY_STAR'][i]
    class_label.loc[i] = value

In [66]:
high_value = class_label.sort_values('Class_value', ascending = False).reset_index(drop = True)\
       ['Class_value'][2500]
# 
avg_value = class_label.sort_values('Class_value', ascending = False).reset_index(drop = True)\
       ['Class_value'][5000]

In [67]:
high_value

43.27252740920386

In [68]:
avg_value

32.77252740920385

In [69]:
#compute the classlabel
for i in range (0, class_label.shape[0]):
    if class_label['Class_value'][i] >= high_value:
        y_label = "High"
    elif class_label['Class_value'][i] >= avg_value:
        y_label = "Average"
    else:
        y_label = "Low"
    class_label.loc[i] = y_label

In [70]:
googlePlayStore_catboost = pd.concat([googlePlayStore, class_label], axis = 1)

In [71]:
googlePlayStore.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8889 entries, 0 to 10840
Data columns (total 11 columns):
APP_NAME              8889 non-null object
CATEGORY              8889 non-null object
OVERALL_RATING        8889 non-null float64
NO_OF_USER_REVIEWS    8889 non-null object
SIZE_OF_APP_MB        8889 non-null float64
TYPE                  8889 non-null object
PRICE                 8889 non-null float64
CONTENT_RATING        8889 non-null object
ANDROID_VER           8889 non-null object
USABILITY_STAR        8889 non-null int64
NO_OF_INSTALLS_CAT    8889 non-null category
dtypes: category(1), float64(3), int64(1), object(6)
memory usage: 1.1+ MB


In [72]:
googlePlayStore_catboost.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10395 entries, 0 to 10840
Data columns (total 12 columns):
APP_NAME              8889 non-null object
CATEGORY              8889 non-null object
OVERALL_RATING        8889 non-null float64
NO_OF_USER_REVIEWS    8889 non-null object
SIZE_OF_APP_MB        8889 non-null float64
TYPE                  8889 non-null object
PRICE                 8889 non-null float64
CONTENT_RATING        8889 non-null object
ANDROID_VER           8889 non-null object
USABILITY_STAR        8889 non-null float64
NO_OF_INSTALLS_CAT    8889 non-null category
Class_value           8889 non-null object
dtypes: category(1), float64(4), object(7)
memory usage: 984.9+ KB


In [73]:
googlePlayStore_catboost.Class_value.value_counts()

Low        3886
Average    2502
High       2501
Name: Class_value, dtype: int64

In [74]:
googlePlayStore_catboost.head(3)

Unnamed: 0,APP_NAME,CATEGORY,OVERALL_RATING,NO_OF_USER_REVIEWS,SIZE_OF_APP_MB,TYPE,PRICE,CONTENT_RATING,ANDROID_VER,USABILITY_STAR,NO_OF_INSTALLS_CAT,Class_value
0,Photo Editor & Candy Camera & Grid & ScrapBook,ENTERTAINMENT,4.1,159,19.0,Free,0.0,Everyone,4.0.3 and up,5.0,Low,High
1,Coloring book moana,ENTERTAINMENT,3.9,967,14.0,Free,0.0,Everyone,4.0.3 and up,5.0,Medium,High
2,"U Launcher Lite â€“ FREE Live Cool Themes, Hid...",ENTERTAINMENT,4.7,87510,8.7,Free,0.0,Everyone,4.0.3 and up,3.0,High,High


### 4. Training and Test Sets
#### Split the data you generated into a training set and a test set. Explain how you chose the proportions. How did
#### the knowledge of your dataset influence the size of the training dataset? Feel free to use external sources for
#### guidance on how to choose proportions, but do not forget to reference the sources.

#### Case 1: Random Selection in determining best split

#### When the train size is 0.5 --> accuracy is 97.75%
#### When the train size is 0.6--->accuracy is  97.67%
#### When the train size is 0.7 ---> accuracy is 98.11%
#### When the train size is 0.75--->accuracy is  97.78%
#### When the train size is 0.8 --->accuracy is  97.77%
#### Though there is no much difference in accuracy , trainsize = 0.7 gives the highest accuracy .

In [75]:
googlePlayStore_catboost.columns

Index(['APP_NAME', 'CATEGORY', 'OVERALL_RATING', 'NO_OF_USER_REVIEWS',
       'SIZE_OF_APP_MB', 'TYPE', 'PRICE', 'CONTENT_RATING', 'ANDROID_VER',
       'USABILITY_STAR', 'NO_OF_INSTALLS_CAT', 'Class_value'],
      dtype='object')

In [76]:

googlePlayStore_cat = googlePlayStore_catboost[['CATEGORY', 'OVERALL_RATING','NO_OF_USER_REVIEWS','SIZE_OF_APP_MB','USABILITY_STAR','PRICE','NO_OF_INSTALLS_CAT', 'Class_value']]

In [77]:
googlePlayStore_cat.shape

(10395, 8)

In [78]:
googlePlayStore_cat.head(3)

Unnamed: 0,CATEGORY,OVERALL_RATING,NO_OF_USER_REVIEWS,SIZE_OF_APP_MB,USABILITY_STAR,PRICE,NO_OF_INSTALLS_CAT,Class_value
0,ENTERTAINMENT,4.1,159,19.0,5.0,0.0,Low,High
1,ENTERTAINMENT,3.9,967,14.0,5.0,0.0,Medium,High
2,ENTERTAINMENT,4.7,87510,8.7,3.0,0.0,High,High


In [79]:
x_label = googlePlayStore_cat.drop('Class_value', axis=1)
y_label = googlePlayStore_cat.Class_value

In [80]:
x_label.shape

(10395, 7)

In [81]:
y_label.shape

(10395,)

In [82]:
y_label.replace(to_replace=dict(High=2, Average=1, Low=0), inplace=True)

In [83]:
x_label.replace(["NaN", 'NaT'], np.nan, inplace = True)
x_label = x_label.dropna()
x_label.head(3)

Unnamed: 0,CATEGORY,OVERALL_RATING,NO_OF_USER_REVIEWS,SIZE_OF_APP_MB,USABILITY_STAR,PRICE,NO_OF_INSTALLS_CAT
0,ENTERTAINMENT,4.1,159.0,19.0,5.0,0.0,Low
1,ENTERTAINMENT,3.9,967.0,14.0,5.0,0.0,Medium
2,ENTERTAINMENT,4.7,87510.0,8.7,3.0,0.0,High


In [84]:
x_label.shape

(8889, 7)

In [85]:
x_label.reset_index(inplace=True)

In [86]:
x_label.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8889 entries, 0 to 8888
Data columns (total 8 columns):
index                 8889 non-null int64
CATEGORY              8889 non-null object
OVERALL_RATING        8889 non-null float64
NO_OF_USER_REVIEWS    8889 non-null float64
SIZE_OF_APP_MB        8889 non-null float64
USABILITY_STAR        8889 non-null float64
PRICE                 8889 non-null float64
NO_OF_INSTALLS_CAT    8889 non-null category
dtypes: category(1), float64(5), int64(1), object(1)
memory usage: 495.1+ KB


In [87]:
x_label_train = pd.get_dummies(x_label)

In [88]:
x_label_train.head(2)

Unnamed: 0,index,OVERALL_RATING,NO_OF_USER_REVIEWS,SIZE_OF_APP_MB,USABILITY_STAR,PRICE,CATEGORY_ENTERTAINMENT,CATEGORY_NON ENTERTAINMENT,NO_OF_INSTALLS_CAT_Low,NO_OF_INSTALLS_CAT_Medium,NO_OF_INSTALLS_CAT_High,NO_OF_INSTALLS_CAT_VeryHigh
0,0,4.1,159.0,19.0,5.0,0.0,1,0,1,0,0,0
1,1,3.9,967.0,14.0,5.0,0.0,1,0,0,1,0,0


In [89]:
x_label_train

Unnamed: 0,index,OVERALL_RATING,NO_OF_USER_REVIEWS,SIZE_OF_APP_MB,USABILITY_STAR,PRICE,CATEGORY_ENTERTAINMENT,CATEGORY_NON ENTERTAINMENT,NO_OF_INSTALLS_CAT_Low,NO_OF_INSTALLS_CAT_Medium,NO_OF_INSTALLS_CAT_High,NO_OF_INSTALLS_CAT_VeryHigh
0,0,4.1,159.0,19.000000,5.0,0.0,1,0,1,0,0,0
1,1,3.9,967.0,14.000000,5.0,0.0,1,0,0,1,0,0
2,2,4.7,87510.0,8.700000,3.0,0.0,1,0,0,0,1,0
3,3,4.5,215644.0,25.000000,3.0,0.0,1,0,0,0,0,1
4,4,4.3,967.0,2.800000,2.0,0.0,1,0,1,0,0,0
5,5,4.4,167.0,5.600000,2.0,0.0,1,0,1,0,0,0
6,6,3.8,178.0,19.000000,2.0,0.0,1,0,1,0,0,0
7,7,4.1,36815.0,29.000000,2.0,0.0,1,0,0,1,0,0
8,8,4.4,13791.0,33.000000,4.0,0.0,1,0,0,1,0,0
9,9,4.7,121.0,3.100000,5.0,0.0,1,0,1,0,0,0


In [90]:
#x_label_train = (x_label_train - x_label_train.mean()) / (x_label_train.max() - x_label_train.min())

In [91]:
x_label_train.head(3)

Unnamed: 0,index,OVERALL_RATING,NO_OF_USER_REVIEWS,SIZE_OF_APP_MB,USABILITY_STAR,PRICE,CATEGORY_ENTERTAINMENT,CATEGORY_NON ENTERTAINMENT,NO_OF_INSTALLS_CAT_Low,NO_OF_INSTALLS_CAT_Medium,NO_OF_INSTALLS_CAT_High,NO_OF_INSTALLS_CAT_VeryHigh
0,0,4.1,159.0,19.0,5.0,0.0,1,0,1,0,0,0
1,1,3.9,967.0,14.0,5.0,0.0,1,0,0,1,0,0
2,2,4.7,87510.0,8.7,3.0,0.0,1,0,0,0,1,0


In [92]:
categorical_features = np.where(x_label_train.dtypes != np.float)[0]

In [93]:
y_label.dropna(inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(result)


In [94]:
x_label.shape

(8889, 8)

In [95]:
y_label.shape

(8889,)

In [96]:
y_label.head(3)

0    2.0
1    2.0
2    2.0
Name: Class_value, dtype: float64

In [97]:
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x_label_train, y_label, train_size=0.7, random_state=100)



#### 5. Train the Model
#### In this step, you will use the training dataset to train a classification model using the CatBoost library for
#### gradient boosting with support for categorical features (https://catboost.yandex/). CatBoost is available in three
#### forms: as a python library (could be installed using pip), as a package for R, the statistical computing software,
#### and as a command-line utility. Feel free to choose either option.
#### Install CatBoost and use it to train the classification model using the training dataset. Read the documentation
#### and look at the examples to grasp the idea of how to use the library.

#### Step 1 :   Tuning columns to check the best model 
####  ----------------------------------------------------------------------------
#### Determining best model by selecting different columns and checking which gives maximum accuracy.
####  # With feature columns CATEGORY','OVERALL_RATING','NO_OF_USER_REVIEWS','SIZE_OF_APP_MB','USABILITY_STAR','PRICE','NO_OF_INSTALLS_CAT' 
#### Accuracy when using k-fold cross validation is 97.48%
#### Accuracy when using train-test split of 70-30 is 97.54%

####  Step 2 :   GridSearchCV
#### --------------------------------------------------------------------------
#### 1. Used GridSearchCV to determine the n_estimator and learning_rate . Best learning_rate and n_estimator is determined 
#### as  n_estimators = 350,learning_rate = 0.1, max_depth = 6 .Added these parameter to CatBoostClassifier.
#### Accuracy is 97.39%.
#### 2. With n_estimators = 350,learning_rate = 0.1, max_depth = 9 , Accuracy is 97.73%

####  Step 3 :   Normalized x-label values
#### --------------------------------------------------------------------------
#### Modified x-label to have normalized value between 0 and 1 and gave as input to train the model.
#### Accuracy -->97.73% (No change)

In [103]:
model = CatBoostClassifier(
    custom_loss = ['Accuracy'],
    random_seed = 100,
    loss_function = 'MultiClass',
     n_estimators = 350,
    learning_rate = 0.1, 
    max_depth = 6
)

In [104]:
# Step 1: With feature columns - 'CATEGORY', 'OVERALL_RATING','NO_OF_USER_REVIEWS','SIZE_OF_APP_MB'  --->Accuracy is 57.62%
# With feature columns-'CATEGORY', 'OVERALL_RATING','NO_OF_USER_REVIEWS','SIZE_OF_APP_MB','USABILITY_STAR' --> Accuracy is 89.77%
# With feature columns-'CATEGORY', 'OVERALL_RATING','NO_OF_USER_REVIEWS','SIZE_OF_APP_MB','USABILITY_STAR','TYPE' --> Accuracy is 93.65%
# With feature columns - 'CATEGORY', 'OVERALL_RATING','NO_OF_USER_REVIEWS','SIZE_OF_APP_MB','USABILITY_STAR','PRICE' -->Accuracy is 93.75%
# With feature columns - 'CATEGORY', 'OVERALL_RATING','NO_OF_USER_REVIEWS','SIZE_OF_APP_MB','USABILITY_STAR','CONTENT_RATING' -->Accuracy is 89.80%
# With feature columns-'CATEGORY', 'OVERALL_RATING','NO_OF_USER_REVIEWS','SIZE_OF_APP_MB','USABILITY_STAR','NO_OF_INSTALLS_CAT'-->Accuracy is 93.82% 
# With feature columns 'CATEGORY', 'OVERALL_RATING','NO_OF_USER_REVIEWS','SIZE_OF_APP_MB','USABILITY_STAR','PRICE','NO_OF_INSTALLS_CAT' --> Accuracy is 97.78%  
print(cross_val_score(model,x_label_train,y_label,cv=10,scoring="accuracy").mean())



0:	learn: -0.9704222	total: 43.6ms	remaining: 15.2s
1:	learn: -0.8580668	total: 76.8ms	remaining: 13.4s
2:	learn: -0.7717834	total: 110ms	remaining: 12.8s
3:	learn: -0.7007520	total: 143ms	remaining: 12.4s
4:	learn: -0.6382375	total: 178ms	remaining: 12.3s
5:	learn: -0.5856060	total: 228ms	remaining: 13.1s
6:	learn: -0.5432447	total: 267ms	remaining: 13.1s
7:	learn: -0.4996126	total: 308ms	remaining: 13.2s
8:	learn: -0.4629293	total: 343ms	remaining: 13s
9:	learn: -0.4287694	total: 376ms	remaining: 12.8s
10:	learn: -0.4010832	total: 410ms	remaining: 12.6s
11:	learn: -0.3790517	total: 445ms	remaining: 12.5s
12:	learn: -0.3553570	total: 479ms	remaining: 12.4s
13:	learn: -0.3337921	total: 522ms	remaining: 12.5s
14:	learn: -0.3152165	total: 560ms	remaining: 12.5s
15:	learn: -0.2975089	total: 595ms	remaining: 12.4s
16:	learn: -0.2834540	total: 629ms	remaining: 12.3s
17:	learn: -0.2691756	total: 664ms	remaining: 12.2s
18:	learn: -0.2555966	total: 703ms	remaining: 12.2s
19:	learn: -0.2430858	

159:	learn: -0.0329711	total: 6.03s	remaining: 7.17s
160:	learn: -0.0327872	total: 6.08s	remaining: 7.14s
161:	learn: -0.0325982	total: 6.12s	remaining: 7.11s
162:	learn: -0.0322789	total: 6.17s	remaining: 7.07s
163:	learn: -0.0320369	total: 6.21s	remaining: 7.05s
164:	learn: -0.0318514	total: 6.26s	remaining: 7.02s
165:	learn: -0.0317967	total: 6.3s	remaining: 6.99s
166:	learn: -0.0314990	total: 6.35s	remaining: 6.96s
167:	learn: -0.0311507	total: 6.4s	remaining: 6.94s
168:	learn: -0.0310056	total: 6.45s	remaining: 6.9s
169:	learn: -0.0307814	total: 6.49s	remaining: 6.87s
170:	learn: -0.0305377	total: 6.54s	remaining: 6.85s
171:	learn: -0.0303603	total: 6.59s	remaining: 6.82s
172:	learn: -0.0300130	total: 6.63s	remaining: 6.79s
173:	learn: -0.0298659	total: 6.68s	remaining: 6.76s
174:	learn: -0.0296870	total: 6.73s	remaining: 6.73s
175:	learn: -0.0295349	total: 6.78s	remaining: 6.7s
176:	learn: -0.0294248	total: 6.82s	remaining: 6.67s
177:	learn: -0.0290983	total: 6.86s	remaining: 6.6

317:	learn: -0.0161387	total: 13.5s	remaining: 1.36s
318:	learn: -0.0161113	total: 13.6s	remaining: 1.32s
319:	learn: -0.0160355	total: 13.6s	remaining: 1.28s
320:	learn: -0.0160103	total: 13.7s	remaining: 1.23s
321:	learn: -0.0159759	total: 13.7s	remaining: 1.19s
322:	learn: -0.0159268	total: 13.8s	remaining: 1.15s
323:	learn: -0.0159015	total: 13.8s	remaining: 1.11s
324:	learn: -0.0158770	total: 13.9s	remaining: 1.06s
325:	learn: -0.0158366	total: 13.9s	remaining: 1.02s
326:	learn: -0.0157752	total: 14s	remaining: 982ms
327:	learn: -0.0157277	total: 14s	remaining: 940ms
328:	learn: -0.0157001	total: 14.1s	remaining: 898ms
329:	learn: -0.0156625	total: 14.1s	remaining: 856ms
330:	learn: -0.0156318	total: 14.2s	remaining: 813ms
331:	learn: -0.0156085	total: 14.2s	remaining: 770ms
332:	learn: -0.0155737	total: 14.3s	remaining: 728ms
333:	learn: -0.0155342	total: 14.3s	remaining: 686ms
334:	learn: -0.0154911	total: 14.4s	remaining: 643ms
335:	learn: -0.0154269	total: 14.4s	remaining: 601



0:	learn: -0.9575756	total: 51.1ms	remaining: 17.8s
1:	learn: -0.8552035	total: 86.7ms	remaining: 15.1s
2:	learn: -0.7641639	total: 130ms	remaining: 15s
3:	learn: -0.6926778	total: 179ms	remaining: 15.5s
4:	learn: -0.6338848	total: 226ms	remaining: 15.6s
5:	learn: -0.5817022	total: 276ms	remaining: 15.8s
6:	learn: -0.5354797	total: 323ms	remaining: 15.8s
7:	learn: -0.4958029	total: 365ms	remaining: 15.6s
8:	learn: -0.4576926	total: 419ms	remaining: 15.9s
9:	learn: -0.4266598	total: 466ms	remaining: 15.8s
10:	learn: -0.3998292	total: 518ms	remaining: 16s
11:	learn: -0.3737551	total: 573ms	remaining: 16.1s
12:	learn: -0.3513110	total: 625ms	remaining: 16.2s
13:	learn: -0.3294541	total: 673ms	remaining: 16.2s
14:	learn: -0.3118831	total: 719ms	remaining: 16s
15:	learn: -0.2960765	total: 765ms	remaining: 16s
16:	learn: -0.2818477	total: 809ms	remaining: 15.8s
17:	learn: -0.2678380	total: 851ms	remaining: 15.7s
18:	learn: -0.2552766	total: 892ms	remaining: 15.5s
19:	learn: -0.2425596	total:

161:	learn: -0.0306126	total: 7.11s	remaining: 8.25s
162:	learn: -0.0304353	total: 7.16s	remaining: 8.21s
163:	learn: -0.0303664	total: 7.2s	remaining: 8.16s
164:	learn: -0.0302277	total: 7.24s	remaining: 8.12s
165:	learn: -0.0300399	total: 7.29s	remaining: 8.07s
166:	learn: -0.0299916	total: 7.35s	remaining: 8.05s
167:	learn: -0.0298212	total: 7.4s	remaining: 8.01s
168:	learn: -0.0297890	total: 7.44s	remaining: 7.97s
169:	learn: -0.0294356	total: 7.49s	remaining: 7.92s
170:	learn: -0.0290423	total: 7.53s	remaining: 7.88s
171:	learn: -0.0286065	total: 7.57s	remaining: 7.84s
172:	learn: -0.0283593	total: 7.62s	remaining: 7.8s
173:	learn: -0.0282255	total: 7.66s	remaining: 7.75s
174:	learn: -0.0281485	total: 7.7s	remaining: 7.7s
175:	learn: -0.0280907	total: 7.75s	remaining: 7.66s
176:	learn: -0.0279836	total: 7.79s	remaining: 7.62s
177:	learn: -0.0277467	total: 7.84s	remaining: 7.58s
178:	learn: -0.0276796	total: 7.89s	remaining: 7.54s
179:	learn: -0.0276023	total: 7.93s	remaining: 7.49

319:	learn: -0.0150581	total: 14.6s	remaining: 1.37s
320:	learn: -0.0150009	total: 14.7s	remaining: 1.32s
321:	learn: -0.0149504	total: 14.7s	remaining: 1.28s
322:	learn: -0.0149345	total: 14.8s	remaining: 1.24s
323:	learn: -0.0148818	total: 14.8s	remaining: 1.19s
324:	learn: -0.0148536	total: 14.9s	remaining: 1.15s
325:	learn: -0.0148319	total: 14.9s	remaining: 1.1s
326:	learn: -0.0147947	total: 15s	remaining: 1.05s
327:	learn: -0.0147626	total: 15.1s	remaining: 1.01s
328:	learn: -0.0147332	total: 15.1s	remaining: 965ms
329:	learn: -0.0146569	total: 15.2s	remaining: 919ms
330:	learn: -0.0145336	total: 15.2s	remaining: 874ms
331:	learn: -0.0144913	total: 15.3s	remaining: 829ms
332:	learn: -0.0144507	total: 15.3s	remaining: 783ms
333:	learn: -0.0144084	total: 15.4s	remaining: 737ms
334:	learn: -0.0143653	total: 15.4s	remaining: 691ms
335:	learn: -0.0143234	total: 15.5s	remaining: 646ms
336:	learn: -0.0142796	total: 15.6s	remaining: 600ms
337:	learn: -0.0142230	total: 15.6s	remaining: 55



0:	learn: -0.9644690	total: 45ms	remaining: 15.7s
1:	learn: -0.8570692	total: 88.1ms	remaining: 15.3s
2:	learn: -0.7675438	total: 128ms	remaining: 14.8s
3:	learn: -0.6931675	total: 161ms	remaining: 13.9s
4:	learn: -0.6316193	total: 195ms	remaining: 13.5s
5:	learn: -0.5799562	total: 234ms	remaining: 13.4s
6:	learn: -0.5351082	total: 274ms	remaining: 13.4s
7:	learn: -0.4925615	total: 311ms	remaining: 13.3s
8:	learn: -0.4577640	total: 347ms	remaining: 13.1s
9:	learn: -0.4268703	total: 384ms	remaining: 13.1s
10:	learn: -0.4001432	total: 422ms	remaining: 13s
11:	learn: -0.3746718	total: 465ms	remaining: 13.1s
12:	learn: -0.3529082	total: 518ms	remaining: 13.4s
13:	learn: -0.3324349	total: 567ms	remaining: 13.6s
14:	learn: -0.3133735	total: 617ms	remaining: 13.8s
15:	learn: -0.2970075	total: 667ms	remaining: 13.9s
16:	learn: -0.2836032	total: 724ms	remaining: 14.2s
17:	learn: -0.2680223	total: 775ms	remaining: 14.3s
18:	learn: -0.2543391	total: 827ms	remaining: 14.4s
19:	learn: -0.2429321	to

161:	learn: -0.0306643	total: 7.71s	remaining: 8.95s
162:	learn: -0.0305921	total: 7.76s	remaining: 8.9s
163:	learn: -0.0304935	total: 7.81s	remaining: 8.85s
164:	learn: -0.0303267	total: 7.85s	remaining: 8.8s
165:	learn: -0.0299547	total: 7.89s	remaining: 8.75s
166:	learn: -0.0297789	total: 7.94s	remaining: 8.7s
167:	learn: -0.0296542	total: 7.99s	remaining: 8.65s
168:	learn: -0.0295354	total: 8.03s	remaining: 8.6s
169:	learn: -0.0293323	total: 8.07s	remaining: 8.55s
170:	learn: -0.0292459	total: 8.12s	remaining: 8.5s
171:	learn: -0.0291401	total: 8.16s	remaining: 8.45s
172:	learn: -0.0288616	total: 8.21s	remaining: 8.4s
173:	learn: -0.0288589	total: 8.24s	remaining: 8.33s
174:	learn: -0.0287522	total: 8.28s	remaining: 8.28s
175:	learn: -0.0284206	total: 8.32s	remaining: 8.23s
176:	learn: -0.0282703	total: 8.37s	remaining: 8.18s
177:	learn: -0.0281069	total: 8.42s	remaining: 8.13s
178:	learn: -0.0280518	total: 8.47s	remaining: 8.09s
179:	learn: -0.0278710	total: 8.51s	remaining: 8.04s

318:	learn: -0.0153346	total: 15.8s	remaining: 1.53s
319:	learn: -0.0153033	total: 15.9s	remaining: 1.49s
320:	learn: -0.0152530	total: 15.9s	remaining: 1.44s
321:	learn: -0.0152338	total: 16s	remaining: 1.39s
322:	learn: -0.0151643	total: 16s	remaining: 1.34s
323:	learn: -0.0150605	total: 16s	remaining: 1.29s
324:	learn: -0.0150384	total: 16.1s	remaining: 1.24s
325:	learn: -0.0150379	total: 16.1s	remaining: 1.19s
326:	learn: -0.0150073	total: 16.2s	remaining: 1.14s
327:	learn: -0.0149841	total: 16.2s	remaining: 1.09s
328:	learn: -0.0148891	total: 16.3s	remaining: 1.04s
329:	learn: -0.0148019	total: 16.4s	remaining: 991ms
330:	learn: -0.0147826	total: 16.4s	remaining: 941ms
331:	learn: -0.0146484	total: 16.4s	remaining: 892ms
332:	learn: -0.0146092	total: 16.5s	remaining: 842ms
333:	learn: -0.0145891	total: 16.5s	remaining: 792ms
334:	learn: -0.0145567	total: 16.6s	remaining: 743ms
335:	learn: -0.0145354	total: 16.6s	remaining: 694ms
336:	learn: -0.0144984	total: 16.7s	remaining: 644ms



0:	learn: -0.9600750	total: 35.4ms	remaining: 12.3s
1:	learn: -0.8501155	total: 72.8ms	remaining: 12.7s
2:	learn: -0.7628555	total: 106ms	remaining: 12.3s
3:	learn: -0.6886991	total: 143ms	remaining: 12.4s
4:	learn: -0.6294984	total: 182ms	remaining: 12.6s
5:	learn: -0.5774557	total: 232ms	remaining: 13.3s
6:	learn: -0.5324956	total: 277ms	remaining: 13.6s
7:	learn: -0.4934626	total: 325ms	remaining: 13.9s
8:	learn: -0.4578175	total: 378ms	remaining: 14.3s
9:	learn: -0.4267458	total: 436ms	remaining: 14.8s
10:	learn: -0.3990855	total: 498ms	remaining: 15.3s
11:	learn: -0.3746609	total: 548ms	remaining: 15.4s
12:	learn: -0.3539609	total: 596ms	remaining: 15.4s
13:	learn: -0.3342478	total: 650ms	remaining: 15.6s
14:	learn: -0.3145868	total: 699ms	remaining: 15.6s
15:	learn: -0.3005567	total: 742ms	remaining: 15.5s
16:	learn: -0.2857245	total: 784ms	remaining: 15.4s
17:	learn: -0.2696824	total: 827ms	remaining: 15.3s
18:	learn: -0.2563903	total: 875ms	remaining: 15.2s
19:	learn: -0.243100

161:	learn: -0.0308534	total: 7.93s	remaining: 9.21s
162:	learn: -0.0306130	total: 8.01s	remaining: 9.19s
163:	learn: -0.0304385	total: 8.09s	remaining: 9.17s
164:	learn: -0.0303982	total: 8.16s	remaining: 9.15s
165:	learn: -0.0301204	total: 8.23s	remaining: 9.12s
166:	learn: -0.0299156	total: 8.28s	remaining: 9.08s
167:	learn: -0.0297709	total: 8.35s	remaining: 9.04s
168:	learn: -0.0296918	total: 8.44s	remaining: 9.04s
169:	learn: -0.0294088	total: 8.53s	remaining: 9.03s
170:	learn: -0.0292856	total: 8.65s	remaining: 9.05s
171:	learn: -0.0292112	total: 8.76s	remaining: 9.06s
172:	learn: -0.0290868	total: 8.82s	remaining: 9.03s
173:	learn: -0.0289926	total: 8.9s	remaining: 9s
174:	learn: -0.0289889	total: 8.96s	remaining: 8.96s
175:	learn: -0.0289127	total: 9.05s	remaining: 8.95s
176:	learn: -0.0286883	total: 9.12s	remaining: 8.91s
177:	learn: -0.0285800	total: 9.19s	remaining: 8.88s
178:	learn: -0.0283959	total: 9.27s	remaining: 8.85s
179:	learn: -0.0282918	total: 9.32s	remaining: 8.8

319:	learn: -0.0157929	total: 17.2s	remaining: 1.62s
320:	learn: -0.0157682	total: 17.3s	remaining: 1.56s
321:	learn: -0.0157269	total: 17.3s	remaining: 1.51s
322:	learn: -0.0156897	total: 17.4s	remaining: 1.45s
323:	learn: -0.0156518	total: 17.4s	remaining: 1.4s
324:	learn: -0.0156061	total: 17.5s	remaining: 1.35s
325:	learn: -0.0155826	total: 17.6s	remaining: 1.29s
326:	learn: -0.0155581	total: 17.6s	remaining: 1.24s
327:	learn: -0.0155277	total: 17.7s	remaining: 1.18s
328:	learn: -0.0154932	total: 17.7s	remaining: 1.13s
329:	learn: -0.0154754	total: 17.8s	remaining: 1.08s
330:	learn: -0.0154442	total: 17.8s	remaining: 1.02s
331:	learn: -0.0153855	total: 17.9s	remaining: 968ms
332:	learn: -0.0153275	total: 17.9s	remaining: 914ms
333:	learn: -0.0152195	total: 18s	remaining: 860ms
334:	learn: -0.0151727	total: 18s	remaining: 807ms
335:	learn: -0.0151475	total: 18.1s	remaining: 753ms
336:	learn: -0.0150985	total: 18.1s	remaining: 700ms
337:	learn: -0.0149526	total: 18.2s	remaining: 646m



0:	learn: -0.9599748	total: 50.1ms	remaining: 17.5s
1:	learn: -0.8511654	total: 86.6ms	remaining: 15.1s
2:	learn: -0.7629104	total: 133ms	remaining: 15.3s
3:	learn: -0.6888705	total: 173ms	remaining: 15s
4:	learn: -0.6301850	total: 214ms	remaining: 14.8s
5:	learn: -0.5785237	total: 257ms	remaining: 14.7s
6:	learn: -0.5355853	total: 304ms	remaining: 14.9s
7:	learn: -0.4964155	total: 349ms	remaining: 14.9s
8:	learn: -0.4613347	total: 393ms	remaining: 14.9s
9:	learn: -0.4308000	total: 436ms	remaining: 14.8s
10:	learn: -0.4032951	total: 483ms	remaining: 14.9s
11:	learn: -0.3778794	total: 532ms	remaining: 15s
12:	learn: -0.3562561	total: 578ms	remaining: 15s
13:	learn: -0.3339319	total: 619ms	remaining: 14.9s
14:	learn: -0.3159560	total: 661ms	remaining: 14.8s
15:	learn: -0.3008167	total: 702ms	remaining: 14.6s
16:	learn: -0.2861254	total: 743ms	remaining: 14.6s
17:	learn: -0.2716394	total: 787ms	remaining: 14.5s
18:	learn: -0.2586404	total: 832ms	remaining: 14.5s
19:	learn: -0.2482463	tota

158:	learn: -0.0319447	total: 7.92s	remaining: 9.51s
159:	learn: -0.0317175	total: 7.97s	remaining: 9.46s
160:	learn: -0.0315677	total: 8.01s	remaining: 9.41s
161:	learn: -0.0314108	total: 8.06s	remaining: 9.36s
162:	learn: -0.0311076	total: 8.12s	remaining: 9.31s
163:	learn: -0.0309382	total: 8.17s	remaining: 9.27s
164:	learn: -0.0308608	total: 8.22s	remaining: 9.21s
165:	learn: -0.0305960	total: 8.27s	remaining: 9.16s
166:	learn: -0.0304857	total: 8.31s	remaining: 9.11s
167:	learn: -0.0303258	total: 8.36s	remaining: 9.06s
168:	learn: -0.0302107	total: 8.41s	remaining: 9.01s
169:	learn: -0.0300843	total: 8.46s	remaining: 8.96s
170:	learn: -0.0299044	total: 8.5s	remaining: 8.9s
171:	learn: -0.0298291	total: 8.55s	remaining: 8.85s
172:	learn: -0.0296442	total: 8.6s	remaining: 8.8s
173:	learn: -0.0296402	total: 8.63s	remaining: 8.73s
174:	learn: -0.0295478	total: 8.68s	remaining: 8.68s
175:	learn: -0.0293143	total: 8.73s	remaining: 8.63s
176:	learn: -0.0291869	total: 8.78s	remaining: 8.5

317:	learn: -0.0156568	total: 16.1s	remaining: 1.62s
318:	learn: -0.0155384	total: 16.2s	remaining: 1.57s
319:	learn: -0.0154861	total: 16.3s	remaining: 1.52s
320:	learn: -0.0154281	total: 16.3s	remaining: 1.48s
321:	learn: -0.0153722	total: 16.4s	remaining: 1.43s
322:	learn: -0.0152653	total: 16.4s	remaining: 1.37s
323:	learn: -0.0152061	total: 16.5s	remaining: 1.32s
324:	learn: -0.0151510	total: 16.6s	remaining: 1.27s
325:	learn: -0.0151276	total: 16.6s	remaining: 1.22s
326:	learn: -0.0150951	total: 16.7s	remaining: 1.17s
327:	learn: -0.0150748	total: 16.7s	remaining: 1.12s
328:	learn: -0.0150475	total: 16.8s	remaining: 1.07s
329:	learn: -0.0149661	total: 16.9s	remaining: 1.02s
330:	learn: -0.0149108	total: 16.9s	remaining: 973ms
331:	learn: -0.0148439	total: 17s	remaining: 922ms
332:	learn: -0.0148294	total: 17.1s	remaining: 871ms
333:	learn: -0.0148029	total: 17.1s	remaining: 820ms
334:	learn: -0.0147688	total: 17.2s	remaining: 770ms
335:	learn: -0.0147545	total: 17.2s	remaining: 7



0:	learn: -0.9595424	total: 51.3ms	remaining: 17.9s
1:	learn: -0.8512669	total: 102ms	remaining: 17.8s
2:	learn: -0.7626042	total: 157ms	remaining: 18.1s
3:	learn: -0.6884842	total: 212ms	remaining: 18.4s
4:	learn: -0.6285192	total: 277ms	remaining: 19.1s
5:	learn: -0.5769989	total: 328ms	remaining: 18.8s
6:	learn: -0.5328180	total: 407ms	remaining: 19.9s
7:	learn: -0.4924826	total: 476ms	remaining: 20.3s
8:	learn: -0.4600190	total: 534ms	remaining: 20.2s
9:	learn: -0.4275729	total: 593ms	remaining: 20.2s
10:	learn: -0.3999923	total: 655ms	remaining: 20.2s
11:	learn: -0.3741484	total: 720ms	remaining: 20.3s
12:	learn: -0.3536110	total: 776ms	remaining: 20.1s
13:	learn: -0.3330159	total: 829ms	remaining: 19.9s
14:	learn: -0.3145783	total: 882ms	remaining: 19.7s
15:	learn: -0.2996223	total: 941ms	remaining: 19.7s
16:	learn: -0.2844343	total: 1s	remaining: 19.6s
17:	learn: -0.2720535	total: 1.06s	remaining: 19.6s
18:	learn: -0.2578691	total: 1.13s	remaining: 19.6s
19:	learn: -0.2456400	to

161:	learn: -0.0302951	total: 8.14s	remaining: 9.44s
162:	learn: -0.0301889	total: 8.18s	remaining: 9.38s
163:	learn: -0.0299479	total: 8.22s	remaining: 9.32s
164:	learn: -0.0297976	total: 8.27s	remaining: 9.27s
165:	learn: -0.0296487	total: 8.31s	remaining: 9.21s
166:	learn: -0.0294377	total: 8.36s	remaining: 9.16s
167:	learn: -0.0292579	total: 8.41s	remaining: 9.12s
168:	learn: -0.0291134	total: 8.46s	remaining: 9.06s
169:	learn: -0.0289449	total: 8.5s	remaining: 9s
170:	learn: -0.0287244	total: 8.54s	remaining: 8.94s
171:	learn: -0.0282488	total: 8.59s	remaining: 8.89s
172:	learn: -0.0281434	total: 8.63s	remaining: 8.83s
173:	learn: -0.0279691	total: 8.7s	remaining: 8.8s
174:	learn: -0.0279660	total: 8.73s	remaining: 8.73s
175:	learn: -0.0278237	total: 8.79s	remaining: 8.69s
176:	learn: -0.0276602	total: 8.84s	remaining: 8.64s
177:	learn: -0.0275327	total: 8.89s	remaining: 8.59s
178:	learn: -0.0272691	total: 8.94s	remaining: 8.54s
179:	learn: -0.0271152	total: 8.99s	remaining: 8.49s

319:	learn: -0.0146235	total: 15.7s	remaining: 1.47s
320:	learn: -0.0145939	total: 15.8s	remaining: 1.42s
321:	learn: -0.0145630	total: 15.8s	remaining: 1.38s
322:	learn: -0.0145271	total: 15.9s	remaining: 1.33s
323:	learn: -0.0144395	total: 15.9s	remaining: 1.28s
324:	learn: -0.0143915	total: 16s	remaining: 1.23s
325:	learn: -0.0143772	total: 16s	remaining: 1.18s
326:	learn: -0.0143315	total: 16.1s	remaining: 1.13s
327:	learn: -0.0142983	total: 16.1s	remaining: 1.08s
328:	learn: -0.0141755	total: 16.1s	remaining: 1.03s
329:	learn: -0.0141473	total: 16.2s	remaining: 981ms
330:	learn: -0.0141326	total: 16.2s	remaining: 932ms
331:	learn: -0.0140818	total: 16.3s	remaining: 883ms
332:	learn: -0.0140576	total: 16.3s	remaining: 834ms
333:	learn: -0.0140310	total: 16.4s	remaining: 785ms
334:	learn: -0.0140123	total: 16.4s	remaining: 736ms
335:	learn: -0.0139617	total: 16.5s	remaining: 687ms
336:	learn: -0.0139107	total: 16.5s	remaining: 638ms
337:	learn: -0.0138798	total: 16.6s	remaining: 589



0:	learn: -0.9600329	total: 41.4ms	remaining: 14.5s
1:	learn: -0.8493813	total: 80.6ms	remaining: 14s
2:	learn: -0.7610726	total: 141ms	remaining: 16.3s
3:	learn: -0.6848166	total: 179ms	remaining: 15.5s
4:	learn: -0.6261047	total: 216ms	remaining: 14.9s
5:	learn: -0.5745671	total: 255ms	remaining: 14.6s
6:	learn: -0.5302043	total: 293ms	remaining: 14.4s
7:	learn: -0.4920261	total: 331ms	remaining: 14.1s
8:	learn: -0.4589904	total: 365ms	remaining: 13.8s
9:	learn: -0.4289649	total: 401ms	remaining: 13.6s
10:	learn: -0.4001331	total: 437ms	remaining: 13.5s
11:	learn: -0.3754830	total: 475ms	remaining: 13.4s
12:	learn: -0.3543804	total: 517ms	remaining: 13.4s
13:	learn: -0.3346487	total: 560ms	remaining: 13.4s
14:	learn: -0.3155868	total: 597ms	remaining: 13.3s
15:	learn: -0.2991907	total: 636ms	remaining: 13.3s
16:	learn: -0.2817304	total: 677ms	remaining: 13.3s
17:	learn: -0.2668233	total: 721ms	remaining: 13.3s
18:	learn: -0.2545103	total: 763ms	remaining: 13.3s
19:	learn: -0.2426142	

159:	learn: -0.0304973	total: 7.37s	remaining: 8.76s
160:	learn: -0.0302659	total: 7.42s	remaining: 8.71s
161:	learn: -0.0300567	total: 7.47s	remaining: 8.66s
162:	learn: -0.0299310	total: 7.51s	remaining: 8.62s
163:	learn: -0.0296605	total: 7.56s	remaining: 8.57s
164:	learn: -0.0295602	total: 7.61s	remaining: 8.53s
165:	learn: -0.0295590	total: 7.64s	remaining: 8.47s
166:	learn: -0.0292244	total: 7.69s	remaining: 8.43s
167:	learn: -0.0290732	total: 7.74s	remaining: 8.38s
168:	learn: -0.0288358	total: 7.79s	remaining: 8.34s
169:	learn: -0.0287618	total: 7.83s	remaining: 8.29s
170:	learn: -0.0286266	total: 7.88s	remaining: 8.25s
171:	learn: -0.0284076	total: 7.94s	remaining: 8.22s
172:	learn: -0.0283008	total: 7.99s	remaining: 8.18s
173:	learn: -0.0281080	total: 8.05s	remaining: 8.14s
174:	learn: -0.0279580	total: 8.1s	remaining: 8.1s
175:	learn: -0.0276595	total: 8.14s	remaining: 8.05s
176:	learn: -0.0275418	total: 8.19s	remaining: 8.01s
177:	learn: -0.0274947	total: 8.24s	remaining: 7

318:	learn: -0.0149322	total: 14.8s	remaining: 1.44s
319:	learn: -0.0149026	total: 14.9s	remaining: 1.39s
320:	learn: -0.0148492	total: 14.9s	remaining: 1.35s
321:	learn: -0.0147233	total: 15s	remaining: 1.3s
322:	learn: -0.0146846	total: 15s	remaining: 1.25s
323:	learn: -0.0145805	total: 15.1s	remaining: 1.21s
324:	learn: -0.0145275	total: 15.1s	remaining: 1.16s
325:	learn: -0.0144975	total: 15.1s	remaining: 1.11s
326:	learn: -0.0144109	total: 15.2s	remaining: 1.07s
327:	learn: -0.0143754	total: 15.2s	remaining: 1.02s
328:	learn: -0.0143319	total: 15.3s	remaining: 975ms
329:	learn: -0.0142945	total: 15.3s	remaining: 929ms
330:	learn: -0.0142841	total: 15.4s	remaining: 882ms
331:	learn: -0.0142160	total: 15.4s	remaining: 835ms
332:	learn: -0.0141815	total: 15.4s	remaining: 789ms
333:	learn: -0.0141476	total: 15.5s	remaining: 742ms
334:	learn: -0.0140340	total: 15.5s	remaining: 696ms
335:	learn: -0.0140199	total: 15.6s	remaining: 649ms
336:	learn: -0.0139889	total: 15.6s	remaining: 603m



0:	learn: -0.9602264	total: 45.5ms	remaining: 15.9s
1:	learn: -0.8523815	total: 80ms	remaining: 13.9s
2:	learn: -0.7627923	total: 115ms	remaining: 13.3s
3:	learn: -0.6900282	total: 149ms	remaining: 12.9s
4:	learn: -0.6305502	total: 183ms	remaining: 12.7s
5:	learn: -0.5792516	total: 224ms	remaining: 12.9s
6:	learn: -0.5335482	total: 262ms	remaining: 12.8s
7:	learn: -0.4937455	total: 298ms	remaining: 12.7s
8:	learn: -0.4589651	total: 335ms	remaining: 12.7s
9:	learn: -0.4297709	total: 370ms	remaining: 12.6s
10:	learn: -0.4007300	total: 406ms	remaining: 12.5s
11:	learn: -0.3786892	total: 441ms	remaining: 12.4s
12:	learn: -0.3577415	total: 477ms	remaining: 12.4s
13:	learn: -0.3373575	total: 515ms	remaining: 12.4s
14:	learn: -0.3184421	total: 551ms	remaining: 12.3s
15:	learn: -0.3018356	total: 586ms	remaining: 12.2s
16:	learn: -0.2868657	total: 624ms	remaining: 12.2s
17:	learn: -0.2722497	total: 660ms	remaining: 12.2s
18:	learn: -0.2606193	total: 696ms	remaining: 12.1s
19:	learn: -0.2486473	

158:	learn: -0.0312562	total: 6.75s	remaining: 8.11s
159:	learn: -0.0312161	total: 6.79s	remaining: 8.07s
160:	learn: -0.0310572	total: 6.84s	remaining: 8.03s
161:	learn: -0.0308987	total: 6.88s	remaining: 7.99s
162:	learn: -0.0306970	total: 6.93s	remaining: 7.95s
163:	learn: -0.0305404	total: 6.97s	remaining: 7.91s
164:	learn: -0.0302830	total: 7.02s	remaining: 7.87s
165:	learn: -0.0301644	total: 7.07s	remaining: 7.83s
166:	learn: -0.0300611	total: 7.11s	remaining: 7.79s
167:	learn: -0.0298886	total: 7.16s	remaining: 7.75s
168:	learn: -0.0297724	total: 7.2s	remaining: 7.71s
169:	learn: -0.0295851	total: 7.25s	remaining: 7.68s
170:	learn: -0.0292649	total: 7.3s	remaining: 7.64s
171:	learn: -0.0291355	total: 7.34s	remaining: 7.6s
172:	learn: -0.0288650	total: 7.38s	remaining: 7.56s
173:	learn: -0.0287331	total: 7.43s	remaining: 7.51s
174:	learn: -0.0284458	total: 7.48s	remaining: 7.48s
175:	learn: -0.0283328	total: 7.52s	remaining: 7.44s
176:	learn: -0.0282087	total: 7.58s	remaining: 7.

318:	learn: -0.0154297	total: 14s	remaining: 1.36s
319:	learn: -0.0153971	total: 14.1s	remaining: 1.32s
320:	learn: -0.0153610	total: 14.1s	remaining: 1.27s
321:	learn: -0.0153332	total: 14.2s	remaining: 1.23s
322:	learn: -0.0152728	total: 14.2s	remaining: 1.19s
323:	learn: -0.0152325	total: 14.2s	remaining: 1.14s
324:	learn: -0.0151774	total: 14.3s	remaining: 1.1s
325:	learn: -0.0151515	total: 14.3s	remaining: 1.05s
326:	learn: -0.0151265	total: 14.4s	remaining: 1.01s
327:	learn: -0.0151089	total: 14.4s	remaining: 967ms
328:	learn: -0.0150399	total: 14.5s	remaining: 923ms
329:	learn: -0.0149389	total: 14.5s	remaining: 879ms
330:	learn: -0.0149351	total: 14.5s	remaining: 835ms
331:	learn: -0.0148470	total: 14.6s	remaining: 791ms
332:	learn: -0.0147723	total: 14.6s	remaining: 747ms
333:	learn: -0.0147364	total: 14.7s	remaining: 703ms
334:	learn: -0.0147232	total: 14.7s	remaining: 660ms
335:	learn: -0.0146582	total: 14.8s	remaining: 616ms
336:	learn: -0.0146089	total: 14.8s	remaining: 57



0:	learn: -0.9599719	total: 43.6ms	remaining: 15.2s
1:	learn: -0.8518735	total: 76.3ms	remaining: 13.3s
2:	learn: -0.7623442	total: 109ms	remaining: 12.7s
3:	learn: -0.6873741	total: 143ms	remaining: 12.4s
4:	learn: -0.6291073	total: 177ms	remaining: 12.2s
5:	learn: -0.5789243	total: 231ms	remaining: 13.2s
6:	learn: -0.5343115	total: 266ms	remaining: 13s
7:	learn: -0.4960114	total: 302ms	remaining: 12.9s
8:	learn: -0.4623850	total: 338ms	remaining: 12.8s
9:	learn: -0.4319792	total: 376ms	remaining: 12.8s
10:	learn: -0.4028334	total: 415ms	remaining: 12.8s
11:	learn: -0.3780109	total: 456ms	remaining: 12.8s
12:	learn: -0.3558645	total: 499ms	remaining: 12.9s
13:	learn: -0.3356462	total: 538ms	remaining: 12.9s
14:	learn: -0.3170475	total: 579ms	remaining: 12.9s
15:	learn: -0.3005892	total: 620ms	remaining: 12.9s
16:	learn: -0.2842094	total: 660ms	remaining: 12.9s
17:	learn: -0.2694211	total: 701ms	remaining: 12.9s
18:	learn: -0.2560133	total: 741ms	remaining: 12.9s
19:	learn: -0.2450063	

160:	learn: -0.0310770	total: 7.69s	remaining: 9.03s
161:	learn: -0.0309111	total: 7.74s	remaining: 8.98s
162:	learn: -0.0307662	total: 7.78s	remaining: 8.93s
163:	learn: -0.0306685	total: 7.83s	remaining: 8.88s
164:	learn: -0.0304495	total: 7.87s	remaining: 8.83s
165:	learn: -0.0302461	total: 7.92s	remaining: 8.78s
166:	learn: -0.0299402	total: 7.96s	remaining: 8.73s
167:	learn: -0.0296945	total: 8.01s	remaining: 8.67s
168:	learn: -0.0294207	total: 8.06s	remaining: 8.63s
169:	learn: -0.0292780	total: 8.1s	remaining: 8.58s
170:	learn: -0.0290441	total: 8.14s	remaining: 8.53s
171:	learn: -0.0288265	total: 8.19s	remaining: 8.47s
172:	learn: -0.0286898	total: 8.23s	remaining: 8.42s
173:	learn: -0.0286114	total: 8.28s	remaining: 8.37s
174:	learn: -0.0282497	total: 8.32s	remaining: 8.32s
175:	learn: -0.0282169	total: 8.36s	remaining: 8.27s
176:	learn: -0.0280850	total: 8.4s	remaining: 8.21s
177:	learn: -0.0279558	total: 8.45s	remaining: 8.16s
178:	learn: -0.0278921	total: 8.49s	remaining: 8

317:	learn: -0.0156271	total: 15s	remaining: 1.51s
318:	learn: -0.0155303	total: 15s	remaining: 1.46s
319:	learn: -0.0154978	total: 15.1s	remaining: 1.41s
320:	learn: -0.0154783	total: 15.1s	remaining: 1.37s
321:	learn: -0.0154636	total: 15.2s	remaining: 1.32s
322:	learn: -0.0154313	total: 15.2s	remaining: 1.27s
323:	learn: -0.0153851	total: 15.3s	remaining: 1.23s
324:	learn: -0.0153605	total: 15.3s	remaining: 1.18s
325:	learn: -0.0152915	total: 15.4s	remaining: 1.13s
326:	learn: -0.0152446	total: 15.4s	remaining: 1.08s
327:	learn: -0.0151935	total: 15.5s	remaining: 1.04s
328:	learn: -0.0151125	total: 15.5s	remaining: 990ms
329:	learn: -0.0149928	total: 15.6s	remaining: 943ms
330:	learn: -0.0149493	total: 15.6s	remaining: 896ms
331:	learn: -0.0149302	total: 15.6s	remaining: 848ms
332:	learn: -0.0147853	total: 15.7s	remaining: 801ms
333:	learn: -0.0147587	total: 15.7s	remaining: 754ms
334:	learn: -0.0147240	total: 15.8s	remaining: 708ms
335:	learn: -0.0146952	total: 15.9s	remaining: 661



0:	learn: -0.9598067	total: 34.2ms	remaining: 11.9s
1:	learn: -0.8484653	total: 71.2ms	remaining: 12.4s
2:	learn: -0.7619022	total: 105ms	remaining: 12.2s
3:	learn: -0.6871602	total: 142ms	remaining: 12.3s
4:	learn: -0.6293838	total: 178ms	remaining: 12.3s
5:	learn: -0.5790671	total: 215ms	remaining: 12.3s
6:	learn: -0.5343461	total: 254ms	remaining: 12.4s
7:	learn: -0.4959361	total: 290ms	remaining: 12.4s
8:	learn: -0.4622444	total: 325ms	remaining: 12.3s
9:	learn: -0.4298269	total: 367ms	remaining: 12.5s
10:	learn: -0.4012994	total: 403ms	remaining: 12.4s
11:	learn: -0.3766461	total: 442ms	remaining: 12.4s
12:	learn: -0.3549028	total: 479ms	remaining: 12.4s
13:	learn: -0.3331799	total: 518ms	remaining: 12.4s
14:	learn: -0.3141300	total: 558ms	remaining: 12.5s
15:	learn: -0.2982254	total: 596ms	remaining: 12.4s
16:	learn: -0.2817828	total: 634ms	remaining: 12.4s
17:	learn: -0.2676253	total: 669ms	remaining: 12.3s
18:	learn: -0.2542911	total: 710ms	remaining: 12.4s
19:	learn: -0.242321

159:	learn: -0.0311103	total: 6.66s	remaining: 7.91s
160:	learn: -0.0309422	total: 6.71s	remaining: 7.87s
161:	learn: -0.0309175	total: 6.75s	remaining: 7.84s
162:	learn: -0.0306384	total: 6.8s	remaining: 7.8s
163:	learn: -0.0305505	total: 6.85s	remaining: 7.76s
164:	learn: -0.0302062	total: 6.89s	remaining: 7.72s
165:	learn: -0.0300256	total: 6.93s	remaining: 7.68s
166:	learn: -0.0296588	total: 6.98s	remaining: 7.65s
167:	learn: -0.0294539	total: 7.03s	remaining: 7.61s
168:	learn: -0.0292095	total: 7.07s	remaining: 7.57s
169:	learn: -0.0291360	total: 7.11s	remaining: 7.53s
170:	learn: -0.0289374	total: 7.15s	remaining: 7.49s
171:	learn: -0.0287159	total: 7.19s	remaining: 7.44s
172:	learn: -0.0285071	total: 7.24s	remaining: 7.4s
173:	learn: -0.0283852	total: 7.28s	remaining: 7.37s
174:	learn: -0.0282198	total: 7.33s	remaining: 7.33s
175:	learn: -0.0279266	total: 7.38s	remaining: 7.29s
176:	learn: -0.0278719	total: 7.42s	remaining: 7.26s
177:	learn: -0.0277077	total: 7.47s	remaining: 7.

320:	learn: -0.0154778	total: 14.1s	remaining: 1.27s
321:	learn: -0.0154294	total: 14.1s	remaining: 1.23s
322:	learn: -0.0154151	total: 14.2s	remaining: 1.18s
323:	learn: -0.0154074	total: 14.2s	remaining: 1.14s
324:	learn: -0.0153762	total: 14.2s	remaining: 1.1s
325:	learn: -0.0153482	total: 14.3s	remaining: 1.05s
326:	learn: -0.0153143	total: 14.3s	remaining: 1.01s
327:	learn: -0.0152955	total: 14.4s	remaining: 964ms
328:	learn: -0.0152800	total: 14.4s	remaining: 920ms
329:	learn: -0.0152163	total: 14.5s	remaining: 877ms
330:	learn: -0.0151926	total: 14.5s	remaining: 833ms
331:	learn: -0.0151511	total: 14.6s	remaining: 789ms
332:	learn: -0.0150888	total: 14.6s	remaining: 746ms
333:	learn: -0.0150502	total: 14.6s	remaining: 702ms
334:	learn: -0.0150064	total: 14.7s	remaining: 658ms
335:	learn: -0.0149628	total: 14.7s	remaining: 614ms
336:	learn: -0.0148801	total: 14.8s	remaining: 570ms
337:	learn: -0.0147921	total: 14.8s	remaining: 526ms
338:	learn: -0.0147497	total: 14.9s	remaining: 

In [105]:
model.fit(x_train, y_train, categorical_features)



0:	learn: -0.9649289	total: 34.2ms	remaining: 11.9s
1:	learn: -0.8574045	total: 64.2ms	remaining: 11.2s
2:	learn: -0.7719394	total: 91.8ms	remaining: 10.6s
3:	learn: -0.7006360	total: 120ms	remaining: 10.4s
4:	learn: -0.6383493	total: 147ms	remaining: 10.2s
5:	learn: -0.5888691	total: 174ms	remaining: 9.98s
6:	learn: -0.5466153	total: 203ms	remaining: 9.94s
7:	learn: -0.5059781	total: 238ms	remaining: 10.2s
8:	learn: -0.4686378	total: 273ms	remaining: 10.3s
9:	learn: -0.4357573	total: 306ms	remaining: 10.4s
10:	learn: -0.4070923	total: 341ms	remaining: 10.5s
11:	learn: -0.3835723	total: 369ms	remaining: 10.4s
12:	learn: -0.3615763	total: 397ms	remaining: 10.3s
13:	learn: -0.3399869	total: 426ms	remaining: 10.2s
14:	learn: -0.3229596	total: 458ms	remaining: 10.2s
15:	learn: -0.3079278	total: 494ms	remaining: 10.3s
16:	learn: -0.2923627	total: 523ms	remaining: 10.2s
17:	learn: -0.2785814	total: 551ms	remaining: 10.2s
18:	learn: -0.2674802	total: 582ms	remaining: 10.1s
19:	learn: -0.25347

159:	learn: -0.0332498	total: 4.91s	remaining: 5.83s
160:	learn: -0.0330993	total: 4.94s	remaining: 5.8s
161:	learn: -0.0328692	total: 4.97s	remaining: 5.77s
162:	learn: -0.0326235	total: 5s	remaining: 5.74s
163:	learn: -0.0325277	total: 5.05s	remaining: 5.72s
164:	learn: -0.0322292	total: 5.08s	remaining: 5.7s
165:	learn: -0.0320650	total: 5.13s	remaining: 5.68s
166:	learn: -0.0318037	total: 5.17s	remaining: 5.66s
167:	learn: -0.0316371	total: 5.21s	remaining: 5.64s
168:	learn: -0.0314773	total: 5.24s	remaining: 5.61s
169:	learn: -0.0313535	total: 5.27s	remaining: 5.58s
170:	learn: -0.0313305	total: 5.3s	remaining: 5.55s
171:	learn: -0.0311371	total: 5.33s	remaining: 5.51s
172:	learn: -0.0309368	total: 5.36s	remaining: 5.49s
173:	learn: -0.0307355	total: 5.39s	remaining: 5.46s
174:	learn: -0.0305291	total: 5.43s	remaining: 5.43s
175:	learn: -0.0303585	total: 5.46s	remaining: 5.4s
176:	learn: -0.0299838	total: 5.49s	remaining: 5.37s
177:	learn: -0.0298617	total: 5.53s	remaining: 5.34s


318:	learn: -0.0171256	total: 10.3s	remaining: 1000ms
319:	learn: -0.0170886	total: 10.3s	remaining: 968ms
320:	learn: -0.0170149	total: 10.4s	remaining: 937ms
321:	learn: -0.0169290	total: 10.4s	remaining: 905ms
322:	learn: -0.0168960	total: 10.4s	remaining: 873ms
323:	learn: -0.0168691	total: 10.5s	remaining: 840ms
324:	learn: -0.0167308	total: 10.5s	remaining: 808ms
325:	learn: -0.0166928	total: 10.5s	remaining: 776ms
326:	learn: -0.0166576	total: 10.6s	remaining: 743ms
327:	learn: -0.0166138	total: 10.6s	remaining: 711ms
328:	learn: -0.0165834	total: 10.6s	remaining: 679ms
329:	learn: -0.0165492	total: 10.7s	remaining: 646ms
330:	learn: -0.0165381	total: 10.7s	remaining: 614ms
331:	learn: -0.0164528	total: 10.7s	remaining: 582ms
332:	learn: -0.0164193	total: 10.8s	remaining: 549ms
333:	learn: -0.0163978	total: 10.8s	remaining: 517ms
334:	learn: -0.0163640	total: 10.8s	remaining: 484ms
335:	learn: -0.0163202	total: 10.9s	remaining: 452ms
336:	learn: -0.0163010	total: 10.9s	remaining

<catboost.core.CatBoostClassifier at 0x1e5a7961208>

In [106]:
y_test.value_counts()

0.0    1174
2.0     771
1.0     722
Name: Class_value, dtype: int64

In [107]:
model.score(x_test, y_test)

0.9951256092988376

In [108]:
param_grid = {'n_estimators': [50, 100, 150, 200, 250,300,350,400],'learning_rate': [10 ** x for x in range(-3, 3)],
             'max_depth' : range(1, 10, 1)}

In [110]:
grid_cat = GridSearchCV(estimator=CatBoostClassifier(), 
                        param_grid=param_grid, 
                        cv=5,
                        verbose=True, n_jobs=-1)

In [None]:
grid_result = grid_cat.fit(x_train, y_train)


In [None]:
grid_result

In [None]:
grid_result.best_params_

In [None]:
feature_score = pd.DataFrame(list(zip(x_label_train.dtypes.index, model.get_feature_importance(Pool(x_label_train, label=y_label, cat_features=categorical_features)))),
                columns=['Feature','Score'])
feature_score = feature_score.sort_values(by='Score', ascending=False, inplace=False, kind='quicksort', na_position='last')

In [None]:
feature_score

In [None]:
cross_matrix = pd.DataFrame()
cross_matrix['Actual'] = y_test
cross_matrix['Predict'] = model.predict(x_test)

In [None]:
userPreference = {0:'Low', 1: 'Average', 2: 'High'}
userPrefPredict = {0.0:'Low', 1.0: 'Average', 2.0: 'High'}
cross_matrix = cross_matrix.replace({'Actual': userPreference, 'Predict': userPrefPredict})

In [None]:
pd.crosstab(cross_matrix['Actual'], cross_matrix['Predict'], margins=True)