## 4. LightGBM Binary Classification, Multi-Class Classification, Regression using Python

<div style="text-align: right"> <b>Author : Kwang Myung Yu</b></div>
<div style="text-align: right\"> Initial upload: 2021.9.24 </div>
<div style="text-align: right\"> Last update: 2021.9.24</div>

- 출처 : https://nitin9809.medium.com/lightgbm-binary-classification-multi-class-classification-regression-using-python-4f22032b36a2

In [1]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings; warnings.filterwarnings('ignore')
plt.style.use('ggplot')
%matplotlib inline

from scipy.signal import find_peaks

In [2]:
colors = ["#00798c", "#d1495b", '#edae49', '#66a182', '#4a4a4a',
          '#1a508b', '#e3120b', '#c5a880', '#9F5F80', '#6F9EAF',
          '#0278ae','#F39233', '#A7C5EB', '#54E346', '#ABCE74',
        '#d6b0b1', '#58391c', '#cdd0cb', '#ffb396', '#6930c3']
sns.color_palette(colors[:10])

Types of Operation supported by LightGBM:
- Regression
- Binary Classification
- Multi-Class Classification
- Cross-Entropy
- Lambdrank

Before we get started I would like to remind you that the dataset we will use are toy datasets(less records), hence **they are prone to overfitting in LightGBM.** **To escape overfitting in we can play with the max_depth value.** You may get a doubt that max_depth is used for level-wise growth, rest assured the tree will grow leaf-wise even if the max_depth is specified.

In [3]:
#importing libraries
from collections import Counter
import lightgbm as lgb
from sklearn.datasets import load_breast_cancer,load_boston,load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import mean_squared_error,roc_auc_score,precision_score
pd.options.display.max_columns = 999

### 1. Binary classification

In [9]:
#loading the breast cancer dataset
X=load_breast_cancer()
df=pd.DataFrame(X.data,columns=X.feature_names)
y=X.target 

In [10]:
df.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,radius error,texture error,perimeter error,area error,smoothness error,compactness error,concavity error,concave points error,symmetry error,fractal dimension error,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,1.095,0.9053,8.589,153.4,0.006399,0.04904,0.05373,0.01587,0.03003,0.006193,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,0.5435,0.7339,3.398,74.08,0.005225,0.01308,0.0186,0.0134,0.01389,0.003532,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,0.7456,0.7869,4.585,94.03,0.00615,0.04006,0.03832,0.02058,0.0225,0.004571,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,0.4956,1.156,3.445,27.23,0.00911,0.07458,0.05661,0.01867,0.05963,0.009208,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,0.7572,0.7813,5.438,94.44,0.01149,0.02461,0.05688,0.01885,0.01756,0.005115,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [11]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
       1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
       1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,
       0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1,
       1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
       0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0,

In [12]:
#scaling the features using Standard Scaler
sc=StandardScaler()
sc.fit(df)
X_scaled=pd.DataFrame(sc.fit_transform(df), columns=X.feature_names)

In [14]:
X_scaled.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,radius error,texture error,perimeter error,area error,smoothness error,compactness error,concavity error,concave points error,symmetry error,fractal dimension error,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,1.097064,-2.073335,1.269934,0.984375,1.568466,3.283515,2.652874,2.532475,2.217515,2.255747,2.489734,-0.565265,2.833031,2.487578,-0.214002,1.316862,0.724026,0.66082,1.148757,0.907083,1.88669,-1.359293,2.303601,2.001237,1.307686,2.616665,2.109526,2.296076,2.750622,1.937015
1,1.829821,-0.353632,1.685955,1.908708,-0.826962,-0.487072,-0.023846,0.548144,0.001392,-0.868652,0.499255,-0.876244,0.263327,0.742402,-0.605351,-0.692926,-0.44078,0.260162,-0.80545,-0.099444,1.805927,-0.369203,1.535126,1.890489,-0.375612,-0.430444,-0.146749,1.087084,-0.24389,0.28119
2,1.579888,0.456187,1.566503,1.558884,0.94221,1.052926,1.363478,2.037231,0.939685,-0.398008,1.228676,-0.780083,0.850928,1.181336,-0.297005,0.814974,0.213076,1.424827,0.237036,0.293559,1.51187,-0.023974,1.347475,1.456285,0.527407,1.082932,0.854974,1.955,1.152255,0.201391
3,-0.768909,0.253732,-0.592687,-0.764464,3.283553,3.402909,1.915897,1.451707,2.867383,4.910919,0.326373,-0.110409,0.286593,-0.288378,0.689702,2.74428,0.819518,1.115007,4.73268,2.047511,-0.281464,0.133984,-0.249939,-0.550021,3.394275,3.893397,1.989588,2.175786,6.046041,4.93501
4,1.750297,-1.151816,1.776573,1.826229,0.280372,0.53934,1.371011,1.428493,-0.00956,-0.56245,1.270543,-0.790244,1.273189,1.190357,1.483067,-0.04852,0.828471,1.144205,-0.361092,0.499328,1.298575,-1.46677,1.338539,1.220724,0.220556,-0.313395,0.613179,0.729259,-0.868353,-0.3971


In [16]:
#train_test_split 
X_train,X_test,y_train,y_test=train_test_split(X_scaled,y,test_size=0.3,random_state=0)

In [17]:
#converting the dataset into proper LGB format 
d_train=lgb.Dataset(X_train, label=y_train)

In [18]:
d_train.data

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,radius error,texture error,perimeter error,area error,smoothness error,compactness error,concavity error,concave points error,symmetry error,fractal dimension error,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
478,-0.749028,-1.093640,-0.740560,-0.710995,0.586383,-0.418088,-0.448455,-0.753936,-0.119089,0.417114,-0.728238,-0.092266,-0.643074,-0.571980,-0.694688,-0.242457,0.320505,-0.609606,-0.255754,-0.068427,-0.801242,-0.615097,-0.751235,-0.725988,0.124117,-0.338840,-0.060394,-0.613574,0.065106,0.435246
303,-1.033042,-0.158159,-1.034246,-0.911788,0.742947,-0.711836,-0.826485,-0.802687,-1.203419,0.453972,-0.926383,0.628029,-0.906430,-0.665707,0.611365,-0.900724,-0.458022,-0.421281,-0.318715,-0.180768,-1.078732,-0.185190,-1.087219,-0.888068,0.391516,-0.953351,-0.901735,-0.751071,-1.112638,-0.306218
155,-0.533178,-0.314072,-0.564266,-0.553431,-0.698865,-0.711647,-0.627112,-0.660562,0.578241,-0.073377,-0.668325,-0.425561,-0.684162,-0.524236,-0.507680,-0.550967,-0.396350,-0.628098,-0.309029,-0.495851,-0.554813,-0.074456,-0.615412,-0.556174,-0.467667,-0.480063,-0.373672,-0.494807,0.343365,-0.145512
186,1.187949,-0.165140,1.096935,1.098139,-0.745834,-0.372605,-0.089257,0.237843,-0.695938,-1.211713,-0.532258,-1.344707,-0.519316,-0.251195,-1.391716,-0.910840,-0.589324,-0.823561,-1.192902,-1.024268,1.043864,0.111186,0.951324,0.930669,-0.393146,-0.062119,0.391533,0.647036,0.493818,-0.807177
101,-2.029648,-1.363580,-1.984504,-1.454443,1.468835,-0.543168,-1.114873,-1.261820,0.432204,2.180614,-0.653527,0.528240,-0.650005,-0.671142,1.049716,-0.818119,-1.057501,-1.913447,0.732247,0.115403,-1.726901,-0.999409,-1.693361,-1.222423,1.141110,-0.852841,-1.305831,-1.745063,0.050546,0.547186
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
277,1.329956,0.160649,1.191672,1.271629,-0.507430,-0.862311,-0.107964,0.245323,-0.955155,-1.819865,-0.277447,-0.705514,-0.249030,-0.079142,0.176681,-0.801352,-0.187792,0.459680,-0.150416,-0.801478,0.764302,-0.224272,0.647508,0.624792,-0.353694,-0.879559,-0.245578,0.225259,-0.539944,-1.472721
9,-0.473535,1.105439,-0.329482,-0.509063,1.582699,2.563358,1.738872,0.941760,0.797298,2.783096,-0.388250,0.693345,-0.409420,-0.360764,0.036008,2.609587,1.509848,0.409395,-0.321136,2.377346,-0.244190,2.443109,-0.286278,-0.297409,2.320295,5.112877,3.995433,1.620015,2.370444,6.846856
359,-1.332393,-0.225644,-1.324225,-1.070205,0.323071,-0.848666,-0.774633,-0.899156,-1.115796,0.962892,0.370767,0.054696,0.198478,-0.216873,-0.068329,-0.921962,-0.278974,-0.848541,-0.135886,-0.409232,-0.879933,-0.107025,-0.937396,-0.775210,0.040829,-0.950170,-0.756994,-0.975815,-0.722753,-0.143295
192,-1.251733,-0.248914,-1.286742,-1.043186,-1.911524,-1.533193,-1.114873,-1.261820,-0.579108,0.237079,-0.185052,6.655279,-0.314869,-0.410268,-1.776065,-1.047490,-1.057501,-1.913447,2.112542,-0.796939,-1.304866,-0.789340,-1.340697,-1.013934,-2.682695,-1.443878,-1.305831,-1.745063,-1.604443,-1.017203


In [19]:
#Specifying the parameter
params={}
params['learning_rate']=0.03
params['boosting_type']='gbdt' #GradientBoostingDecisionTree
params['objective']='binary' #Binary target feature
params['metric']='binary_logloss' #metric for binary classification
params['max_depth']=10

In [20]:
#train the model 
clf=lgb.train(params,d_train,100) #train the model on 100 epocs

[LightGBM] [Info] Number of positive: 249, number of negative: 149
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3970
[LightGBM] [Info] Number of data points in the train set: 398, number of used features: 30
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.625628 -> initscore=0.513507
[LightGBM] [Info] Start training from score 0.513507


In [21]:
#prediction on the test set
y_pred=clf.predict(X_test)

In [22]:
y_pred

array([0.0382581 , 0.93938292, 0.97661526, 0.97477828, 0.95249041,
       0.98159454, 0.96405854, 0.98139151, 0.97336296, 0.97714962,
       0.63541415, 0.91738267, 0.97538054, 0.44957024, 0.7612895 ,
       0.11064176, 0.8370553 , 0.03666349, 0.03303545, 0.03295332,
       0.0436788 , 0.03540915, 0.94307263, 0.98151254, 0.06781437,
       0.98155683, 0.97915965, 0.06741623, 0.98153833, 0.03064858,
       0.97605446, 0.0476607 , 0.90658661, 0.03877576, 0.9791487 ,
       0.03587542, 0.96935081, 0.04695059, 0.97712772, 0.03412578,
       0.55283163, 0.97711538, 0.51898677, 0.98131514, 0.77475243,
       0.03082835, 0.97915438, 0.98112882, 0.97051128, 0.03677367,
       0.0347962 , 0.07025932, 0.04206208, 0.98107221, 0.98204527,
       0.97961681, 0.95543472, 0.92849396, 0.97160477, 0.03123831,
       0.05267959, 0.0265726 , 0.98234984, 0.98054089, 0.03623822,
       0.76855256, 0.03083428, 0.02990212, 0.03534412, 0.97405216,
       0.79798605, 0.0304825 , 0.9815473 , 0.4174706 , 0.05860

예측결과를 적절한 format으로 바꾸기

In [23]:
#rounding the values
y_pred=y_pred.round(0)

In [24]:
y_pred

array([0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 1., 0., 1.,
       0., 0., 0., 0., 0., 1., 1., 0., 1., 1., 0., 1., 0., 1., 0., 1., 0.,
       1., 0., 1., 0., 1., 0., 1., 1., 1., 1., 1., 0., 1., 1., 1., 0., 0.,
       0., 0., 1., 1., 1., 1., 1., 1., 0., 0., 0., 1., 1., 0., 1., 0., 0.,
       0., 1., 1., 0., 1., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0., 1., 0.,
       1., 1., 1., 0., 0., 1., 0., 1., 0., 1., 1., 0., 1., 1., 1., 1., 1.,
       1., 1., 0., 1., 0., 1., 1., 0., 1., 0., 0., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 0., 1., 0., 1., 1., 1., 1., 1., 0., 1., 1., 1., 1., 1.,
       1., 0., 0., 1., 1., 1., 0., 1., 1., 0., 1., 1., 1., 1., 1., 0., 1.,
       1., 1., 0., 1., 0., 1., 0., 0., 1., 1., 0., 1., 0., 0., 0., 1., 1.,
       1.])

In [25]:
#converting from float to integer
y_pred=y_pred.astype(int)

In [26]:
#roc_auc_score metric
roc_auc_score(y_pred,y_test)

0.965424739195231

### 2.Multi-Class Classification using the Wine dataset

In [27]:
X = load_wine()
df = pd.DataFrame(data=X.data, columns=X.feature_names)
y = X.target

In [28]:
df.head()

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0
2,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0
3,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0
4,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0


scaling

In [30]:
sc = StandardScaler()
sc.fit(df)
X_scaled = pd.DataFrame(data=sc.transform(df), columns=X.feature_names)

In [31]:
X_scaled.head()

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline
0,1.518613,-0.56225,0.232053,-1.169593,1.913905,0.808997,1.034819,-0.659563,1.224884,0.251717,0.362177,1.84792,1.013009
1,0.24629,-0.499413,-0.827996,-2.490847,0.018145,0.568648,0.733629,-0.820719,-0.544721,-0.293321,0.406051,1.113449,0.965242
2,0.196879,0.021231,1.109334,-0.268738,0.088358,0.808997,1.215533,-0.498407,2.135968,0.26902,0.318304,0.788587,1.395148
3,1.69155,-0.346811,0.487926,-0.809251,0.930918,2.491446,1.466525,-0.981875,1.032155,1.186068,-0.427544,1.184071,2.334574
4,0.2957,0.227694,1.840403,0.451946,1.281985,0.808997,0.663351,0.226796,0.401404,-0.319276,0.362177,0.449601,-0.037874


train_test split

In [32]:
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=0)

In [48]:
print(X_train.shape)
print(X_test.shape)

(124, 13)
(54, 13)


In [39]:
np.unique(y_train, return_counts=True)

(array([0, 1, 2]), array([40, 49, 35], dtype=int64))

In [40]:
np.unique(y_test, return_counts=True)

(array([0, 1, 2]), array([19, 22, 13], dtype=int64))

데이터셋을 LGB format으로 변경하기

In [49]:
d_train = lgb.Dataset(X_train, label=y_train)

setting up the parameters

In [50]:
params={}
params['learning_rate']=0.03
params['boosting_type']='gbdt' #GradientBoostingDecisionTree
params['objective']='multiclass' #Multi-class target feature
params['metric']='multi_logloss' #metric for multi-class
params['max_depth']=10
params['num_class']=3 #no.of unique values in the target class not inclusive of the end value

학습

In [51]:
clf = lgb.train(params, d_train, 100) # training the model on 100 epochs

You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 456
[LightGBM] [Info] Number of data points in the train set: 124, number of used features: 13
[LightGBM] [Info] Start training from score -1.131402
[LightGBM] [Info] Start training from score -0.928461
[LightGBM] [Info] Start training from score -1.264934


예측

In [52]:
y_pred = clf.predict(X_test)

In [53]:
y_pred

array([[0.98496123, 0.00853119, 0.00650758],
       [0.02076345, 0.04205365, 0.9371829 ],
       [0.11787569, 0.84252622, 0.03959809],
       [0.98466342, 0.00826664, 0.00706994],
       [0.0272622 , 0.94234744, 0.03039036],
       [0.47623182, 0.41420143, 0.10956674],
       [0.9852414 , 0.00763133, 0.00712727],
       [0.00982261, 0.06427323, 0.92590416],
       [0.0059612 , 0.98939507, 0.00464372],
       [0.00556692, 0.98248364, 0.01194944],
       [0.11777327, 0.04104257, 0.84118416],
       [0.00858824, 0.0588461 , 0.93256566],
       [0.98426238, 0.00846873, 0.00726889],
       [0.20415278, 0.73197623, 0.06387099],
       [0.08834691, 0.01748978, 0.89416331],
       [0.012631  , 0.98136569, 0.00600331],
       [0.97204709, 0.02006339, 0.00788952],
       [0.9723269 , 0.02001837, 0.00765473],
       [0.07075943, 0.24544748, 0.68379308],
       [0.96924992, 0.0239537 , 0.00679637],
       [0.38336023, 0.5546589 , 0.06198088],
       [0.93164127, 0.04986886, 0.01848986],
       [0.

In [54]:
# predict class  
y_pred = [np.argmax(line) for line in y_pred]

In [55]:
y_pred

[0,
 2,
 1,
 0,
 1,
 0,
 0,
 2,
 1,
 1,
 2,
 2,
 0,
 1,
 2,
 1,
 0,
 0,
 2,
 0,
 1,
 0,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 2,
 0,
 0,
 1,
 0,
 0,
 0,
 2,
 1,
 1,
 2,
 0,
 0,
 1,
 1,
 1,
 0,
 2,
 1,
 2,
 0,
 2,
 2,
 0,
 2]

In [58]:
precision_score(y_pred, y_test, average = None)

array([1.        , 0.90909091, 1.        ])

In [59]:
precision_score(y_pred, y_test, average = None).mean()

0.9696969696969697

### 3. Regression using the Boston dataset

In [61]:
X=load_boston()
df=pd.DataFrame(X.data,columns=X.feature_names)
y=X.target

scaling

In [62]:
sc = StandardScaler()
sc.fit(df)
X_scaled = pd.DataFrame(data = sc.transform(df), columns=X.feature_names)

train_test split

In [63]:
X_train,X_test,y_train,y_test=train_test_split(X_scaled,y,test_size=0.3,random_state=0)

데이터셋 transformation

In [65]:
d_train = lgb.Dataset(X_train, label = y_train)

parameters

In [66]:
params={}
params['learning_rate']=0.03
params['boosting_type']='gbdt' #GradientBoostingDecisionTree
params['objective']='regression'#regression task
params['n_estimators']=100
params['max_depth']=10

학습

In [67]:
clf = lgb.train(params, d_train, 100)

You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 884
[LightGBM] [Info] Number of data points in the train set: 354, number of used features: 13
[LightGBM] [Info] Start training from score 22.745480


예측

In [68]:
y_pred=clf.predict(X_test)

평가

In [69]:
#using RMSE error metric
mean_squared_error(y_pred,y_test)

20.1990255464617