## 保存xgboost模型

xgboost可以用于使用梯度提升算法为表格数据创建一些表现最佳的模型。


经过训练，将模型保存为文件中以便以后用于预测新的测试和验证数据集以及全新数据通常是一种很好的做法。


## 使用pickle序列化您的xgboost模型

In [29]:
# 我们继续使用pima印第安人糖尿病数据集上训练xgboost模型，将模型保存到文件，然后加载它以做出预测。
# Train xgboost model, save to file using pickle,load and make predictions
from numpy import loadtxt
import xgboost
import pickle
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# load data
dataset = loadtxt('pima-indians-diabetes.csv',delimiter=",",skiprows=1)
# split data into x and y
x = dataset[:,0:8]
y = dataset[:,8]
# split data into train and test
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.33,random_state=7)
# fit model on traing data
model = xgboost.XGBClassifier()
model.fit(x_train,y_train)
# save model to file
pickle.dump(model,open("pima.pickle.dat","wb"))
# some time later
# load model file
load_model =pickle.load(open("pima.pickle.dat","rb"))
# make predictions for test data
y_pred = load_model.predict(x_test)
predictions = [round(value) for value in y_pred]
# evaluate predictions
accuracy = accuracy_score(y_test,predictions)
print("ASccuracy: %.2f%%" % (accuracy*100))

ASccuracy: 72.83%


加载模型并对训练数据集做出预测后，打印出来的概率为72.83%

## 使用joblib序列化保存xgboost模型

joblib是scipy生态系统的一部分，并提供用于管道化的python作业的实用程序。


joblib api提供了用于保存和加载有效利用numpy数据结构的python对象的实用程序，对于非常大的模型，使用它而可能是一种更快捷的方法。

In [30]:
# 我们继续使用pima印第安人糖尿病数据集上训练xgboost模型，将模型保存到文件，然后加载它以做出预测。
# Train xgboost model, save to file using joblib,load and make predictions
from numpy import loadtxt
import xgboost
import joblib
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# load data
dataset = loadtxt('pima-indians-diabetes.csv',delimiter=",",skiprows=1)
# split data into x and y
x = dataset[:,0:8]
y = dataset[:,8]
# split data into train and test
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.33,random_state=7)
# fit model on traing data
model = xgboost.XGBClassifier()
model.fit(x_train,y_train)
# save model to file
joblib.dump(model,"pima.joblib.dat")
# some time later
# load model file
load_model =joblib.load("pima.joblib.dat")
# make predictions for test data
y_pred = load_model.predict(x_test)
predictions = [round(value) for value in y_pred]
# evaluate predictions
accuracy = accuracy_score(y_test,predictions)
print("ASccuracy: %.2f%%" % (accuracy*100))

ASccuracy: 72.83%


这里保存为了pima.joblib.dat,输出结果为72.83%。