In [2]:
# =============================================================
# Copyright © 2020 Intel Corporation
# 
# SPDX-License-Identifier: MIT
# =============================================================

# XGBoost Getting Started Example on Linear Regression
## Importing and Organizing Data
In this example we will be predicting prices of houses in California based on the features of each house using Intel optimizations for XGBoost shipped as a part of the oneAPI AI Analytics Toolkit.
Let's start by **importing** all necessary data and packages.

In [1]:
import xgboost as xgb
from sklearn.metrics import mean_squared_error
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)


Now let's **load** in the dataset and **organize** it as necessary to work with our model.

In [2]:
#loading the data
california = fetch_california_housing()

#converting data into a pandas dataframe
data = pd.DataFrame(california.data)
data.columns = california.feature_names

#setting price as value to be predicted
data['PRICE'] = california.target

#extracting rows
X, y = data.iloc[:,:-1],data.iloc[:,-1]

#using dmatrix values for xgboost
data_dmatrix = xgb.DMatrix(data=X,label=y)

#splitting data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1693)



**Instantiate and define XGBoost regresion object** by calling the XGBRegressor() class from the library. Use hyperparameters to define the object. Intel optimizations for XGBoost trainingcan be used by calling the `hist` tree method in the parameters, as shown below.

In [3]:
xg_reg = xgb.XGBRegressor(objective ='reg:squarederror', colsample_bytree = 0.3, learning_rate = 0.1,max_depth = 5, alpha = 10, n_estimators = 10, tree_method='hist')

## Training and Saving the model

**Fitting and training model** using training datasets and predicting values. Note that Intel optimizations for XGBoost inference are enabled by default. 

In [4]:
xg_reg.fit(X_train,y_train)
preds = xg_reg.predict(X_test)

**Finding root mean squared error** of predicted values.

In [5]:
rmse = np.sqrt(mean_squared_error(y_test, preds))
print("RMSE:",rmse)

RMSE: 1.0823382872176526


 ##Saving the Results

Now let's **export the predicted values to a CSV file**.

In [6]:
pd.DataFrame(preds).to_csv('foo.csv',index=False)

In [7]:
print("[CODE_SAMPLE_COMPLETED_SUCCESFULLY]")

[CODE_SAMPLE_COMPLETED_SUCCESFULLY]
