# **Getting Started**

Welcome to Segmind. This JupyterLab interface is the next-generation of notebook interface that supports a wide range of workflows in data science, scientific computing, and machine learning. If you are new to notebooks, we recommend you to look at JupyterLab's official [documentation](https://jupyterlab.readthedocs.io/en/latest/user/interface.html).
Kick-start your project with Segmind without worrying about the environment setup.

## XGBoost Sample Project - Iris dataset 

In [1]:
import xgboost
from sklearn import datasets
from sklearn.model_selection import train_test_split


Load Iris dataset, split it to train and test

In [2]:

iris = datasets.load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

XGBoost needs that Numpy arrays be loaded in special DMatrix format:

In [3]:
dtrain = xgboost.DMatrix(X_train, label=y_train)
dtest = xgboost.DMatrix(X_test, label=y_test)

Then let’s set up XGBoost params:

In [4]:
param = {
    'max_depth': 3,                 # the maximum depth of each tree
    'eta': 0.3,                     # the training step for each iteration
    'silent': 1,                    # logging mode - quiet
    'objective': 'multi:softmax',   # multiclass classification using the softmax objective
    'num_class': 3                  # the number of classes that exist in this datset
}  
num_round = 20  # the number of training iterations

More about XGBoost params can be found here: [XGBoost Parameters](https://xgboost.readthedocs.io/en/latest/parameter.html).

Now train our model. And, if you want to look at how XGBModel looks like, dump it at text file and then simply take a look at it.

In [5]:
if not os.path.exists('models'):
    os.mkdir('models')
    
bstmodel = xgboost.train(param, dtrain, num_round)
bstmodel.dump_model('models/dump.bstmodel.txt')

Parameters: { silent } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.




Then predict on test data and calculate accuracy score:

In [6]:
preds = bstmodel.predict(dtest)

In [7]:
from sklearn import metrics
acc = metrics.accuracy_score(y_test, preds)
acc

1.0