# Jane Street Market Prediction: Baseline (Part 2)
![janestreet](https://www.janestreet.com/assets/logo_horizontal.png)

### “Buy low, sell high.” It sounds so easy….

In reality, trading for profit has always been a difficult problem to solve, even more so in today’s fast-moving and complex financial markets. Electronic trading allows for thousands of transactions to occur within a fraction of a second, resulting in nearly unlimited opportunities to potentially find and take advantage of price differences in real time.

## It's the second part of my notebook: 

## [Jane Street Market Prediction: EDA, PCA, Baseline](https://www.kaggle.com/maksymshkliarevskyi/jane-street-market-prediction-eda-pca-baseline) with baseline model.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import gc
from xgboost import XGBClassifier
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split

# ignoring warnings
import warnings
warnings.simplefilter("ignore")

import janestreet

In [None]:
train_df = pd.read_csv('../input/jane-street-market-prediction/train.csv')
features_df = pd.read_csv('../input/jane-street-market-prediction/features.csv')
example_test = pd.read_csv('../input/jane-street-market-prediction/example_test.csv')
sample_prediction_df = pd.read_csv('../input/jane-street-market-prediction/example_sample_submission.csv')

print('Train dataset shape: {}'.format(train_df.shape))
print('Features dataset shape: {}'.format(features_df.shape))
print('Example test dataset shape: {}'.format(example_test.shape))

# Baseline model

In [None]:
# Loading prediction work space
env = janestreet.make_env()
iter_test = env.iter_test()

In [None]:
# Preparing the data
train_df = train_df[train_df['weight'] != 0]
train_df['action'] = ((train_df['weight'].values * train_df['resp']
                       .values) > 0).astype('int')

X_train = train_df.loc[:, train_df.columns.str.contains('feature')]
y_train = train_df.loc[:, 'action']

X_train = X_train.fillna(-999)

In [None]:
del train_df
gc.collect()

In [None]:
y_train.astype('str').hist()
plt.show()

We have balanced targets.

In [None]:
X_tr, X_valid, y_tr, y_valid = train_test_split(X_train, y_train, 
                                                train_size = 0.85, 
                                                random_state = 0)

In [None]:
params = {'n_estimators': 500,
          'max_depth': 10,
          'learning_rate': 0.05,
          'missing': -999,
          'random_state': 0,
          'tree_method': 'gpu_hist',
          'verbosity': 1}

model = XGBClassifier(**params)

model.fit(X_tr, y_tr)

In [None]:
print('ROC AUC score: %.3f' 
      %roc_auc_score(y_valid, model.predict(X_valid)))

In [None]:
params = {'n_estimators': 500,
          'max_depth': 11,
          'subsample': 0.9,
          'learning_rate': 0.05,
          'missing': -999,
          'random_state': 0,
          'tree_method': 'gpu_hist'}

model = XGBClassifier(**params)

model.fit(X_train, y_train)

In [None]:
for (test_df, sample_prediction_df) in iter_test:
    X_test = test_df.loc[:, test_df.columns.str.contains('feature')]
    X_test.fillna(-999)
    preds = model.predict(X_test)
    sample_prediction_df.action = preds
    env.predict(sample_prediction_df)