#### Install Packages
You can install packages interactively (in the notebook), at the command line (using a shell/terminal), or through a Custom Environment (preferred).

To install packages interactively, you can use:
- `%package install <package_name>`: Bloomberg's recommended method, which uses the Conda repositories.
- `%pip install <package_name>`: I use `pip` for most installation, using PyPI repository. 

pip is faster and has a more up to date catalog for most pure-Python packages, but Conda includes non-Python resources. 

In [None]:
# Uncomment this line, if the packages aren't installed
# This only needs to be run once per session, which is why I have it commented out.

# %package install plotly scikit-learn xgboost

In [None]:
import bql
import plotly.express as px
import xgboost

# Some default settings for charts, mostly for blogging purposes
# Adjust for your needs

import plotly.io as pio
pio.renderers.default='plotly_mimetype+notebook'
pio.templates[pio.templates.default].layout.height = 200 
pio.templates[pio.templates.default].layout.margin = dict(l=50, r=50, t=50, b=50) 

#### Run a BQL Query

This query retrieves 30 days of px_last for a single security. BQL dates are inclusive, so (-29d,0d) includes today / current value. 

`fill=prev` fills in empty values with the previous value. This fills in days where the market wasn't open and avoids gaps in results. When doing analysis, consider carefully how filling will influence your values: filling makes change rates seem more correlated, for instance. 

date_ordinal is used, since most models need ordinal (numeric) X values. 

In [None]:
bql_svc = bql.Service()
basics_query = """get(
      px_last
    ) for(
      ['IBM US Equity']
    ) with(
      dates=range(-29d, 0d),
      fill=prev,
      currency=USD
    )"""
response = bql_svc.execute(basics_query)
base_df = bql.combined_df(response)

# Reset the index: bql's combined_df returns ID as a sole index.
base_df = base_df.reset_index() 

base_df['date_ordinal'] = base_df['DATE'].apply(lambda x: x.toordinal())


#### Plot the result

In [None]:
px.line(base_df, x="DATE", y="px_last")

#### Draw a Simple Moving Average

In [None]:
import pandas as pd

# Make a copy of the dataframe
df_withavgs = base_df.copy()

df_withavgs["sma_3day"] = df_withavgs["px_last"].rolling(3).mean()
px.line(df_withavgs, x="DATE", y=["px_last", "sma_3day"])

#### LinearRegression - Fit to Entire Data

In [None]:
# This example uses scikit-learn (also called: sklearn) to perform a simple linear
# regression. This pattern of fitting a model, and then predicting, unlocks a lot of other tools
# You'll see xgboost uses the same flow. 
# This model uses the entire date range, with no train/test split. 

from sklearn.linear_model import LinearRegression

linear_model_full = LinearRegression()
X = df_withavgs[['date_ordinal']]
y = df_withavgs['px_last']

linear_model_full.fit(X, y)
df_withavgs["px_last_pred_fulltrain"] = linear_model_full.predict(X)

df_withavgs["sma_3day"] = df_withavgs["px_last"].rolling(3).mean()
px.line(df_withavgs, x="DATE", y=["px_last", "sma_3day", "px_last_pred_fulltrain"])

#### LinearRegression w/ Train - Test Split
Last example wasn't very interesting. We fit the model to the entirety of the data, telling us little about whether the model is useful or not. 
Instead, let's split the data into a "Train/Test" split. Being time series, we'll train on the first 3 weeks (21 days) and forecast (test) the remaining 9 days.


In [None]:
train_df = base_df.iloc[:21]
test_df = base_df[["DATE", "date_ordinal", "px_last"]].iloc[21:]

linear_model_split = LinearRegression()
X_train = train_df[["date_ordinal"]]
y_train = train_df["px_last"]

linear_model_split.fit(X_train, y_train)

X_test = test_df[["date_ordinal"]]
test_df["px_last_pred_split"] = linear_model_split.predict(X_test)

df_withpreds = pd.concat([train_df, test_df]).reset_index()
px.line(df_withpreds, x="DATE", y=["px_last", "px_last_pred_split"])


#### Predicting into the Future

But, what about the future?
Using the full 30 days, let's predict 14 days into the future


In [None]:
import datetime 

future_range = 14 # days
df_future = pd.DataFrame({"date_ordinal":range(base_df['date_ordinal'].max()+1, base_df['date_ordinal'].max()+future_range)}, index=range(base_df.index.max()+1, base_df.index.max()+future_range))

X_future_pred = linear_model_full.predict(df_future)

df_future["px_last_pred_future"] = X_future_pred
df_future["DATE"] = pd.to_datetime(df_future['date_ordinal'].apply(lambda x: datetime.date.fromordinal(x)))

df_with_future = pd.concat([df_withavgs, df_future])

px.line(df_with_future, x="DATE", y=["px_last", "px_last_pred_fulltrain", "px_last_pred_future"])

#### Using XGBoost

The point of this example isn't about xgboost, which is a pretty cool technology and useful for many things... but that the same flow we used for SKLearn's LinearRegression can be applied to *other* regressions. 

If you can follow / understand the LinearRegression example, then you can do a lot of other cool things without needing to learn much more Python code. 

In [None]:
import xgboost as xgb

xg_df = base_df.copy()

xg_df['month'] = xg_df['DATE'].dt.month
xg_df['day_of_week'] = xg_df['DATE'].dt.dayofweek

train_df = xg_df.iloc[:-21]
test_df = xg_df.iloc[-21:]

X_train = train_df[['month', 'day_of_week']]
y_train = train_df["px_last"]

xmodel = xgb.XGBRegressor(n_estimators=100, learning_rate=0.1, objective='reg:squarederror')
xmodel.fit(X_train, y_train)

xg_df.loc[test_df.index, 'xgpredicted_px_last'] = xmodel.predict(test_df[['month', 'day_of_week']])

px.line(xg_df, x="DATE", y=["px_last", "xgpredicted_px_last"])
