# introduction

This notebook gives a quick intro to using the `ubiquant_utils` module which emulates the Ubiquant timeseries API locally. It gives you the flexibility to:
1. feed different slices of the train dataset into it
2. create as many emulator instances as you want within one session
3. call an LB score method at any point during the iteration

It enforces similar constraints to the real API and produces realistic error messages. 
The code adapts my [Local API Emulator](https://www.kaggle.com/jagofc/local-api-emulator) from the G-Research Crypo Competition.

For a quick introduction to importing utility scripts see [this intro video](https://www.youtube.com/watch?v=C4h88PfN5jA&ab_channel=Kaggle).

# import

To use the module in your notebook you need to:
1. In the notebook menu select File > Add Utility Script
2. Search for "ubiquant_utils" and (double-) click Add.
3. Import `ubiquant_utils` as you would any module. E.g:

In [None]:
import ubiquant_utils as uu

# data

In [None]:
import gc
import time
import numpy as np
import pandas as pd
from tqdm.notebook import tqdm

In [None]:
# thanks for the parquet file @robikscube. upvoted.
# takes a minute or so to load.
train_df = pd.read_parquet('../input/ubiquant-parquet/train.parquet')

# demo #1
### example loop with dummy predictions

Take slice of `train_df` which contains 250 time_ids for testing. Note that we have [around 200 - 250 time_ids](https://www.kaggle.com/lucasmorin/don-t-mind-me-just-probing-the-lb) in the public LB. upvoted.

In [None]:
test_df = train_df[train_df.time_id <= 250]

Delete `train_df` and take out the trash to free up enough RAM for a couple of examples.

In [None]:
del train_df
gc.collect()

Create an API instance:

In [None]:
api = uu.API(test_df)

An example loop making dummy predictions of target=0:

In [None]:
start_time = time.time()

for (data_df, pred_df) in tqdm(api):
    
    # dummy prediction - insert yours here.
    pred_df['target'] = 0.
    api.predict(pred_df)
    
finish_time = time.time()

total_time = finish_time - start_time
mean_iter_speed = api.init_num_times/total_time

print(f"Iterations/s = {mean_iter_speed:.2g}.")
test_iters = 250
print(f"Expected number of iterations in test set is approx. {test_iters}",
      f"which will take {test_iters / (mean_iter_speed * 3600):.2g} hours",
      "using this API emulator while making dummy predictions.")

The API has a `score` method. This returns:
+ a dataframe containing your predictions and the targets.
+ the LB score: mean of the correlation between predictions and targets when they are grouped by time_id.

In [None]:
score_df, score = api.score()

In [None]:
print(f"Final LB score is {score:.4g}")

In [None]:
score_df.head(5)

# demo #2
### example loop with random predictions and regular scoring calls

In [None]:
api = uu.API(test_df)

for i, (data_df, pred_df) in enumerate(api):
    
    # random prediction
    pred_df['target'] = np.random.randn(len(pred_df), 1)
    api.predict(pred_df)
    
    #regular scoring
    if i % 10 == 0:
        _, cum_score = api.score()
        print(f"LB score at {i:<3}: {cum_score:>10.4g}")
    
    
score_df, score = api.score()

print(f"Final LB score is {score:.4g}")

# end