# Detailed API Introduction

In [1]:
import sys
sys.path.insert(0, '../data') # Paul: insert directory containing gresearch_crypto module into system path
import gresearch_crypto
import pandas as pd
import os

You can only submit from Kaggle Notebooks.

## TL;DR: End-to-End Usage Example

```
import gresearch_crypto
env = gresearch_crypto.make_env()

# Training data is in the competition dataset as usual
train_df = pd.read_csv('/kaggle/input/g-research-crypto-forecasting/train.csv', low_memory=False)
tgt_1_model.fit(train_df)
tgt_2_model.fit(train_df)
iter_test = env.iter_test()
for (test_df, sample_prediction_df) in iter_test:
    sample_prediction_df['Target'] = tgt_1_model.predict(test_df)
    env.predict(sample_prediction_df)
```

## Introduction

You can only call make_env() $\textbf{once}$ so don't lose it! Paul: If you want to call it again you have to restart the kernel.

In [2]:
env = gresearch_crypto.make_env()

In [3]:
for dirname, _, filenames in os.walk('../data'): # Paul: '/kaggle/input' in Kaggle Notebook?
    for filename in filenames:
        print(os.path.join(dirname, filename))

../data/example_test.csv
../data/supplemental_train.csv
../data/train.csv
../data/asset_details.csv
../data/example_sample_submission.csv
../data/gresearch_crypto/competition.cpython-37m-x86_64-linux-gnu.so
../data/gresearch_crypto/__init__.py
../data/gresearch_crypto/__pycache__/__init__.cpython-37.pyc


In [5]:
# Paul: V froze Ubuntu VM
# train_df = pd.read_csv('../data/train.csv', low_memory=False, 
#                        dtype={'Asset_ID': 'int8', 'Count': 'int32', 'row_id': 'int32', 'Count': 'int32', 
#                               'Open': 'float64', 'High': 'float64', 'Low': 'float64', 'Close': 'float64', 
#                               'Volume': 'float64', 'VWAP': 'float64'
#                              }
#                       )
# train_df.head(3)
train_df = pd.read_csv('../data//train.csv')
train_df.head(3)

Unnamed: 0,timestamp,Asset_ID,Count,Open,High,Low,Close,Volume,VWAP,Target
0,1514764860,2,40.0,2376.58,2399.5,2357.14,2374.59,19.233005,2373.116392,-0.004218
1,1514764860,0,5.0,8.53,8.53,8.53,8.53,78.38,8.53,-0.014399
2,1514764860,1,229.0,13835.194,14013.8,13666.11,13850.176,31.550062,13827.062093,-0.014643


## `iter_test` function

You have direct access to the example test rows for convenience, but your code will only be able to get rows from the real test set via the API. Once you call `predict` you can continue on to the next batch. Yields test_df and sample_prediction_df until until call failure. Paul: What does this mean? Can I do this offline or has to be in kaggle notebook?

In [6]:
# You can only iterate through a result from `env.iter_test()` once
# so be careful not to lose it once you start iterating.
iter_test = env.iter_test()

In [7]:
(test_df, sample_prediction_df) = next(iter_test)
test_df.head(3)

This version of the API is not optimized and should not be used to estimate the runtime of your code on the hidden test set.


Unnamed: 0,timestamp,Asset_ID,Count,Open,High,Low,Close,Volume,VWAP,row_id
0,1623542400,3,1201,1.478556,1.48603,1.478,1.483681,654799.561103,1.481439,0
1,1623542400,2,1020,580.306667,583.89,579.91,582.276667,1227.988328,581.697038,1
2,1623542400,0,626,343.7895,345.108,343.64,344.598,1718.832569,344.441729,2


The API will require 0.5 GB of memory after initialization. The initialization step (env.iter_test()) will require meaningfully more memory than that; we recommend you do not load your model until after making that call. The API will also consume less than 30 minutes of runtime for loading and serving the data. Paul: How do I plan for the impact of the API on your notebook's runtime and memory use?

In [8]:
sample_prediction_df.head(3)

Unnamed: 0,row_id,Target
0,0,0.0
1,1,0.0
2,2,0.0


We'll get an error if we try to continue on to the next batch without making our predictions for the current batch.

## `predict` function

Stores your predictions for the current batch. Expects the same format as sample_prediction_df.

Args:

    predictions_df: DataFrame which must have the same format as sample_prediction_df.

This function will raise an Exception if not called after a successful iteration of the iter_test generator.

In [9]:
env.predict(sample_prediction_df)

## Main Loop/Basic Submission Template

When writing your own notebooks, be sure to write robust code that makes as few assumptions about the iter_test/predict loop as possible. For example there may be large gaps between timestamps for one or more cryptoassets. In the unlikely event that a cryptoasset were dropped from enough exchanges it might go missing from the dataset entirely.

You may assume that the structure of sample_prediction_df will not change in this competition.

In [10]:
for (test_df, sample_prediction_df) in iter_test:
    sample_prediction_df['Target'] = 0
    env.predict(sample_prediction_df)