# Introduction to the `kagglegym` API

Code Competitions are a new style of competition where you submit code rather than the predictions that your code creates. This allows for new types of competitions like this time-series competition hosted by Two Sigma. This notebook gives an overview of the API, `kagglegym`, which was heavily influenced by [OpenAI's Gym](https://gym.openai.com/docs) API for reinforcement learning challenges.

## Data Overview

Another difference with this competition is that we're using an [HDF5 file](https://support.hdfgroup.org/HDF5/) instead of a CSV file due to the size of the data. You can still easily read it and manipulate it for exploration:

In [6]:
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

In [2]:
# Here's an example of loading the CSV using Pandas's built-in HDF5 support:
import pandas as pd

with pd.HDFStore("../data/train.h5", "r") as train:
    # Note that the "train" dataframe is the only dataframe in the file
    df = train.get("train")

In [3]:
# Let's see how many rows are in full training set
len(df)

1710756

In [4]:
df.head()

Unnamed: 0,id,timestamp,derived_0,derived_1,derived_2,derived_3,derived_4,fundamental_0,fundamental_1,fundamental_2,...,technical_36,technical_37,technical_38,technical_39,technical_40,technical_41,technical_42,technical_43,technical_44,y
0,10,0,0.370326,-0.006316,0.222831,-0.21303,0.729277,-0.335633,0.113292,1.621238,...,0.775208,,,,-0.414776,,,-2.0,,-0.011753
1,11,0,0.014765,-0.038064,-0.017425,0.320652,-0.034134,0.004413,0.114285,-0.210185,...,0.02559,,,,-0.273607,,,-2.0,,-0.00124
2,12,0,-0.010622,-0.050577,3.379575,-0.157525,-0.06855,-0.155937,1.219439,-0.764516,...,0.151881,,,,-0.17571,,,-2.0,,-0.02094
3,25,0,,,,,,0.178495,,-0.007262,...,1.035936,,,,-0.211506,,,-2.0,,-0.015959
4,26,0,0.176693,-0.025284,-0.05768,0.0151,0.180894,0.139445,-0.125687,-0.018707,...,0.630232,,,,-0.001957,,,0.0,,-0.007338


In [5]:
# How many timestamps are in the full training set?
len(df["timestamp"].unique())

1813

**Important Note**: the raw training file is only available for exploration kernels. It will not be available when you make a competition submission. You should only use the raw training file for exploration purposes.

## API Overview

The "kagglegym" API is based on OpenAI's Gym API, a toolkit for developing and comparing reinforcement learning algorithms. Read OpenAI's Gym API [documentation](https://gym.openai.com/docs) for more details. Note that ours is named "kagglegym" and not "gym" to prevent possible conflicts with OpenAI's "gym" library. This section will give an overview of the concepts to get you started on this competition.

The API is exposed through a `kagglegym` library. Let's import it to get started:

In [8]:
from src import kagglegym

Now, we need to create an "environment". This will be our primary interface to the API. The `kagglegym` API has the concept of a default environment name for a competition, so just calling `make()` will create the appropriate one for this competition.

In [9]:
# Create environment
env = kagglegym.make()

To properly initialize things, we need to "reset" the environment. This will also give us our first "observation":

In [10]:
# Get first observation
observation = env.reset()

Observations are the means by which our code "observes" the world. The very first observation has a special property called "train" which is a dataframe which we can use to train our model:

In [11]:
# Look at first few rows of the train dataframe
observation.train.head()

Unnamed: 0,id,timestamp,derived_0,derived_1,derived_2,derived_3,derived_4,fundamental_0,fundamental_1,fundamental_2,...,technical_36,technical_37,technical_38,technical_39,technical_40,technical_41,technical_42,technical_43,technical_44,y
0,10,0,0.370326,-0.006316,0.222831,-0.21303,0.729277,-0.335633,0.113292,1.621238,...,0.775208,,,,-0.414776,,,-2.0,,-0.011753
1,11,0,0.014765,-0.038064,-0.017425,0.320652,-0.034134,0.004413,0.114285,-0.210185,...,0.02559,,,,-0.273607,,,-2.0,,-0.00124
2,12,0,-0.010622,-0.050577,3.379575,-0.157525,-0.06855,-0.155937,1.219439,-0.764516,...,0.151881,,,,-0.17571,,,-2.0,,-0.02094
3,25,0,,,,,,0.178495,,-0.007262,...,1.035936,,,,-0.211506,,,-2.0,,-0.015959
4,26,0,0.176693,-0.025284,-0.05768,0.0151,0.180894,0.139445,-0.125687,-0.018707,...,0.630232,,,,-0.001957,,,0.0,,-0.007338


Note that this "train" is about half the size of the full training dataframe. This is because we're in an exploratory mode where we simulate the full environment by reserving the first half of timestamps for training and the second half for simulating the public leaderboard.

In [12]:
# Get length of the train dataframe
len(observation.train)

806298

In [13]:
# Get number of unique timestamps in train
len(observation.train["timestamp"].unique())

906

In [14]:
# Note that this is half of all timestamps:
len(df["timestamp"].unique())

1813

In [15]:
# Here's proof that it's the first half:
unique_times = list(observation.train["timestamp"].unique())
(min(unique_times), max(unique_times))

(0, 905)

Each observation also has a "features" dataframe which contains features for the timestamp you'll be asked to predict in the next "step." Note that these features are for timestamp 906 which is just passed the last training timestamp. Also, note that the "features" dataframe does *not* have the target "y" column:

In [16]:
# Look at the first few rows of the features dataframe
observation.features.head()

Unnamed: 0,id,timestamp,derived_0,derived_1,derived_2,derived_3,derived_4,fundamental_0,fundamental_1,fundamental_2,...,technical_35,technical_36,technical_37,technical_38,technical_39,technical_40,technical_41,technical_42,technical_43,technical_44
0,0,906,0.246848,0.102251,0.002781,-0.029337,0.400748,-0.273942,0.535335,0.243197,...,-0.083835,-0.193842,-4.642828e-10,-4.642828e-10,-4.436047e-10,-0.057845,-0.054224,4.406503e-05,-1.999907,-0.052079
1,7,906,0.217796,1.922894,0.752644,-0.237133,0.149876,-0.324906,-0.110032,-0.49329,...,-0.003478,-0.139345,-0.015625,-0.015625,-0.015625,-0.017113,-0.067621,-0.015625,-3.330669e-16,0.010135
2,11,906,-0.074121,-0.057722,1.595148,0.3118,-0.00356,0.274813,0.088182,-0.170704,...,-0.073679,-0.083068,-3.359413e-15,-7.448059000000001e-23,-0.7128695,-0.045347,-0.094896,-0.7277103,-1.96875,0.025098
3,12,906,0.160421,-0.037247,0.455026,0.022488,0.154796,-0.079147,0.865164,0.016182,...,0.471345,0.467192,-5.4633659999999996e-24,-0.00069384,-6.1447310000000004e-33,-0.030753,-0.24019,0.0,-1.301181e-13,-0.001249
4,13,906,-0.025554,-0.046826,,-0.026069,,-0.049894,0.046646,,...,0.115415,0.09082,-3.1055289999999997e-20,-1.614121e-21,0.0,-0.133357,-0.068176,-5.743766e-24,-3.330669e-16,0.019396


The final part of observation is the "target" dataframe which is what we're asking you to fill in. It includes the "id"s for the timestamp next step.

In [17]:
# Look at the first few rows of the target dataframe
observation.target.head()

Unnamed: 0,id,y
0,0,0.0
1,7,0.0
2,11,0.0
3,12,0.0
4,13,0.0


This target is a valid submission for the step. The OpenAI Gym calls each step an "action". Each step of the environment returns four things: "observation", "reward", "done", and "info".

In [18]:
# Each step is an "action"
action = observation.target

# Each "step" of the environment returns four things:
observation, reward, done, info = env.step(action)

The "done" variable tells us if we're done. In this case, we still have plenty of timestamps to go, so it returns "False".

In [19]:
# Print done
done

False

The "info" variable is just a dictionary used for debugging. In this particular environment, we only make use of it at the end (when "done" is True).

In [20]:
# Print info
info

{}

We see that "observation" has the same properties as the one we get in "reset". However, notice that it's for the next "timestamp":

In [21]:
# Look at the first few rows of the observation dataframe for the next timestamp
observation.features.head()

Unnamed: 0,id,timestamp,derived_0,derived_1,derived_2,derived_3,derived_4,fundamental_0,fundamental_1,fundamental_2,...,technical_35,technical_36,technical_37,technical_38,technical_39,technical_40,technical_41,technical_42,technical_43,technical_44
0,0,907,0.241634,0.101084,0.00255,-0.028882,0.398276,-0.27257,0.538591,0.241426,...,-0.081623,-0.194489,-4.041816e-10,-4.041816e-10,-3.861803e-10,-0.057166,-0.055835,3.836084e-05,-1.999919,-0.049153
1,7,907,0.204701,1.925502,0.753222,-0.237052,0.154132,-0.324891,-0.110032,-0.495813,...,-0.005009,-0.144726,-0.01360235,-0.01360235,-0.01360235,-0.017764,-0.087472,-0.01360235,-3.330669e-16,0.008787
2,11,907,-0.073517,-0.057697,1.611172,0.310672,-0.003293,0.272259,0.088949,-0.171558,...,-0.071035,-0.082874,-2.924539e-15,-6.483912e-23,-0.7500384,-0.045536,-0.10429,-0.762958,-1.972795,0.021927
3,12,907,0.160591,-0.037544,0.450525,0.021005,0.153457,-0.077793,0.858471,0.01649,...,0.474488,0.46915,-4.756136e-24,-0.0006040228,-5.349299e-33,-0.030443,-0.222433,0.0,-1.132427e-13,-0.001827
4,13,907,-0.025554,-0.046826,,-0.026069,,-0.049894,0.046646,,...,0.115415,0.09082,-2.70352e-20,-1.4051739999999999e-21,0.0,-0.133357,-0.054863,-5.000239e-24,-3.330669e-16,0.019338


In [22]:
# Note that this timestamp has more id's/rows
len(observation.features)

968

Perhaps most interesting is the "reward" variable. This tells you how well you're doing. The goal in reinforcement contexts is that you want to maximize the reward. In this competition, we're using the R value that ranges from -1 to 1 (higher is better). Note that we submitted all 0's, so we got a score that's below 0. If we had correctly predicted the true mean value, we would have gotten all zeros. If we had made extreme predictions (e.g. all `-1000`'s) then our score would have been capped to -1.

In [24]:
# Print reward
reward

-0.2000274424971549

Since we're in exploratory mode, we have access to the ground truth (obviously not available in submit mode):

In [25]:
perfect_action = df[df["timestamp"] == observation.features["timestamp"][0]][["id", "y"]].reset_index(drop=True)

In [26]:
# Look at the first few rows of perfect action
perfect_action.head()

Unnamed: 0,id,y
0,0,-0.003758
1,7,-0.009357
2,11,-0.001851
3,12,0.00309
4,13,0.008478


Let's see what happens when we submit a "perfect" action:

In [27]:
# Submit a perfect action
observation, reward, done, info = env.step(perfect_action)

As expected, we get the maximum reward of 1 by submitting the perfect value:

In [28]:
# Print reward
reward

1.0

## Making a complete submission

We've covered all of the basic components of the `kagglegym` API. You now know how to create an environment for the competition, get observations, examine features, and submit target values for a reward. But, we're still not done as there are more observations/timestamps left.

In [29]:
# Print done ... still more timestamps remaining
done

False

Now that we've gotten the basics out of the way, we can create a basic loop until we're "done". That is, we'll make a prediction for the remaining timestamp in the data:

In [30]:
while True:
    target = observation.target
    timestamp = observation.features["timestamp"][0]
    if timestamp % 100 == 0:
        print("Timestamp #{}".format(timestamp))

    observation, reward, done, info = env.step(target)
    if done:        
        break

Timestamp #1000
Timestamp #1100
Timestamp #1200
Timestamp #1300
Timestamp #1400
Timestamp #1500
Timestamp #1600
Timestamp #1700
Timestamp #1800


Now we can confirm that we're done:

In [31]:
# Print done
done

True

And since we're "done", we can take a look at at "info", our dictionary used for debugging. Recall that in this environment, we only make use of it when "done" is True.

In [32]:
# Print info
info

{'public_score': 0.016651181430087304}

Our score is better than 0 because we had that one submission that was perfect.

In [33]:
# Print "public score" from info
info["public_score"]

0.016651181430087304

This concludes our overview of the `kagglegym` API. We encourage you to ask questions in the competition forums or share public kernels for feedback on your approach. Good luck!