This tutorial is an extended version of the [Quickstart Guide](../quickstart.html)

In [None]:
# Note: Cell is tagged to not show up in the mkdocs build
%load_ext autoreload
%autoreload 2

In [None]:
import polaris as po

## Login
We first need to authenticate ourselves using our Polaris account. If you don't have an account yet, you can create one [here](https://polarishub.io/sign-up).

In [None]:
from polaris.hub.client import PolarisHubClient

with PolarisHubClient() as client:
    client.login()

## Load from the Hub
Datasets and benchmarks are identified by a `owner/slug` id. 

In [None]:
benchmark = po.load_benchmark("polaris/hello-world-benchmark")

Loading a benchmark will automatically load the underlying dataset. 

You can also load the dataset directly. 

In [None]:
dataset = po.load_dataset("polaris/hello-world")

## The Benchmark API
The benchmark object provides two main API endpoints. 

- `get_train_test_split()`: For creating objects through which we can access the different dataset partitions.
- `evaluate()`: For evaluating a set of predictions in accordance with the benchmark protocol.

### Train-test split

In [None]:
train, test = benchmark.get_train_test_split()

The created objects support various flavours to access the data.
- The objects are iterable;
- The objects can be indexed;
- The objects have properties to access all data at once.

In [None]:
for x, y in train:
    pass

In [None]:
for i in range(len(train)):
    x, y = train[i]

In [None]:
x = train.inputs
y = train.targets

To avoid accidental access to the test targets, the test object does not expose the labels and will throw an error if you try access them explicitly.

In [None]:
for x in test:
    pass

In [None]:
for i in range(len(test)):
    x = test[i]

In [None]:
x = test.inputs

# NOTE: The below will throw an error!
# y = test.targets

We also support conversion to other typical formats.

In [None]:
df_train = train.as_dataframe()

### Submit your results

In this example, we will train a simple Random Forest model on the ECFP representation through [scikit-learn](https://scikit-learn.org/stable/) and [datamol](https://github.com/datamol-io/datamol).

In [None]:
import datamol as dm
from sklearn.ensemble import RandomForestRegressor

# We will recreate the split to pass a featurization function.
train, test = benchmark.get_train_test_split(featurization_fn=dm.to_fp)

# Define a model and train
model = RandomForestRegressor(max_depth=2, random_state=0)
model.fit(train.X, train.y)

In [None]:
predictions = model.predict(test.X)

As said before, evaluating the submissions should be done through the `evaluate()` endpoint.

In [None]:
results = benchmark.evaluate(predictions)
results

Before uploading the results to the Hub, you can provide some additional information about the results that will be displayed on the Polaris Hub.

In [None]:
# For a complete list of metadata, check out the BenchmarkResults object
results.name = "hello-world-result"
results.github_url = "https://github.com/polaris-hub/polaris-hub"
results.paper_url = "https://polarishub.io/"
results.description = "Hello, World!"
results.tags = ["random_forest", "ecfp"]
results.user_attributes = {"Framework": "Scikit-learn"}

Finally, let's upload the results to the Hub!

In [None]:
results.upload_to_hub(owner="my-username", access="public")

That's it! Just like that you have submitted a result to a Polaris benchmark

---

The End.
