# The Polaris Library Basics

The `Polaris library` is specifically crafted to standardize and streamline the process of constructing datasets, training ML models, and evaluating novel ML techniques within the realms of biology, chemistry, and drug discovery. \
`Polaris` is designed to adeptly manage diverse types of dataset, including small molecules and cell painting images, among others.\
This notebook demonstrates the basic usage of Polaris in effectively managing datasets and benchmarks with **small molecules**.


**Overview**:
- [How to retrieve Dataset?](#dataset)
- [How to retrieve benchmark?](#benchmark)
- [How to submit your evaluation result?](#evaluation)


In [24]:
%load_ext autoreload
%autoreload 2
import tempfile
import datamol as dm
import pandas as pd
import polaris as po
from polaris import curation
from polaris.hub.client import PolarisHubClient

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Login with your Polaris hub account

In [25]:
client = PolarisHubClient()
client.login()

[32m2023-10-12 16:26:36.988[0m | [1mINFO    [0m | [36mpolaris.hub.client[0m:[36mlogin[0m:[36m234[0m - [1mYou are already logged in to the Polaris Hub as luzhu (lu@valencediscovery.com). Set `overwrite=True` to force re-authentication.[0m


### Retrieve Dataset from Polaris Hub
A dataset can be retrieved by `<owner name>/<dataset name>`.

In [31]:
dataset = po.load_dataset("polaristest/tutorial_rdkit_solublity")

### Retrieve benchmark from `PolarisHub`

In [38]:
benchmark = po.load_benchmark("polaristest/tutorial_benchmark_rdkit_solublity")

In [39]:
benchmark

  Expected `Union[Dataset, str, dict[str, any]]` but got `Dataset` - serialized value may not be as expected
  Expected `url` but got `str` - serialized value may not be as expected
  return self.__pydantic_serializer__.to_python(


0,1
name,tutorial_benchmark_rdkit_solublity
description,Single task regression for solibility
tags,
user_attributes,
owner,organizationIdorg_2VuFFDUgVqc7MI80w5LtujZBxSQuserIdNoneslugpolaristestownerorg_2VuFFDUgVqc7MI80w5LtujZBxSQ
target_cols,SOL
input_cols,smiles
metrics,mean_squared_errormean_absolute_error
main_metric,mean_squared_error
md5sum,b84ede1f98143356a564ee03524d02b6

0,1
organizationId,org_2VuFFDUgVqc7MI80w5LtujZBxSQ
userId,
slug,polaristest
owner,org_2VuFFDUgVqc7MI80w5LtujZBxSQ


#### Get train and test splits

In [40]:
train, test = benchmark.get_train_test_split()

In [44]:
train.targets.shape

(988,)

**Users are not allow having access to the labels of test set**

In [47]:
# test.targets

### Now we train a predictive model

In [48]:
from sklearn.ensemble import RandomForestRegressor

# Convert smiles with any molecular featurizer
train_fps = [dm.to_fp(smi) for smi in train.inputs]

# Define a model and train
model = RandomForestRegressor(max_depth=2, random_state=0)
model.fit(train_fps, train.targets)

**Make predictions on the testset**

In [49]:
test_fps = [dm.to_fp(smi) for smi in test.inputs]
predictions = model.predict(test_fps)

**Compute metrics provided by the benchmark**

In [50]:
results = benchmark.evaluate(predictions)
results.results

{<Metric.mean_squared_error: MetricInfo(fn=<function mean_squared_error at 0x15a402f20>, is_multitask=False)>: 2.687513982094899,
 <Metric.mean_absolute_error: MetricInfo(fn=<function mean_absolute_error at 0x15a402b60>, is_multitask=False)>: 1.2735690161081497}

**Provide the information about your ML method**

In [54]:
# TODO: to be updated to the new  Result structure
results.name = f"tutorial_result"
results.github_url = "https://github.com/polaris-hub/polaris-hub"
results.paper_url = "https://polaris-hub.vercel.app"
results.description = "This is a test result"

#### Upload results to the hub

In [56]:
response = client.upload_results(results)

  Expected `url` but got `str` - serialized value may not be as expected
  Expected `url` but got `str` - serialized value may not be as expected
  return self.__pydantic_serializer__.to_python(
[32m2023-10-12 16:34:58.346[0m | [32m[1mSUCCESS [0m | [36mpolaris.hub.client[0m:[36mupload_results[0m:[36m370[0m - [32m[1mYour result has been successfully uploaded to the Hub. View it here: https://polarishub.io//results/6DQW1HvMGj2nwGMyuF0aV[0m
