In [1]:
import datamol as dm
import os

In [2]:
import os
from os import path
from os.path import dirname

os.chdir(path.dirname(path.dirname(path.abspath("__file__"))))

In [3]:
os.getcwd()

'/Users/lu.zhu/Documents/Codebase/ValenceLab/polaris-recipes'

In [34]:
dataset_name = "drewry_ret_wt_v804l_y791f_v1"

In [35]:
dataset_readme = """

- **RET (Rearranged during Transfection)** is a proto-oncogene that codes for a receptor tyrosine kinase. This means it produces a protein that plays a role in signaling pathways within cells, particularly related to cell growth and differentiation. When activated, RET helps regulate cell survival, proliferation, and differentiation. Mutations or alterations in the RET gene can lead to uncontrolled cell growth and potentially the development of cancer.
    - **RET wild type**:  In some cases, targeting both mutant and wild-type RET together can be more effective than targeting only one form as Combination Therapies.
    In certain cancer types, such as some subtypes of non-small cell lung cancer (NSCLC), the RET signaling pathway can interact with other oncogenic pathways, such as the EGFR (epidermal growth factor receptor) pathway. Targeting both pathways simultaneously might offer a synergistic effect and improve treatment outcomes.
    - **RET-V804L:** The V804L mutation causes a structural change in the RET protein, resulting in its continuous activation, even in the absence of ligand binding. The mutation leads to uncontrolled cell growth and division, contributing to oncogenesis. The V804L mutation in RET has been identified in various cancer types, particularly in thyroid cancers, including papillary thyroid carcinoma (PTC) and medullary thyroid carcinoma (MTC). It is often associated with aggressive tumor behavior and resistance to conventional therapies.
    V804L confers a gain of function on the Ret protein, as indicated by increased kinase activity, cell transformation (PMID: 9242375), and is considered a gatekeeper mutation due to lack of response to some inhibitors, including cabozantinib and vandetanib (PMID: 27712045).
    - **RET-Y791F:** The Y791F mutation disrupts a crucial phosphorylation site within the RET protein. Consequently, the tyrosine at position 791 cannot be phosphorylated effectively or at all. This disruption interferes with the normal signaling pathways that rely on this particular phosphorylation event, leading to dysregulated downstream signaling, such as MAPK and PI3K. The Y791F mutation, by abrogating this phosphorylation site, alters these cellular responses and may influence tumor development and progression. The Y791F mutation is found in the RET protein, which is implicated in several types of cancer, particularly medullary thyroid carcinoma (MTC). Y791F confers a gain of function on the Ret protein, resulting in ligand independent kinase activity and activation of Stat3 signaling through Src, Jak1, and Jak2-dependent pathways in cell culture (PMID: 15753368).

**Description of readout:**
- **Readouts**: `RET`, `RET_(V804L_mutant)`, `RET_(Y791F_mutant)`
- **Bioassay readout**: Percentage of inhibition.
- **Optimization objective**: Higher the more potent.

**Published EGFR datasets**: 
- **PKIS 1**: A set of 364 unique small-molecule ATP-competitive kinase inhibitors that was screened by the set in activity assays with 224 recombinant kinases and 24 G protein-coupled receptors and in cellular assays of cancer cell proliferation and angiogenesis ([ref](https://pubmed.ncbi.nlm.nih.gov/26501955/)).


"""

In [36]:
with dm.fs.fsspec.open(
    f"gs://polaris-public/datasets/readmes/{dataset_name}_readme.md", "w"
) as f:
    f.write(dataset_readme)

In [51]:
benchmark_name = "drewry_ret_wildtype_singletask_clf_v1"

**Benchmarking goal:**
The goal of this benchmark is to select the best predictive model for 
- Optimization of the bioactivity % inhibition.
- Discovery of potential hits in new chemical space.

**Molecule data resource**: 
https://www.ncbi.nlm.nih.gov/pubmed/28767711

**Train/test split**
Therefore, a scaffold-based splitting approach was applied to ensure training and test sets contain distinct chemical structures while maintaining the diversity of scaffolds.

**Distribution of the train/test in the chemical space**
![image](https://storage.googleapis.com/polaris-public/datasets/kinases/ret/figures/drewry_ret_wildtype_v1_umap_scaffold_split.png)

**For more details of this benchmark** -> [notebook](https://github.com/polaris-hub/polaris-recipes/blob/mvp/03_Kinases/RET/mutants/02_RET_mutants_benchmark.ipynb)

In [52]:
benchmark_readme = """

**Benchmarking goal:**
The goal of this benchmark is to select the best predictive model for 
- Optimization of the bioactivity % inhibition.
- Discovery of potential hits in new chemical space.

**Molecule data resource**: 
https://www.ncbi.nlm.nih.gov/pubmed/28767711

**Train/test split**
Therefore, a scaffold-based splitting approach was applied to ensure training and test sets contain distinct chemical structures while maintaining the diversity of scaffolds.

**Distribution of the train/test in the chemical space**
![image](https://storage.googleapis.com/polaris-public/datasets/kinases/ret/figures/drewry_ret_wildtype_v1_umap_scaffold_split.png)

**For more details of this benchmark** -> [notebook](https://github.com/polaris-hub/polaris-recipes/blob/mvp/03_Kinases/RET/mutants/02_RET_mutants_benchmark.ipynb)
"""

In [53]:
with dm.fs.fsspec.open(f"readme/benchmarks/{benchmark_name}_readme.md", "w") as f:
    f.write(benchmark_readme)
with dm.fs.fsspec.open(
    f"gs://polaris-public/benchmarks/readmes/{benchmark_name}_readme.md", "w"
) as f:
    f.write(benchmark_readme)

In [None]:
response = dataset.upload_to_hub(access="public")
response = benchmark_clf.upload_to_hub(access="public")
response = benchmark_reg.upload_to_hub(access="public")