# Querying the public gReLU model zoo on Weights and Biases (wandb)

This tutorial shows how to programmatically query our public model zoo and download models and datasets. You can also visit the model zoo in your browser at https://wandb.ai/grelu/. 

## Rules

- wandb projects are the main storage units for datasets and the models trained on them. The main idea is to always keep the links between the raw dataset, the preprocessed dataset and the models trained on them for reproducibility, documentation and sanity reasons.
  
- The ideal wandb lineage is shown below. This lineage allows us to query project-model-dataset links via the API.

- Each project contains a notebook describing the details of data preprocessing, model training and model testing (e.g. performance metrics on holdout data). For models trained by us, the training logs are also available and can be seen by visiting the model zoo website. 

![image.png](lineage.png)

In [1]:
import os
import anndata
import grelu.resources

  from .autonotebook import tqdm as notebook_tqdm
  TF_GAMMAS = torch.load(str(DIR / "precomputed"/ "tf_gammas.pt"))


## List all available projects in the zoo

The `grelu.resources` module contains functions for interacting with the model zoo. First, we can list all available projects in the zoo:

In [2]:
grelu.resources.projects()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


[34m[1mwandb[0m: Currently logged in as: [33msarosavo[0m. Use [1m`wandb login --relogin`[0m to force relogin


['GM12878_dnase',
 'demo',
 'human-mpra-agrawal-2023',
 'binary_atac_cell_lines',
 'model-zoo-test',
 'alzheimers-variant-tutorial',
 'microglia-scatac-tutorial',
 'human-chromhmm-fullstack',
 'human-atac-catlas',
 'borzoi',
 'corces-microglia-scatac',
 'yeast-gpra',
 'enformer']

We choose the 'human-atac-catlas' project to interact with.

## List all datasets and models in a project

In [3]:
project_name = 'binary_atac_cell_lines'

Individual objects such as datasets and models are stored as 'artifacts' under each project. Artifacts can be of different types, but the ones that we are generally interested in are "dataset" (the preprocessed dataset) and "model" (the trained model). We can search for these under the project of interest:

In [5]:
grelu.resources.artifacts(project_name, type_is="model")

['model']

This tells us that there is an artifact called "model" which is of the "model" type.

## Download a dataset

Let us now select the "dataset" artifact.

## One-step downloading and loading a model

In [6]:
model = grelu.resources.load_model(
    project=project_name,
    model_name='model'
) # that's it!


[34m[1mwandb[0m: Downloading large artifact model:latest, 272.79MB. 1 files... 
[34m[1mwandb[0m:   1 of 1 files downloaded.  
Done. 0:0:5.1
[34m[1mwandb[0m: Downloading large artifact human_state_dict:latest, 939.29MB. 1 files... 
[34m[1mwandb[0m:   1 of 1 files downloaded.  
Done. 0:0:0.9
  state_dict = torch.load(Path(d) / "human.h5")


In [10]:
model.data_params.keys()

model.data_params['train_seq_len']
model.data_params['tasks']

{'name': ['GM12878', 'HepG2', 'IMR90', 'Jurkat', 'K562', 'SKNSH', 'WTC11']}