# How to use `scivision`

In this notebook, we will:

1. Demonstrate using the scivision [Python API](https://scivision.readthedocs.io/en/latest/api.html) to load several pretrained image classification models
2. Use the scivision catalog to find a matching dataset, which the model can be run on
3. Run the model on the data, performing simple model inference
4. Use the scivision catalog to find another model that can be run on the same dataset

First let's import some things from scivision: `default_catalog` is a scivision **catalog** that will let us discover models and datasets, and `load_pretrained_model` provides a convenient way to load and run a model.

In [None]:
from scivision import default_catalog, load_pretrained_model

## Inspecting our model in the scivision catalog

A scivision catalog is a collection of **models** and **datasources**.

For this example, we want to find datasources compatible with the model catalog entry "image-classifiers".  But first, let's use the catalog to retrive the "image-classifiers" repository url and take a look at the data contained in the *default catalog* (the built-in catalog, distributed as part of scivision) and see how this is structured.

In [None]:
# Get the model repo url
models_catalog = default_catalog.models.to_dataframe()
model_repo = models_catalog[models_catalog.name == "image-classifiers"].url.item()
model_repo # Why not paste the repo link into your browser and see how it looks?

In [None]:
# Inspecting model entry and its metadata in the default catalog
models_catalog[models_catalog.name == "image-classifiers"]

## Loading the model

Some model entries in the scivision catalog contain a single loadable model. Here, let's load the "image-classifiers" entry, which contains each of the models in the [image-classifiers](https://pypi.org/project/image-classifiers/) package. We can do this with the `load_pretrained_model` function. We'll load 2 models (`resnet18` and `densenet121`) using the `model_selection` arg (this arg is not mandatory and will have a default value for any model catalog entries that have more than one model - for model catalog entries with a single model, this need not be set):

**Note: if you see an error message in the code cell below, read on!**

In [None]:
# Load the resnet model from the scivision_classifier package:
resnet_model = load_pretrained_model(model_repo, model_selection='resnet18')

Ok, that probably didn't work the first time you ran it. Did you get an error message that looks like the below? You need to install the python package containing the model before you can use it. Paste the suggested install code in another cell and run it. You can then delete that cell and then try restarting the Jupyter kernel and loading the resnet model again (run all the code cells above).

Error message:
```
Exception: Package does not exist. Try installing it with: 
`!pip install -e git+https://github.com/alan-turing-institute/scivision_classifier@main#egg=scivision_classifier`
```

In [None]:
# Now load the densenet model from the scivision_classifier package:
densenet_model = load_pretrained_model(model_repo, model_selection='densenet121')

In [None]:
# let's explore one of the model objects
resnet_model

Later, we'll use these models to make predictions on image data found in the scivision catalog.

## Query the default scivision data catalog

Now let's use the `default_catalog` to identify datasources in the catalog that are compatible with our models (based on `tasks`, `format` and `labels_provided`/`labels_required`).

In [None]:
compatible_datasources = default_catalog.compatible_datasources("image-classifiers").to_dataframe()
compatible_datasources

Let's use `data-003`, an image dataset containing a single image of a Koala.

In [None]:
target_datasource = compatible_datasources.loc[compatible_datasources['name'] == 'data-003']
target_datasource

## Load the dataset

Now let's load the dataset using the scivision python API, specifically the [load_dataset](https://scivision.readthedocs.io/en/latest/api.html#scivision.io.reader.load_dataset) function, which takes as input the url of the data repository (structured as per [this template](https://scivision.readthedocs.io/en/latest/data_repository_template.html)), which we can get from the target datasource:

In [None]:
from scivision import load_dataset

In [None]:
data_url = target_datasource['url'].item()

The returned data config object of the`load_dataset` function is an "intake catalog". You can read our [documentation](https://scivision.readthedocs.io/en/latest/data_repository_template.html#data-config-file) to understand this better, but for now, let's inspect this config:

In [None]:
data_config = load_dataset(data_url)
data_config

In [None]:
list(data_config)

Clicking the `path` link to the location of this data config file online (in the dataset repo) reveals that there is one data source called `test_image`, and that the `intake_xarray.image.ImageSource` "intake driver" is being used. We can retrive the test image data in an image format which the model will accept, like so:

In [None]:
data_config['test_image']()

In [None]:
test_image = data_config.test_image().to_dask() # The xarray.DataArray is one format accepted by the our models
test_image

Let's take a look at the image with `matplotlib`:

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.imshow(test_image)

## Model predictions

Now let's use the loaded model on the test image data we found in the via catalog.

In [None]:
resnet_model.predict(test_image)

In [None]:
densenet_model.predict(test_image)

As you can see, the models have given predictions for the test image, with a confidence score. Check out the code in the [model repo](https://github.com/alan-turing-institute/scivision_classifier) to see how this was determined!

## Query the default scivision model catalog

Using our test Koala image dataset, let's search the scivision default catalog for other models that can be used with it:

In [None]:
compatible_models = default_catalog.compatible_models("data-003").to_dataframe()
compatible_models

The `huggingface-classifiers` catalog entry can be used to load some of the most popular image classification models from [Hugging Face](https://huggingface.co/models?pipeline_tag=image-classification&sort=downloads). See the list of included models in the [model repo](https://github.com/alan-turing-institute/scivision_huggingface). As before, let's load one of the named models and run it on our test image:

In [None]:
huggingface_repo = models_catalog[models_catalog.name == "huggingface-classifiers"].url.item()
microsoft_model = load_pretrained_model(huggingface_repo, model_selection='microsoft_swin_tiny_patch4_window7_224', allow_install=True)

In [None]:
microsoft_model.predict(test_image)