# Zero-shot classification model

In this notebook we will load and use a zero shot classification language model. Learn about the Zero Shot Classification task <a href="https://huggingface.co/tasks/zero-shot-classification" target="_blank" rel="noopener">here</a>.

We will be running SQL queries using <a href="https://jupysql.ploomber.io/en/latest/quick-start.html" target="_blank" rel="noopener"> JupySQL</a> SQL Magic.

## Prerequisites

Prior to using this notebook one needs to complete the follow steps:
1. [Configure the sandbox](../sendbox_config.ipynb).
2. [Initialize the Transformer Extension](te_init.ipynb).

## Set up

In [23]:
#TODO: start using the secret store.

from collections import UserDict

class Secrets(UserDict):
    """This class mimics the Secret Store we will start using soon."""

    def save(self, key: str, value: str) -> "Secrets":
        self[key] = value
        return self

# For now just hardcode the configuration.
sb_config = Secrets({
    'EXTERNAL_HOST_NAME': '192.168.124.93',
    'HOST_PORT': '8888',
    'USER': 'sys',
    'PASSWORD': 'exasol',
    'BUCKETFS_PORT': '6666',
    'BUCKETFS_USER': 'w',
    'BUCKETFS_PASSWORD': 'write',
    'BUCKETFS_USE_HTTPS': 'False',
    'BUCKETFS_SERVICE': 'bfsdefault',
    'BUCKETFS_BUCKET': 'default',
    'SCRIPT_LANGUAGE_NAME': 'PYTHON3_60',
    'UDF_FLAVOR': 'python3-ds-EXASOL-6.0.0',
    'UDF_RELEASE': '20190116',
    'UDF_CLIENT': 'exaudfclient_py3',
    'SCHEMA': 'IDA',
    'TE_TOKEN': '',
    'TE_TOKEN_CONN': '',
    'TE_BFS_CONN': 'MyBFSConn',
    'TE_BFS_DIR': 'my_storage',
    'TE_MODELS_BFS_DIR': 'models',
    'TE_MODELS_CACHE_DIR': 'models_cache'
})

EXTERNAL_HOST = f"{sb_config.get('EXTERNAL_HOST_NAME')}:{sb_config.get('HOST_PORT')}"
WEBSOCKET_URL = f"exa+websocket://{sb_config.get('USER')}:{sb_config.get('PASSWORD')}" \
    f"@{EXTERNAL_HOST}/{sb_config.get('SCHEMA')}?SSLCertificate=SSL_VERIFY_NONE"

Let's bring up JupySQL and connect to the database via SQLAlchemy. Please refer to the documentation in the <a href="https://github.com/exasol/sqlalchemy-exasol" target="_blank" rel="noopener">sqlalchemy-exasol</a> for details on how to connect to the database using Exasol SQLAlchemy driver.

In [24]:
from sqlalchemy import create_engine

engine = create_engine(WEBSOCKET_URL)

%load_ext sql
%sql engine

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


## Get language model

To demonstrate the zero shot classification task we will use the [Cross-Encoder for Natural Language Inference](https://huggingface.co/cross-encoder/nli-deberta-base).

We need to load the model from the Huggingface hub into the BucketFS. This could potentially be a long process. Unfortunately we cannot tell exactly when it has finished. Notebook's hourglass may not be a reliable indicator. BucketFS will still be doing some work when the call issued by the notebook returns. Please wait for few moments after that, before querying the model.

In [25]:
# This is the name of the model at the Huggingface Hub
MODEL_NAME = 'cross-encoder/nli-deberta-base'

In [None]:
%run ./model_retrieval.ipynb
load_huggingface_model(MODEL_NAME, sb_config)

## Use language model

In [27]:
# Text to be classified.
MY_TEXT = """
A new model offers an explanation for how the Galilean satellites formed around the solar system’s largest world. 
Konstantin Batygin did not set out to solve one of the solar system’s most puzzling mysteries when he went for a
run up a hill in Nice, France. Dr. Batygin, a Caltech researcher, best known for his contributions to the search
for the solar system’s missing “Planet Nine,” spotted a beer bottle. At a steep, 20 degree grade, he wondered why
it wasn’t rolling down the hill. He realized there was a breeze at his back holding the bottle in place. Then he
had a thought that would only pop into the mind of a theoretical astrophysicist: “Oh! This is how Europa formed.”
Europa is one of Jupiter’s four large Galilean moons. And in a paper published Monday in the Astrophysical Journal,
Dr. Batygin and a co-author, Alessandro Morbidelli, a planetary scientist at the Côte d’Azur Observatory in France,
present a theory explaining how some moons form around gas giants like Jupiter and Saturn, suggesting that
millimeter-sized grains of hail produced during the solar system’s formation became trapped around these massive
worlds, taking shape one at a time into the potentially habitable moons we know today.
"""

# Make sure our texts can be used in an SQL statement.
MY_TEXT = MY_TEXT.replace("'", "''")

# Classes, not seen during model training.
MY_LABELS = 'space & cosmos, scientific discovery, microbiology, robots, archeology'

In [None]:
%%sql
WITH MODEL_OUTPUT AS
(
    SELECT TE_ZERO_SHOT_TEXT_CLASSIFICATION_UDF(
        NULL,
        '{{sb_config.get("TE_BFS_CONN")}}',
        '{{sb_config.get("TE_TOKEN_CONN")}}',
        '{{sb_config.get("TE_MODELS_BFS_DIR")}}',
        '{{MODEL_NAME}}',
        '{{MY_TEXT}}',
        '{{MY_LABELS}}'
    )
)
SELECT label, score, error_message FROM MODEL_OUTPUT ORDER BY SCORE DESC