In this notebook we will load and use a zero shot classification language model. Learn about the Zero Shot Classification task <a href="https://huggingface.co/tasks/zero-shot-classification" target="_blank" rel="noopener">here</a>.

We will be running SQL queries using <a href="https://jupysql.ploomber.io/en/latest/quick-start.html" target="_blank" rel="noopener"> JupySQL</a> SQL Magic.

Prior to using this notebook one needs to complete the follow steps:
1. [Create the database schema](../setup_db.ipynb).
2. [Initialize the Transformer Extension](te_init.ipynb).

In [26]:
# TODO: Move this to a separate configuration notebook. Here we just need to load this configuration from a store.
from dataclasses import dataclass

@dataclass
class SandboxConfig:
    EXTERNAL_HOST_NAME = "192.168.124.93"
    HOST_PORT = "8888"

    @property
    def EXTERNAL_HOST(self):
        return f"""{self.EXTERNAL_HOST_NAME}:{self.HOST_PORT}"""

    USER = "sys"
    PASSWORD = "exasol"
    BUCKETFS_PORT = "6666"
    BUCKETFS_USER = "w"
    BUCKETFS_PASSWORD = "write"
    BUCKETFS_USE_HTTPS = False
    BUCKETFS_SERVICE = "bfsdefault"
    BUCKETFS_BUCKET = "default"

    @property
    def EXTERNAL_BUCKETFS_HOST(self):
        return f"""{self.EXTERNAL_HOST_NAME}:{self.BUCKETFS_PORT}"""

    @property
    def BUCKETFS_URL_PREFIX(self):
        return "https://" if self.BUCKETFS_USE_HTTPS else "http://"

    @property
    def BUCKETFS_PATH(self):
        # Filesystem-Path to the read-only mounted BucketFS inside the running UDF Container
        return f"/buckets/{self.BUCKETFS_SERVICE}/{self.BUCKETFS_BUCKET}"

    SCRIPT_LANGUAGE_NAME = "PYTHON3_60"
    UDF_FLAVOR = "python3-ds-EXASOL-6.0.0"
    UDF_RELEASE= "20190116"
    UDF_CLIENT = "exaudfclient" # or for newer versions of the flavor exaudfclient_py3
    SCHEMA = "IDA"

    @property
    def SCRIPT_LANGUAGES(self):
        return f"""{self.SCRIPT_LANGUAGE_NAME}=localzmq+protobuf:///{self.BUCKETFS_SERVICE}/
            {self.BUCKETFS_BUCKET}/{self.UDF_FLAVOR}?lang=python#buckets/{self.BUCKETFS_SERVICE}/
            {self.BUCKETFS_BUCKET}/{self.UDF_FLAVOR}/exaudf/{self.UDF_CLIENT}""";

    @property
    def connection_params(self):
        return {"dns": self.EXTERNAL_HOST, "user": self.USER, "password": self.PASSWORD, "compression": True}

    @property
    def params(self):
        return {
            "script_languages": self.SCRIPT_LANGUAGES,
            "script_language_name": self.SCRIPT_LANGUAGE_NAME,
            "schema": self.SCHEMA,
            "BUCKETFS_PORT": self.BUCKETFS_PORT,
            "BUCKETFS_USER": self.BUCKETFS_USER,
            "BUCKETFS_PASSWORD": self.BUCKETFS_PASSWORD,
            "BUCKETFS_USE_HTTPS": self.BUCKETFS_USE_HTTPS,
            "BUCKETFS_BUCKET": self.BUCKETFS_BUCKET,
            "BUCKETFS_PATH": self.BUCKETFS_PATH
        }

    # Name of the BucketFS connection
    BFS_CONN = 'MyBFSConn'

    # Name of a sub-directory of the bucket root
    BFS_DIR = 'my_storage'

    # We will store all models in this sub-directory at BucketFS
    TE_MODELS_DIR = 'models'
    
    # We will save cached model in this sub-directory relative to the current directory on the local machine.
    TE_MODELS_CACHE_DIR = 'models_cache'

conf = SandboxConfig()

First let's bring up the JupySQL and connect to the database via the SQLAlchemy. Please refer to the documentation in the <a href="https://github.com/exasol/sqlalchemy-exasol" target="_blank" rel="noopener">sqlalchemy-exasol</a> for details on how to connect to the database using Exasol SQLAlchemy driver.

In [None]:
from sqlalchemy import create_engine

engine = create_engine(conf.WEBSOCKET_URL)

%load_ext sql
%sql engine

Now we will download a model from the Huggingface Hub and put into the BucketFS.

There are two ways of doing this.
1. Using the `TE_MODEL_DOWNLOADER_UDF` UDF.
2. Downloading a model to a local drive and subsequently uploading in into the BucketFS using a CLI.

The first method requires the database machine to have internet access. Here we assume this condition is met. Otherwise please refer to another notebook where the second method is demonstrated.

To demonstrate the zero shot classification task we will use the [DistilBERT base model](https://huggingface.co/typeform/distilbert-base-uncased-mnli).

This is a public model, therefore the last parameter - the name of the Huggingface token connection - can be an empty string.

Please note that loading a model, especially a big one, may take considerable time. At the time of writing we do not have any means to check the completion of this process. Notebook's hourglass may not be a reliable indicator. BucketFS will still be doing some work when the call issued by the notebook returns. Please wait for few moments after that, before querying the model.

In [28]:
# This is the name of the model at the Huggingface Hub
MODEL_NAME = 'typeform/distilbert-base-uncased-mnli'

In [None]:
%%sql
SELECT TE_MODEL_DOWNLOADER_UDF(
    '{{MODEL_NAME}}',
    '{{conf.TE_MODELS_DIR}}',
    '{{conf.BFS_CONN}}',
    ''
)

In [30]:
# Text to be classified.
MY_TEXT = """
A new model offers an explanation for how the Galilean satellites formed around the solar system’s largest world. 
Konstantin Batygin did not set out to solve one of the solar system’s most puzzling mysteries when he went for a
run up a hill in Nice, France. Dr. Batygin, a Caltech researcher, best known for his contributions to the search
for the solar system’s missing “Planet Nine,” spotted a beer bottle. At a steep, 20 degree grade, he wondered why
it wasn’t rolling down the hill. He realized there was a breeze at his back holding the bottle in place. Then he
had a thought that would only pop into the mind of a theoretical astrophysicist: “Oh! This is how Europa formed.”
Europa is one of Jupiter’s four large Galilean moons. And in a paper published Monday in the Astrophysical Journal,
Dr. Batygin and a co-author, Alessandro Morbidelli, a planetary scientist at the Côte d’Azur Observatory in France,
present a theory explaining how some moons form around gas giants like Jupiter and Saturn, suggesting that
millimeter-sized grains of hail produced during the solar system’s formation became trapped around these massive
worlds, taking shape one at a time into the potentially habitable moons we know today.
"""

# Make sure our texts can be used in an SQL statement.
MY_TEXT = MY_TEXT.replace("'", "''")

# Classes, not seen during model training.
MY_LABELS = 'space & cosmos, scientific discovery, microbiology, robots, archeology'

In [None]:
%%sql
WITH MODEL_OUTPUT AS
(
    SELECT TE_ZERO_SHOT_TEXT_CLASSIFICATION_UDF(
        NULL,
        '{{conf.BFS_CONN}}',
        NULL,
        '{{conf.TE_MODELS_DIR}}',
        '{{MODEL_NAME}}',
        '{{MY_TEXT}}',
        '{{MY_LABELS}}'
    )
)
SELECT label, score, error_message FROM MODEL_OUTPUT ORDER BY SCORE DESC