# Generative text model

In this notebook, we will load and use a generative language model that can produce a continuation for a given text. Learn more about the Text Generation task <a href="https://huggingface.co/tasks/text-generation" target="_blank" rel="noopener">here</a>. Please also refer to the Transformer Extension <a href="https://github.com/exasol/transformers-extension/blob/main/doc/user_guide/user_guide.md" target="_blank" rel="noopener">User Guide</a> to find more information about the UDF used in this notebook.

To execute queries and load data from Exasol database we will be using the <a href="https://github.com/exasol/pyexasol" target="_blank" rel="noopener">`pyexasol`</a> module.

## Prerequisites

Prior to using this notebook the following steps need to be completed:
1. [Configure the sandbox](../sandbox_config.ipynb).
2. [Initialize the Transformer Extension](te_init.ipynb).

## Set up

### Access configuration

In [None]:
%run ../utils/access_store_ui.ipynb
display(get_access_store_ui('../'))

In [None]:
EXTERNAL_HOST = f"{sb_config.EXTERNAL_HOST_NAME}:{sb_config.HOST_PORT}"

WEBSOCKET_URL = f"exa+websocket://{sb_config.USER}:{sb_config.PASSWORD}" \
    f"@{EXTERNAL_HOST}/{sb_config.SCHEMA}?SSLCertificate=SSL_VERIFY_NONE"

## Get language model

To demonstrate the text generation task we will use [Open Pretrained Transformers (OPT)](https://huggingface.co/facebook/opt-125m), a decoder-only pre-trained transformer from Facebook.

We need to load the model from the Huggingface hub into the BucketFS. This could potentially be a long process. Unfortunately, we cannot tell exactly when it has finished. The notebook's hourglass may not be a reliable indicator. BucketFS will still be doing some work when the call issued by the notebook returns. Please wait for a few moments after that, before querying the model.

In [None]:
# This is the name of the model at the Huggingface Hub
MODEL_NAME = 'facebook/opt-125m'

In [None]:
%run utils/model_retrieval.ipynb
load_huggingface_model(MODEL_NAME, sb_config)

## Use language model

Let's put the start of our conversation in a variable.

In [None]:
MY_TEXT = 'The bar-headed goose can fly at much'

# Make sure our texts can be used in an SQL statement.
MY_TEXT = MY_TEXT.replace("'", "''")

In [None]:
# Let's put a limit on the length of text the model can generate in one call.
# The limit is specified in the number of characters.
MAX_LENGTH = 30

In [None]:
import pyexasol

We will be updating this variable at every call to the model.
Please run the next cell multiple times to see how the text evolves.

In [None]:
sql = f"""
SELECT {sb_config.SCHEMA}.TE_TEXT_GENERATION_UDF(
    NULL,
    '{sb_config.TE_BFS_CONN}',
    '{sb_config.TE_TOKEN_CONN}',
    '{sb_config.TE_MODELS_BFS_DIR}',
    '{MODEL_NAME}',
    '{MY_TEXT}',
    {MAX_LENGTH},
    True
)
"""

with pyexasol.connect(dsn=EXTERNAL_HOST, user=sb_config.USER, password=sb_config.PASSWORD, compression=True) as conn:
    result = conn.export_to_pandas(query_or_table=sql).squeeze()
    MY_TEXT = result['GENERATED_TEXT']
    # The error can be observed at result['ERROR_MESSAGE']

print(MY_TEXT)
MY_TEXT = MY_TEXT.replace("'", "''")