# Text AI Preprocessing


Here we will demonstrate how the Text AI can be used to build a data-preprocessing workflow. We will be taking a dataset of customer support tickets. This dataset contains unstructured data in the form of ticket descriptions. We will sort these tickets into "urgent" and "not urgent" cases, and find important named entities and keywords within the text. This will be archived using a Text AI Workflow.

We will also demonstrate the Text AI ability to determine if data was already processed, and skip it if applicable.

## Prerequisites

Prior to using this notebook one needs to complete the following steps:

1. [Load the Customer Support Ticket Dataset](../data/data_customer_support.ipynb)
2. [Initialize the Text AI Extension](txaie_init.ipynb)


## Natural Language Processing Introduction

This section contains a short introduction to Natural Language Processing, and the precesses we will use in this notebook.

### NLP

Natural Language Processing, or the processing of so-called "unstructured data" or "free text", is the processing(i.e. classification, retrieving of information) of unannotated language.

There are tasks in Natural Language Processing (NLP) which seem easy to us humans, but are very hard for a machine to do. For example, inferring the opinion the speaker has about a topic (Opinion Extraction/Mining). Doing these tasks on un-annotated text is even harder. Therefore, multiple ways to annotate a natural language text with additional information were developed. These annotated texts are then better suited for higher-level NLP tasks.
                                                                                                                                                   
Depending on the amount of data/text which should be processed, annotating by hand is mostly not an option these days, since with increasing dataset sizes the resources needed quickly become unrealistic. Therefore, Exasol Text AI provides you with tools you can use for annotating your data in various ways.
                                                                                                                     
In this Notebook, we will show you our three default preprocessing workflow steps. Of course, it is possible for you to define your own workflow later on.
Let's explain these three steps before we dive into how to run the preprocessing.
                                                                                                                     
### Topic Classification
                                                                                                                     
Topic Classification is the task of assigning topics to text/documents/datapoints. In Topic Classification, a given set of topics is used, and each data point is assigned a relevance score and rank for each of the topics. These relevance scores and ranks can then be used to select the best matching topic for each data point.
Given that a document is about a particular topic, particular words are expected to appear in the document more or less frequently. However, it is not required for the exact words to describe the topic to be found in a text. This means that topics can be inferred, even if their name/description/topic synonyms are not found in the data.

![diagramm a document text added topics](images/topics.drawio.png)
Topic Classification works with a given set of these topics as input and assigns to each a relevance score that the text being about this topic. It is usually trained using supervised learning. It can also be used with Zero-Shot Classification models, which can assign classes/topics which have not been seen during the training. This is opposed to other approaches like topic extraction, which is often unsupervised and does not need a list of topics as input, instead extracting them from the data itself.
                                                                                                                     
### Keyword Search
                                                                               
Keyword Search is about identifying the most relevant words or phrases(Keywords/Keyphrases) from a given text.
These can then help in further steps, e.g. summarizing the content of texts and recognizing the main topics discussed.
Keywords or phrases need be present in the text.
For Example:
![diagramm a document text with highlighted keywords](images/keywords.drawio.png)


### Named Entity Recognition

Named entity recognition (NER) is about locating and classify so called "named entities" mentioned in a text document. Depending on the model, entities are e.g. person names, organizations, locations, or vehicles etc., so "things that have names". The model seeks out those entities, returning their positions in the document, as well as their class.

### Example Result of 3 Steps

Let's look at an example of what the output for these three steps might look like combined. For a given document, consider the document content to be "I'm having an issue with the GoPro Hero. It's affecting my productivity.". We may use a topic classifier with the input topic set of "urgent, not urgent" for inferring urgency from ticket content. The NER and Keyword Search do not need additional input, they just work with the document itself. Then the output of a preprocessing pipeline containing all three steps could look something like this:

![diagramm showing document text with found entity and keyword and topic](images/document_annotated.drawio.png)



You can verify if the dataset is already available with the query below. It should return `3606`.

## General Setup

As a first step, we need to get access to the Ai-Lab secret store:

In [None]:
%run ../utils/access_store_ui.ipynb
display(get_access_store_ui('../'))

Then we can get the activation SQL for our previously installed Script Language Containers. This will be used to activate those SLCs in order to use their UDFs.

We also want to import some of the Python functions of the Text AI and Notebook Connector modules.

In [15]:
from exasol.nb_connector.connections import open_pyexasol_connection
from exasol.nb_connector.text_ai_extension_wrapper import Extraction
from exasol.ai.text.extraction.abstract_extraction import Defaults, Output
from exasol.ai.text.extractors.standard_extractor import StandardExtractor
from exasol.ai.text.extractors.extractor import PipelineExtractor
from exasol.ai.text.extractors.source_table_extractor import SourceTableExtractor, SchemaSource, TableSource, NameSelector

The next call will make it possible to run SQL directly in this notebook, in order to easier display the results of our preprocessing. The one below sets the maximum number of columns our SQL statements can display in the notebook.

In [None]:
%run ../utils/jupysql_init.ipynb

In [6]:
%config SqlMagic.displaylimit = 20

## Create an Example Data Source

We will be using a dataset which holds information on customer support tickets. For loading the dataset, please refer to the [prerequisites](#prerequisites). We will split this data into 2 sets, in order to demonstrate how the preprocessing tasks handle new data being added to a data source.

You can verify if the dataset is already available with the query below. It should return `3606`.

In [None]:
text_column="TICKET_DESCRIPTION"
key_column="TICKET_ID"
table="CUSTOMER_SUPPORT_TICKETS"
schema=ai_lab_config.db_schema

In [None]:
%%sql
SELECT COUNT(*) FROM "{{schema}}"."{{table}}";

### Create a View on the Data

This dataset has ~3600 entries. You could run the preprocessing for the whole dataset, but it would take quite some time. Instead, we will create a view containing only a part of the dataset, and use this view as the base for our preprocessing.
We set the size of this view here. If you want to see how the AI-Lab handles bigger datasets on your Exasol instance, you can set the `view_size` higher.

In [None]:
view="TICKETS_SAMPLE"
view_size = 10

In [48]:
%%sql
CREATE OR REPLACE VIEW "{{schema}}"."{{view}}" AS 
SELECT * FROM "{{schema}}"."{{table}}" 
ORDER BY "TICKET_ID" 
LIMIT {{view_size}};

Lets check the size of our created view:

In [None]:
%%sql
SELECT COUNT(*) FROM "{{schema}}"."{{view}}";

As you can see, we now have only our defined 10 data points in the view.

Let's now see what our data contains:

In [None]:
%%sql
DESC "{{schema}}"."{{view}}"

We can see a ticket ID column, as well as some columns containing information about the customer, like name. There are columns containing some metadata for the ticket itself, such as the purchase date, ticket status and ticket channel. The ticket description contains the actual text of the ticket.

In [None]:
%%sql
SELECT 
    TICKET_ID,
    CUSTOMER_NAME,
    DATE_OF_PURCHASE,
    TICKET_SUBJECT, 
    TICKET_DESCRIPTION,
    TICKET_STATUS,
    TICKET_CHANNEL
FROM "{{schema}}"."{{view}}"
LIMIT 5

### Cleaning up Results of previous Runs of the Notebook

To make sure the tables the preprocessing will use don't already exist, for example from a previous run of this notebook, we are going to drop them.
First, we define a list of tables to drop:

In [None]:
# A list of tables which the steps below create automatically. If you run the notebook multiple times they need to be dropped in between.
table_list = [
    "TXAIE_AUDIT_LOG",
    "DOCUMENTS",
    f"DOCUMENTS_{schema}_TICKETS_SAMPLE",
    "NAMED_ENTITY",
    "NAMED_ENTITY_LOOKUP_ENTITY_TYPE",
    "NAMED_ENTITY_LOOKUP_SETUP",
    "KEYWORD_SEARCH",
    "KEYWORD_SEARCH_LOOKUP_KEYWORD",
    "KEYWORD_SEARCH_LOOKUP_SETUP",
    "TOPIC_CLASSIFIER",
    "TOPIC_CLASSIFIER_LOOKUP_TOPIC",
    "TOPIC_CLASSIFIER_LOOKUP_SETUP"
]


If you are curious about which tables are generated and how they look, you can find that information in the [Results](#results) section below.
Next, define a function which drops these tables. Then we call the function.

**Note:** If you run into technical issues during the running of this notebook, you might want to run the `delete_text_ai_preprocessing_tables` function again, in order to re-run the workflow from scratch. This will ensure all data gets processed again.

In [None]:
def delete_text_ai_preprocessing_tables():
    with open_pyexasol_connection(ai_lab_config, compression=True) as conn:
        for drop_table in table_list:
            conn.execute(f"""DROP TABLE IF EXISTS "{schema}"."{drop_table}" """)

In [None]:
delete_text_ai_preprocessing_tables()

## Download NLP Models

We will use multiple different open source Hugging Face Transformers models to run our preprocessing with. 
* For Named Entity Extraction: [guishe/nuner-v2_fewnerd_fine_super](https://huggingface.co/guishe/nuner-v2_fewnerd_fine_super)
* For Topic Classification: [tasksource/ModernBERT-base-nli](https://huggingface.co/tasksource/ModernBERT-base-nli)
* For Keyword Extraction: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)

These models were already installed during the [intialization](txaie_init.ipynb) of Text AI.

You can download your own models as follows:

```python
from exasol.nb_connector.text_ai_extension_wrapper 
import install_model, TransformerModel

install_model(conf,TransformerModel(
    "your keyword search model", 
    "feature-extraction", AutoModel))
install_model(conf,TransformerModel(
    "your named entity model", 
    "token-classification", AutoModelForTokenClassification))
install_model(conf,TransformerModel(
    "your zero-shot classification model",
    "zero-shot-classification", AutoModelForSequenceClassification))
```

## Configure the Text AI Workflow

The Text-AI-Extension allows you to configure a data processing workflow. In this Notebook we will be using a basic example where the workflow is defined by a so called `StandardExtractor`.                                                                                                                    
#### Configure Default Values

Here, we will configure how our workflow should run. In general, each NLP extractor has its own configuration parameters. The `Defaults` object is a helper object allowing us to set these parameters once and apply these settings to all extractors.

How you need to set these defaults will depend on your Database. This demonstration should work on a rather small Docker-DB. Therefore, we set the `batch_size` to only 10, so only 10 rows will be processed at once in each UDF instance, and also our `parallelism_per_node` is set to the low value of 1. `parallelism_per_node` determines how many parallel UDF instances are run on each node of your database. If you have a bigger Database to run this Notebook on, you can try to increase both values.
The model repository is a data object pointing to the location of the model files we during the extension initailization or the previous step.


In [11]:
defaults = Defaults(
    parallelism_per_node=1,
    batch_size=10
)

### Define the Extractor

Now we need to define an extractor to run our extraction/preprocessing. We will use a `StandardExtractor` which has 3 standard preprocessing steps built-in, namely the topic classification, keyword search and named entity recognition. It is possible to disable each of these steps in the `StandartExtractor` by setting its model to `None`. You can also use a different model instead of the built-in one, by setting its model to a specific HuggingFace model. But here we will use the `StandartExtractor` as is.

For the topic classification model we will use the topics "urgent", and "not urgent".

In [56]:
topics={"urgent", "not urgent"}

std_extractor =  StandardExtractor(
                        # If you want to disable a step, set it to None:
                        # named_entity_recognition_model = None,
                        # topic_classification_model = None,
                        
                        # If you want to use a different(not default) model, set its name:
                        # keyword_search_model = HuggingFaceModel(name="MY_KEYWORD_SEARCH_MODEL"),
                        topics=topics
                    )

We will also need a `SourceTableExtractor`, which holds information on which data we want to use as a source for our preprocessing, and feed it to the `StandardExtractor`.
We give it our schema and view as a data source, and tell it to run the preprocessing on the column `TICKET_DESCRIPTION`, since that is where the Natural Text part of our data is. We also tell it to use the `TICKET_ID` column as an id/key.

In [57]:
text_column="TICKET_DESCRIPTION"
key_column="TICKET_ID"

src_extractor = SourceTableExtractor(
                        name='DOCUMENTS',
                        sources=[
                            SchemaSource(
                                db_schema=NameSelector(pattern=schema),
                                tables=[
                                    TableSource(
                                        table=NameSelector(pattern=view),
                                        columns=[NameSelector(pattern=text_column)],
                                        keys=[NameSelector(pattern=key_column)]
                                    )
                                ]
                            )
                        ]
                    )

Now, we can give these two extractors as steps to a `PipelineExtractor`, which will build a pipeline out of them which will execute them after each other and feed the output of the first step into the second step:

In [58]:
p_extractor = PipelineExtractor(
                steps=[
                    src_extractor,
                    std_extractor
                ]
            )

Next, we will wrap our `PipelineExtractor* in an `Extraction`. This will allow us to configure where the output should be stored, which defaults to use and to run the extractor.

We feed it our `PipelineExtractor` as the extractor, tell it to put the `Output` into our schema, and also give it our `Defaults`.

In [61]:
extraction = Extraction(extractor=p_extractor,
                        output=Output(db_schema=schema),
                        defaults=defaults)

Then the only step left is to define a convenience function which calls our preprocessing, and then run it in the next section.

In [62]:
def run_text_ai_preprocessing():
    extraction.run(ai_lab_config)

## Run the Preprocessing

Time to run our preprocessing. First, let's verify how many entries our view has:

In [None]:
%%sql
SELECT COUNT(ALL TICKET_ID) FROM "{{schema}}"."{{view}}";

Then we call our preprocessing function. This will use our view as input, and produce new tables and views using the models we installed into the Exasol. 

Also, take note of the time this operation takes on your setup.

In [None]:
%%time
run_text_ai_preprocessing()

**Note**: If the previous operation fails with an error indicating a lost connection, please increase the size of your database and try again. The models are each around 1-2 GB in size and also need that much main memory on each node of your Database.

## Results

Now, we will take a look at some of the tables and views our preprocessing has created for us. 
First, let's look at the tables created by our preprocessing:


In [None]:
%%sql
SELECT TABLE_SCHEMA, TABLE_NAME FROM EXA_ALL_TABLES WHERE TABLE_SCHEMA='{{schema}}'

As you can see, there are a number of new tables related to our preprocessing. There is our original data table `CUSTOMER_SUPPORT_TICKETS`, and a new log table `TXAIE_AUDIT_LOG` which we will take a closer look at below. The `DOCUMENTS` table contains our input texts together with an identifying Span, we will take a look at that as well.
 There is also a `DOCUMENTS_AI_LAB_TICKETS_SAMPLE` table, which contains IDs of the input text and documents, as well as the name of the column the input text originated from.
This enables you to trace back documents(and their associated results) to the original input data point.

And then there are 3 tables per step of our preprocessing. A main output table named after the preprocessing step, and some support tables. Multiple tables are required because the output content is usually normalized.
 The support tables are lookup tables and have names formatted like `<main_table_name>_LOOKUP_<normalized_column_name>`. We won't look at them in detail, but if you are curious, feel free to look at the contents of these tables on your own.
 Later we will discuss the contents of VIEWs build on top of these tables, which present them in a de-normalised form.

If we want to find out how these new tables are structured, we can get a description from the Exasol Database. For example, let's see how the resulting `DOCUMENTS` table looks like.

### DOCUMENTS Table


In [None]:
%%sql
DESC "{{schema}}".DOCUMENTS

It looks like this table contains a `TEXT_DOC_ID`, `TEXT_CHAR_BEGIN`, `TEXT_CHAR_END` and a `TEXT` column.
The `TEXT` column includes the text of the document.
`TEXT_DOC_ID` is an ID assigned to each document.
 `TEXT_CHAR_BEGIN` and `TEXT_CHAR_END` indicate which parts of the original document each specific row contains. This triplet of `TEXT_DOC_ID`, `TEXT_CHAR_BEGIN` and `TEXT_CHAR_END` is called a "Span", and together builds an identifier for a section of text. You will encounter them for a lot of text-subsections. For example, found keywords contained in a text are also identified by a span in our result tables (see below).
                                                                                                                                                        
The usage of these Spans allows you to do various operations on top of these results, such as joining results on the document-id, or checking the order in which keywords appear in a document.

We can also check the number of unique TEXT_DOC_IDs in our table:



In [None]:
%%sql
SELECT COUNT(ALL text_doc_id) FROM "{{schema}}".DOCUMENTS;

It's identical to the number of rows in our input view. So all the data was converted successfully.

Now, let's look at what the content of our table looks like:

In [None]:
%%sql
SELECT * FROM "{{schema}}".DOCUMENTS WHERE TEXT_DOC_ID < 5

## Resulting Views

There are also some new views:

In [None]:
%%sql
SELECT VIEW_SCHEMA, VIEW_NAME FROM EXA_ALL_VIEWS

Text AI stores the results of the three preprocessing steps in tables, which contain the data in normalized form. So, for instance, instead of the topic name you will see a number in the table. The names are collected in a supporting table, named something like XYZ_LOOKUP.
For easier usage, Text AI creates views which join the respective tables together to provide human-readable information. Such a view is created for each of the preprocessing steps.

The `DOCUMENTS_AI_LAB_TICKETS_SAMPLE_VIEW` is a view on top of our input data, with the addition of the span identifier(`TEXT_DOC_ID`, `TEXT_CHAR_BEGIN`, `TEXT_CHAR_END`) for the text column of each row. This can be used to join the original data with the preprocessing results.

![A diagramm showing multiple Table names with their respective columns. Starting at "CUSTOMER_SUPPORT_TICKETS" folowed by "TICKETS_SAMPLE", then flowing to "DOCUMENTS" and "DOCUMENTS_AI_LAB_TICKETS_SAMPLE_VIEW". The columns containing the text document span are highlighted.](images/data_model_1.drawio.png)

Let's take a closer look at the results of the topic classification step in our preprocessing now. These can be found in the view `TOPIC_CLASSIFIER_VIEW`.

### Topic Classifier View


In [None]:
%%sql
DESC "{{schema}}".TOPIC_CLASSIFIER_VIEW

This view contains the cross product of the input text and the topics. For each text-topic pair, it provides the computed `TOPIC_SCORE` between the text and the topic. The `TOPIC_SCORE` approximates a normalized relevance of the text with the topic.

The value in the `TOPIC_RANK` column ranks the topics for each source document by their `TOPIC_SCORE` value. For our example, we had only two topics, so each document was assigned each of the topics, with different scores. The one with the higher score for a given document will have rank 1, the one with the lower score will have rank 2.

There is also a column for error messages encountered during classification, as well as a `SETUP` column documenting which setup(i.e. model, model-settings) where used to obtain this result.

As you remember, we wanted to use the classifier to differentiate our user tickets into "urgent" issues and "non-urgent" issues. So those are the topics we expect to see in the results. Let's check how these results look:

In [None]:
%%sql
SELECT * FROM "{{schema}}".TOPIC_CLASSIFIER_VIEW LIMIT 5

Next, we look at the identified named entities for our input documents. These can be found in the `NAMED_ENTITY_VIEW`.
### Named Entity View:


In [None]:
%%sql
DESC "{{schema}}".NAMED_ENTITY_VIEW

Similar to the `TOPIC_CLASSIFIER_VIEW`, the `NAMED_ENTITY_VIEW` also has the Span(`TEXT_DOC_ID`, `TEXT_CHAR_BEGIN`, `TEXT_CHAR_END`) identifying the input document the entity was found in. Then there are the found named entities in the `ENTITY` column, as well as an `ENTITY_TYPE` and an `ENTITY_SCORE`. The `ENTITY_TYPE` and `ENTITY_SCORE` are assigned to the entity by the model. Additionally, we also have an identifying span for the entity itself: `ENTITY_DOC_ID`, `ENTITY_CHAR_BEGIN`, `ENTITY_CHAR_END`. This span represents exactly where in our input data this entity was found.

![a text with an id number. the text containings the named entity subtext "GoPro Hero". from the id, subtext begin and subtext end arrows are pointing to the id,begin,end of the entity span.](images/entity_span.drawio.png)

Since the named entity was found in the text identified by `TEXT_DOC_ID, TEXT_CHAR_BEGIN, TEXT_CHAR_END`, it follows that `TEXT_DOC_ID`=`ENTITY_DOC_ID` for a given row. Similarly, both `ENTITY_CHAR_BEGIN` and `ENTITY_CHAR_END` are between `TEXT_CHAR_BEGIN` and `TEXT_CHAR_END`. You can use these spans for further processing down the line. For example, you could check how close together named entities of the same document were found, and then check if certain named entity clusters are indicative of different topics. However, this post-processing is not part of this tutorial.

The `NAMED_ENTITY_VIEW` also includes an error message column and a setup column like the `TOPIC_CLASSIFIER_VIEW` above.


In [None]:
%config SqlMagic.displaylimit = 10 # we set this lower to the show only a preview of the views

In [None]:
%%sql
SELECT TEXT_DOC_ID, 
    TEXT_CHAR_BEGIN, 
    TEXT_CHAR_END,
    ENTITY, 
    ENTITY_TYPE, 
    ENTITY_SCORE, 
    ENTITY_DOC_ID, 
    ENTITY_CHAR_BEGIN, 
    ENTITY_CHAR_END 
FROM "{{schema}}".NAMED_ENTITY_VIEW

### Keyword-Search View

Lastly, our preprocessing created a view containing the results of the keyword search step, the `KEYWORD_SEARCH_VIEW`. This one is structured similar to the `NAMED_ENTITY_VIEW`:

In [None]:
%%sql
DESC "{{schema}}".KEYWORD_SEARCH_VIEW

The `TEXT_DOC_ID`, `TEXT_CHAR_BEGIN` and `TEXT_CHAR_END` are again the input document span. But instead of an entity with an entity-score and an entity span, we now have a keyword column, a keyword score and a span(`KEYWORD_DOC_ID`, `KEYWORD_CHAR_BEGIN`, `KEYWORD_CHAR_END`) identifying the found keyword in the text. Then, of course, the `ERROR_MESSAGE` and `SETUP` columns.

In [None]:
%%sql
SELECT TEXT_DOC_ID, 
    TEXT_CHAR_BEGIN, 
    TEXT_CHAR_END,
    KEYWORD, 
    KEYWORD_SCORE, 
    KEYWORD_DOC_ID, 
    KEYWORD_CHAR_BEGIN, 
    KEYWORD_CHAR_END 
FROM "{{schema}}".KEYWORD_SEARCH_VIEW WHERE TEXT_DOC_ID < 5

You might notice some seemingly duplicated keywords for a given document. But take a look at the keyword spans of those "duplicates". They are different. This means the same keyword was found multiple times in the same document.

### Result Summary

Here is an overview of the data model our preprocessing created.
    
![A diagramm showing multiple Table names with their respective columns. Starting at "DOCUMENTS" and then the three result views. The columns containg the text document span are highlighted.](images/data_model_2.drawio.png)


## Adding Data to Source View

Now, let's try and run the preprocessing again, using the exact same input.

In [None]:
%%time
run_text_ai_preprocessing()

See how quickly it runs this time? This is because the Text AI does not compute results already computed in previous runs. We can test this behaviour further. Let's add more entries to our dataset, and see and see how long the preprocessing takes then.

So, in the next call let's double the data in our input view:

In [78]:
%%sql
CREATE OR REPLACE VIEW "{{schema}}"."{{view}}" AS 
SELECT * FROM "{{schema}}"."{{table}}" 
ORDER BY "TICKET_ID" 
LIMIT {{view_size}}*2;

In [None]:
%%sql
SELECT COUNT(ALL TICKET_ID) FROM "{{schema}}"."{{view}}";

Once we run the preprocessing again, you would expect this run to take twice as long as the first run we did. However, thanks to the way the Text AI is implemented, it takes only roughly the same time as the first run. Text AI detects which documents where already processed and only processes new documents.

In [None]:
%%time
run_text_ai_preprocessing()

In [None]:
%%sql
SELECT COUNT (*) FROM "{{schema}}".DOCUMENTS;

Remember, the processing time is dependent on a lot of factors, such as the actual size of the data points, the batch size, parallelism per node, as well as available memory and number of nodes of the used Exasol Database. So the actual speedup you experience will differ from case to case.

If you want to experiment with this further, feel free to, for example, add even more data. For this Notebook we did not demonstrate this, because the calls take a long time for demonstration purposes.

## Audit Log

Lastly, let's look at the audit log table Text AI has generated for us. This is a table documenting each run Text AI does on our ExasolDatabase. It contains information on runtime, how mana data entries were used or created, and error messages. This can be very helpful if you suspect a problem with one of your pipelines and want to know where it is coming from. Or if you are interested in seeing how much data came from a specific step, or which of the pipeline steps is taking too long.


In [85]:
%config SqlMagic.displaylimit = 20

In [None]:
%%sql
DESC "{{schema}}".TXAIE_AUDIT_LOG

In [None]:
from pandas import option_context
with open_pyexasol_connection(ai_lab_config, compression=True) as conn:
    audit_log = conn.export_to_pandas(f"""
        SELECT 
            RUN_ID,
            DB_OBJECT_NAME,
            EVENT_NAME,
            ROW_COUNT,
            LOG_TIMESTAMP 
        FROM "{schema}".TXAIE_AUDIT_LOG
    """)
    with option_context('display.max_rows', 20, 'display.max_colwidth', 1000):
        display(audit_log)

You can now continue with the [Text AI Analytics Notebook](txaie_analytics.ipynb).