# Text AI Extension initialization

Here we will bring the Text AI Extension functionality up and running. We will do this via a single call to function in the support library.

## Prerequisites

Prior to using this notebook one needs to complete the following steps:
1. [Configure the AI-Lab](../main_config.ipynb).

## Setup

### Open Secure Configuration Storage

In [1]:
%run ../utils/access_store_ui.ipynb
display(get_access_store_ui('../'))

Output()

Box(children=(Box(children=(Label(value='Configuration Store', layout=Layout(border_bottom='solid 1px', border…

## Configure access to a pre-release version

In [2]:
%run utils/txaie_init_ui.ipynb
display(get_txaie_pre_release_ui(ai_lab_config))

Box(children=(Box(children=(Label(value='Pre-release Access', layout=Layout(border_bottom='solid 1px', border_…

## Initialize the extension

<b>This operation normally takes a considerable amount of time to complete</b>

When the initialization finishes we should see a printed output suggesting us to activate the language container. Let's take it as an indication of the successful completion of the initialization procedure. The language container activation will be executed at the session level once the connection to the database is established. In tutorials using JupySQL the container activation is included in the routine that enables the JupySQL.

In [3]:
import subprocess
from exasol.nb_connector.text_ai_extension_wrapper import initialize_text_ai_extension, download_pre_release

with download_pre_release(ai_lab_config) as unzipped_files:
    project_wheel, slc_tar_gz = unzipped_files

    pip_cmd = ['pip', 'install', str(project_wheel)]
    subprocess.run(pip_cmd, check=True, capture_output=True)

    initialize_text_ai_extension(ai_lab_config, container_file=slc_tar_gz)




In SQL, you can activate the SLC
by using the following statements:

To activate the SLC only for the current session:
ALTER SESSION SET SCRIPT_LANGUAGES='R=builtin_r JAVA=builtin_java PYTHON3=builtin_python3 PYTHON3_TXAIE=localzmq+protobuf:///bfsdefault/default/TXAIE/exasol_text_ai_extension_container_release?lang=python#/buckets/bfsdefault/default/TXAIE/exasol_text_ai_extension_container_release/exaudf/exaudfclient_py3';

To activate the SLC on the system:
ALTER SYSTEM SET SCRIPT_LANGUAGES='R=builtin_r JAVA=builtin_java PYTHON3=builtin_python3 PYTHON3_TXAIE=localzmq+protobuf:///bfsdefault/default/TXAIE/exasol_text_ai_extension_container_release?lang=python#/buckets/bfsdefault/default/TXAIE/exasol_text_ai_extension_container_release/exaudf/exaudfclient_py3';



In [177]:
from exasol.nb_connector.transformers_extension_wrapper import initialize_te_extension

initialize_te_extension(ai_lab_config, version="2.2.1")




In SQL, you can activate the SLC
by using the following statements:

To activate the SLC only for the current session:
ALTER SESSION SET SCRIPT_LANGUAGES='R=builtin_r JAVA=builtin_java PYTHON3=builtin_python3 PYTHON3_TE=localzmq+protobuf:///bfsdefault/default/TE/exasol_transformers_extension_container_release?lang=python#/buckets/bfsdefault/default/TE/exasol_transformers_extension_container_release/exaudf/exaudfclient_py3';

To activate the SLC on the system:
ALTER SYSTEM SET SCRIPT_LANGUAGES='R=builtin_r JAVA=builtin_java PYTHON3=builtin_python3 PYTHON3_TE=localzmq+protobuf:///bfsdefault/default/TE/exasol_transformers_extension_container_release?lang=python#/buckets/bfsdefault/default/TE/exasol_transformers_extension_container_release/exaudf/exaudfclient_py3';



In [151]:
!pip install exasol-udf-mock-python



In [14]:
from exasol.ai.text.extraction import *
from exasol.ai.text.extraction.extraction import Extraction
from exasol.ai.text.extraction.abstract_extraction import Output

In [255]:
schema=ai_lab_config.db_schema
table="MY_TABLE"
text_column="TEXT_COLUMN"
key_column="KEY_COLUMN"
NAMED_ENTITY_MODEL="guishe/nuner-v2_fewnerd_fine_super"
NLI_MODEL_NAME_2="facebook/bart-large-mnli"
NLI_MODEL_NAME_1="tasksource/ModernBERT-base-nli"
NLI_MODEL_NAME_3="tasksource/ModernBERT-large-nli"
FEATURE_EXTRACTION_MODEL="answerdotai/ModernBERT-base"
LABELS=["urgent", "not urgent"]
BUCKETFS_CONNECTION_NAME=ai_lab_config.te_bfs_connection
BUCKETFS_SUB_DIR=ai_lab_config.te_models_bfs_dir
OUTPUT_SCHEMA=ai_lab_config.db_schema

In [None]:
%run ../transformers/utils/model_retrieval.ipynb

In [None]:
load_huggingface_model(ai_lab_config, NAMED_ENTITY_MODEL, 'token-classification')

In [192]:
load_huggingface_model(ai_lab_config, NLI_MODEL_NAME_1, 'zero-shot-classification')

In [None]:
load_huggingface_model(ai_lab_config, NLI_MODEL_NAME_2, 'zero-shot-classification')

In [None]:
load_huggingface_model(ai_lab_config, NLI_MODEL_NAME_3, 'zero-shot-classification')

In [232]:
load_huggingface_model(ai_lab_config, FEATURE_EXTRACTION_MODEL, 'feature-extraction')

In [276]:
def default_extractors(labels: List[str], parallelism_per_node:int = 1):
    return BranchExtractor(
                paths=[
                    NamedEntityExtractor(named_entity_settings=HftNamedEntitySettings(
                        model_name=NAMED_ENTITY_MODEL,
                        bucketfs_conn_name=BUCKETFS_CONNECTION_NAME,
                        sub_dir=BUCKETFS_SUB_DIR,
                    ), parallelism_per_node=parallelism_per_node),
                    TopicClassifierExtractor(topic_settings=HftTopicSettings(
                        model_name=NLI_MODEL_NAME_1,
                        bucketfs_conn_name=BUCKETFS_CONNECTION_NAME,
                        sub_dir=BUCKETFS_SUB_DIR,
                        topics=labels
                    ), parallelism_per_node=parallelism_per_node),
                    KeywordSearchExtractor(
                        keyword_settings=PatternRankKeywordSettings(
                            model_name=FEATURE_EXTRACTION_MODEL,
                            bucketfs_conn_name=BUCKETFS_CONNECTION_NAME,
                            sub_dir=BUCKETFS_SUB_DIR,
                        ),
                        parallelism_per_node=parallelism_per_node
                    )
                ]
            )

In [281]:
extraction = Extraction(
    extractor=PipelineExtractor(
        steps=[
            SourceTableExtractor(sources=[
                SchemaSource(db_schema=NameSelector(pattern=schema),
                     tables=[
                         TableSource(table=NameSelector(pattern=table),
                                     columns=[NameSelector(pattern=text_column)],
                                     keys=[NameSelector(pattern=key_column)])
                     ])
            ]),
            default_extractors(labels=LABELS, parallelism_per_node=1)
        ]
    ),
    output=Output(db_schema=OUTPUT_SCHEMA)
)

In [278]:
from exasol.nb_connector.connections import open_pyexasol_connection
from exasol.nb_connector.language_container_activation import get_activation_sql

activation_sql = get_activation_sql(ai_lab_config)

In [279]:
with open_pyexasol_connection(ai_lab_config, compression=True) as conn:
    conn.execute(f"""DROP TABLE IF EXISTS "{OUTPUT_SCHEMA}"."TOPIC_CLASSIFIER" """)
    conn.execute(f"""DROP TABLE IF EXISTS "{OUTPUT_SCHEMA}"."TOPIC_CLASSIFIER_LOOKUP_TOPIC" """)
    conn.execute(f"""DROP TABLE IF EXISTS "{OUTPUT_SCHEMA}"."TOPIC_CLASSIFIER_LOOKUP_SETUP" """)
    conn.execute(f"""DROP TABLE IF EXISTS "{OUTPUT_SCHEMA}"."NAMED_ENTITY" """)
    conn.execute(f"""DROP TABLE IF EXISTS "{OUTPUT_SCHEMA}"."NAMED_ENTITY_LOOKUP_ENTITY_NAME" """)
    conn.execute(f"""DROP TABLE IF EXISTS "{OUTPUT_SCHEMA}"."NAMED_ENTITY_LOOKUP_SETUP" """)
    conn.execute(f"""DROP TABLE IF EXISTS "{OUTPUT_SCHEMA}"."DOCUMENTS" """)
    conn.execute(f"""DROP TABLE IF EXISTS "{OUTPUT_SCHEMA}"."DOCUMENTS_AI_LAB_MY_TABLE" """)
    conn.execute(f"""DROP TABLE IF EXISTS "{OUTPUT_SCHEMA}"."KEYWORD_SEARCH" """)
    conn.execute(f"""DROP TABLE IF EXISTS "{OUTPUT_SCHEMA}"."KEYWORD_SEARCH_LOOKUP_KEYWORD" """)
    conn.execute(f"""DROP TABLE IF EXISTS "{OUTPUT_SCHEMA}"."KEYWORD_SEARCH_LOOKUP_SETUP" """)
    conn.execute(f"""DROP TABLE IF EXISTS "{schema}"."{table}" """)

In [280]:
with open_pyexasol_connection(ai_lab_config, compression=True) as conn:
    conn.execute(query=activation_sql)
    conn.execute(
    f"""
    CREATE OR REPLACE TABLE "{schema}"."{table}" (
      "{key_column}" INTEGER,
      "{text_column}" VARCHAR(2000000) UTF8
    )
    """)
    conn.execute(
    f"""INSERT INTO "{schema}"."{table}" VALUES (1, 'This is a test.')""")
    conn.execute(
    f"""INSERT INTO "{schema}"."{table}" VALUES (2, 'Office 2000 is awesome.')""")
    conn.execute(
    f"""INSERT INTO "{schema}"."{table}" VALUES (3, 'Please assist, with troubleshooting Lotus.')""")
    conn.execute(
    f"""INSERT INTO "{schema}"."{table}" VALUES (4, 'ASAP. Please assist, with troubleshooting Lotus.')""")
    extraction.run(conn, schema, "PYTHON3_TXAIE")

In [268]:
with open_pyexasol_connection(ai_lab_config, compression=True) as conn:
    result=conn.export_to_pandas(f"SELECT TABLE_SCHEMA, TABLE_NAME FROM EXA_ALL_TABLES")
result

Unnamed: 0,TABLE_SCHEMA,TABLE_NAME
0,AI_LAB,1828919947603214336_4_11_1
1,AI_LAB,MY_TABLE
2,AI_LAB,DOCUMENTS
3,AI_LAB,DOCUMENTS_AI_LAB_MY_TABLE
4,AI_LAB,TOPIC_CLASSIFIER
5,AI_LAB,TOPIC_CLASSIFIER_LOOKUP_TOPIC
6,AI_LAB,TOPIC_CLASSIFIER_LOOKUP_SETUP
7,AI_LAB,KEYWORD_SEARCH
8,AI_LAB,KEYWORD_SEARCH_LOOKUP_KEYWORD
9,AI_LAB,KEYWORD_SEARCH_LOOKUP_SETUP


In [269]:
with open_pyexasol_connection(ai_lab_config, compression=True) as conn:
    result=conn.export_to_pandas(f"SELECT VIEW_SCHEMA, VIEW_NAME FROM EXA_ALL_VIEWS")
result

Unnamed: 0,VIEW_SCHEMA,VIEW_NAME
0,AI_LAB,1828852495322120192_6_1
1,AI_LAB,1828852813227687936_6_1
2,AI_LAB,1828852985202933760_6_1
3,AI_LAB,1828853040310976512_6_1
4,AI_LAB,1828853125968691200_6_1
5,AI_LAB,1828853237315076096_6_1
6,AI_LAB,1828853321762799616_6_1
7,AI_LAB,1828865968121577472_6_1
8,AI_LAB,1828866082098118656_6_1
9,AI_LAB,1828866095866773504_6_1


In [270]:
with open_pyexasol_connection(ai_lab_config, compression=True) as conn:
    result=conn.export_to_pandas(f"""
    SELECT * 
    FROM {OUTPUT_SCHEMA}.DOCUMENTS as d
    """)
result

Unnamed: 0,text_doc_id,text_char_begin,text_char_end,text
0,1,0,15,This is a test.
1,2,0,23,Office 2000 is awesome.
2,3,0,42,"Please assist, with troubleshooting Lotus."
3,4,0,48,"ASAP. Please assist, with troubleshooting Lotus."


In [260]:
with open_pyexasol_connection(ai_lab_config, compression=True) as conn:
    result=conn.export_to_pandas(f"""
    SELECT * 
    FROM {OUTPUT_SCHEMA}.TOPIC_CLASSIFIER_VIEW as t
    JOIN {OUTPUT_SCHEMA}.DOCUMENTS as d
    ON d."text_doc_id" = t."text_doc_id"
    """)
result

Unnamed: 0,text_doc_id,text_char_begin,text_char_end,topic,topic_score,topic_rank,error_message,setup,text_doc_id.1,text_char_begin.1,text_char_end.1,text
0,1,0,15,not urgent,0.514206,2,,"{""model_name"":""tasksource/ModernBERT-base-nli"",""topics"":[""urgent"",""not urgent""],""hypothesis_template"":null,""multi_label"":false}",1,0,15,This is a test.
1,1,0,15,urgent,0.485794,1,,"{""model_name"":""tasksource/ModernBERT-base-nli"",""topics"":[""urgent"",""not urgent""],""hypothesis_template"":null,""multi_label"":false}",1,0,15,This is a test.
2,2,0,23,not urgent,0.96887,2,,"{""model_name"":""tasksource/ModernBERT-base-nli"",""topics"":[""urgent"",""not urgent""],""hypothesis_template"":null,""multi_label"":false}",2,0,23,Office 2000 is awesome.
3,2,0,23,urgent,0.03113,1,,"{""model_name"":""tasksource/ModernBERT-base-nli"",""topics"":[""urgent"",""not urgent""],""hypothesis_template"":null,""multi_label"":false}",2,0,23,Office 2000 is awesome.
4,3,0,42,urgent,0.67012,2,,"{""model_name"":""tasksource/ModernBERT-base-nli"",""topics"":[""urgent"",""not urgent""],""hypothesis_template"":null,""multi_label"":false}",3,0,42,"Please assist, with troubleshooting Lotus."
5,3,0,42,not urgent,0.32988,1,,"{""model_name"":""tasksource/ModernBERT-base-nli"",""topics"":[""urgent"",""not urgent""],""hypothesis_template"":null,""multi_label"":false}",3,0,42,"Please assist, with troubleshooting Lotus."
6,4,0,48,urgent,0.680033,2,,"{""model_name"":""tasksource/ModernBERT-base-nli"",""topics"":[""urgent"",""not urgent""],""hypothesis_template"":null,""multi_label"":false}",4,0,48,"ASAP. Please assist, with troubleshooting Lotus."
7,4,0,48,not urgent,0.319967,1,,"{""model_name"":""tasksource/ModernBERT-base-nli"",""topics"":[""urgent"",""not urgent""],""hypothesis_template"":null,""multi_label"":false}",4,0,48,"ASAP. Please assist, with troubleshooting Lotus."


In [271]:
with open_pyexasol_connection(ai_lab_config, compression=True) as conn:
    result=conn.export_to_pandas(f"""
    SELECT * FROM {OUTPUT_SCHEMA}.NAMED_ENTITY_VIEW as e
    JOIN {OUTPUT_SCHEMA}.DOCUMENTS as d
    ON d."text_doc_id" = e."text_doc_id"
    """)
result

Unnamed: 0,text_doc_id,text_char_begin,text_char_end,entity_name,entity_score,entity,entity_doc_id,entity_char_begin,entity_char_end,error_message,setup,text_doc_id.1,text_char_begin.1,text_char_end.1,text
0,4,0,48,product_software,0.719111,Lotus,4,42,47,,"{""model_name"":""guishe/nuner-v2_fewnerd_fine_super"",""ignore_labels"":null,""aggregation_strategy"":""simple""}",4,0,48,"ASAP. Please assist, with troubleshooting Lotus."
1,3,0,42,product_software,0.762711,Lotus,3,36,41,,"{""model_name"":""guishe/nuner-v2_fewnerd_fine_super"",""ignore_labels"":null,""aggregation_strategy"":""simple""}",3,0,42,"Please assist, with troubleshooting Lotus."
2,2,0,23,product_software,0.702919,Office 2000,2,0,11,,"{""model_name"":""guishe/nuner-v2_fewnerd_fine_super"",""ignore_labels"":null,""aggregation_strategy"":""simple""}",2,0,23,Office 2000 is awesome.


In [272]:
with open_pyexasol_connection(ai_lab_config, compression=True) as conn:
    result=conn.export_to_pandas(f"""
    SELECT * FROM {OUTPUT_SCHEMA}.KEYWORD_SEARCH_VIEW as e
    JOIN {OUTPUT_SCHEMA}.DOCUMENTS as d
    ON d."text_doc_id" = e."text_doc_id"
    """)
result

Unnamed: 0,text_doc_id,text_char_begin,text_char_end,keyword,keyword_score,keyword_doc_id,keyword_char_begin,keyword_char_end,error_message,setup,text_doc_id.1,text_char_begin.1,text_char_end.1,text
0,1,0,15,test,0.792,1,10,14,,"[{""model_name"":""answerdotai/ModernBERT-base"",""bucketfs_conn_name"":""TE_BFS_sys"",""sub_dir"":""te_models"",""vec_kwargs"":{""spacy_pipeline"":null,""max_df"":null,""min_df"":null},""kbx_kwargs"":{""top_n"":5,""use_maxsum"":false,""use_mmr"":false,""diversity"":0.5,""nr_candidates"":20}},{""phrase_column"":""keyword"",""phrase_column_flags"":4,""fuzziness"":0}]",1,0,15,This is a test.
1,2,0,23,office,0.5886,2,0,6,,"[{""model_name"":""answerdotai/ModernBERT-base"",""bucketfs_conn_name"":""TE_BFS_sys"",""sub_dir"":""te_models"",""vec_kwargs"":{""spacy_pipeline"":null,""max_df"":null,""min_df"":null},""kbx_kwargs"":{""top_n"":5,""use_maxsum"":false,""use_mmr"":false,""diversity"":0.5,""nr_candidates"":20}},{""phrase_column"":""keyword"",""phrase_column_flags"":4,""fuzziness"":0}]",2,0,23,Office 2000 is awesome.
2,3,0,42,lotus,0.7196,3,36,41,,"[{""model_name"":""answerdotai/ModernBERT-base"",""bucketfs_conn_name"":""TE_BFS_sys"",""sub_dir"":""te_models"",""vec_kwargs"":{""spacy_pipeline"":null,""max_df"":null,""min_df"":null},""kbx_kwargs"":{""top_n"":5,""use_maxsum"":false,""use_mmr"":false,""diversity"":0.5,""nr_candidates"":20}},{""phrase_column"":""keyword"",""phrase_column_flags"":4,""fuzziness"":0}]",3,0,42,"Please assist, with troubleshooting Lotus."
3,4,0,48,asap,0.743,4,0,4,,"[{""model_name"":""answerdotai/ModernBERT-base"",""bucketfs_conn_name"":""TE_BFS_sys"",""sub_dir"":""te_models"",""vec_kwargs"":{""spacy_pipeline"":null,""max_df"":null,""min_df"":null},""kbx_kwargs"":{""top_n"":5,""use_maxsum"":false,""use_mmr"":false,""diversity"":0.5,""nr_candidates"":20}},{""phrase_column"":""keyword"",""phrase_column_flags"":4,""fuzziness"":0}]",4,0,48,"ASAP. Please assist, with troubleshooting Lotus."
4,4,0,48,lotus,0.7117,4,42,47,,"[{""model_name"":""answerdotai/ModernBERT-base"",""bucketfs_conn_name"":""TE_BFS_sys"",""sub_dir"":""te_models"",""vec_kwargs"":{""spacy_pipeline"":null,""max_df"":null,""min_df"":null},""kbx_kwargs"":{""top_n"":5,""use_maxsum"":false,""use_mmr"":false,""diversity"":0.5,""nr_candidates"":20}},{""phrase_column"":""keyword"",""phrase_column_flags"":4,""fuzziness"":0}]",4,0,48,"ASAP. Please assist, with troubleshooting Lotus."
