# Text AI Preprocessing for the Demo

Here we will run the preprocessing for the the Demo.

## Prerequisites

Prior to using this notebook one needs to complete the following steps:
1. [Configure the AI-Lab](../main_config.ipynb).

## Setup

### Open Secure Configuration Storage

In [1]:
%run ../../utils/access_store_ui.ipynb
display(get_access_store_ui('../../'))

Output()

Box(children=(Box(children=(Label(value='Configuration Store', layout=Layout(border_bottom='solid 1px', border…

## Setup

In [2]:
%run utils/xp_default_extractor.ipynb

In [6]:
%run ../../utils/jupysql_init.ipynb

In [3]:
from exasol.ai.text.extraction import *
from exasol.ai.text.extraction.extraction import Extraction
from exasol.ai.text.extraction.abstract_extraction import Output

In [4]:
schema=ai_lab_config.db_schema

In [23]:
from exasol.nb_connector.connections import open_pyexasol_connection
from exasol.nb_connector.language_container_activation import get_activation_sql

activation_sql = get_activation_sql(ai_lab_config)

In [38]:
%config SqlMagic.displaylimit = 20

## Run Preprocessing for new Data

### Customer Support Ticket Dataset

Source: https://www.kaggle.com/datasets/suraj520/customer-support-ticket-dataset

In [39]:
%%sql
SELECT * FROM {{schema}}.CUSTOMER_SUPPORT_TICKETS as d LIMIT 5

ticket_id,customer_name,date_of_purchase,ticket_subject,ticket_description,ticket_status,ticket_channel
6525,Raymond Dickerson,2021-05-21,Battery life,"I'm having an issue with the Fitbit Charge. Please assist. 1) Don't add an item or use a service to add another item, this will leave the price to continue. 2) Do you want to charge/re I'm experiencing this issue on multiple devices of the same model, so it seems to be a widespread problem.",Closed,Social media
6526,Megan Colon,2020-10-22,Product setup,"I'm having an issue with the Lenovo ThinkPad. Please assist. (2) If you have any questions about shipping with your package, please email the manufacturer. (3) **NOTE** When creating an invoice you must I've tried using different cables, adapters, or peripherals with my Lenovo ThinkPad, but the issue persists.",Open,Social media
6527,George Mitchell,2020-06-21,Hardware issue,"I'm having an issue with the LG oLED. Please assist. If I can, please send a message. In the meantime, don't forget to include the address information in the email I sent to the company. I know there I've performed a factory reset on my LG oLED, hoping it would resolve the problem, but it didn't help.",Pending Customer Response,Email
6528,Joshua Pollard,2021-01-03,Product recommendation,"I've accidentally deleted important data from my Microsotf Office. Is there any way to recover the deleted files? I need them urgently. https://help.vulnapp.org/showthread.php?c= I've recently updated the firmware of my Microsotf Office, and the issue started happening afterward. Could it be related to the update?",Closed,Social media
6529,Melissa Thomas,2021-06-24,Delivery problem,"I'm facing a problem with my MacBook Pro. The MacBook Pro is not turning on. It was working fine until yesterday, but now it doesn't respond. In the near future, if you're running I'm unable to find the option to perform the desired action in the MacBook Pro. Could you please guide me through the steps?",Closed,Phone


### Adding new Data to the CUSTOMER_SUPPORT_TICKETS Table

We processed the original Dataset beforehand, because for around 7000 rows this takes a few hours on CPU. Here we demonstrate, how you execute an extraction in general and that Exasol Text AI is capable to process only new data.

#### First lets have a look how many Tickets and Documents we have in total before we are adding new data:

In [13]:
%%sql
SELECT count(*) FROM {{schema}}.CUSTOMER_SUPPORT_TICKETS as d

COUNT(*)
7352


In [14]:
%%sql
SELECT count(*) FROM {{schema}}.DOCUMENTS as d

COUNT(*)
7352


#### Now we are adding new data:

In [19]:
%%sql
INSERT INTO {{schema}}.CUSTOMER_SUPPORT_TICKETS VALUES (
    (SELECT MAX(TICKET_ID)+1 FROM {{schema}}.CUSTOMER_SUPPORT_TICKETS), 
    'Steven Davis MD', 
    '2020-06-01', 
    'Hardware issue',
    'There seems to be a hardware problem with my Dell XPS.',
    'Open',
    'Phone'
)

Now we have new total count of tickets:

In [20]:
%%sql
SELECT count(*) FROM {{schema}}.CUSTOMER_SUPPORT_TICKETS as d

COUNT(*)
7353


The new ticket in the table:

In [28]:
%%sql
SELECT * FROM {{schema}}.CUSTOMER_SUPPORT_TICKETS WHERE TICKET_ID = (SELECT MAX(TICKET_ID) FROM {{schema}}.CUSTOMER_SUPPORT_TICKETS)

ticket_id,customer_name,date_of_purchase,ticket_subject,ticket_description,ticket_status,ticket_channel
8470,Steven Davis MD,2020-06-01,Hardware issue,There seems to be a hardware problem with my Dell XPS.,Open,Phone


### Running the Extraction

In [24]:
extraction = Extraction(
    extractor=PipelineExtractor(
        steps=[
            SourceTableExtractor(sources=[
                SchemaSource(db_schema=NameSelector(pattern=schema),
                     tables=[
                         TableSource(table=NameSelector(pattern="CUSTOMER_SUPPORT_TICKETS"),
                                     columns=[NameSelector(pattern="TICKET_DESCRIPTION")],
                                     keys=[NameSelector(pattern="TICKET_ID")])
                     ])
            ]),
            DefaultExtractor(
                named_entity_recognition_enabled = True,
                topic_classification_enabled = True,
                keyword_search_enabled = True,
                topics=["urgent", "not urgent"], 
                parallelism_per_node=2)
        ]
    ),
    output=Output(db_schema=schema)
)

In [25]:
with open_pyexasol_connection(ai_lab_config, compression=True) as conn:
    conn.execute(query=activation_sql)
    extraction.run(conn, schema, "PYTHON3_TXAIE")

Our document count increased:

In [26]:
%%sql
SELECT count(*) FROM {{schema}}.DOCUMENTS as d

COUNT(*)
7353


### Lets have a look at the results of the Extraction:

In [40]:
%%sql
SELECT TABLE_SCHEMA, TABLE_NAME FROM EXA_ALL_TABLES

table_schema,table_name
AI_LAB,PRODUCTS
AI_LAB,DOCUMENTS
AI_LAB,DOCUMENTS_AI_LAB_CUSTOMER_SUPPORT_TICKETS
AI_LAB,NAMED_ENTITY
AI_LAB,NAMED_ENTITY_LOOKUP_ENTITY_TYPE
AI_LAB,NAMED_ENTITY_LOOKUP_SETUP
AI_LAB,TOPIC_CLASSIFIER
AI_LAB,TOPIC_CLASSIFIER_LOOKUP_TOPIC
AI_LAB,TOPIC_CLASSIFIER_LOOKUP_SETUP
AI_LAB,KEYWORD_SEARCH


In [41]:
%%sql
SELECT VIEW_SCHEMA, VIEW_NAME FROM EXA_ALL_VIEWS

view_schema,view_name
AI_LAB,ENTITIES_WITH_TOPICS
AI_LAB,URGENT_PRODUCTS
AI_LAB,KEYWORD_SEARCH_VIEW
AI_LAB,TOPIC_CLASSIFIER_VIEW
AI_LAB,NAMED_ENTITY_VIEW


In [44]:
%%sql
SELECT t.TOPIC, t.TOPIC_SCORE, d.TEXT_DOC_ID, d.TEXT_CHAR_BEGIN, d.TEXT_CHAR_END, d.TEXT
FROM {{schema}}.TOPIC_CLASSIFIER_VIEW as t JOIN {{schema}}.DOCUMENTS as d ON d.TEXT_DOC_ID = t.TEXT_DOC_ID 
WHERE t.TOPIC_RANK=1 ORDER BY t.TOPIC_SCORE DESC

topic,topic_score,text_doc_id,text_char_begin,text_char_end,TEXT
urgent,0.9878994226455688,4828,0,262,"I'm facing a problem with my Dell XPS. The Dell XPS is not turning on. It was working fine until yesterday, but now it doesn't respond. Please send out the refund immediately. I need assistance as soon as possible because it's affecting my work and productivity."
urgent,0.987150490283966,3720,0,234,"I've accidentally deleted important data from my Sony K4 HDR TV. Is there any way to recover the deleted files? I need them urgently. I rely heavily on my Sony K4 HDR TV for my daily tasks, and this issue is hindering my productivity."
urgent,0.9862364530563354,3761,0,312,I've accidentally deleted important data from my Amazon cEho. Is there any way to recover the deleted files? I need them urgently. A large amount of time has passed since a lot of users decided to re-download files from their I need assistance as soon as possible because it's affecting my work and productivity.
urgent,0.9859652519226074,145,0,263,"I'm facing a problem with my Canon EOS. The Canon EOS is not turning on. It was working fine until yesterday, but now it doesn't respond. A call for help has been filed. Please I need assistance as soon as possible because it's affecting my work and productivity."
urgent,0.9855612516403198,4164,0,268,I'm having an issue with the Nest Thermostat. Please assist. I'm having an issue with the Nest Thermostat. Please assist. I'm having an issue with the Nest Thermostat. Please Casino I need assistance as soon as possible because it's affecting my work and productivity.
urgent,0.9854793548583984,5630,0,342,"I've accidentally deleted important data from my Nintendo Switch Pro Controller. Is there any way to recover the deleted files? I need them urgently. There are no easy ways to recover the files but I've tried. I've sent my support team I've checked for any available software updates for my Nintendo Switch Pro Controller, but there are none."
urgent,0.9820414185523988,780,0,143,I'm having an issue with the HP Pavilion. Please assist. I need assistance as soon as possible because it's affecting my work and productivity.
urgent,0.9819363951683044,3500,0,274,"There seems to be a hardware problem with my PlayStation. The screen is flickering, and I'm unable to use it. What should I do? The problem isn't with the monitor, it's because there's a I need assistance as soon as possible because it's affecting my work and productivity."
urgent,0.9817339181900024,2489,0,264,I'm having an issue with the Philips Hue Lgiths. Please assist. I'm having an issue with the Philips Hue Lgiths. Please assist. I'm having an issue with the Philips Hue Lgiths. I need assistance as soon as possible because it's affecting my work and productivity.
urgent,0.98157399892807,5758,0,291,"I'm unable to access my Canon DSLR Camera account. It keeps displaying an 'Invalid Credentials' error, even though I'm using the correct login information. How can I regain access to my account? You want I need assistance as soon as possible because it's affecting my work and productivity."


In [45]:
%%sql
SELECT t.TOPIC, t.TOPIC_SCORE, d.TEXT_DOC_ID, d.TEXT_CHAR_BEGIN, d.TEXT_CHAR_END, d.TEXT
FROM {{schema}}.TOPIC_CLASSIFIER_VIEW as t JOIN {{schema}}.DOCUMENTS as d ON d.TEXT_DOC_ID = t.TEXT_DOC_ID 
WHERE t.TOPIC_RANK=1 ORDER BY t.TOPIC_SCORE ASC

topic,topic_score,text_doc_id,text_char_begin,text_char_end,TEXT
not urgent,0.500121533870697,1891,0,296,I'm having an issue with the Fitbit Charge. Please assist. I'm having an issue with the Fitbit Charge. Please assist. I'm in the middle of a battle with the dragon and I'm unable to find the option to perform the desired action in the Fitbit Charge. Could you please guide me through the steps?
not urgent,0.5001460909843445,5923,0,284,"I'm having an issue with the LG Smart TV. Please assist. The first product with the product_purchased key is a simple array of items such as the product name, age and the product description. I've tried different settings and configurations on my LG Smart TV, but the issue persists."
urgent,0.5001549124717712,5873,0,287,"I've encountered a data loss issue with my Fitbit Versa Smartwatch. All the files and documents seem to have disappeared. Can you guide me on how to retrieve them? [07/Aug/2016 04:18:39 PMiability: I've already contacted customer support multiple times, but the issue remains unresolved."
not urgent,0.5002934336662292,470,0,299,"I'm having an issue with the GoPro Hero. Please assist. In addition to those steps, I can use a combination of these to find the item you wish to receive. Please contact me with a list or a screenshot if you I've checked for software updates, and my GoPro Hero is already running the latest version."
urgent,0.5002939105033875,1752,0,262,I'm having an issue with the Microsoft surface. Please assist. I'm having an issue with the Microsoft surface. Please assist. I'm having an issue with the Microsoft surface. I've checked the device settings and made sure that everything is configured correctly.
not urgent,0.5003679990768433,6800,0,281,"I'm having an issue with the Xbox. Please assist. 3. Remove the product from my account and start the ""Clean Up"" button in the bottom left corner of the screen. The page will tell you which I'm concerned about the security of my Xbox and would like to ensure that my data is safe."
urgent,0.5005078315734863,2773,0,306,"I'm having an issue with the Amazon Echo. Please assist. The product can't be ordered before October 28th, 2016 at 07:00AM (EST). Selling to Europe: Belgium Categories Disc I've recently updated the firmware of my Amazon Echo, and the issue started happening afterward. Could it be related to the update?"
not urgent,0.5005123019218445,2368,0,258,"I'm having an issue with the Amazon Echo. Please assist. Note: You can use the app's app-name to call other app owners. The app can also ask you for your password. I've checked for software updates, and my Amazon Echo is already running the latest version."
urgent,0.5005147457122803,7276,0,252,"I'm having an issue with the Nintendo Switch. Please assist. I'm not having an issue with the Nintendo Switch. Please assist. Last edited by takyo; 08-19-2015 at 14: I've tried troubleshooting steps mentioned in the user manual, but the issue persists."
not urgent,0.5005958676338196,4895,0,310,"I'm having an issue with the Lenovo ThinkPad. Please assist. Sorry, this product is no longer available. We have an additional promotional code, available for purchase. It is offered for the first year, and I rely heavily on my Lenovo ThinkPad for my daily tasks, and this issue is hindering my productivity."


In [46]:
%%sql
SELECT e.ENTITY_DOC_ID, e.ENTITY_CHAR_BEGIN, e.ENTITY_CHAR_END, e.ENTITY_TYPE, e.ENTITY, e.ENTITY_SCORE, d.TEXT
FROM {{schema}}.NAMED_ENTITY_VIEW as e JOIN {{schema}}.DOCUMENTS as d ON d.TEXT_DOC_ID = e.TEXT_DOC_ID

entity_doc_id,entity_char_begin,entity_char_end,entity_type,entity,entity_score,TEXT
2604,,,,,,I'm having an issue with the eNst Thermostat. Please assist. I need assistance as soon as possible because it's affecting my work and productivity.
5217,75.0,84.0,person_other,Lidia-Ann,0.9261261820793152,I'm having an issue with the Garmin Forerunner. Please assist. My name is Lidia-Ann and I am currently a licensed clinical pharmacist using a combination of prescription medicine (Vicovir and Prozac) I've checked the device settings and made sure that everything is configured correctly.
5217,181.0,188.0,other_medical,Vicovir,0.855757474899292,I'm having an issue with the Garmin Forerunner. Please assist. My name is Lidia-Ann and I am currently a licensed clinical pharmacist using a combination of prescription medicine (Vicovir and Prozac) I've checked the device settings and made sure that everything is configured correctly.
5217,193.0,199.0,other_medical,Prozac,0.8536580801010132,I'm having an issue with the Garmin Forerunner. Please assist. My name is Lidia-Ann and I am currently a licensed clinical pharmacist using a combination of prescription medicine (Vicovir and Prozac) I've checked the device settings and made sure that everything is configured correctly.
5218,29.0,59.0,product_other,Nintendo Switch Pro Controller,0.9500516653060912,"I'm having an issue with the Nintendo Switch Pro Controller. Please assist. Thank you. It's a lot of work to get the best bang for your buck, but this is what we did for you. If you need help I'm concerned about the security of my Nintendo Switch Pro Controller and would like to ensure that my data is safe."
5218,232.0,262.0,product_other,Nintendo Switch Pro Controller,0.9521684050559998,"I'm having an issue with the Nintendo Switch Pro Controller. Please assist. Thank you. It's a lot of work to get the best bang for your buck, but this is what we did for you. If you need help I'm concerned about the security of my Nintendo Switch Pro Controller and would like to ensure that my data is safe."
5219,29.0,40.0,product_other,Sony Xperia,0.915447235107422,"I'm having an issue with the Sony Xperia. Please assist. Thanks. Rated 1 out of 5 by Fartfan from Not For The Money It doesn't work. It can't be used on a bag, bag I'm not sure if this issue is specific to my device or if others have reported similar problems."
5219,86.0,93.0,person_other,Fartfan,0.7744524478912354,"I'm having an issue with the Sony Xperia. Please assist. Thanks. Rated 1 out of 5 by Fartfan from Not For The Money It doesn't work. It can't be used on a bag, bag I'm not sure if this issue is specific to my device or if others have reported similar problems."
5220,43.0,53.0,product_other,GoPro Hero,0.9142147302627563,"I've encountered a data loss issue with my GoPro Hero. All the files and documents seem to have disappeared. Can you guide me on how to retrieve them? I've encountered a data loss issue with my website after updating The issue I'm facing is intermittent. Sometimes it works fine, but other times it acts up unexpectedly."
5221,34.0,57.0,product_other,Fitbit Versa Smartwatch,0.7948973178863525,"I've forgotten my password for my Fitbit Versa Smartwatch account, and the password reset option is not working. How can I recover my account? To resolve this, you will have to restart your app and start again. After I'm experiencing this issue on multiple devices of the same model, so it seems to be a widespread problem."


In [47]:
%%sql
SELECT
    k.KEYWORD_DOC_ID, k.KEYWORD_CHAR_BEGIN, k.KEYWORD_CHAR_END, 
    k.KEYWORD, k.KEYWORD_SCORE,
    d.TEXT
FROM {{schema}}.KEYWORD_SEARCH_VIEW as k
JOIN {{schema}}.DOCUMENTS as d
ON d.TEXT_DOC_ID = k.TEXT_DOC_ID
ORDER BY k.KEYWORD_DOC_ID, k.KEYWORD_SCORE DESC

keyword_doc_id,keyword_char_begin,keyword_char_end,keyword,keyword_score,TEXT
1,54,64,fi network,0.7114,"I'm having trouble connecting my iPhone to my home Wi-Fi network. It doesn't detect any networks, although other devices are connecting fine. What can be done to resolve this issue? The Wi-Fi The issue I'm facing is intermittent. Sometimes it works fine, but other times it acts up unexpectedly."
1,11,18,trouble,0.6919,"I'm having trouble connecting my iPhone to my home Wi-Fi network. It doesn't detect any networks, although other devices are connecting fine. What can be done to resolve this issue? The Wi-Fi The issue I'm facing is intermittent. Sometimes it works fine, but other times it acts up unexpectedly."
1,33,39,iphone,0.6882,"I'm having trouble connecting my iPhone to my home Wi-Fi network. It doesn't detect any networks, although other devices are connecting fine. What can be done to resolve this issue? The Wi-Fi The issue I'm facing is intermittent. Sometimes it works fine, but other times it acts up unexpectedly."
1,88,96,networks,0.6825,"I'm having trouble connecting my iPhone to my home Wi-Fi network. It doesn't detect any networks, although other devices are connecting fine. What can be done to resolve this issue? The Wi-Fi The issue I'm facing is intermittent. Sometimes it works fine, but other times it acts up unexpectedly."
1,260,271,other times,0.6698,"I'm having trouble connecting my iPhone to my home Wi-Fi network. It doesn't detect any networks, although other devices are connecting fine. What can be done to resolve this issue? The Wi-Fi The issue I'm facing is intermittent. Sometimes it works fine, but other times it acts up unexpectedly."
2,117,138,| order_total=3540.00,0.867,I'm having an issue with the Canon DSLR Camear. Please assist. Please try again later. | product_purchased=product | order_total=3540.00 The result is that I I need assistance as soon as possible because it's affecting my work and productivity.
2,234,246,productivity,0.7713,I'm having an issue with the Canon DSLR Camear. Please assist. Please try again later. | product_purchased=product | order_total=3540.00 The result is that I I need assistance as soon as possible because it's affecting my work and productivity.
2,29,34,canon,0.7544,I'm having an issue with the Canon DSLR Camear. Please assist. Please try again later. | product_purchased=product | order_total=3540.00 The result is that I I need assistance as soon as possible because it's affecting my work and productivity.
2,40,46,camear,0.735,I'm having an issue with the Canon DSLR Camear. Please assist. Please try again later. | product_purchased=product | order_total=3540.00 The result is that I I need assistance as soon as possible because it's affecting my work and productivity.
2,108,115,product,0.7173,I'm having an issue with the Canon DSLR Camear. Please assist. Please try again later. | product_purchased=product | order_total=3540.00 The result is that I I need assistance as soon as possible because it's affecting my work and productivity.
