# HeatWave GenAI

HeatWave GenAI is the industry's first automated in-database Generative AI service. Seamlessly integrating large language models (LLMs) and embedding generation within the database, it allows you to effortlessly generate new and realistic content, speed up manual or repetitive tasks like summarizing large documents, perform Retrieval Augmented Generation (RAG), and engage in natural language interactions. Refer to https://www.oracle.com/heatwave/genai for further details on Heatwave GenAI.

This notebook demonstrates the application of [ML_GENERATE](https://dev.mysql.com/doc/heatwave/en/mys-hwgenai-ml-generate.html) for content generation using data from the 2024 Olympic Games.

### References
- https://www.economicsobservatory.com/what-happened-at-the-2024-olympics
- https://en.wikipedia.org/wiki/2024_Summer_Olympics

### Prerequistises
Install the necessary packages

- mysql-connector-python
- pandas 

##### Import Python packages

In [1]:
# import Python packages
import time
import json
import numpy as np
import pandas as pd
import mysql.connector
from mysql.connector.errors import OperationalError, InterfaceError

### Connect to the HeatWave instance
We create a connection to an active [HeatWave](https://www.oracle.com/mysql/) instance using the [MySQL Connector/Python](https://dev.mysql.com/doc/connector-python/en/). We also define an API to execute a SQL query using a cursor, and the result is returned as a Pandas DataFrame. Modify the below variables to point to your HeatWave instance. On AWS, set USE_BASTION to False. On OCI, please create a tunnel on your machine using the below command by substituting the variable with their respective values

ssh -o ServerAliveInterval=60 -i BASTION_PKEY -L LOCAL_PORT:DBSYSTEM_IP:DBSYSTEM_PORT BASTION_USER@BASTION_IP

In [None]:
BASTION_IP ="ip_address"
BASTION_USER = "opc"
BASTION_PKEY = "private_key_file"
DBSYSTEM_IP = "ip_address"
DBSYSTEM_PORT = 3306
DBSYSTEM_USER = "username"
DBSYSTEM_PASSWORD = "password"
DBSYSTEM_SCHEMA = "mlcorpus"
LOCAL_PORT = 3306
USE_BASTION = True

if USE_BASTION is True:
    DBSYSTEM_IP = "127.0.0.1"
else:
    LOCAL_PORT = DBSYSTEM_PORT
    
mydb = None  # global handle we keep fresh

def _get_conn():
    """Return a live MySQL connection, recreating it if needed."""
    global mydb
    if mydb is None or not mydb.is_connected():
        try:
            if mydb:
                mydb.close()
        except Exception:
            pass
        mydb = mysql.connector.connect(
            host=DBSYSTEM_IP,
            port=LOCAL_PORT,
            user=DBSYSTEM_USER,
            password=DBSYSTEM_PASSWORD,
            database=DBSYSTEM_SCHEMA,
            autocommit=True,
            connection_timeout=10,
        )
    return mydb

# Helper function to execute SQL queries and return the results as a Pandas DataFrame
def execute_sql(sql: str, _retry=True) -> pd.DataFrame:
    """
    Execute SQL and return a DataFrame. Empty DF for DDL/DML without result sets.
    Ensures connection is alive; retries once on OperationalError.
    """
    conn = _get_conn()
    try:
        with conn.cursor() as cur:
            cur.execute(sql)
            if cur.description is None:
                return pd.DataFrame()
            rows = cur.fetchall()
            cols = [d[0] for d in cur.description]
            return pd.DataFrame(rows, columns=cols)
    except (OperationalError, InterfaceError) as e:
        if _retry:
            time.sleep(0.5)
            try:
                conn.close()
            except Exception:
                pass
            global mydb
            mydb = None
            return execute_sql(sql, _retry=False)
        raise


# ML_GENERATE operation

You can perform content generation using the [ML_GENERATE](https://dev.mysql.com/doc/heatwave/en/mys-hwgenai-ml-generate.html) procedure.

### Content generation

Invoke the [ML_GENERATE](https://dev.mysql.com/doc/heatwave/en/mys-hwgenai-ml-generate.html) procedure to query the LLM.

In [3]:
df = execute_sql(f"""SELECT JSON_PRETTY(sys.ML_GENERATE("In which year were the first modern Olympic Games held?", NULL));""")
json.loads(df.iat[0,0])["text"]

'The first modern Olympic Games were held in 1896, in Athens, Greece. They were organized by the International Olympic Committee (IOC) and took place from April 6 to April 15, 1896.'

In [4]:
df = execute_sql(f"""SELECT JSON_PRETTY(sys.ML_GENERATE("How many Olympians participated in the 2024 Games?", NULL));""")
json.loads(df.iat[0,0])["text"]

"I don't have information about the number of Olympians who will participate in the 2024 Games, as my training data is current up to 2021 and I do not have real-time access to future events. However, I can suggest checking the official Olympic website or other reliable news sources for the most up-to-date information on the participants of the 2024 Paris Olympics."

The model is unable to answer the above question because its training data extends only up to 2022.

Let's provide the model with up-to-date information by using the 'context' option in the command.

In [5]:
df = execute_sql(f"""SELECT JSON_PRETTY(sys.ML_GENERATE("How many Olympians participated in the 2024 Games?", JSON_OBJECT("context", "Paris has been the host for the 2024 Olympic and Paralympic Games. Over the course of the summer, the city – and other venues across France – have welcomed 10,500 Olympians (competing in 32 sports) and over 4,000 Paralympians (competing in 22 sports).")));""")
json.loads(df.iat[0,0])["text"]

'According to the text, there were 10,500 Olympians who competed in the 2024 Olympic Games.'