# MySQL AI
MySQL AI provides developers to build rich applications with MySQL leveraging built in machine learning, GenAI, LLMs and semantic search. They can create vectors from documents stored in a local file system. Customers can deploy these AI applications on premise or migrate them to MySQL HeatWave for lower cost, higher performance, richer functionality and latest LLMs with no change to their application. This gives developers the flexibility to build their applications on MySQL EE and then deploy them either on premise or in the cloud.

This notebook will showcase the use of [ML_RAG](https://dev.mysql.com/doc/heatwave/en/mys-hwgenai-ml-rag.html) for Retrieval Augmented Generation (RAG) and [HEATWAVE_CHAT](https://dev.mysql.com/doc/heatwave/en/mys-hwgenai-hw-chat.html) for engaging in natural language interactions using data from the 2024 Olympic Games.

### References
- https://blogs.oracle.com/mysql/post/announcing-mysql-ai
- https://dev.mysql.com/doc/mysql-ai/9.4/en/
- https://dev.mysql.com/doc/dev/mysql-studio/latest/#overview
- https://www.economicsobservatory.com/what-happened-at-the-2024-olympics
- https://en.wikipedia.org/wiki/2024_Summer_Olympics

### Prerequistises

- mysql-connector-python
- pandas 

##### Import Python packages

In [1]:
# import Python packages
import os
import json
import mysql.connector
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

### Connect to the MySQL AI instance
We create a connection to an active MySQL AI instance using the [MySQL Connector/Python](https://dev.mysql.com/doc/connector-python/en/). We also define an API to execute a SQL query using a cursor, and the result is returned as a Pandas DataFrame. Modify the below variables to point to your MySQL AI instance.

 - In MySQL Studio, connections are restricted to only allow localhost as the host. 
 - In MySQL Studio, the only accepted password values are the string unused or None. 

In [None]:
HOST = 'localhost'
PORT = 3306
USER = 'root'
PASSWORD = 'unused'
DATABASE = 'mlcorpus'


myconn = mysql.connector.connect(
    host=HOST,
    port=PORT,
    user=USER,
    password=PASSWORD,
    database=DATABASE,
    allow_local_infile=True,
    use_pure=True,
    autocommit=True,
)
mycursor = myconn.cursor()


# Helper function to execute SQL queries and return the results as a Pandas DataFrame
def execute_sql(sql: str) -> pd.DataFrame:
    mycursor.execute(sql)
    return pd.DataFrame(mycursor.fetchall(), columns=mycursor.column_names)

# ML_RAG operation

The available LLMs have the limitation of being trained on public datasets. However, you can leverage MySQL AI to generate content based on your own proprietary data through Retrieval Augmented Generation (RAG). MySQL AI can convert your data into embeddings and store them in a vector store, and these embeddings can be used to provide your enterprise's specific context to the LLMs.

In [3]:
# Delete the vector_store_data_1 table if exists
execute_sql(f"""DROP TABLE if exists mlcorpus.vector_store_data_1;""")

### Use VECTOR_STORE_LOAD to create a vector store from the files that contain the proprietary data.

To do that first copy the files into /var/lib/mysql-files folder

Example: sudo cp /home/john_doe/Olympics_2024.pdf /var/lib/mysql-files

In [4]:
df = execute_sql("""CALL sys.VECTOR_STORE_LOAD('file:///var/lib/mysql-files/2024_Summer_Olympics_Wikipedia.pdf', JSON_OBJECT("schema_name","mlcorpus","table_name","vector_store_data_1"))""")

In [0]:
	
df_status = execute_sql(f"""{df['task_status_query'][0]}""")
print(f"Progress:{json.loads(df_status.iloc[0,0])['progress']}%")

Progress:100%


Invoke the [ML_RAG](https://dev.mysql.com/doc/heatwave/en/mys-hwgenai-ml-rag.html) procedure with your query.

In [24]:
execute_sql(f"""CALL sys.ML_RAG("Where were the 2024 Summer Olympics held?", @output, NULL);""")
df = execute_sql(f"""SELECT JSON_PRETTY(@output);""")
json.loads(df.iat[0,0])["text"]

'The 2024 Summer Olympics were held in France.'

In [25]:
execute_sql(f"""CALL sys.ML_RAG("What were the opening and closing dates of the Games?", @output, NULL);""")
df = execute_sql(f"""SELECT JSON_PRETTY(@output);""")
json.loads(df.iat[0,0])["text"]

'The opening date of the 2024 Summer Olympics was July 26, 2024, and the closing date was August 11, 2024.'

In [26]:
execute_sql(f"""CALL sys.ML_RAG("Which sports made their Olympic debut in 2024?", @output, NULL);""")
df = execute_sql(f"""SELECT JSON_PRETTY(@output);""")
json.loads(df.iat[0,0])["text"]

'According to the text, one sport that made its Olympic debut in 2024 is "breaking" (also known as breakdancing).'

# HEATWAVE_CHAT operation

Thanks to the native support for vector store and Retrival Augmented Generation (RAG), you can load and query unstructured documents using natural language within the MySQL AI ecosystem.
This will be a brief demonstration of HeatWave Chat, a built-in chatbot that utilizes LLMs to understand and respond to your inputs.

Use VECTOR_STORE_LOAD to create a vector store from the files that contain the proprietary data (see previous section), and then, use the [HEATWAVE_CHAT](https://dev.mysql.com/doc/heatwave/en/mys-hwgenai-hw-chat.html) procedure to start a conversation with HeatWave.

In [27]:
df = execute_sql("""CALL sys.HEATWAVE_CHAT('What were the opening and closing dates of the Olympic Games?')""")
df["response"].iloc[0]

'The opening date of the 2024 Summer Olympics was July 26, 2024, and the closing date was August 11, 2024.'

In [0]:
df = execute_sql("""CALL sys.HEATWAVE_CHAT('Where was the location?')""")
df["response"].iloc[0]

'The location of the 2024 Summer Olympics was France.'