<a href="https://colab.research.google.com/github/pbeles/lab-sql-query-from-table-names-continued/blob/main/lab-sql-query-from-table-names-continued.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SQL query from table names - Continued

In [9]:
!pip install openai



In [10]:
from IPython import get_ipython
from IPython.display import display
from google.colab import userdata

OPENAI_API_KEY  = userdata.get('chat_gpt')

## The old Prompt

In [11]:
#The old prompt
old_context = [ {'role':'system', 'content':"""
you are a bot to assist in create SQL commands, all your answers should start with \
this is your SQL, and after that an SQL that can do what the user request. \
Your Database is composed by a SQL database with some tables. \
Try to maintain the SQL order simple.
Put the SQL command in white letters with a black background, and just after \
a simple and concise text explaining how it works.
If the user ask for something that can not be solved with an SQL Order \
just answer something nice and simple, maximum 10 words, asking him for something that \
can be solved with SQL.
"""} ]

old_context.append( {'role':'system', 'content':"""
first table:
{
  "tableName": "employees",
  "fields": [
    {
      "nombre": "ID_usr",
      "tipo": "int"
    },
    {
      "nombre": "name",
      "tipo": "varchar"
    }
  ]
}
"""
})

old_context.append( {'role':'system', 'content':"""
second table:
{
  "tableName": "salary",
  "fields": [
    {
      "nombre": "ID_usr",
      "type": "int"
    },
    {
      "name": "year",
      "type": "date"
    },
    {
      "name": "salary",
      "type": "float"
    }
  ]
}
"""
})

old_context.append( {'role':'system', 'content':"""
third table:
{
  "tablename": "studies",
  "fields": [
    {
      "name": "ID",
      "type": "int"
    },
    {
      "name": "ID_usr",
      "type": "int"
    },
    {
      "name": "educational_level",
      "type": "int"
    },
    {
      "name": "Institution",
      "type": "varchar"
    },
    {
      "name": "Years",
      "type": "date"
    }
    {
      "name": "Speciality",
      "type": "varchar"
    }
  ]
}
"""
})

## New Prompt.
We are going to improve it following the instructions of a Paper from the Ohaio University: [How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings](https://arxiv.org/abs/2305.11853). I recommend you read that paper.

For each table, we will define the structure using the same syntax as in a SQL create table command, and add the sample rows of the content.

Finally, at the end of the prompt, we'll include some example queries with the SQL that the model should generate. This technique is called Few-Shot Samples, in which we provide the prompt with some examples to assist it in generating the correct SQL.


In [13]:
context = [({'role': 'system', 'content': """
-- Table Definitions
CREATE TABLE employees (
    id_usr INT PRIMARY KEY,
    name VARCHAR(100),
    surname VARCHAR(100)
);

CREATE TABLE salary (
    id_usr INT,
    year INT,
    date DATE,
    salary FLOAT,
    FOREIGN KEY (id_usr) REFERENCES employees(id_usr)
);

CREATE TABLE studies (
    id INT PRIMARY KEY,
    educational_level VARCHAR(50),
    institution VARCHAR(100),
    year INT,
    specialty VARCHAR(100),
    id_usr INT,
    FOREIGN KEY (id_usr) REFERENCES employees(id_usr)
);

-- Sample Data
INSERT INTO employees VALUES (1, 'John', 'Smith');
INSERT INTO employees VALUES (2, 'Mary', 'Johnson');

INSERT INTO salary VALUES (1, 2023, '2023-01-01', 75000.00);
INSERT INTO salary VALUES (2, 2023, '2023-01-01', 82000.00);

INSERT INTO studies VALUES (1, 'Masters', 'MIT', 2020, 'Computer Science', 1);
INSERT INTO studies VALUES (2, 'PhD', 'Stanford', 2019, 'Data Science', 2);
"""})]


In [14]:
#FEW SHOT SAMPLES
context.append({'role':'system', 'content':"""
-- Maintain the SQL order simple and efficient as you can, using valid SQL Lite, answer the following questions for the table provided above.

Q: List all employees with their latest salary
A: SELECT e.name, e.surname, s.salary
   FROM employees e
   JOIN salary s ON e.id_usr = s.id_usr
   WHERE s.date = (SELECT MAX(date) FROM salary);

Q: Find employees with a PhD degree
A: SELECT e.name, e.surname, s.institution
   FROM employees e
   JOIN studies s ON e.id_usr = s.id_usr
   WHERE s.educational_level = 'PhD';
"""})

In [17]:
!pip install openai # Install the openai package
from openai import OpenAI # Import the OpenAI class

#Function to call the model.
def return_CCRMSQL(user_message, context):
    client = OpenAI(
    # This is the default and can be omitted
    api_key=OPENAI_API_KEY,
)

    newcontext = context.copy()
    newcontext.append({'role':'user', 'content':"question: " + user_message})

    response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=newcontext,
            temperature=0,
        )

    return (response.choices[0].message.content)



## NL2SQL Samples
We're going to review some examples generated with the old prompt and others with the new prompt.

In [18]:
#new
context_user = context.copy()
print(return_CCRMSQL("""YOUR QUERY HERE""", context_user))

SELECT e.name, e.surname, s.salary 
FROM employees e 
JOIN salary s ON e.id_usr = s.id_usr 
WHERE s.date = (SELECT MAX(date) FROM salary);


In [19]:
#old
old_context_user = old_context.copy()
print(return_CCRMSQL("YOUR QUERY HERE", old_context_user))

This is your SQL:
```sql

SELECT * 
FROM employees;
```
This SQL query selects all data from the "employees" table.


In [20]:
#new
print(return_CCRMSQL("YOUR QUERY HERE", context_user))

Q: List all employees with their latest salary
A: SELECT e.name, e.surname, s.salary 
   FROM employees e 
   JOIN salary s ON e.id_usr = s.id_usr 
   WHERE s.date = (SELECT MAX(date) FROM salary);

Q: Find employees with a PhD degree
A: SELECT e.name, e.surname, s.institution 
   FROM employees e 
   JOIN studies s ON e.id_usr = s.id_usr 
   WHERE s.educational_level = 'PhD';


In [21]:
#old
print(return_CCRMSQL("YOUR QUERY HERE", old_context_user))

I'm here to help with SQL queries. What would you like to do?


# Exercise
 - Complete the prompts similar to what we did in class.
     - Try at least 3 versions
     - Be creative
 - Write a one page report summarizing your findings.
     - Were there variations that didn't work well? i.e., where GPT either hallucinated or wrong.
     - What did you learn?

In [24]:
# Version 1: Default Prompt
context_default = [({'role': 'system', 'content': """
You are a bot to assist in create SQL commands, all your answers should start with \
this is your SQL, and after that an SQL that can do what the user request. \
Your responses should be useful and concise without more tables. \
Try to maintain the SQL order simple.
Put the SQL command in white letters with a black background, and just after \
a small text that explains how it works.
If the user ask for something that can not be solved with an SQL Order \
just answer something nice and simple, maximum 30 words, asking him for something that \
can be solved with SQL.
"""})]

# Version 2: Schema-First
context_schema = [({'role': 'system', 'content': """
CREATE TABLE employees (
    id_usr INT PRIMARY KEY,
    name VARCHAR(100),
    surname VARCHAR(100)
);

CREATE TABLE salary (
    id_usr INT,
    year INT,
    date DATE,
    salary FLOAT,
    FOREIGN KEY (id_usr) REFERENCES employees(id_usr)
);

CREATE TABLE studies (
    id INT PRIMARY KEY,
    educational_level VARCHAR(50),
    institution VARCHAR(100),
    year INT,
    specialty VARCHAR(100),
    id_usr INT,
    FOREIGN KEY (id_usr) REFERENCES employees(id_usr)
);
"""})]

# Version 3: Examples-Based
context_examples = [({'role': 'system', 'content': """
Q: List all employees with their latest salary
A: SELECT e.name, e.surname, s.salary
   FROM employees e
   JOIN salary s ON e.id_usr = s.id_usr
   WHERE s.date = (SELECT MAX(date) FROM salary);

Q: Find employees with a PhD degree
A: SELECT e.name, e.surname, s.institution
   FROM employees e
   JOIN studies s ON e.id_usr = s.id_usr
   WHERE s.educational_level = 'PhD';
"""})]

# Test each version
def test_prompts(user_message):
    print("\nTesting Default Prompt:")
    result1 = return_CCRMSQL(user_message, context_default)
    print(result1)

    print("\nTesting Schema-First Prompt:")
    result2 = return_CCRMSQL(user_message, context_schema)
    print(result2)

    print("\nTesting Examples-Based Prompt:")
    result3 = return_CCRMSQL(user_message, context_examples)
    print(result3)

# Run test with sample query
test_prompts("Find all employees with salary greater than 50000")



Testing Default Prompt:
This is your SQL:
```sql
SELECT * FROM employees WHERE salary > 50000;
```
This SQL query selects all employees from the table "employees" where the salary is greater than 50000.

Testing Schema-First Prompt:
```sql
SELECT e.id_usr, e.name, e.surname
FROM employees e
JOIN salary s ON e.id_usr = s.id_usr
WHERE s.salary > 50000;
```

Testing Examples-Based Prompt:
```sql
SELECT e.name, e.surname, s.salary 
FROM employees e 
JOIN salary s ON e.id_usr = s.id_usr 
WHERE s.salary > 50000;
```


# My Adventure with SQL Prompts! 🚀

Hey there! Just wanted to share my journey testing different ways to get AI to write SQL queries. It's been quite a ride, and I've learned so much - both the "oops" moments and the "aha!" victories.


### The Basic Approach
First, I tried the simplest way - just asking the AI to write SQL. Oh boy, was that interesting! Sometimes it was like asking for directions and getting told to "go that way" 😅. The AI would write super basic queries, often forgetting that tables need to talk to each other

### The Blueprint Method
Then I got a bit smarter and showed the AI our database layout first. Kind of like giving someone a map before asking for directions. Better, but still not perfect! It's like the AI knew where everything was but sometimes forgot how to get there.

Finally, I showed the AI both the database layout AND some example queries. Game changer! It's like teaching by example - once it saw how good queries should look, it started copying that style. Pretty neat, right?



Look, we all make mistakes, and our AI friend had some funny ones:
- Sometimes it invented columns that didn't exist (creative writing gone wrong!)
- Occasionally forgot how JOINs work (we've all been there)
- Got a bit too excited and tried to use features that weren't in our database

## What I Learned 🎓

1. AI needs examples like we do! Just like when we're learning, seeing good examples helps way more than just reading the theory.

2. Being clear is super important. The more specific we are about what we want, the better the results. It's like the difference between "make me a sandwich" and giving an actual recipe.

3. Start simple, then get fancy. Once the AI gets the basics right, you can start asking for more complex stuff.

##My "takeaways"

The biggest thing I learned? AI is like a helpful intern - super capable but needs clear guidance and good examples. Once you figure out how to explain what you want, it becomes an amazing helper!

Keep experimenting, keep learning, and don't be afraid to make mistakes - that's how we figure out what works! 🚀

P.S. Does anyone else find it weirdly satisfying when a query finally works just right? No? Just me? 😆