# SQL query from table names - Continued

In [21]:
from openai import OpenAI
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())

OPENAI_API_KEY  = os.getenv('OPENAI_API_KEY')

## The old Prompt

In [22]:
#The old prompt
old_context = [ {'role':'system', 'content':"""
you are a bot to assist in create SQL commands, all your answers should start with \
this is your SQL, and after that an SQL that can do what the user request. \
Your Database is composed by a SQL database with some tables. \
Try to maintain the SQL order simple.
Put the SQL command in white letters with a black background, and just after \
a simple and concise text explaining how it works.
If the user ask for something that can not be solved with an SQL Order \
just answer something nice and simple, maximum 10 words, asking him for something that \
can be solved with SQL.
"""} ]

old_context.append( {'role':'system', 'content':"""
first table:
{
  "tableName": "employees",
  "fields": [
    {
      "nombre": "ID_usr",
      "tipo": "int"
    },
    {
      "nombre": "name",
      "tipo": "varchar"
    }
  ]
}
"""
})

old_context.append( {'role':'system', 'content':"""
second table:
{
  "tableName": "salary",
  "fields": [
    {
      "nombre": "ID_usr",
      "type": "int"
    },
    {
      "name": "year",
      "type": "date"
    },
    {
      "name": "salary",
      "type": "float"
    }
  ]
}
"""
})

old_context.append( {'role':'system', 'content':"""
third table:
{
  "tablename": "studies",
  "fields": [
    {
      "name": "ID",
      "type": "int"
    },
    {
      "name": "ID_usr",
      "type": "int"
    },
    {
      "name": "educational_level",
      "type": "int"
    },
    {
      "name": "Institution",
      "type": "varchar"
    },
    {
      "name": "Years",
      "type": "date"
    }
    {
      "name": "Speciality",
      "type": "varchar"
    }
  ]
}
"""
})

## New Prompt.
We are going to improve it following the instructions of a Paper from the Ohaio University: [How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings](https://arxiv.org/abs/2305.11853). I recommend you read that paper.

For each table, we will define the structure using the same syntax as in a SQL create table command, and add the sample rows of the content.

Finally, at the end of the prompt, we'll include some example queries with the SQL that the model should generate. This technique is called Few-Shot Samples, in which we provide the prompt with some examples to assist it in generating the correct SQL.


In [23]:
context = [ {'role':'system', 'content':"""
    CREATE TABLE pokemon(
    pokemon_id      int not null,
    name            varchar,
    evolved state   varchar         
    )
    
    CREATE TABLE types(
    type_id         int not null,
    name            varchar,
    pokemon_id      int not null
    )
    
    CREATE TABLE attacks(
    attack_id           int not null,
    name                varchar,
    attack_strength     int not null,
    level               int not null,
    pokemon_id          int not null
    )
             
    Sample rows pokemon:
    125 Charmander  Charizard
    268 Cubone      Marowak
    ...
             
    Sample rows types:
    1   fire    125
    2   earth   268
    ...
             
    Sample rows attacks:
    1   fire wheel      10  5   689
    2   elektroshock    15  20  189
    ...
             
"""} ]



In [29]:
#FEW SHOT SAMPLES
context.append( {'role':'system', 'content':"""
 -- Maintain the SQL order simple and efficient as you can, using valid SQL Lite, answer the following questions for the table provided above.

Query: What is the correct command to determine the attack with the highest attack strength?
Answer: 
SELECT * 
FROM attacks 
ORDER BY attack_strength DESC 
LIMIT 1;

Query: What is the correct command to determine the pokemon that can use this attack?
Answer:
SELECT p.name AS pokemon_name
FROM pokemon p
JOIN attacks a ON p.pokemon_id = a.pokemon_id
WHERE a.attack_id = (SELECT attack_id FROM attacks ORDER BY attack_strength DESC LIMIT 1);
                 
Query: What is the correct command to determine what type of attack it is? 
Answer:                
SELECT t.name AS attack_type
FROM types t
JOIN attacks a ON t.pokemon_id = a.pokemon_id
WHERE a.attack_id = (SELECT attack_id FROM attacks ORDER BY attack_strength DESC LIMIT 1)
                 
"""
})

In [30]:
#Function to call the model.
def return_CCRMSQL(user_message, context):
    client = OpenAI(
    # This is the default and can be omitted
    api_key=OPENAI_API_KEY,
)

    newcontext = context.copy()
    newcontext.append({'role':'user', 'content':"question: " + user_message})

    response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=newcontext,
            temperature=0,
        )

    return (response.choices[0].message.content)

## NL2SQL Samples
We're going to review some examples generated with the old prompt and others with the new prompt.

In [31]:
#new
context_user = context.copy()
print(return_CCRMSQL("""Which pokemon has the id of 200?""", context_user))

To find the pokemon with the id of 200, you can use the following SQL query:

```sql
SELECT name
FROM pokemon
WHERE pokemon_id = 200;
```

This query will return the name of the pokemon with the id of 200 from the `pokemon` table.


In [32]:
#old
old_context_user = old_context.copy()
print(return_CCRMSQL("Which employee has the highest salary?", old_context_user))

This is your SQL:
```sql
SELECT e.name
FROM employees e
JOIN salary s ON e.ID_usr = s.ID_usr
ORDER BY s.salary DESC
LIMIT 1;
```

This SQL query selects the name of the employee with the highest salary by joining the "employees" table with the "salary" table on the employee ID. It then orders the result by salary in descending order and limits the output to only one row, which represents the employee with the highest salary.


In [33]:
#new
print(return_CCRMSQL("Which pokemon has the highest number of different attacks?", context_user))

Query:
```sql
SELECT p.name AS pokemon_name, COUNT(a.attack_id) AS num_attacks
FROM pokemon p
JOIN attacks a ON p.pokemon_id = a.pokemon_id
GROUP BY p.name
ORDER BY num_attacks DESC
LIMIT 1;
```

This query will return the pokemon that has the highest number of different attacks along with the count of attacks it has.


In [34]:
#old
print(return_CCRMSQL("Which study institutions did most of the employees attend?", old_context_user))

This is your SQL:
```sql
SELECT Institution, COUNT(ID_usr) AS num_employees
FROM studies
GROUP BY Institution
ORDER BY num_employees DESC;
```

This SQL query selects the institution from the `studies` table and counts the number of employees who attended each institution. It then groups the results by institution, orders them by the number of employees in descending order, showing the institutions that most employees attended at the top.


# Exercise
 - Complete the prompts similar to what we did in class. 
     - Try at least 3 versions
     - Be creative
 - Write a one page report summarizing your findings.
     - Were there variations that didn't work well? i.e., where GPT either hallucinated or wrong.
     - What did you learn?

- The model's answers seemed correct in both the old and the new verisons
- This was the case even though the old version was much more limited in the information that was given
- The new version I created was based on pokemon types and their attacks - the model was not thrown off by this change in context
- The model even explained its answer which helped follow its reasoning and confirmed that it understood the query
- I decided not to change the temperature or other parameters in this exercise, as we already experimented with this in previous labs