# Task
Create a an LLM pipeline that will transform any free text query into a SQL query, the key points of this task are:

* Create a valid representation of SQL tables allowing for semantic search that will match the top results to the given free text query

* Based on the table representation the LLM has to create a real SQL query, based on free text user query, that will allow for immediate usage

* LLM has to support creation of queries with different levels of complexity not only the simplest ones

* LLM has to support creating queries to fetch data from different database schemas

* When an error in LLM created sql query is encountered it should attempt to self correct

SQL tables are represented in database.sql file

LLM used for this task: https://huggingface.co/TheBloke/sqlcoder2-GGUF

Templates, prompts and functions are defined in /utils/utils.py

In [1]:
from langchain.llms import LlamaCpp
from utils import text_to_sql_pipe, save_queries_to_csv

In [2]:
llm = LlamaCpp(model_path="sqlcoder2-GGUF\sqlcoder2.Q8_0.gguf",
               n_batch=512,
               n_ctx=2048,
               n_gpu_layers=30,
               verbose=False)

### Demonstration on single example

In [3]:
query = "List the first names, last names, and department names of employees who were hired after January 1, 2022, and include their department names."
generated_query = text_to_sql_pipe(llm, query)

In [4]:
print(generated_query)

SELECT e.first_name, e.last_name, d.department_name FROM public.employees e JOIN public.departments d ON e.department_id = d.department_id WHERE e.hire_date > '2022-01-01';


### Test on multiple queries
Results saved to "sql_queries_results.csv"

In [5]:
queries = [
    "Show me the names of employees and their departments.",
    "What is the highest salary?",
    "List all orders that were made in the last 30 days.",
    "What are the total sales grouped by region?",
    "List employees whose salary is greater than $50,000.",
    "Get the names of customers and their lifetime value.",
    "Show products where the sales quantity is above 100.",
    "Which departments are managed by the employee with ID 5?",
    "List all sales representatives working in the 'West' region.",
    "What is the most recent sales report date?",
    "Find the average salary of employees in each department.",
    "List customers who have placed more than 5 orders.",
    "Find the total revenue generated by each product.",
    "Show the department with the highest number of employees.",
    "Get the total number of sales representatives in each region.",
    "Find the average lifetime value of customers grouped by region.",
    "List the top 3 products with the highest revenue.",
    "Find employees who have not received a salary in the last year.",
    "Show the total number of orders per customer.",
    "List all departments with more than one manager."
]

In [6]:
csv_filename = "sql_queries_results.csv"
save_queries_to_csv(llm, queries, csv_filename)