Create a an LLM pipeline that will transform any free text query into a SQL query, the key points of this task are:

* Create a valid representation of SQL tables allowing for semantic search that will match the top results to the given free text query

* Based on the table representation the LLM has to create a real SQL query, based on free text user query, that will allow for immediate usage

* LLM has to support creation of queries with different levels of complexity not only the simplest ones

* LLM has to support creating queries to fetch data from different database schemas

* When an error in LLM created sql query is encountered it should attempt to self correct

In [1]:
from langchain.llms import LlamaCpp
from langchain.prompts import PromptTemplate
from utils import text_to_sql_pipe, save_queries_to_csv

In [2]:
llm = LlamaCpp(model_path="sqlcoder2-GGUF\sqlcoder2.Q8_0.gguf",
               n_batch=512,
               n_ctx=2048,
               n_gpu_layers=30,
               verbose=True)

llama_model_loader: loaded meta data with 19 key-value pairs and 485 tensors from sqlcoder2-GGUF\sqlcoder2.Q8_0.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = starcoder
llama_model_loader: - kv   1:                               general.name str              = StarCoder
llama_model_loader: - kv   2:                   starcoder.context_length u32              = 8192
llama_model_loader: - kv   3:                 starcoder.embedding_length u32              = 6144
llama_model_loader: - kv   4:              starcoder.feed_forward_length u32              = 24576
llama_model_loader: - kv   5:                      starcoder.block_count u32              = 40
llama_model_loader: - kv   6:             starcoder.attention.head_count u32              = 48
llama_model_loader: - kv   7:          starcoder.attention.head_count_kv u32     

In [None]:
queries = [
    "Show me the names of employees and their departments.",
    "What is the highest salary?",
    "List all orders that were made in the last 30 days.",
    "What are the total sales grouped by region?",
    "List employees whose salary is greater than $50,000.",
    "Get the names of customers and their lifetime value.",
    "Show products where the sales quantity is above 100.",
    "Which departments are managed by the employee with ID 5?",
    "List all sales representatives working in the 'West' region.",
    "What is the most recent sales report date?",
    "Find the average salary of employees in each department.",
    "List customers who have placed more than 5 orders.",
    "Find the total revenue generated by each product.",
    "Show the department with the highest number of employees.",
    "Get the total number of sales representatives in each region.",
    "Find the average lifetime value of customers grouped by region.",
    "List the top 3 products with the highest revenue.",
    "Find employees who have not received a salary in the last year.",
    "Show the total number of orders per customer.",
    "List all departments with more than one manager."
]

In [4]:
csv_filename = "sql_queries_results.csv"
save_queries_to_csv(llm, queries, csv_filename)


llama_print_timings:        load time =   21620.70 ms
llama_print_timings:      sample time =       6.77 ms /    45 runs   (    0.15 ms per token,  6645.01 tokens per second)
llama_print_timings: prompt eval time =   31300.14 ms /   681 tokens (   45.96 ms per token,    21.76 tokens per second)
llama_print_timings:        eval time =   15094.61 ms /    44 runs   (  343.06 ms per token,     2.91 tokens per second)
llama_print_timings:       total time =   46461.16 ms /   725 tokens


Generated SQL Query:
SELECT e.first_name, e.last_name, d.department_name FROM sales.departments AS d JOIN sales.employees AS e ON d.manager_id = e.employee_id;

Original SQL Query:
SELECT e.first_name, e.last_name, d.department_name FROM sales.departments AS d JOIN sales.employees AS e ON d.manager_id = e.employee_id;


Fixed SQL Query:
SELECT
    e.first_name,
    e.last_name,
    d.department_name
FROM sales.departments AS d
INNER JOIN sales.employees AS e ON d.manager_id = e.employee_id;



Final SQL Query After Verification and Correction:
SELECT e.first_name, e.last_name, d.department_name FROM sales.departments AS d JOIN sales.employees AS e ON d.manager_id = e.employee_id;

Saved query: Show me the names of employees and their departments.
Generated SQL:
SELECT e.first_name, e.last_name, d.department_name FROM sales.departments AS d JOIN sales.employees AS e ON d.manager_id = e.employee_id;

