# Fase cero
Bloque diseñado para generar las descripciones de las tablas que posteriormente se utilizaran en el Pre-Filtro. 

### Librerías necesarias

In [None]:
import sys
from langchain.llms import LlamaCpp
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

### Conexión a DB local

In [None]:
# Base de datos dummy para hacer las pruebas.
from langchain.sql_database import SQLDatabase

db = SQLDatabase.from_uri("sqlite://///home/llmuser/DB/laloss2.db",sample_rows_in_table_info=0)

In [None]:
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

### Carga del modelo Mixtral en LlamaCpp

In [None]:
# Modelos con llama_cpp
llm = LlamaCpp(
    model_path="dockerFolder/mixtral-8x7b-v0.1.Q5_K_M.gguf",
    temperature=0.0,
    n_gpu_layers=-1,
    max_tokens=1000,
    n_ctx=1024*10,
    top_p=1,
    callback_manager=callback_manager,
    verbose=True,  # Verbose is required to pass to the callback manager
)

## Prompt con few-shot-learning y chain-of-thought

In [None]:
description_prompt = """
Q: "Pleas give me a description about this table:"

CREATE TABLE EMPLOYEE(
  EMPLOYEE_ID     INTEGER PRIMARY KEY AUTOINCREMENT, 
  NAME   TEXT NOT NULL, 
  PHONE_NUMBRE TEXT NOT NULL,
  EMAIL TEXT NOT NULL,
  FK_EMPLOYEE_ADDRESS INTEGER REFERENCES ADDRESS(ADDRESS_ID),
  FK_EMPLOYEE_ROLE_ID INTEGER REFERENCES ROLE(ROLE_ID)
);

A:This table, named EMPLOYEE, is designed to store information about employees. 

Here's a breakdown of its structure and the purpose of each column:
- EMPLOYEE_ID: A unique identifier for each employee. It's an integer value that auto-increments every time a new employee is added, ensuring that each employee has a distinct ID.
- NAME: This column stores the name of the employee. It's a text field and is marked as NOT NULL, meaning every record must include the employee's name.
- PHONE_NUMBRE: A text field intended to store the employee's phone number. The misspelling of "number" seems like a typo. This field is also NOT NULL.
- EMAIL: This column keeps the employee's email address, allowing for contact and identification beyond just a name or ID. It's a text field and must be provided for each employee.
- FK_EMPLOYEE_ADDRESS: A foreign key linking to the ADDRESS table via the ADDRESS_ID column. This establishes a relationship between each employee and their address, indicating where the employee lives. It's an integer reflecting the ADDRESS_ID from the ADDRESS table.
- FK_EMPLOYEE_ROLE_ID: Another foreign key, this time linking to the ROLE table by the ROLE_ID column. This connection defines the role or position the employee holds within the organization. It's an integer that matches a ROLE_ID in the ROLE table, detailing the employee's specific role or job function.

Q: "Pleas give me a description about this table:"

CREATE TABLE ROLE(
  ROLE_ID     INTEGER PRIMARY KEY AUTOINCREMENT, 
  DESCRIPTION	TEXT NOT NULL, 
  PRIVILEGES	TEXT NOT NULL 
);

A: The ROLE table is structured to define various roles within an organization, including a description and the privileges associated with each role.

Here's a breakdown of its structure and the purpose of each column:
- ROLE_ID: This is the primary key for the table, uniquely identifying each role. It's an integer that auto-increments whenever a new role is added to the table, ensuring that each role has a unique identifier.
- DESCRIPTION: A text field that provides a detailed description of the role. This could include information about the role's responsibilities, its place within the organizational hierarchy, or any other relevant details. This column is marked as NOT NULL, meaning that a description must be provided for each role.
- PRIVILEGES: Another text field that outlines the specific privileges or permissions associated with the role. This could detail access levels, tasks that the role is authorized to perform, or other rights granted to the role. Like the DESCRIPTION column, this field is also NOT NULL, requiring that privileges are explicitly defined for each role.

Q: "Pleas give me a description about this table:"

CREATE TABLE ADDRESS(
  ADDRESS_ID     INTEGER PRIMARY KEY AUTOINCREMENT, 
  STREET	TEXT NOT NULL, 
  CITY	TEXT NOT NULL,
  STATE	TEXT NOT NULL,
  POSTAL_CODE	TEXT NOT NULL,
  COUNTRY	TEXT NOT NULL
);

A: The ADDRESS table is designed to store detailed information about various addresses. This table is essential for maintaining accurate location data for individuals, businesses, or other entities.

Here's a breakdown of its structure and the purpose of each column:
- ADDRESS_ID: This is the primary key of the table, which uniquely identifies each address entry. It is an integer value that automatically increments with each new entry, ensuring that each address has a unique identifier.
- STREET: A text field that holds the street name and number of the address. This column is marked as NOT NULL, meaning an entry must include this information.
- CITY: This text field stores the name of the city for the address. It is also marked as NOT NULL, requiring that every address entry specifies a city.
- STATE: A text field for the state, province, or regional equivalent in which the address is located. Like the previous fields, this is a NOT NULL column, indicating that the state must be included in every address entry.
- POSTAL_CODE: This text field contains the postal or ZIP code for the address. It is essential for mail delivery and other location-based services. This column is also NOT NULL.
- COUNTRY: A text field specifying the country of the address. This is critical for international addresses and is also a NOT NULL field, requiring an entry for every address.

"""

In [None]:
# Se recorren todas las tablas de la DB y se genera las descripción de cada una, la información se almacena en dos archivos, uno para las tablas y otro para las columnas
tables_list = open("table_list.txt","w")
columns_list = open("columns_list.txt","w")
tables = db.get_usable_table_names()
tables_description =""
for table in tables:
    table_schema = db.get_table_info([table])
    prompt = description_prompt + 'Q: "Pleas give me a description about this table:" "' + table_schema + """"\nA: The PROFESSION table """
    tables_description = llm(prompt)
    aux = (tables_description.split("\n"))
    tables = aux[0]
    columns = "\n".join(aux[2:])
    tables_list.write(tables)
    columns_list.write(columns)

In [None]:
tables_list.close()
columns_list.close()