# GOAL:

This note book is aimed to simulate a step by step process of how I came to think about main.py

## PRE-REQUISITES:
- have a data.csv containing transactions downloaded with (this is the file you want to load to talk with the LLM chain)
- define `.env` as appropriate, refer to the variables needed in `configs.py`, also place your API tokens here and load into `configs.py`
- have the minimum RAM required to run the .gguf model (5~6 GB RAM)
  - due to hardware constraint the `.gguf` model is used for mistral-7b which suffers quality loss in exchange for less hardware resource usage
  - you can opt to extend the code to load full models from hugging face / ollama if you do not face such hardware constraints
- the python version used is `3.11.5`
- have a conda environment or equivalent created e.g. `conda create -n venv python=3.11.5`

## IMPORTS / SET-UPs

setup is done for logger and database dependency

In [60]:
! conda activate myprojects # or replace with your desired environemnt name
! pip install -r requirements.txt
! pip list

/bin/bash: line 1: conda: command not found


Package                  Version
------------------------ -----------
accelerate               1.3.0
aiohappyeyeballs         2.4.4
aiohttp                  3.11.11
aiosignal                1.3.2
annotated-types          0.7.0
anyio                    4.8.0
asttokens                3.0.0
attrs                    24.3.0
certifi                  2024.12.14
charset-normalizer       3.4.1
comm                     0.2.2
dataclasses-json         0.6.7
debugpy                  1.8.12
decorator                5.1.1
diskcache                5.6.3
executing                2.2.0
filelock                 3.17.0
frozenlist               1.5.0
fsspec                   2024.12.0
greenlet                 3.1.1
h11                      0.14.0
httpcore                 1.0.7
httpx                    0.28.1
httpx-sse                0.4.0
huggingface-hub          0.27.1
idna                     3.10
ipykernel                6.29.5
ipython                  8.31.0
jedi                     0.19.2
Jinja2      

In [61]:
import logging
import os
import time
from typing import Any, List, Optional, Tuple, Type, Union
from llama_cpp import Llama

from langchain_experimental.sql import SQLDatabaseChain, SQLDatabaseSequentialChain
from langchain_huggingface import HuggingFacePipeline
from langchain_community.utilities import SQLDatabase
from langchain.schema.cache import BaseCache
from langchain.callbacks.base import Callbacks
from langchain.sql_database import SQLDatabase
from langchain.schema import BaseOutputParser
from langchain.llms.base import LLM
from langchain.prompts import BasePromptTemplate, PromptTemplate
from pydantic import Field

from classes import GracefulSQLDatabaseChain
from mydatabase import initialize_database
from utils import BenchmarkReport, setup_logger, truncate_conversation_history
from myprompts import ALL_PROMPT_STRINGS, DEFAULT_SQLITE_PROMPT, prompt_template_generator, _sqlite_prompt1, _sqlite_prompt2, _sqlite_prompt3
import myprompts
from configs import DATABASE_PATH, DATABASE_URL, DEFAULT_CHAT_OUTPUT_FILEPATH, DEFAULT_MODEL_PATH


# Load the model
from main import load_local_model, DEFAULT_CONTEXT_WINDOW_SIZE


In [62]:
from main import test_database_context


logger = setup_logger("jupyter_notebook", "jupyter.log", level=logging.INFO)

if os.path.exists(DATABASE_PATH):
    os.remove(DATABASE_PATH)
    print(f"Existing database '{DATABASE_PATH}' has been deleted.")
# Reinitialize the database
initialize_database()


Existing database 'data.db' has been deleted.


2025-01-24 15:45:47,813 - mydatabase - INFO - Loaded data from data.csv into 'transactions' table.
2025-01-24 15:45:47,843 - mydatabase - INFO - Loaded data from clients.csv into 'clients' table.


Data from data.csv has been successfully loaded into 'transactions' table.
Data from clients.csv has been successfully loaded into 'clients' table.


## Step by step: Build > Run

In [63]:
from main import CustomLlamaLLM
from main import load_database_connection
from main import create_llm_chain

load the model from local storage and wrap it with subclass of langchain's `LLM` class

In [64]:
llama_model = load_local_model(DEFAULT_MODEL_PATH, DEFAULT_CONTEXT_WINDOW_SIZE)
# Initialize the custom Llama LLM
llm = CustomLlamaLLM(llama_model, DEFAULT_CONTEXT_WINDOW_SIZE)


llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from ./models/mistral-7b-instruct-v0.1.Q4_K_M.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = mistralai_mistral-7b-instruct-v0.1
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 l

llm_load_vocab: token to piece cache size = 0.1637 MB
llm_load_print_meta: format           = GGUF V2
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 32768
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_

After initializing the database in the setup, load the database from path (this assumes a local sqllite database, but could be extended to use other connectors)

In [65]:

sql_database = load_database_connection(DATABASE_URL)

2025-01-24 15:45:49,403 - main - INFO - Database connected successfully.


Define a prompt template appropraite for your SQL assistant

In [66]:
# Create a SQL prompt template
PROMPT_TEMPLATE = """
You are an assistant for a banking client. Answer questions based on the provided database schema and data.

{table_info}

User Query: {input}
"""

prompt = PromptTemplate(
    input_variables=["table_info", "input"],
    template=PROMPT_TEMPLATE
)


The build phase is complete after building the SQL LLM chain with all the above dependencies

In [72]:
llm_chain = create_llm_chain(
    database=sql_database,
    llm=llm,
    prompt= DEFAULT_SQLITE_PROMPT,
    database_chain_cls=SQLDatabaseChain
)

2025-01-24 15:47:07,660 - main - INFO - Banking assistant created successfully.


Now, we finally run the LLM + Database = SQL LLM Chain, this assumes a pre-defined set of questions.
Order matters! This is because conversation history is preserved in the LLM context window in sequential manner

In [None]:
# Simulate user queries to the assistant
questions = [
    "How many transactions are in the database?",
    "List all transactions by client ID C001.",
    "What is the total amount spent by client ID C002?"
]

for ques in questions:
    print(f"User Query: {ques}")
    response = llm_chain.run(ques)
    print(f"Assistant Response: {response}\n")

User Query: How many transactions are in the database?


[1m> Entering new SQLDatabaseChain chain...[0m
How many transactions are in the database?
SQLQuery:

## Running the main loop with minimal code

running the chat loop as though you are speaking to the SQL DB Chain LLM in real-time, 
<br>
**type "exit" to exit the loop**

In [None]:
# from main import main_run_loop
# main_run_loop()

## USAGE with CLI on pure python scripts

**NOTE: --simulate and --benchmark cannot be used together. The script will throw an error if both are provided**

In [1]:
# default usage:
! python main.py

To simulate a chat run against a pre-defiuned set of questions

In [None]:
! python main.py --simulate

TO RUN Benchmarking tests against a pre-defined set of questions and sets of pre-defined configurations on the LLM

In [None]:
! python main.py --benchmark

If memory is enabled, which means the LLM chain is not stateless

In [None]:
! python main.py --memory
! python main.py --simulate --memory
! python main.py --benchmark --memory

# JUSTIFICATIONS AND EXPLANATIONS
FOR CONSIDERATIONS AND JUSTIFCATIONS refer to `README.md`