<br>

## Initial Setup

---

In [1]:
# Import the libraries
import os
import pandas as pd
import importlib
import langchain
from langchain.document_loaders import *
from sentence_transformers import SentenceTransformer
from langchain.schema import Document

USER_AGENT environment variable not set, consider setting it to identify your requests.
USER_AGENT environment variable not set, consider setting it to identify your requests.
  from tqdm.autonotebook import tqdm, trange


In [2]:
# Change the path to root directory
os.chdir('..') 

# Print the current working directory
os.getcwd()  

'/home/predator/Desktop/code base/enigma-code'

In [3]:
# Import the local modules
import enigma_code
import enigma_code as ec
from enigma_code import data
from enigma_code import vectorstore

In [4]:
import importlib

# Assuming enigma_code was already imported as shown in the active selection
importlib.reload(enigma_code)

# If you have imported submodules or used aliases, you might need to reload those as well
importlib.reload(ec)
importlib.reload(data)
importlib.reload(vectorstore)

<module 'enigma_code.vectorstore' from '/home/predator/Desktop/code base/enigma-code/enigma_code/vectorstore.py'>

<br>

## Load Dataset

---

In this section, we will load the dataset from various sources.

In [5]:
# Instantiat the data handler
data_handler = ec.data.DataHandler()

In [6]:
# Load a text dataset
dataset = data_handler.load_dataset("datasets/rules_and_regulations.txt")
dataset

[Document(metadata={'source': 'datasets/rules_and_regulations.txt'}, page_content="1. Guests will be charged per night.\n2. Check-in time is 2:00 p.m. on the day of arrival and the check-out time is 12:00 p.m. on the day of departure.\n3. If upon check-in the guest does not specify the duration of his or her stay the hotel will presume it is for one night.\n4. Guest must show up before 2 p.m., and then the booking will be cancelled if there is no contact from guest.\n5. Only 2 guests are allowed to stay in a room. Children older than 7 years old or the third guest must pay 500 Thai baht for extra bed.\n6. Baby cot can be provided to a child under 2 years old upon request.\n7. Bills must be settled by cash or valid credit card, personal cheques are not accepted.\n8. Guests wishing to check out later than 2:00 p.m. are kindly requested to ask for permission from our reception before 12:00 p.m. (subject to room availability, otherwise it will be charged for one additional night.\n9. Smoki

In [7]:
# Chunk the dataset 
chunks = data_handler.chunk_dataset(documents=dataset, chunk_type="newline", chunk_size=None, chunk_overlap=None)
chunks

[Document(page_content='1. Guests will be charged per night.'),
 Document(page_content='2. Check-in time is 2:00 p.m. on the day of arrival and the check-out time is 12:00 p.m. on the day of departure.'),
 Document(page_content='3. If upon check-in the guest does not specify the duration of his or her stay the hotel will presume it is for one night.'),
 Document(page_content='4. Guest must show up before 2 p.m., and then the booking will be cancelled if there is no contact from guest.'),
 Document(page_content='5. Only 2 guests are allowed to stay in a room. Children older than 7 years old or the third guest must pay 500 Thai baht for extra bed.'),
 Document(page_content='6. Baby cot can be provided to a child under 2 years old upon request.'),
 Document(page_content='7. Bills must be settled by cash or valid credit card, personal cheques are not accepted.'),
 Document(page_content='8. Guests wishing to check out later than 2:00 p.m. are kindly requested to ask for permission from our 

In [8]:
len(dataset)

1

In [10]:
len(chunks)

34

In [11]:
# TODO: Add functions for cleaning (e.g., beutiful soup, regex, etc.)
# TODO: Add functions for preprocessing (e.g., tokenization, lemmatization, etc.)

In [12]:
# Instantiate the database query tool
db_query_tool = ec.data.DatabaseQueryTool(
    database_type="mysql",
    username=os.environ.get("MYSQL_USERNAME"),
    password=os.environ.get("MYSQL_PASSWORD"),
    host=os.environ.get("MYSQL_HOST"),
)

In [None]:
# List all available tables
db_query_tool.run_sql_query("SHOW TABLES")

In [None]:
# Load the tables
chat_history_df = pd.read_sql_query(sql="SELECT * FROM company_x.chat_history", con=db_query_tool.engine)
company_knowledge_base_df = pd.read_sql_query(sql="SELECT * FROM company_x.company_knowledge_base", con=db_query_tool.engine)
faq_df = pd.read_sql_query(sql="SELECT * FROM company_x.faq", con=db_query_tool.engine)
users_df = pd.read_sql_query(sql="SELECT * FROM company_x.users", con=db_query_tool.engine)

# Display the chat history
display("Chat History: ", chat_history_df.head())
display("Company Knowledge Base: ", company_knowledge_base_df.head())
display("FAQ: ", faq_df.head())
display("Users: ", users_df.head())

<br>

## Vectorstore

---

Transform the knowledge base dataset into embedding vectors and save them in the vector store for use in the RAG process.

In [12]:
# Initialize VectorStoreHandler
vs = ec.vectorstore.VectorStoreManager(
    vectorstore_name="pinecone",
    embedding_model="openai",
    index_name="index-name"
)

In [13]:
# Load the vectorstore
vectorstore = vs.load_vectorstore()

In [14]:
# Add documents

# Test adding documents
documents = [
    Document(page_content="This is a test document about AI.", metadata={"source": "test1"}),
    Document(page_content="Vector databases are useful for similarity search.", metadata={"source": "test2"}),
    Document(page_content="LangChain is a framework for developing applications powered by language models.", metadata={"source": "test3"})
]

# Add documents to the vectorstore
vectorstore.add_documents(documents=documents)

['c1c65162-e1bc-47e0-9b03-19a5556b431d',
 'bc78a297-ed66-480d-8b20-dc2bbbaad53f',
 '537423aa-e8c9-40e9-bee8-f63ef3736cc6']

In [23]:
# Seach for similar documents
query = "This is a test document about AI."
vectorstore.search(query=query, search_type="similarity")

[Document(metadata={'source': 'test1'}, page_content='This is a test document about AI.'),
 Document(metadata={'source': 'test1'}, page_content='This is a test document about AI.'),
 Document(metadata={'source': 'test1'}, page_content='This is a test document about AI.'),
 Document(metadata={'source': 'test1'}, page_content='This is a test document about AI.')]

<br>

## Fine-Tune Model

---

In this section, we will fine-tune the base model using the chat dataset.

<br>

## Agentic Tools

---

<br>

## Prompt Engineering

---