# Dependencies

In [1]:
# !conda install psycopg2
# !pip install sqlalchemy
# !pip install langchain
# !pip install llama-index
# !pip install llama-index-llms-huggingface
# !pip install torch
# !pip install llama-index-embeddings-langchain
# !pip install bitsandbytes
# !pip install sentence_transformers
# %pip install llama-index-readers-web
# %pip install llama-index-vector-stores-postgres

# Embedding Model

The BAAI BGE model developed by the Beijing Academy of Artificial Intelligence (BAAI) is a state-of-the-art embedding model that is designed to encode text data into high-dimensional vector representations. This model is capable of capturing complex relationships and semantic knowledge within the data, making it a powerful tool for a wide range of natural language processing (NLP) tasks. In this blog we will use this mdodel to encode the text data into high-dimensional vector representations. Then at later stage we can use this encoded data to do a similarity search for a particular query.

In [2]:
embedding_model_name = "BAAI/bge-large-en-v1.5"

In [3]:
from langchain.embeddings.huggingface import HuggingFaceBgeEmbeddings
from llama_index.embeddings.langchain import LangchainEmbedding

# Create the embedding model using the HuggingFaceBgeEmbeddings class
embed_model = LangchainEmbedding(
  HuggingFaceBgeEmbeddings(model_name=embedding_model_name)
)

# Get the embedding dimension of the model by doing a forward pass with a dummy input
embed_dim = len(embed_model.get_text_embedding("Hello world")) # 1024

# Knowledge

There are many ways of building a knowledge base. We can use the PDF/txt file (the most common ones) to create the knowledge base. We can also use the html files to create the knowledge base. We can also use the data from the web to create the knowledge base. In this blog, since we're building a chatbot for E2E Website to help users find the information they need easily, we will use the webpages directly to create the knowledge base. Llama-Index provides a simple way to create a document from a webpage. We're mainly going to use the homepage, product, about us, contact us, contact sales, t&c, privacy policy, FAQ pages to create the knowledge base. You can change this as per your requirement. Note that having a high-quality knowledge base is the key to the success of the chatbot. So a well curated PDF with just the essential information is the best way to create the knowledge base. But for the purpose of this blog, we will use the webpages directly to create the knowledge base.

In [6]:
from llama_index.readers.web import SimpleWebPageReader

documents = SimpleWebPageReader(html_to_text=True).load_data(
    [
        "https://www.e2enetworks.com/",
        "https://www.e2enetworks.com/products",
        "https://www.e2enetworks.com/about-us",
        "https://www.e2enetworks.com/contact-us",
        "https://www.e2enetworks.com/contact-sales",
        "https://www.e2enetworks.com/policies/service-level-agreement",
        "https://www.e2enetworks.com/policies/terms-of-service",
        "https://www.e2enetworks.com/policies/privacy-policy",
        "https://www.e2enetworks.com/policies/refund-policy",
        "https://www.e2enetworks.com/policy-faq",
        "https://www.e2enetworks.com/countries-served",

    ]
)

# Settings

By default, llama-index uses the OpenAI's LLM. But we do not want that, instead we want to use our own local LLM, Mixtral 8x7B. So, we need to change the settings of llama-index to use our local LLM and embeddings model. We also need to configure the settings for the creating of knowlege base. The most important settings are chunk_size and chunk_overlap. In our knowledge base, we can have a very long document, or content. Instead of indexing the whole document at once, which is not feasible, inefficient, slow, and leads to a bad performance, we divide the document into several chunks and index each chunk separately. This way, when we query the knowledge base, we can retrieve only the relevant chunks and not the whole document. This makes the retrieval process faster and more efficient. The chunk_size is the size of each chunk and the chunk_overlap is the overlap between the chunks.

In [7]:
from llama_index.core import Settings

In [8]:
Settings.llm = None
Settings.embed_model = embed_model

Settings.chunk_size = 1024
Settings.chunk_overlap = 512

LLM is explicitly disabled. Using MockLLM.


# Database

We then setup the database. We use the PGVector from postgres as the database to store the knowledge base. 

In [None]:
connection_string = "postgresql://postgres:test123@localhost:5432"
db_name = "chatbotdb"
table_name = 'companyDocEmbeddings'

In [None]:
import psycopg2

# Connect to the database
conn = psycopg2.connect(connection_string)
# Set autocommit to True to avoid having to commit after every command
conn.autocommit = True

# Create the database
# If it already exists, then delete it and create a new one
with conn.cursor() as c:
    c.execute(f"DROP DATABASE IF EXISTS {db_name}")
    c.execute(f"CREATE DATABASE {db_name}")

# Index the knowledge

Once we've created the desired table in which we want to save the embeddings of the knowlege base, we'll start the process of indexing the documents. Here we use PGVector to store the embeddings.

In [9]:
from sqlalchemy import make_url
from llama_index.vector_stores.postgres import PGVectorStore

# Creates a URL object from the connection string
url = make_url(connection_string)

# Create the vector store
vector_store = PGVectorStore.from_params(
    database=db_name,
    host=url.host,
    password=url.password,
    port=url.port,
    user=url.username,
    table_name=table_name,
    embed_dim=embed_dim,
)

In [10]:
from llama_index.core.storage.storage_context import StorageContext

# Create the storage context to be used while indexing and storing the vectors
storage_context = StorageContext.from_defaults(vector_store=vector_store)

In [11]:
from llama_index.core import VectorStoreIndex

# Create the index
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, show_progress=True
)

Parsing nodes:   0%|          | 0/11 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/117 [00:00<?, ?it/s]

In [None]:
conn.close()

Great! Now we have the knowledge base indexed and ready to be queried. We can now use the encoded data to do a similarity search for a particular query. Let's delve into the second part of the blog where we will use the encoded data to do a similarity search for a particular query.