# Introduction

In this project, we are leveraging the LlamaIndex technology to build a simple chatbot. The chatbot utilizes OpenAI's GPT-4 model to answer queries based on the indexed documents. The following steps will guide you through the process of setting up the necessary environment, constructing an index with the documents in the 'data' directory, and initiating a chat interface where you can ask questions and receive answers from the AI.

# Install the dependicies

1. llama-index: A package that facilitates the creation of an index with LlamaIndex technology.
2. langchain: A package that provides chat models to be used with LlamaIndex.
3. sentence_transformers: A package necessary for embedding sentences with transformer models.

In [None]:
%pip install llama-index
%pip install langchain
%pip install sentence_transformers
%pip install python-dotenv
%pip install pypdf

# Define the functions
The following code defines the functions we need to construct the index and query it

In [None]:
from llama_index import SimpleDirectoryReader, GPTListIndex, readers, VectorStoreIndex, LLMPredictor, PromptHelper, ServiceContext, StorageContext, load_index_from_storage
from langchain.chat_models import ChatOpenAI
import sys
import os
from IPython.display import Markdown, display
from dotenv import load_dotenv


# Load environment variables from .env file
load_dotenv()
os.environ.get("OPENAI_API_KEY")


# Define the functions
def construct_index(directory_path):
    # set maximum input size
    max_input_size = 4096
    # set number of output tokens
    num_outputs = 2000
    # set maximum chunk overlap
    max_chunk_overlap = 0.2
    # set chunk size limit
    chunk_size_limit = 600

    # define prompt helper
    prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)

    # define LLM
    llm_predictor = LLMPredictor(llm=ChatOpenAI(openai_api_key=openai.api_key, temperature=0.5, model_name="gpt-4", max_tokens=num_outputs))

    documents = SimpleDirectoryReader(directory_path).load_data()

    service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
    index = VectorStoreIndex.from_documents(documents, service_context=service_context)

    index.storage_context.persist()

    return index

def ask_ai():
    # rebuild storage context
    storage_context = StorageContext.from_defaults(persist_dir="./storage")
    # load index
    index = load_index_from_storage(storage_context)
    while True:
        query_engine = index.as_query_engine()
        query = input("Hello! How may I help today? ")
        response = query_engine.query(query)
        display(Markdown(f"Response: <b>{response.response}</b>"))

# Set OpenAI API Key

In [None]:
import openai
openai.api_key = os.getenv('OPENAI_API_KEY')

# Construct an index
Now we are ready to construct the index. This will take every file in the folder 'data', split it into chunks, and embed it with OpenAI's embeddings API.

In [None]:
construct_index("data")

# Ask questions

In [None]:
ask_ai()