# Custom Knowledge Chatbot w/ LlamaIndex
By Liam Ottley - YouTube: https://www.youtube.com/@LiamOttley

Examples:
- https://gita.kishans.in/
- https://www.chatpdf.com/

In [None]:
!pip install llama_index
!pip install langchain

# Basic LlamaIndex Usage Pattern

In [2]:
import os

from dotenv import load_dotenv
load_dotenv()

os.environ['OPENAI_API_KEY'] = os.getenv('API_KEY')

In [3]:
# Load you data into 'Documents' a custom type by LlamaIndex

from llama_index import SimpleDirectoryReader

documents = SimpleDirectoryReader('./data').load_data()

In [4]:
# Create an index of your documents

from llama_index import GPTSimpleVectorIndex

index = GPTSimpleVectorIndex(documents)

INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] Total embedding token usage: 1321 tokens


In [5]:
# Query your index!

response = index.query("What do you think of Facebook's LLaMa?")
print(response)

INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 1448 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 11 tokens



I think Facebook's LLaMa is a great step forward in democratizing access to large language models and advancing research in this subfield of AI. It is encouraging to see that they are making the model available at several sizes and providing a model card to detail how it was built in accordance with responsible AI practices. I am also glad to see that they are releasing the model under a noncommercial license to ensure integrity and prevent misuse.


# Customize your LLM for different output

In [26]:
# import necessary libraries
from llama_index import LLMPredictor, GPTSimpleVectorIndex, PromptHelper
import openai


# define LLM
class OpenAILanguageModel(openai.Completion):
    def generate_prompt(self, prompt: str, max_tokens: int, **kwargs) -> str:
        response = self.create_prompt(prompt=prompt, max_tokens=max_tokens, **kwargs)
        return response.choices[0].text

    def agenerate_prompt(self, prompt: str, max_tokens: int, **kwargs) -> str:
        response = self.create_prompt(prompt=prompt, max_tokens=max_tokens, **kwargs)
        return [choice.text for choice in response.choices]

llm_predictor = LLMPredictor(llm=OpenAILanguageModel(temperature=0, model_name="gpt-3.5-turbo"))

# define prompt helper
# set maximum input size
max_input_size = 4096
# set number of output tokens
num_output = 256
# set maximum chunk overlap
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

custom_LLM_index = GPTSimpleVectorIndex(
    documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper)


INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] Total embedding token usage: 1321 tokens


In [27]:
# Query your index!

response = custom_LLM_index.query("What do you think of Facebook's LLaMa?")
print(response)

INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 1448 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 11 tokens



I think Facebook's LLaMa is a great step forward in democratizing access to large language models and advancing research in this subfield of AI. It is encouraging to see that they are making the model available at several sizes and providing a model card to detail how it was built in accordance with responsible AI practices. I am also glad to see that they are releasing the model under a noncommercial license to ensure integrity and prevent misuse.


# Wikipedia Example

In [13]:
from llama_index import download_loader

WikipediaReader = download_loader("WikipediaReader")

loader = WikipediaReader()
wikidocs = loader.load_data(pages=['Cyclone Freddy'])

# https://en.wikipedia.org/wiki/Cyclone_Freddy

In [14]:
wiki_index = GPTSimpleVectorIndex(wikidocs)

INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] Total embedding token usage: 7896 tokens


In [15]:
response = wiki_index.query("What is cyclone freddy?")
print(response)

INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 3843 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 8 tokens




Cyclone Freddy is a very intense tropical cyclone that traversed the southern Indian Ocean for more than five weeks in February and March 2023. It is both the longest-lasting and highest-ACE-producing tropical cyclone ever recorded worldwide. Additionally, it is the third-deadliest tropical cyclone recorded in the Southern Hemisphere, only behind 2019's Cyclone Idai and the 1973 Flores cyclone. It caused catastrophic flooding, wind damage, and loss of life in Madagascar, Mauritius, Réunion, Mozambique, and Malawi. In Mauritius, the cyclone passed within 200 km (120 mi) of the island, just north of Grand Bay. Strong winds and waves were observed along the northern coast of Mauritius. Winds in Port Louis reached 104 km/h (65 mph) while a peak gust of 154 km/h (96 mph) was observed on Signal Mountain. Flooding and gale-force winds also affected the country, resulting in one fatality and at least 500 displaced families in a variety of shelters across Mauritius. Contact was also lost with

# Customer Support Example

In [16]:
documents = SimpleDirectoryReader('./asos').load_data()

In [17]:
index = GPTSimpleVectorIndex(documents)

INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] Total embedding token usage: 12591 tokens


In [18]:
response = index.query("What premier service options do I have in the UAE?")
print(response)

INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 1328 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 11 tokens



In the United Arab Emirates, you have the option of signing up for ASOS Premier, which gives you free Standard and Express delivery all year round when you spend over 150 AED. It costs 200 AED and is valid on the order you purchase it on. You will receive an email once it is active.


# YouTube Video Example

In [19]:
YoutubeTranscriptReader = download_loader("YoutubeTranscriptReader")

loader = YoutubeTranscriptReader()
documents = loader.load_data(ytlinks=['https://www.youtube.com/watch?v=K7Kh9Ntd8VE&ab_channel=DaveNick'])

In [20]:
index = GPTSimpleVectorIndex(documents)

INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] Total embedding token usage: 18181 tokens


In [21]:
response = index.query("What some YouTube automation mistakes to avoid?")
print(response)

INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 4019 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 8 tokens




1. Re-uploading other people's content without permission.
2. Using copyrighted music.
3. Not understanding how the YouTube algorithm works.
4. Not researching the best niche for YouTube automation.
5. Not optimizing the About section with relevant keywords.
6. Not creating a logo and channel art.


# Chatbot Class - Just include your index

In [22]:
import openai
import json

class Chatbot:
    def __init__(self, api_key, index):
        self.index = index
        openai.api_key = api_key
        self.chat_history = []

    def generate_response(self, user_input):
        prompt = "\n".join([f"{message['role']}: {message['content']}" for message in self.chat_history[-5:]])
        prompt += f"\nUser: {user_input}"
        response = index.query(user_input)

        message = {"role": "assistant", "content": response.response}
        self.chat_history.append({"role": "user", "content": user_input})
        self.chat_history.append(message)
        return message
    
    def load_chat_history(self, filename):
        try:
            with open(filename, 'r') as f:
                self.chat_history = json.load(f)
        except FileNotFoundError:
            pass

    def save_chat_history(self, filename):
        with open(filename, 'w') as f:
            json.dump(self.chat_history, f)


In [23]:
documents = SimpleDirectoryReader('./data').load_data()
index = GPTSimpleVectorIndex(documents)

INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] Total embedding token usage: 1321 tokens


In [None]:
# Swap out your index below for whatever knowledge base you want
bot = Chatbot("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", index=index)
bot.load_chat_history("chat_history.json")

while True:
    user_input = input("You: ")
    if user_input.lower() in ["bye", "goodbye"]:
        print("Bot: Goodbye!")
        bot.save_chat_history("chat_history.json")
        break
    response = bot.generate_response(user_input)
    print(f"Bot: {response['content']}")