<a href="https://colab.research.google.com/github/royam0820/ChatGPT-Google/blob/main/update_amr_Custom_Knowledge_Chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Custom Knowledge Chatbot w/ LlamaIndex
This notebook has been done via input from By Liam Ottley

Github: https://github.com/wombyz/custom-knowledge-chatbot/tree/main/custom-knowledge-chatbot

This notebook has been revised on 05/2024 to take into account LangChain new revised package.

- As per llama_index 0.8.40, `GPTSimpleVectorIndex` is deprecated and replaced by `GPTVectorStoreIndex`, and
- To do a query, you need to change from `index.query` to


```
query_engine = index.as_query_engine()
response = query_engine.query("My query")
```



Examples:
- https://gita.kishans.in/
- https://www.chatpdf.com/
- https://www.chatbase.co/create-new-chatbot


In [169]:
OPENAI_API_KEY = "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"

In [170]:
# Getting the OpenAI api key
import os
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

https://platform.openai.com/account/api-keys

In [171]:
# connecting a Google drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [172]:
# !git clone https://github.com/wombyz/custom-knowledge-chatbot.git

In [173]:
!pip install llama_index -q # accessing vectorized data
!pip install langchain -q # accessing OpenAI api
from IPython.display import Markdown, display # for print formatting

- **LlamaIndex** (GPT-Index) is a project that provides a central interface to connect your LLM's with external data. It will build a vector search on your data.  It offers data connectors to your existing data sources and data formats (API's, PDF's, docs, SQL, etc.) Provides indices over your unstructured and structured data for use with LLM's. https://gpt-index.readthedocs.io/en/latest/index.html

- **LangChain** is a framework built around Large Language Models (LLMs) that can be used for various natural language processing tasks. At its core, LangChain allows developers to "chain" together different components to create more advanced use cases around LLMs. LangChain is a framework built around Large Language Models (LLMs) that can be used for various natural language processing tasks. At its core, LangChain allows developers to "chain" together different components to create more advanced use cases around LLMs  https://www.pinecone.io/learn/langchain-intro/

In [174]:
# llama index modules to import
from llama_index.core import (
    GPTVectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage,
)

# Basic LlamaIndex Usage Pattern

- Get an OpenAI key
- Load data into Documents, a custom type by Llamaindex
- Create a vectorized index of your documents
- Save the index as a json file
- Load the index from the saved json file
- Query your index.

## Text Document Example (one example)

In [175]:
# Load your data into 'Documents' a custom type by LlamaIndex
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader('/content/drive/MyDrive/chatbot_knowledge/data').load_data()

NB: As per the revised LangChain version the `SimpleDirectoryReader` must come from the package `llama_index.core`

In [176]:
# Create a vectorized index of your documents
from llama_index.core import GPTVectorStoreIndex
index = GPTVectorStoreIndex.from_documents(documents)

Based on the provided web search results, **GPTVectorStoreIndex** is an indexing system used to store and retrieve embeddings generated by Large Language Models (LLMs) like GPT. The embeddings are created during the index construction phase and stored in a vectorized index. This index allows for efficient querying and finding relevant parts of data based on the similarity of the query and the data [1][2]. The GPTVectorStoreIndex is one of several indexing systems available in the GPT Index project, including GPTChromaIndex and GPTTreeIndex

In [177]:
# # amr
# # test on GPTVectorStoreIndex
# from llama_index.core import StorageContext, load_index_from_disk

# index.storage_context.persist(persist_dir="./storage")

# loaded_index = load_index_from_disk(StorageContext.from_defaults(persist_dir="./storage"))

In [178]:
index.storage_context.persist()

In [179]:
# Querying your index!
query_engine = index.as_query_engine()
response = query_engine.query("What do you think of Facebook's LLaMa?")
print(response)

Facebook's LLaMA is a state-of-the-art foundational large language model designed to help researchers advance their work in the field of AI. It is released as part of Meta's commitment to open science, aiming to democratize access to large language models by providing smaller, more performant models that require less computing power and resources. LLaMA is available in several sizes and is intended for fine-tuning for a variety of tasks. Additionally, Facebook emphasizes responsible AI practices by sharing a model card detailing how LLaMA was built and releasing the model under a noncommercial license for research purposes. The company encourages collaboration across the AI community to address challenges such as bias, toxicity, and hallucinations in large language models.


In [180]:
# Displaying the response in bold, using markdown
display(Markdown(f"<b>{response}</b>"))

<b>Facebook's LLaMA is a state-of-the-art foundational large language model designed to help researchers advance their work in the field of AI. It is released as part of Meta's commitment to open science, aiming to democratize access to large language models by providing smaller, more performant models that require less computing power and resources. LLaMA is available in several sizes and is intended for fine-tuning for a variety of tasks. Additionally, Facebook emphasizes responsible AI practices by sharing a model card detailing how LLaMA was built and releasing the model under a noncommercial license for research purposes. The company encourages collaboration across the AI community to address challenges such as bias, toxicity, and hallucinations in large language models.</b>

## Customer Support Example (several documents)

In [181]:
documents = SimpleDirectoryReader('/content/drive/MyDrive/chatbot_knowledge/asos').load_data()

In [182]:
index = GPTVectorStoreIndex.from_documents(documents)

In [183]:
index.storage_context.persist()

In [184]:
# Querying your index!
query_engine = index.as_query_engine()
response = query_engine.query("What premier service options do I have in the UAE?")
print(response)

You have the ASOS Premier service option available in the United Arab Emirates.


In [185]:
# Displaying the response in bold, using markdown
display(Markdown(f"<b>{response}</b>"))

<b>You have the ASOS Premier service option available in the United Arab Emirates.</b>

## PDF Example

In [186]:
#downloading the Meta earnings pdf file
# !wget -P "/content/drive/MyDrive/pdf_docs" https://s21.q4cdn.com/399680738/files/doc_financials/2022/q4/Meta-12.31.2022-Exhibit-99.1-FINAL.pdf

In [187]:
from pathlib import Path
from llama_index.core import download_loader

PDFReader = download_loader("PDFReader")


loader = PDFReader()

documents = loader.load_data(file=Path('/content/drive/MyDrive/chatbot_knowledge/pdf/Don’t Fear the Terminator.pdf'))

  PDFReader = download_loader("PDFReader")


In [188]:
#from llama_index import GPTVectorStoreIndex
index = GPTVectorStoreIndex.from_documents(documents)

In [189]:
index.storage_context.persist()

In [190]:
# Querying your index!
query_engine = index.as_query_engine()
response = query_engine.query("Give us a summarization of the document?")
print(response)

The document discusses the fear surrounding artificial intelligence (AI) potentially surpassing human intelligence and the concerns of AI taking over the world. It highlights that the idea of AI seeking dominance is a misconception rooted in human evolutionary history. The text emphasizes that AI does not inherently possess a survival instinct like biological organisms do. Instead, AI's intelligence can be directed towards goals set by humans. The document also mentions the more realistic risks posed by AI, such as economic disruptions, life-critical failures, and weaponization. Ultimately, it suggests that the focus should be on understanding and managing the practical risks of AI rather than worrying about a scenario where AI seeks to dominate humanity.


In [191]:
# Querying your index!
query_engine = index.as_query_engine()
response = query_engine.query("can you give us key points in a bullet list")
print(response)

- Isaac Asimov's First Law of Robotics: "A robot may not injure a human being or, through inaction, allow a human being to come to harm."
- Speculation on the link between intelligence and dominance in different species like elephants and orangutans.
- Dominance-seeking behavior is more correlated with testosterone than with intelligence.
- Concerns about AI being weaponized and disrupting the economy, potentially displacing jobs.
- AI may lead to wealth and income inequalities unless new fiscal policies are implemented.
- Risks associated with AI technology, including the "unknown unknowns."
- AI's potential to bring about profound transformations, from releasing humans from mundane work to the risk of World War III fought by superintelligent robots.
- The misconception of AI takeover due to intelligence and drive for dominance.
- Explanation of how natural intelligence in biological organisms differs from artificial intelligence in terms of survival instinct.
- AI's intelligence can 

In [192]:
# Query your index!
response = query_engine.query("who wrote this document and what year")
print(response)
#display(Markdown(f"<b>{response}</b>"))

Anthony Zador and Yann LeCun wrote the document in 2019.


## Wikipedia Example

In [193]:
!pip install -q wikipedia

In [194]:
import wikipedia

In [195]:
# Testing on a wikipedia page
page = wikipedia.page("Cyclone Freddy", auto_suggest=False)
print(page.summary)

Very Intense Tropical Cyclone Freddy, also known as Severe Tropical Cyclone Freddy, was an exceptionally long-lived, powerful, and deadly tropical cyclone that traversed the southern Indian Ocean for more than five weeks in February and March 2023. Freddy is both the longest-lasting and highest-ACE-producing tropical cyclone ever recorded worldwide, traveling across the southern Indian Ocean, Mozambique, and Madagascar for 37 days and producing 87.01 units of ACE. Additionally, it is the third-deadliest tropical cyclone recorded in the Southern Hemisphere, only behind 2019's Cyclone Idai and the 1973 Flores cyclone. Freddy was the fourth named storm of the 2022–23 Australian region cyclone season, and the second very intense tropical cyclone of the 2022–23 South-West Indian Ocean cyclone season.
Preparations for the storm in the Mascarene Islands included flight groundings, cyclone alerts, and personnel being prepped for the aftermath, among other things. In Madagascar, areas previousl

Now let's load the page and query it via LangChain

In [196]:
from langchain.document_loaders import WikipediaLoader

# Charger le contenu depuis Wikipedia en utilisant WikipediaLoader
loader = WikipediaLoader("Machine_learning")
wikidocs = loader.load()

In [197]:
type(wikidocs)

list

In [None]:
# # Issue with GPTVectorStoreIndex and Wikipedia --- NOT SOLVED
# index = GPTVectorStoreIndex.from_documents(
#     wikidocs, storage_context=storage_context,
#     service_context=service_context
# )


In [None]:
# wikidocs = str(list(wikidocs))

In [None]:
# type(wikidocs)

In [None]:
# wikidocs

In [None]:
# wiki_index = GPTVectorStoreIndex.from_documents(wikidocs)

In [None]:
# index.storage_context.persist()

In [None]:
# # Query your index!
# response = query_engine.query("what is Cyclone Freddy")
# print(response)
# #display(Markdown(f"<b>{response}</b>"))

In [None]:
# from llama_index.core import download_loader

# WikipediaReader = download_loader("WikipediaReader")

# loader = WikipediaReader()
# #wikidocs = loader.load_data(pages=['Cyclone Freddy'])
# wikidocs = loader.load_data(pages=page)

[Cyclone Freddy](https://en.wikipedia.org/wiki/Cyclone_Freddy)

In [None]:
# response = wiki_index.query("What is cyclone freddy?")
# #print(response)
# display(Markdown(f"<b>{response}</b>"))

In [None]:
# response = wiki_index.query("What country were affected by cyclone freddy?")
# #print(response)
# display(Markdown(f"<b>{response}</b>"))

NB:
- `max_input_size`: Maximum input size for the LLM.
- `num_output`: Number of outputs for the LLM.
- `max_chunk_overlap`: Maximum chunk overlap for the LLM. For instance, if you max_input_size is 1000 then the first sentence will have 100 characters and the next sentence will have 800 characters onward.


## YouTube Video Example

In [200]:
YoutubeTranscriptReader = download_loader("YoutubeTranscriptReader")

loader = YoutubeTranscriptReader()
documents = loader.load_data(ytlinks=['https://www.youtube.com/watch?v=K7Kh9Ntd8VE&ab_channel=DaveNick'])

  YoutubeTranscriptReader = download_loader("YoutubeTranscriptReader")


In [201]:
# Indexing the documents
index = GPTVectorStoreIndex.from_documents(documents)

In [202]:
index.storage_context.persist()

In [203]:
# Querying your index!
query_engine = index.as_query_engine()
response = query_engine.query("Whare are some YouTube automation mistakes to avoid")
print(response)

Avoid the mistake of thinking that YouTube automation involves stealing someone else's content and reposting it on your channel for profit. This is illegal and not a legitimate way to run a YouTube automation business. It's important to understand the right way to create and manage a YouTube automation system to build a sustainable online business. Additionally, a common mistake to avoid is not properly setting up your automation systems to get views, subscribers, and ultimately make money. It's crucial to follow the correct steps and strategies to ensure the success of your YouTube automation business.


In [204]:
#print(response)
display(Markdown(f"<b>{response}</b>"))

<b>Avoid the mistake of thinking that YouTube automation involves stealing someone else's content and reposting it on your channel for profit. This is illegal and not a legitimate way to run a YouTube automation business. It's important to understand the right way to create and manage a YouTube automation system to build a sustainable online business. Additionally, a common mistake to avoid is not properly setting up your automation systems to get views, subscribers, and ultimately make money. It's crucial to follow the correct steps and strategies to ensure the success of your YouTube automation business.</b>

## SQL Example
A separate notebook entitled `amr_langchain_sql` has been done.

# Customize your LLM for different output

In [205]:
!pip install -q llama-index-llms-langchain

In [206]:
from langchain import OpenAI
from llama_index.core import llms, GPTVectorStoreIndex, PromptHelper, ServiceContext


In [207]:
# indicate which model to use
model = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0.1)

In [208]:
# define prompt helper
#   set maximum input size
max_input_size = 4096
#   set number of output tokens
num_output = 256
#   set maximum chunk overlap
max_chunk_overlap = 1.
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

# custom_LLM_index = GPTVectorStoreIndex(
#     documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper
# )

#service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
service_context = ServiceContext.from_defaults(llm = model, prompt_helper=prompt_helper)


  service_context = ServiceContext.from_defaults(llm = model, prompt_helper=prompt_helper)


In [209]:
# Setup your LLM - Building the bot

#from llama_index.core import LLMPredictor, GPTVectorStoreIndex, PromptHelper, ServiceContext
from llama_index.core import llms, GPTVectorStoreIndex, PromptHelper, ServiceContext


# define LLM
model = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0.1)


#llm_predictor = llms(llm=OpenAI(temperature=0.1, model_name="text-davinci-002"))


# define prompt helper
#   set maximum input size
max_input_size = 4096
#   set number of output tokens
num_output = 256
#   set maximum chunk overlap
max_chunk_overlap = 1.0
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

# custom_LLM_index = GPTVectorStoreIndex(
#     documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper
# )

service_context = ServiceContext.from_defaults(llm=model, prompt_helper=prompt_helper)


  service_context = ServiceContext.from_defaults(llm=model, prompt_helper=prompt_helper)


In [210]:
llm.model_name

'gpt-3.5-turbo-instruct'

In [211]:
llm.temperature

0.1

# Chatbot Class - Just include your index

In [212]:
# Defining a chatbot
import openai
import json

class Chatbot:
    def __init__(self, api_key, index):
        self.index = index
        openai.api_key = api_key
        self.chat_history = []

    def generate_response(self, user_input):
        prompt = "\n".join([f"{message['role']}: {message['content']}" for message in self.chat_history[-5:]])
        prompt += f"\nUser: {user_input}"
        #response = index.query(user_input)
        response = query_engine.query(user_input)

        message = {"role": "assistant", "content": response.response}
        self.chat_history.append({"role": "user", "content": user_input})
        self.chat_history.append(message)
        return message

    def load_chat_history(self, filename):
        try:
            with open(filename, 'r') as f:
                self.chat_history = json.load(f)
        except FileNotFoundError:
            pass

    def save_chat_history(self, filename):
        with open(filename, 'w') as f:
            json.dump(self.chat_history, f)


In [213]:
# Load your data into 'Documents' a custom type by LlamaIndex
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader('/content/drive/MyDrive/chatbot_knowledge/data').load_data()

In [214]:
# Create a vectorized index of your documents
from llama_index.core import GPTVectorStoreIndex
index = GPTVectorStoreIndex.from_documents(documents)

In [215]:
index.storage_context.persist()

In [216]:
# prompt = "The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.\n\nHuman: Hello, who are you?\nAI: I am an AI created by OpenAI. How can I help you today?\nHuman: "

In [217]:
# Querying your index!
query_engine = index.as_query_engine()
response = query_engine.query("What do you think of Facebook's LLaMa?")
print(response)

Facebook's LLaMA is a state-of-the-art foundational large language model designed to support researchers in the field of AI. It is created to be versatile and applicable to various tasks, unlike fine-tuned models that are specific to particular functions. The model is trained on a large set of unlabeled data and is available in different sizes to facilitate research and experimentation in the AI community. Additionally, Facebook is committed to responsible AI practices and has released LLaMA under a noncommercial license for research purposes, aiming to address challenges such as bias, toxicity, and misinformation in large language models.


In [218]:
# Swap out your index below for whatever knowledge base you want
bot = Chatbot("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", index=index)
bot.load_chat_history("chat_history.json")

while True:
    user_input = input("You: ")
    if user_input.lower() in ["bye", "goodbye"]:
        #print("Bot: Goodbye!")
        display(Markdown(f"<b>Bot: Goodbye!</b>"))
        bot.save_chat_history("chat_history.json")
        break
    response = bot.generate_response(user_input)
    #print(f"Bot: {response['content']}")
    display(Markdown(f"<b>Bot: {response['content']}</b>"))


You: What is Llama?


<b>Bot: LLaMA (Large Language Model Meta AI) is a state-of-the-art foundational large language model designed to assist researchers in advancing their work in the field of AI. It is available in several sizes (7B, 13B, 33B, and 65B parameters) and is intended to be versatile for various tasks. LLaMA works by predicting the next word in a sequence of words to generate text recursively. It is trained on a large set of unlabeled data and aims to democratize access to large language models by requiring less computing power and resources compared to larger models.</b>

You: bye


<b>Bot: Goodbye!</b>

# Gradio Interface for the Chatbot
https://github.com/afizs/chatgpt-clone/blob/main/mini_ChatGPT.ipynb

In [219]:
!pip install gradio -q
import gradio as gr

In [220]:
def chatbot_response(user_input):
    if user_input.lower() in ["bye", "goodbye"]:
        #print("Bot: Goodbye!")
        return "Bot: Goodbye!"
    response = bot.generate_response(user_input)
    #print(f"Bot: {response['content']}")
    return f"Bot: {response['content']}"

iface = gr.Interface(fn=chatbot_response, inputs="text", outputs="text", title="Chatbot", description="Enter text to chat with the bot.")
iface.launch()


Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://a1b152803ddcfe4a88.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [226]:
with open("chat_history.json", "r") as f:
    print(f.read())

[{"role": "user", "content": "what is Llama?"}, {"role": "assistant", "content": "I cannot provide an answer to the query as there is no mention of \"Llama\" in the provided context information."}, {"role": "user", "content": "buy"}, {"role": "assistant", "content": "You can earn commissions through affiliate marketing by promoting products in your niche and including affiliate links in your video descriptions. This allows you to earn a percentage of the sale whenever someone makes a purchase through your link."}, {"role": "user", "content": "youtube infos"}, {"role": "assistant", "content": "To optimize a YouTube channel, it is important to set up a logo or profile picture, channel art, and an informative about section with relevant keywords related to the videos being created. Utilizing tools like Photoshop or free alternatives such as Photopia or Canva can help in creating logos and channel art. Additionally, tools like VidIQ can assist in boosting YouTube views and generating video

# Ressources

- [Llamaindex documentarion](https://gpt-index.readthedocs.io/en/latest/how_to/analysis/playground.html#sample-code)

- [Llama hub](https://llamahub.ai/) and [notebooks](https://github.com/jerryjliu/llama_index/tree/main/examples/data_connectors)

- [Langchain](https://www.pinecone.io/learn/langchain-intro/#:~:text=At%20its%20core%2C%20LangChain%20is,advanced%20use%20cases%20around%20LLMs.)

- [Chatgpt api and whisper api](https://openai.com/blog/introducing-chatgpt-and-whisper-apis)

- [Openai Playground](https://platform.openai.com/playground)

- [Openai community](https://community.openai.com/)

- [Gutenberg.org for free ebooks](https://www.gutenberg.org/)

- [Don't fear the terminator](https://blogs.scientificamerican.com/observations/dont-fear-the-terminator/)

- [Openai Discord Channel](https://discord.com/channels/974519864045756446/1037561178286739466/1081542418714873929)


# Glossary
- **unstructured data**: If documents are not in a database or spreadsheet format, they’re “unstructured.” An “Unstructured Document” is a document that may contain valuable data, but the data is not organized in a fixed format. Consequently, it is difficult to “find” and capture the data for use. For instance, An email may contain important information, but it is presented in narrative text. Letters, emails, image files, text files, blogs, and social media posts are examples of unstructured documents.
- **temperature**: Temperature is a value between 0 and 1 that essentially lets you control how confident the model should be when making these predictions. Lowering temperature means it will take fewer risks, and completions will be more accurate and deterministic. Increasing temperature will result in more diverse completions.
- **maximum length**: Maximum length is a value between 1 to 4000. it is the maximum number of tokens that you allow your API call to generate. It’s just a cap: it doesn’t mean that your call will actually generate all these tokens. It should always be less than context_window - input_tokens, or you’ll get an error from OpenAI. In *text-davinci-003*, we have 4000 tokens, in *gpt-3.5-turbo* we have 2048 tokens
and in *gpt-4* we have 2048 tokens.
- **frequency penality**:  The frequency penality is a value between 0 to 2. the OpenAI Frequency Penalty setting is used to adjust how much frequency of tokens in the source material will influence the output of the model. With short (under 400 or so tokens) it’s not likely that frequency  will be noticeable… It’s when you have longer output that you start to see it in play.
- **presence penality**: The presence penality is a value between 0 to 2. Presence penalty is a flat reduction if the token has appeared at least once before, while frequency penalty is bigger if the token has appeared multiple times.
- **top P**: The value is between 0 and 1. It is a token sampling method. top_p computes the cumulative probability distribution, and cut off as soon as that distribution exceeds the value of top_p. For example, a top_p of 0.3 means that only the tokens comprising the top 30% probability mass are considered.


# Dad Jokes

In [228]:
from llama_index.core import download_loader

DadJokesReader = download_loader("DadJokesReader")

loader = DadJokesReader()
documents = loader.load_data()

  DadJokesReader = download_loader("DadJokesReader")


In [232]:
# Create a vectorized index of your documents
from llama_index.core import GPTVectorStoreIndex

index = GPTVectorStoreIndex.from_documents(documents)



In [233]:
# Query your index!
query_engine.query
response = query_engine.query("give me another one")
#print(response)
display(Markdown(f"<b>{response}</b>"))

<b>LLaMA 13B is another model that has been trained on a significant amount of tokens, specifically 1.4 trillion tokens.</b>