**RUN THE ENTIRE CODE FOR CHATBOT**

#Download the data for your custom knowledge base
For the demonstration purposes I am going to use custom knowledge base. You can download them to your local folder from the github repository by running the code below.
Alternatively, you can put your own custom data into the local folder. 

In [1]:
!git clone https://github.com/sudipmondal1310/Internship.git

Cloning into 'Internship'...
remote: Enumerating objects: 15, done.[K
remote: Counting objects: 100% (15/15), done.[K
remote: Compressing objects: 100% (11/11), done.[K
remote: Total 15 (delta 0), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (15/15), 2.15 MiB | 6.06 MiB/s, done.


# Install the dependicies
Run the code below to install the depencies we need for our functions

In [2]:
!pip install llama-index==0.5.6
!pip install langchain==0.0.148

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting llama-index==0.5.6
  Downloading llama_index-0.5.6.tar.gz (165 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m165.0/165.0 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting dataclasses_json (from llama-index==0.5.6)
  Downloading dataclasses_json-0.5.7-py3-none-any.whl (25 kB)
Collecting langchain (from llama-index==0.5.6)
  Downloading langchain-0.0.171-py3-none-any.whl (846 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m846.5/846.5 kB[0m [31m26.4 MB/s[0m eta [36m0:00:00[0m
Collecting openai>=0.26.4 (from llama-index==0.5.6)
  Downloading openai-0.27.6-py3-none-any.whl (71 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.9/71.9 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
Collecting tiktoken (from llama-index==0.5.6)
  Downloading tiktoken-0.4.0-

# Define the functions
The following code defines the functions we need to construct the index and query it

In [3]:
from llama_index import SimpleDirectoryReader, GPTListIndex, readers, GPTSimpleVectorIndex, LLMPredictor, PromptHelper, ServiceContext
from langchain import OpenAI
import sys
import os
from IPython.display import Markdown, display

def construct_index(directory_path):
    # set maximum input size
    max_input_size = 4096
    # set number of output tokens
    num_outputs = 2000
    # set maximum chunk overlap
    max_chunk_overlap = 20
    # set chunk size limit
    chunk_size_limit = 600 

    # define prompt helper
    prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)

    # define LLM
    llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.5, model_name="text-davinci-003", max_tokens=num_outputs))
 
    documents = SimpleDirectoryReader(directory_path).load_data()
    
    service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
    index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)

    index.save_to_disk('index.json')

    return index



# Set OpenAI API Key
You need an OPENAI API key to be able to run this code.

If you don't have one yet, get it by [signing up](https://platform.openai.com/overview). Then click your account icon on the top right of the screen and select "View API Keys". Create an API key.

Then run the code below and paste your API key into the text input.

For demo purpose put the OPENAI_API_KEY as="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"

Make sure your GPT account has sufficient balance.

In [4]:
os.environ["OPENAI_API_KEY"] = input("Paste your OpenAI key here and hit enter:")
#os.environ["OPENAI_API_KEY"]="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"

Paste your OpenAI key here and hit enter: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


#Construct an index
Now we are ready to construct the index. This will take every file in the folder 'data', split it into chunks, and embed it with OpenAI's embeddings API.

**Notice:** running this code will cost you credits on your OpenAPI account ($0.02 for every 1,000 tokens). If you've just set up your account, the free credits that you have should be more than enough for this experiment.

In [5]:
construct_index("/content/Internship/Data")


<llama_index.indices.vector_store.vector_indices.GPTSimpleVectorIndex at 0x7f51189f3d00>

#Ask questions
It's time to have fun and test the chatbot. Run the function that queries GPT and type your question into the input. 

In [6]:
pip install gradio


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting gradio
  Downloading gradio-3.31.0-py3-none-any.whl (17.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.4/17.4 MB[0m [31m107.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting aiofiles (from gradio)
  Downloading aiofiles-23.1.0-py3-none-any.whl (14 kB)
Collecting fastapi (from gradio)
  Downloading fastapi-0.95.2-py3-none-any.whl (56 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.0/57.0 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting ffmpy (from gradio)
  Downloading ffmpy-0.3.0.tar.gz (4.8 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting gradio-client>=0.2.4 (from gradio)
  Downloading gradio_client-0.2.5-py3-none-any.whl (288 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m288.1/288.1 kB[0m [31m37.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting httpx (from gradio)
  Downloa

In [7]:
def ask_ai_new(query):
    index = GPTSimpleVectorIndex.load_from_disk('index.json')
    while True: 
        #query = input("What do you want to ask? ")
        response = index.query(query)
        response = Markdown(f"<b>{response.response}</b>")
        print(response)
        return response.data

In [8]:
import time
import random
import gradio as gr

with gr.Blocks() as demo:
    chatbot = gr.Chatbot()
    query = gr.inputs.Textbox(label="Enter your message here")
    clear = gr.Button("Clear")

    def user(user_message, history):
        return "", history + [[user_message, None]]

    def bot(history):
        bot_message = ask_ai_new(history[-1][0])
        print(bot_message)
        history[-1][1] = ""
        for character in bot_message:
            history[-1][1] += character
            time.sleep(0.05)
            yield history

    query.submit(user, [query, chatbot], [query, chatbot], queue=False).then(
        bot, chatbot, chatbot
    )
    clear.click(lambda: None, None, chatbot, queue=False)

demo.queue()
demo.launch()


  super().__init__(
  super().__init__(


Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://b2116537096cafab86.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces


