## Part B: Write a chatbot prompt to iteratively create a sequence of chats on one particular custom data.

1. The chatbot should be able to answer the questions based on the text data or multiple documents.

2. The chatbot should save the conversation in the memory.

2. Summarize the chats at the end of the conversation.

In [32]:
!!pip -q install langchain
!pip -q install openai
!pip -q install tiktoken
!pip -q install faiss-gpu
!pip -q install langchain_experimental
!pip -q install "langchain[docarray]"
!pip -q install openai
!pip -q install PyPDF2
!pip -q install templates

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/73.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━[0m [32m41.0/73.5 kB[0m [31m1.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m73.5/73.5 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [15]:
import os
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

In [33]:
from dotenv import load_dotenv
from PyPDF2 import PdfReader
from langchain.text_splitter import CharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain


### Data Loading

In [34]:
def get_text_from_pdf(pdf_files):
    text = ""
    for pdf in pdf_files:
        file = PdfReader(pdf)
        for page in file.pages:
            text += page.extract_text()
    return text

### Chunking

In [36]:
def chunk_text(raw_text):
    text = CharacterTextSplitter(
        separator="\n",
        chunk_size=900,
        chunk_overlap=300,
        length_function=len
    )
    chunks = text.split_text(raw_text)
    return chunks

### Embedding

In [37]:
def get_vectorstore(chunks):
    embeddings = OpenAIEmbeddings()
    vectorstore = FAISS.from_texts(texts=chunks, embedding=embeddings)
    return vectorstore

### Model

In [38]:
def get_convo_chain(vectorstore):

    llm = ChatOpenAI()
    memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
    convo_chain = ConversationalRetrievalChain.from_llm(
        llm = llm,
        retriever = vectorstore.as_retriever(),
        memory = memory
    )
    return convo_chain

## Inference

In [39]:
def chat_user_input(user_query, conver_chain):
    response = conver_chain({'question': user_query})
    chat_history = response['chat_history']
    return chat_history[-1].content

### Chatbot

In [40]:
def main():
    load_dotenv()

    docs = []  # List to hold PDF file paths
    pdf_files = os.listdir("/content/drive/MyDrive/pdf_folder_chatbot")  # Directory containing PDF files

    # Read PDF files and extract text
    for file in pdf_files:
        if file.endswith(".pdf"):
            docs.append(os.path.join("/content/drive/MyDrive/pdf_folder_chatbot", file))
    pdf_content = get_text_from_pdf(docs)

    # Chunk text
    chunks = chunk_text(pdf_content)

    # Get vectorstore
    vectorstore = get_vectorstore(chunks)

    # Get conversation chain
    convo_chain = get_convo_chain(vectorstore)

    print("Ask any query regarding your data or enter 'quit' to exit")

    while True:
        user_query = input("You: ")
        if user_query.lower() == "quit":
            break
        response = chat_user_input(user_query, convo_chain)
        print("ChatBot:", response)

if __name__ == "__main__":
    main()

Ask any query regarding your data or enter 'quit' to exit
You: what is the pdf about?
ChatBot: The PDF is about a keynote talk on a Smart City Traffic Drone AI Cloud Platform presented by Jerry Gao. It discusses the use of intelligence, big data, and AI cloud infrastructure in managing city traffic, including real-time traffic information, monitoring, analysis, congestion detection, collision detection, emergency response, and more. It also covers the use of machine learning models for satellite image-based road segmentation, vehicle detection, and counting in the context of smart city traffic management.
You: what are the key concepts in the pdf?
ChatBot: The key concepts discussed in the PDF about the keynote talk on a Smart City Traffic Drone AI Cloud Platform presented by Jerry Gao include:

1. Smart City Traffic Intelligence Cloud Stack
2. Private Traffic Cloud, Public Traffic Cloud, and Hybrid Traffic Cloud
3. Real-time city traffic monitoring and flow analysis
4. Intelligent tra