This notebook demonstrates how to build an interactive Conversational Chatbot using the LangChain framework, OpenAI's GPT-3.5-turbo, and Pinecone for vector storage. The chatbot engages in conversations with users, retrieves relevant information from a vector store, and answers questions based on the context provided by previous interactions.

Installing requirements

In [3]:
!pip install -qU \
    langchain \
    openai \
    pinecone-client \
    langchain-openai \
    langchain-pinecone \
    python-dotenv \
    pypdf \
    langchain_community

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m17.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m386.9/386.9 kB[0m [31m23.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.8/244.8 kB[0m [31m15.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.9/49.9 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m294.5/294.5 kB[0m [31m17.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m52.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m34.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Environment Setup: Create a .env file in the runtime. The dotenv library is used to load API keys and other environment variables, ensuring a secure configuration for accessing OpenAI and Pinecone services.

In [2]:

OPENAI_API_KEY=your_openai_api_key
PINECONE_API_KEY=your_pinecone_api_key
INDEX_NAME=your_pinecone_index_name
PINECONE_CLOUD=aws  # (or any other supported cloud provider)
PINECONE_REGION=us-east-1  # (or the region where your Pinecone project is hosted)

**OpenAIEmbeddings** are used to convert text into embeddings (vector representations), which can be compared to determine semantic similarity between queries and stored data.

**PineconeVectorStore** acts as a vector database that stores the embeddings and retrieves relevant chunks of information based on user queries.

In [4]:
import os
from dotenv import load_dotenv
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import CharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_pinecone import PineconeVectorStore

# Load environment variables
load_dotenv()

def load_pdf_from_terminal():
    """
    Prompts the user to input a PDF file path via the terminal.
    Returns the file path of the PDF.
    """
    pdf_path = input("Please enter the full path to the PDF file: ")

    # Check if the file exists
    if not os.path.exists(pdf_path):
        raise ValueError("The specified file does not exist. Please enter a valid file path.")

    return pdf_path

def process_pdf_file(pdf_path):
    """
    Loads and processes the PDF file by splitting it into text chunks.
    Then creates embeddings and stores them in Pinecone.
    """
    try:
        # Load the PDF document
        loader = PyPDFLoader(pdf_path)
        document = loader.load()

        # Split the document into chunks
        text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
        texts = text_splitter.split_documents(document)
        print(f"Created {len(texts)} chunks")

        # Generate embeddings
        embeddings = OpenAIEmbeddings(openai_api_key=os.environ.get("OPENAI_API_KEY"))

        # Store the vectors in Pinecone
        PineconeVectorStore.from_documents(texts, embeddings, index_name=os.environ.get("INDEX_NAME"))
        print("Embeddings have been successfully stored in Pinecone.")
    except Exception as e:
        print(f"An error occurred during processing: {e}")

if __name__ == "__main__":
    try:
        # Load PDF file from terminal
        pdf_file_path = load_pdf_from_terminal()
        print(f"Selected file: {pdf_file_path}")

        # Process the loaded PDF file
        process_pdf_file(pdf_file_path)
    except Exception as e:
        print(f"An error occurred: {e}")


Please enter the full path to the PDF file: /content/Harry Potter - Book 5 - The Order of the Phoenix.pdf
Selected file: /content/Harry Potter - Book 5 - The Order of the Phoenix.pdf
Created 881 chunks
Embeddings have been successfully stored in Pinecone.


The chatbot is powered by LangChain's ChatOpenAI (a wrapper around **OpenAI's gpt-3.5-turbo** model). It processes the conversational inputs in real-time.
**ConversationalRetrievalChain** is used to manage retrieval-augmented conversations, combining the power of OpenAI's language models and Pinecone's vector search to return dynamic, contextually-relevant answers.

In [5]:
import os
import warnings
from dotenv import load_dotenv
from langchain_openai import OpenAIEmbeddings
from langchain.chains import ConversationalRetrievalChain
from langchain_community.chat_models import ChatOpenAI
from langchain_pinecone import PineconeVectorStore

warnings.filterwarnings("ignore")

# Load environment variables
load_dotenv()

# Initialize chat history
chat_history = []

def start_conversational_chatbot():
    """
    Starts the chatbot in interactive mode, dynamically answering questions until the user ends the conversation.
    """
    # Initialize embeddings and vector store
    embeddings = OpenAIEmbeddings(openai_api_key=os.environ.get("OPENAI_API_KEY"))
    vectorstore = PineconeVectorStore(
        index_name=os.environ["INDEX_NAME"], embedding=embeddings
    )

    # Initialize the LLM model for chat
    chat = ChatOpenAI(verbose=True, temperature=0, model_name="gpt-3.5-turbo")

    # Initialize ConversationalRetrievalChain
    qa = ConversationalRetrievalChain.from_llm(
        llm=chat, chain_type="stuff", retriever=vectorstore.as_retriever()
    )

    print("Chatbot is ready. Type 'end' to finish the conversation.")

    while True:
        # Get user input (question)
        user_input = input("\nYou: ")

        # If user types "end", break the loop
        if user_input.lower().strip() == "end":
            print("Ending conversation. Goodbye!")
            break

        # Generate the response from the chatbot
        res = qa({"question": user_input, "chat_history": chat_history})

        # Retrieve the response and print it
        answer = res["answer"]
        print(f"Bot: {answer}")

        # Save the current conversation to the chat history
        chat_history.append((user_input, answer))

if __name__ == "__main__":
    start_conversational_chatbot()


Chatbot is ready. Type 'end' to finish the conversation.

You: Who is Harry Potter?
Bot: Harry Potter is the main character in the Harry Potter book series written by J.K. Rowling. He is a young wizard who attends Hogwarts School of Witchcraft and Wizardry and goes on various adventures throughout the series.

You: How many books are in this collection?
Bot: There are a total of seven books in the Harry Potter book series written by J.K. Rowling.

You: When did the 3rd  book published?
Bot: The 3rd book in the Harry Potter series, "Harry Potter and the Prisoner of Azkaban," was published in 1999.

You: What are the other works of J. K. Rowling?
Bot: J.K. Rowling is also known for writing "The Casual Vacancy" and the Cormoran Strike series under the pseudonym Robert Galbraith.

You: Is Game of Thones directed by J.K. Rowling?
Bot: No, "Game of Thrones" is not directed by J.K. Rowling. J.K. Rowling is the author of the Harry Potter series, while "Game of Thrones" is a television series b