<a href="https://colab.research.google.com/github/mohwarsame273/Medium-Articles/blob/main/Copy_of_AI_Quickstart_with_OpenAI_%26_Langchain_with_Python_CreateBot_based_on_Multiple_PDFs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# THANKS FOR THE INSPIRATION 

Original Author: https://twitter.com/virattt/status/1644329667673878528
@virattt

Get more like this: http://www.twitter.com/attractfunding 

## MOTIVATION

The world needs more humility
when facing The Mystery.

So, I leveraged 
@virattt
's tutorial to make a Chatbot based on a body of knowledge consisting of multiple PDFs.

Stack: 
· OpenAI
· LangchainAI
· PineCone
· Python
· Google Colab

---

# Let's install everything we need

In [None]:
!pip install openai

In [None]:
!pip install langchain

In [None]:
pip install pinecone-client

In [None]:
from langchain.vectorstores import Chroma, Pinecone
from langchain.embeddings.openai import OpenAIEmbeddings
from tqdm.autonotebook import tqdm
import openai 
import pinecone
import os

In [None]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [None]:
!pip install pypdf

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


---

# Let's set up our environment variables for OpenAI & Pinecone

#### I found this part a bit tricky - I'm leaving all options on the table here just in case you run into environment API key errors. 

In [None]:
os.environ["OPENAI_API_KEY"] = ''
os.environ["PINECONE_API_KEY"] = ''
os.environ["PINECONE_API_ENV"] = ''

In [None]:
OPENAI_API_KEY = os.environ.get('')
PINECONE_API_KEY = os.environ.get('')
PINECONE_API_ENV = os.environ.get('')

In [None]:
%env OPENAI_API_KEY=

In [None]:
%env PINECONE_API_ENV=

In [None]:
%env PINECONE_API_KEY=

---

# Let's Load in the PDFs

In [None]:
pdfs = [
    "/content/SteppingOutOfSelfDeceptionRodneyStephens1.pdf", # Notes from Stepping out of Self-Deception by Rodney Stephens 1
    "/content/SteppingOutOfSelfDeceptionRodneyStephens2.pdf", # Notes from Stepping out of Self-Deception by Rodney Stephens 2
    "/content/SteppingOutOfSelfDeceptionRodneyStephens3.pdf", # Notes from Stepping out of Self-Deception by Rodney Stephens 3
]

In [None]:
annual_reports = []
for pdf in pdfs:
    loader = PyPDFLoader(pdf)
    # Load the PDF document
    document = loader.load()        
    # Add the loaded document to our list
    annual_reports.append(document)

---

# Let's split the PDF into chunks

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)

chunked_annual_reports = []
for annual_report in annual_reports:
    # Chunk the annual_report
    texts = text_splitter.split_documents(annual_report)
    # Add the chunks to chunked_annual_reports, which is a list of lists
    chunked_annual_reports.append(texts)
    print(f"chunked_annual_report length: {len(texts)}")

In [None]:
print(chunked_annual_reports)

----

# This is where the magic us. This is where you connect the chunks of data to a Pinecone Vector Database so that you can use OpenAI/ChatGPT to have a chatbot leveraging the dataset.

In [None]:
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

In [None]:
print(embeddings)

Remember to enter your pinecone index name below

In [None]:
# Initialize Pinecone
pinecone.init(
    api_key=PINECONE_API_KEY,
    environment=PINECONE_API_ENV
)
index_name = "ENTER_PINECONE_INDEX_NAME_HERE"

Remember to enter your pinecone api and pinecone environment below

In [None]:
pinecone.init(api_key="ENTER_PINECONE_API_KEY_HERE", environment="ENTER_PINECONE_ENVIRONMENT_HERE")

Remember to enter your pinecone index name below

In [None]:
index = pinecone.Index("ENTER_PINECONE_INDEX_NAME_HERE")
vectorstore = Pinecone(index, embeddings.embed_query, "text")

In [None]:
!pip install tiktoken

In [None]:
# Upsert annual reports to Pinecone via LangChain.
# There's likely a better way to do this instead of Pinecone.from_texts()
for chunks in chunked_annual_reports:
    Pinecone.from_texts([chunk.page_content for chunk in chunks], embeddings, index_name=index_name)

In [None]:
vectorstore = Pinecone.from_existing_index(index_name=index_name, embedding=embeddings)

In [None]:
print(vectorstore)

In [None]:
from langchain.chains import ConversationalRetrievalChain
from langchain.text_splitter import CharacterTextSplitter
from langchain.llms import OpenAI

In [None]:
# Create the chain
qa = ConversationalRetrievalChain.from_llm(
    llm=OpenAI(temperature=0), 
    retriever=vectorstore.as_retriever(),
    return_source_documents=True,
)

In [None]:
# Initialize chat history list
chat_history = []

# Let's start Chatting

In [None]:
query = "What is the source of depression?"
result = qa({"question": query, "chat_history": chat_history})
result["answer"]
chat_history.append((query, result["answer"]))

' Depression can come from conditioned patterns, self-deprecating commentary, and self-deception.'

In [None]:
query = "what is this data about?"
result = qa({"question": query, "chat_history": chat_history})
result["answer"]
chat_history.append((query, result["answer"]))

' This data is about the qualities that persist after all the practice we have done, and the understanding that begins to surface that all pressure to change crates the backlash of its opposite. It is also about the first sobering acknowledgement of the inherent limitation of the forms of self-effort and a rudimentary sense of what we are ourselves.'

In [None]:
query = "provide a few emotional quotes from the data"
result = qa({"question": query, "chat_history": chat_history})
result["answer"]
chat_history.append((query, result["answer"]))

" I don't know."

In [None]:
query = "write a 100-word essay about this data and its implication for startup founders"
result = qa({"question": query, "chat_history": chat_history})
result["answer"]
chat_history.append((query, result["answer"]))

'\n\nStartup founders often face a lot of external forces that can be difficult to manage. Instead of relying on traditional methods of accountability, it is important to look at the underlying emotions that drive our need to escape. When our minds wander during business meetings, we should ask ourselves what emotion is driving the need to escape. We can then use this knowledge to activate our interest and surrender all forms of resistance and protest to the conditions here and now. Surrendering can be simpler than it seems, as it involves continuously giving up our sense of separateness and all within its definitions. \n\nThis data has important implications for startup founders. It suggests that instead of relying on traditional methods of accountability, they should focus on understanding the underlying emotions that drive their need to escape. By recognizing and accepting these emotions, they can use them to activate their interest and surrender all forms of resistance and protest 

In [None]:
query = "write a 100-word essay about this data from the perspective of a Venture Capitalist presenting to his investors after the great langchain crash of 2023"
result = qa({"question": query, "chat_history": chat_history})
result["answer"]
chat_history.append((query, result["answer"]))

'\n\nIt has been two years since the great langchain crash of 2023, and the effects are still being felt. The crash was caused by a market-driven economy that promoted feelings of incompleteness until people bought certain products, together with a lifetime of comparisons and evaluations on every front. This sense of incompleteness changed over time into a strong sense of inadequacy and low self-esteem.\n\nAs a venture capitalist, I have seen firsthand the consequences of this crash. Companies that relied on the langchain have gone bankrupt, and many people have lost their jobs. The crash has also had a profound effect on our language and our view of reality. Words and views are now seen as approximations or symbols of reality, and we use them as a guide towards the real dimension.\n\nThe lesson to be learned from this crash is that we must be mindful of our spiritual intention and the interconnectedness of all things. We must be aware of the power of words and views, and how they can 

In [None]:
query = "write a 100-word essay about this data from the perspective of a Teacher teaching her 8-year old students about Buddhism and its relation to Christianity"
result = qa({"question": query, "chat_history": chat_history})
result["answer"]
chat_history.append((query, result["answer"]))

"\n\nBuddhism and Christianity are two of the world's most popular religions. They both have a strong focus on spiritual growth and understanding of the world around us. Buddhism emphasizes the importance of meditation and mindfulness, while Christianity emphasizes the importance of faith and prayer.\n\nAs a teacher, I often explain to my 8-year old students the similarities and differences between Buddhism and Christianity. I explain that both religions have a strong focus on spiritual growth and understanding of the world around us. I also explain that the only way to commune with God is by going to church or taking a walk in nature.\n\nI also explain the Eightfold Path and Wise View, which are two of the core teachings of Buddhism. I explain that the Eightfold Path is the true and natural way things inherently are, and that when perceptions are freed from opinions, there is a natural realignment to drop the object of our wrath and connect with emotion.\n\nFinally, I explain the self

In [None]:
query = "using passages of text from the data, write a 100-word essay about this data from the perspective of a Venture Capitalist presenting to his investors after the great langchain crash of 2023"
result = qa({"question": query, "chat_history": chat_history})
result["answer"]
chat_history.append((query, result["answer"]))

' As a Venture Capitalist, I have seen the effects of the great langchain crash of 2023 firsthand. The crash was caused by a market-driven economy that promoted feelings of incompleteness until people bought certain products. This sense of incompleteness changed over time into a strong sense of inadequacy and low self-esteem. This had a devastating effect on the language viewpoint, as our character develops over the course of time as long as our spiritual intention is wise. \n\nThe crash has taught us that words and views are only approximations or symbols of reality. We must use them as a guide towards the real dimension, but eventually the words and views that direct our practice must come to an end. We must learn to strengthen our sense of self without relying on mental control. \n\nThe great langchain crash of 2023 has been a wake-up call for us all. We must learn to be mindful of our words and views, and to be aware of the interconnectedness of all things. We must strive to develo

In [None]:
query = "using passages of text from the data, write a 100-word essay about Worthum, an Outcome-Oriented firm based on Bangalore lead by Manish Andankar that leverages Wardley Mapping to lead product-oriented teams into a world where langchain is taking over"
result = qa({"question": query, "chat_history": chat_history})
result["answer"]
chat_history.append((query, result["answer"]))

"\n\nWorthum is an Outcome-Oriented firm based in Bangalore, lead by Manish Andankar. At Worthum, they leverage Wardley Mapping to lead product-oriented teams into a world where langchain is taking over. Langchain is a new technology that uses words and views as symbols of reality, allowing us to use them as markers towards the virtual dimension.\n\nAt Worthum, they understand that arguing with what is, is pointless. Instead, they focus on thought-based actions that are often in protest to what is, and once they have assumed that this protest is true, they will have the accompanying emotions of that assumption and will act in accordance with those emotions.\n\nWorthum also understands that our character develops over the course of time as long as our spiritual intention is wise. They don't take developing our character as a separate task, but rather as something that will grow naturally upon following the inclination of our heart.\n\nAt Worthum, they strive to maintain presence and und