<a href="https://colab.research.google.com/github/szaveri99/LLM_Chatbot/blob/main/PdfQuery.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## PDF Query Using Langchain

In [None]:
!pip install langchain
!pip install openai
!pip install PyPDF2
!pip install faiss-cpu
!pip install tiktoken
!pip install streamlit
!pip install pyngrok==4.1.1

In [None]:
from dotenv import load_dotenv, find_dotenv
import os
import openai

_*load_dotenv(find_dotenv())

openai.api_key = os.environ['OPENAI_API_KEY']

In [None]:
from PyPDF2 import PdfReader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI

In [None]:
!wget https://pgcag.files.wordpress.com/2010/01/48lawsofpower.pdf

--2024-02-04 08:39:34--  https://pgcag.files.wordpress.com/2010/01/48lawsofpower.pdf
Resolving pgcag.files.wordpress.com (pgcag.files.wordpress.com)... 192.0.72.25, 192.0.72.24
Connecting to pgcag.files.wordpress.com (pgcag.files.wordpress.com)|192.0.72.25|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 104926 (102K) [application/pdf]
Saving to: ‘48lawsofpower.pdf’


2024-02-04 08:39:34 (29.8 MB/s) - ‘48lawsofpower.pdf’ saved [104926/104926]



In [None]:
# !mkdir docs
# !mv 48lawsofpower.pdf docs

In [None]:
# provide the path of  pdf file/files.
pdfreader = PdfReader('48lawsofpower.pdf')

In [None]:
from typing_extensions import Concatenate
# read text from pdf
raw_text = ''
for i, page in enumerate(pdfreader.pages):
    content = page.extract_text()
    if content:
        raw_text += content

In [None]:
# We need to split the text using Character Text Split such that it sshould not increase token size
text_splitter = CharacterTextSplitter(
    separator = "\n",
    chunk_size = 800,
    chunk_overlap  = 200,
    length_function = len,
)
texts = text_splitter.split_text(raw_text)

In [None]:
# Download embeddings from OpenAI
embeddings = OpenAIEmbeddings()

In [None]:
document_search = FAISS.from_texts(texts, embeddings)

In [None]:
chain = load_qa_chain(OpenAI(), chain_type="stuff")

  warn_deprecated(


In [None]:
query = "Can you give me an example from history where the enemy was crushed totally from the book?"
docs = document_search.similarity_search(query)
chain.run(input_documents=docs, question=query)

  warn_deprecated(


" One example from history where an enemy was crushed totally is the case of Emperor Sung of China in 959 A.D. Despite facing numerous enemies and potential threats to his throne, Sung was able to turn all of his enemies into loyal friends by showing generosity and sparing their lives. This ultimately led to the enemies becoming grateful and loyal allies, effectively crushing their power and eliminating any future threat to Sung's rule."

In [None]:
query = "What's the point of making myself less accessible?"
docs = document_search.similarity_search(query)
chain.run(input_documents=docs, question=query)

' The point of making yourself less accessible is to create a sense of scarcity and increase your value in the eyes of others. By creating an aura of unpredictability and keeping others in a state of suspended terror, you can gain power and respect. This can be applied in various situations, such as seduction, love, and business. It also helps to protect your power and prevent others from taking you for granted.'

In [None]:
query = "Can you tell me the story of Queen Elizabeth I from this 48 laws of power book?"
docs = document_search.similarity_search(query)
chain.run(input_documents=docs, question=query)

' From the context given, it appears that the story of Queen Elizabeth I is referenced in the chapter on "Law 37: Create Compelling Spectacles." The chapter discusses how powerful figures like Elizabeth I used spectacles and public displays to enhance their image and maintain their power. The context mentions that Elizabeth I refused to marry or commit to any suitor, making herself ungraspable and thus increasing her power and desirability. She also maintained her independence and refused to be obligated to anyone, while still seeking promises from both sides of political conflicts to secure her position. The context also mentions that she used displays and ceremonies to showcase her power and control over England. This is just a brief overview of the story of Queen Elizabeth I as mentioned in this book, but it is not a comprehensive retelling of her life and accomplishments.'

In [None]:
%%writefile model.py

from PyPDF2 import PdfReader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.chains.question_answering import load_qa_chain
from typing_extensions import Concatenate
from langchain.llms import OpenAI
from dotenv import load_dotenv, find_dotenv
import os
import openai

def pdf_read():
  #_*load_dotenv(find_dotenv())
  load_dotenv(find_dotenv())

  openai.api_key = os.environ['OPENAI_API_KEY']

  # provide the path of  pdf file/files.
  pdfreader = PdfReader('48lawsofpower.pdf')
  return pdfreader

def text_gen():
  # read text from pdf
  pdfreader = pdf_read()
  raw_text = ''
  for i, page in enumerate(pdfreader.pages):
      content = page.extract_text()
      if content:
          raw_text += content

  # We need to split the text using Character Text Split such that it sshould not increase token size
  text_splitter = CharacterTextSplitter(
      separator = "\n",
      chunk_size = 800,
      chunk_overlap  = 200,
      length_function = len,
  )
  texts = text_splitter.split_text(raw_text)
  print(texts)
  # Download embeddings from OpenAI
  embeddings = OpenAIEmbeddings()

  document_search = FAISS.from_texts(texts, embeddings)
  chain = load_qa_chain(OpenAI(), chain_type="stuff")

  return (document_search,chain)

Overwriting model.py


In [None]:
%%writefile app.py
import openai
import streamlit as st
import model

def get_answer(question):
    # Use the Question Answering pipeline to get the answer

    document_search, chain = model.text_gen()
    docs = document_search.similarity_search(question)
    result = chain.run(input_documents=docs, question=question)

    return result

st.title("💬 Chatbot")

if "prompt" not in st.session_state:
    st.session_state.prompt = ""

def submit():
    st.session_state.prompt = st.session_state.widget
    st.session_state.widget = ""

# Define the user's question using text_input
st.text_input("Enter text here", key="widget", on_change=submit)

prompt = st.session_state.prompt

if "messages" not in st.session_state:
    st.session_state["messages"] = [{"role": "assistant", "content": "How can I help you?"}]

for msg in st.session_state.messages:
    st.chat_message(msg["role"]).write(msg["content"])

if prompt:
    st.session_state.messages.append({"role": "user", "content": prompt})
    st.chat_message("user").write(prompt)
    st.text_input("Ask me a question",key="input_clear",value="")
    # Add your Q&A logic here
    result = get_answer(prompt)

    st.session_state.messages.append({"role": "assistant", "content": result})
    st.chat_message("assistant").write(result)



Writing app.py


In [None]:
!ngrok authtoken 2bu2RZ5ZnmsqIy8nEiCz9RjuLk7_82NbmP7tUiYj9S1uWX3HQ

Authtoken saved to configuration file: /root/.ngrok2/ngrok.yml


In [None]:
from pyngrok import ngrok
!streamlit run app2.py&>/dev/null&
!pgrep streamlit

125966


In [None]:
public_url = ngrok.connect(port='8501')
public_url

'http://8939-35-186-163-228.ngrok-free.app'

In [None]:
!streamlit run /content/app2.py


Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Network URL: [0m[1mhttp://172.28.0.12:8502[0m
[34m  External URL: [0m[1mhttp://35.186.163.228:8502[0m
[0m
[34m  Stopping...[0m
[34m  Stopping...[0m
Exception ignored in: <module 'threading' from '/usr/lib/python3.10/threading.py'>
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1547, in _shutdown
    _main_thread._stop()
  File "/usr/lib/python3.10/threading.py", line 1028, in _stop
    def _stop(self):
  File "/usr/local/lib/python3.10/dist-packages/streamlit/web/bootstrap.py", line 69, in signal_handler
    server.stop()
  File "/usr/local/lib/python3.10/dist-packages/streamlit/web/server/server.py", line 399, in stop
    self._runtime.stop()
  File "/usr/local/lib/python3.10/dist-packages/streamlit/runtime/runtime.py", line 311, in stop
    async_objs.eventloop.call

In [None]:
# Install psutil
!pip install psutil

import psutil

# Function to find and kill a process by port
def kill_process_by_port(port):
    for proc in psutil.process_iter(['pid', 'name', 'cmdline']):
        if 'python' in proc.info['name'].lower() and f":{port}" in proc.info['cmdline']:
            print(f"Killing process with PID {proc.info['pid']}")
            proc.terminate()

# Kill process on port 8501
kill_process_by_port(8501)




In [None]:
!lsof -i :8501  # to check whteher the port is free or not