<a href="https://colab.research.google.com/github/mehdi-lamrani/llm/blob/main/PDFSummarizer_langchain_openAI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PDF Summarizer with few lines of code using Gradio, OpenAI and LangChain

## Install necessary packages

[Langchain website link](https://docs.langchain.com/docs/)

In [1]:
!pip install -q gradio openai pypdf tiktoken langchain langchain-openai

In [2]:
import os
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

In [4]:
import gradio as gr
from langchain import OpenAI, PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import PyPDFLoader
from langchain_openai import OpenAI

llm = OpenAI(temperature=0)

In [None]:
model_path="TinyLlama/TinyLlama-1.1B-Chat-v1.0"

## LangChain part
#### Function that takes PDF file as input and returns the summary of that PDF
- langchain `PyPDFLoader` helps load the PDF
- After that we can split the document in smaller chunks
- We then use the `load_summarize_chain` to create a summarization chain

In [5]:
def summarize_pdf(pdf_file_path):
    loader = PyPDFLoader(pdf_file_path)
    docs = loader.load_and_split()
    chain = load_summarize_chain(llm, chain_type="map_reduce")
    summary = chain.invoke(docs)
    return summary

In [None]:
summarize = summarize_pdf("/content/A Survey of Large Language Models.pdf")

In [11]:
summarize['output_text']

' This article discusses recent advancements in large language models (LLMs) and their impact on natural language processing (NLP). It covers pre-training, adaptation tuning, utilization, and capacity evaluation of LLMs, as well as challenges and techniques for improving their performance. The article also explores the potential applications and challenges of LLMs in various fields, such as finance and education, and discusses future directions for research. It also includes a collection of research papers and conference proceedings on LLMs and their use in tasks such as question answering and text generation.'

In [None]:
# just to show you how it works
#loader = PyPDFLoader('/content/OA_Paper_2023_04_15.pdf')
#doc=loader.load_and_split()
#print(len(doc))
#doc[0]

## Create a simple gradio UI (if you prefer UI)

In [10]:

input_pdf_path = gr.components.Textbox(label="Provide the PDF file path")
output_summary = gr.components.Textbox(label="Summary")

interface = gr.Interface(
    fn=summarize_pdf,
    inputs=input_pdf_path,
    outputs=output_summary,
    title="PDF Summarizer",
    description="Provide PDF file path to get the summary.",
).launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://363bc8a78acbdadf68.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
