# 📄 Amazon Bedrock + LangChain PDF Summarizer Demo
This notebook demonstrates how to summarize a PDF file using Amazon Bedrock with Claude and LangChain.

In [4]:
import boto3
from langchain_aws.llms.bedrock import BedrockLLM
from langchain.chains.summarize import load_summarize_chain
from langchain.prompts import PromptTemplate
from langchain.docstore.document import Document
import PyPDF2

In [5]:
# --- CONFIGURATION ---
BEDROCK_REGION = "us-east-1"
MODEL_ID = "anthropic.claude-v2"

# --- INIT BEDROCK CLIENT ---
boto3_bedrock = boto3.client("bedrock-runtime", region_name=BEDROCK_REGION)
llm = BedrockLLM(model_id=MODEL_ID, client=boto3_bedrock)

In [6]:
# --- FUNCTION: Load PDF and Extract Text ---
def extract_text_from_pdf(pdf_path):
    text = ""
    with open(pdf_path, "rb") as f:
        reader = PyPDF2.PdfReader(f)
        for page in reader.pages:
            text += page.extract_text()
    return text[:4000]  # Limit token length

In [30]:
# --- LOAD PDF ---
pdf_path = "sample1.pdf"
raw_text = extract_text_from_pdf(pdf_path)
document = Document(page_content=raw_text)

In [31]:
# --- PROMPT TEMPLATE ---
prompt_template = """Summarize the following content professionally:

{text}

Summary:
"""
prompt = PromptTemplate.from_template(prompt_template)
chain = load_summarize_chain(llm, chain_type="stuff", prompt=prompt)

In [32]:
import re

def simple_sent_tokenize(text):
    # Splits sentences by ., !, or ? followed by a space or end of line
    return [s.strip() for s in re.split(r'(?<=[.!?])\s+', text) if s.strip()]

# --- RUN SUMMARY ---
summary = chain.invoke([document])  # Your existing summarisation call

if isinstance(summary, dict) and 'output_text' in summary:
    print("\n📄 Document Summary (Bullet Points):\n")

    raw_text = summary['output_text']
    sentences = simple_sent_tokenize(raw_text)

    for sentence in sentences:
        print(f"• {sentence}")
else:
    print("\n📄 Unexpected Output:\n")
    print(summary)



📄 Document Summary (Bullet Points):

• Here is a professional summary of the key points from the report:

The 2025 State of Tech Talent Report examines the impact of artificial intelligence (AI) on the technology workforce.
• Key findings include:

- Critical talent gaps exist across organizations, with 68% understaffed in AI/ML, 65% in cybersecurity, and 61% in FinOps.
• - 67% of organizations report AI is reshaping technical roles, requiring upskilling of the workforce.
• - With high turnover, organizations face substantial recruiting and onboarding costs.
• Open source culture boosts retention.
• - 85% prioritize practical work portfolios and open source contributions in hiring, evidencing skills.
• - 94% expect AI to deliver value, increasing the need for skilled workers.
• Upskilling is 3.2x more likely than external hiring.
• - Certifications are considered important by 71% when recruiting.
• Upskilling takes 62% less time than hiring and onboarding.
• - 2.7x more organizations 