# Biopharma Deals — RAG Demo (LangChain + FAISS)

**Goal:** Natural-language Q&A over curated BioPharma Dive deal articles with **grounded answers and citations**.

**Why RAG for customer success?** Retrieval-augmented generation combines an LLM with a searchable index of **client-relevant, current** documents. It boosts accuracy and adds citations (trust), and makes demos fast for discovery workshops and MVPs.

In [2]:
import os, json, textwrap, datetime as dt
from pathlib import Path

from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

INDEX_DIR = "../faiss_index"
MODEL = "gpt-4o-mini"
K = 4

# sanity check
assert os.getenv("OPENAI_API_KEY"), "Please set OPENAI_API_KEY in your env."
Path(INDEX_DIR).exists() or (_ for _ in ()).throw(FileNotFoundError(f"{INDEX_DIR}/ not found"))
print("✅ Env ready")

✅ Env ready


In [3]:
embeddings = OpenAIEmbeddings()
vs = FAISS.load_local(INDEX_DIR, embeddings=embeddings, allow_dangerous_deserialization=True)
retriever = vs.as_retriever(search_kwargs={"k": K})
print("✅ FAISS loaded")

✅ FAISS loaded


In [4]:
PROMPT = PromptTemplate(
    input_variables=["context", "question"],
    template=textwrap.dedent("""
    You are an analyst assistant. Use ONLY the provided context to answer.
    If the context is insufficient, say "I don't know".

    Format the answer as concise bullets.
    After any claim, append citations like: [Title — Source, Date].
    If a URL is available, format citations as: [Title](URL) — Source, Date.

    Context:
    {context}

    Question: {question}

    Answer:
    """).strip()
)

In [5]:
llm = ChatOpenAI(model=MODEL, temperature=0)
qa = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    chain_type="stuff",
    chain_type_kwargs={"prompt": PROMPT},
    return_source_documents=True,
)
print("✅ QA chain ready")

✅ QA chain ready


In [6]:
from IPython.display import Markdown, display

def render_answer(result):
    """Pretty-print answer + deduped citations."""
    answer = result["result"]
    docs   = result.get("source_documents", []) or []
    # Dedupe by (title, url, date)
    seen, cites = set(), []
    for d in docs:
        m = d.metadata or {}
        title = m.get("title") or "(untitled)"
        url   = m.get("source_url")
        date  = m.get("date") or "n.d."
        key   = (title, url, date)
        if key in seen: 
            continue
        seen.add(key)
        if url:
            cites.append(f"- [{title}]({url}) ({date})")
        else:
            cites.append(f"- {title} ({date})")
    md = f"{answer}\n\n**Citations:**\n" + ("\n".join(cites) if cites else "- (none)")
    display(Markdown(md))

In [9]:
questions = [
    "List all deals in 2025 that included contingent value rights (CVRs). What milestones are they tied to?",
    "Compare the CVR structures in the Roche-89bio and Pfizer-Metsera deals.",
    "Summarize the announced deal values and contingent payouts for Roche-89bio and Novartis-Tourmaline.",
    "List the most frequently mentioned financial advisors across deals in September 2025.",
    "Why did Roche acquire 89bio? What condition is its lead drug targeting?",
    "Which deals focus on cardiovascular or liver diseases?"
]

for q in questions:
    print("Q:", q)
    res = qa.invoke({"query": q})
    render_answer(res)
    print("-" * 80)

Q: List all deals in 2025 that included contingent value rights (CVRs). What milestones are they tied to?


- **Eli Lilly – Verve Therapeutics**
  - Disclosed Value: $1 billion
  - CVR tied to: Advancement of the experimental VERVE-102 treatment to dose a patient in a Phase 3 trial within 10 years of the transaction’s closing. [Eli Lilly – deal – Verve Therapeutics] — Source, 6/17/2025.

**Citations:**
- Eli Lilly – deal – Verve Therapeutics (6/17/2025)
- Sanofi – deal – Vigil Neuroscience (5/21/2025)
- Sobi – deal – CTI Biopharma (5/10/2023)
- Bayer – deal – Vividion Therapeutics (8/5/2021)

--------------------------------------------------------------------------------
Q: Compare the CVR structures in the Roche-89bio and Pfizer-Metsera deals.


- **Roche-89bio Deal**: 
  - Roche could pay an additional $6 per share via contingent value rights (CVR) if 89bio’s drug, pegozafermin, is approved and meets certain sales targets. 
  - The base deal values 89bio at $2.4 billion, potentially rising to $3.5 billion with CVR payouts [Roche – deal – 89bio](#) — Source, 9/18/2025.

- **Pfizer-Metsera Deal**: 
  - No information on contingent value rights (CVR) or additional payouts was provided in the context for this deal [Pfizer – deal – Metsera](#) — Source, 9/22/2025.

- **Comparison**: 
  - The Roche-89bio deal includes a specific CVR structure tied to drug approval and sales, while the Pfizer-Metsera deal lacks any disclosed CVR information.

**Citations:**
- Pfizer – deal – Metsera (9/22/2025)
- Roche – deal – 89bio (9/18/2025)
- Pfizer – deal – Biohaven Pharmaceuticals (5/10/2022)

--------------------------------------------------------------------------------
Q: Summarize the announced deal values and contingent payouts for Roche-89bio and Novartis-Tourmaline.


- **Roche-89bio**: 
  - Disclosed deal value: $2.4 billion.
  - Potential additional payouts: Up to $6 per share, totaling $3.5 billion if certain sales targets are met. [Roche – deal – 89bio — Source, 9/18/2025]

- **Novartis-Tourmaline**: 
  - Disclosed deal value: $1.4 billion. 
  - No contingent payouts mentioned. [Novartis – deal – Tourmaline Bio — Source, 9/9/2025]

**Citations:**
- Roche – deal – 89bio (9/18/2025)
- Novartis – deal – Tourmaline Bio (9/9/2025)
- Roche – deal – Telavant (10/23/2023)
- Novartis – deal – Morphosys (2/5/2024)

--------------------------------------------------------------------------------
Q: List the most frequently mentioned financial advisors across deals in September 2025.


- Guggenheim Securities [Pfizer – deal – Metsera — Source, 9/22/2025]
- Citi [Pfizer – deal – Metsera — Source, 9/22/2025]
- Goldman Sachs [Pfizer – deal – Metsera — Source, 9/22/2025]

**Citations:**
- Pfizer – deal – Seagen (3/13/2023)
- Pfizer – deal – Metsera (9/22/2025)
- Eli Lilly – deal – Point Biopharma (10/3/2023)
- AbbVie – deal – Allergan (06/25/2019)

--------------------------------------------------------------------------------
Q: Why did Roche acquire 89bio? What condition is its lead drug targeting?


- Roche acquired 89bio to gain access to its drug pegozafermin, which is being developed as a treatment for metabolic dysfunction-associated steatohepatitis (MASH) — a liver condition characterized by fat buildup, inflammation, and scarring [Roche – deal – 89bio](#) — Source, 9/18/2025.
- The acquisition aims to position Roche competitively in the MASH drug market, which has seen increased interest due to the condition's prevalence and the recent approval of similar treatments [Roche – deal – 89bio](#) — Source, 9/18/2025.

**Citations:**
- Roche – deal – 89bio (9/18/2025)
- Roche – deal – Poseida Therapeutics (11/26/2024)
- Roche – deal – Spark Therapeutics (02/25/2019)
- Roche – deal – Telavant (10/23/2023)

--------------------------------------------------------------------------------
Q: Which deals focus on cardiovascular or liver diseases?


- **Gilead – Myr Pharmaceuticals**: Focuses on chronic hepatitis D, a liver infection. [Gilead – deal – Myr Pharmaceuticals](#) — Source, 12/10/2020.
- **Eli Lilly – Verve Therapeutics**: Focuses on gene therapies for cardiovascular disease. [Eli Lilly – deal – Verve Therapeutics](#) — Source, 6/17/2025.

**Citations:**
- Gilead – deal – Myr Pharmaceuticals (12/10/2020)
- Bayer – deal – Asklepios Biopharmaceuticals (10/26/2020)
-  Bristol Myers Squibb – deal – Myokardia (10/05/2020)
- Eli Lilly – deal – Verve Therapeutics (6/17/2025)

--------------------------------------------------------------------------------
