# Researching Regulatory Status for 10k Drug Products
This notebook demonstrates using everyrow's `agent_map()` to research the current FDA regulatory status of drug products at scale.

**Use case:** You have ~10k drug product entries (trade name, ingredient, applicant, strength, dosage form) and need to determine each product's current regulatory status—is it FDA approved, discontinued, withdrawn for safety, or something else?

**Why everyrow?** Determining regulatory status requires researching each product individually against FDA databases, Orange Book listings, Federal Register notices, and other sources. Some products have straightforward histories (a single NDA approval still in effect), while others have complex timelines involving tentative approvals, voluntary withdrawals, or transitions between marketed and not-marketed status. everyrow handles this heterogeneity—each row gets as much research as it needs, and you only pay for what's necessary.

**Results:** 9,997 out of 10,000 rows returned results (99.97% success rate). Only 3 rows failed. For evals on agent accuracy, see [evals.futuresearch.ai](https://evals.futuresearch.ai/) or our [papers](https://futuresearch.ai/research).

In [None]:
from dotenv import load_dotenv
import pandas as pd
from pydantic import BaseModel, Field
from everyrow import create_session
from everyrow.ops import agent_map

pd.set_option("display.max_colwidth", None)


load_dotenv()

## Load Data
The input dataset contains ~10k drug product entries with trade name, ingredient, applicant, strength, and dosage form.

In [None]:
input_df = pd.read_csv("regulatory_status_results.csv", usecols=["row_id", "trade_name", "ingredient", "applicant", "strength", "dosage_form"])
print(f"{len(input_df):,} drug products")
print(f"Columns: {list(input_df.columns)}")
input_df.head(5)

## Define Response Model and Task
Each agent researches a drug product's regulatory status using its trade name, ingredient, and dosage form. The `regulatory_status` field is constrained to a fixed set of allowed values.

In [None]:
from enum import Enum


class RegulatoryStatus(str, Enum):
    NDA = "FDA approved (NDA)"
    ANDA = "FDA approved (ANDA \u2013 generic)"
    TENTATIVE = "Tentative approval"
    DISCONTINUED = "Discontinued (not withdrawn for safety)"
    WITHDRAWN = "Withdrawn for safety reasons"
    NOT_MARKETED = "Approved but currently not marketed"
    UNDER_REVIEW = "Under FDA review"
    EUA = "Emergency Use Authorization (EUA)"
    NOT_APPROVED = "Not FDA approved (compounded / ex-US only)"


class DrugRegulatoryResult(BaseModel):
    regulatory_status: RegulatoryStatus = Field(
        description="The current FDA regulatory status of this drug product."
    )


AGENT_TASK = """Research the current regulatory status based on its trade name, ingredient, and dosage form.

Allowed Values for regulatory_status:
FDA approved (NDA)
FDA approved (ANDA \u2013 generic)
Tentative approval
Discontinued (not withdrawn for safety)
Withdrawn for safety reasons
Approved but currently not marketed
Under FDA review
Emergency Use Authorization (EUA)
Not FDA approved (compounded / ex-US only)

NOTHING ELSE."""

## Run Agent Map
Send all 10k rows to `agent_map()`. Each row gets its own agent that researches the product's regulatory history—checking FDA Orange Book listings, approval databases, Federal Register notices, and other sources.

In [None]:
async with create_session(name="Drug Regulatory Status Research") as session:
    print(f"Session URL: {session.get_url()}\n")
    result = await agent_map(
        task=AGENT_TASK,
        input=input_df,
        response_model=DrugRegulatoryResult,
        session=session,
    )

### Cost
_(Fill in after running.)_

## Inspecting Results
Load the results CSV (downloaded from the session URL above) and analyze the regulatory status classifications.

In [None]:
results_df = pd.read_csv("regulatory_status_results.csv")
print(f"Total rows: {len(results_df):,}")
print(f"Rows with results: {results_df['regulatory_status'].notna().sum():,}")
print(f"Failed rows: {results_df['regulatory_status'].isna().sum()}")
results_df.head(3)

In [None]:
status_counts = results_df["regulatory_status"].value_counts()
status_pct = (results_df["regulatory_status"].value_counts(normalize=True) * 100).round(1)

summary = pd.DataFrame({"count": status_counts, "percent": status_pct})
print("Regulatory Status Breakdown")
print("=" * 50)
summary

In [None]:
import matplotlib.pyplot as plt

# Normalize near-duplicate labels before plotting
normalize_map = {
    "FDA approve (NDA)": "FDA approved (NDA)",
    "FDA approved (ANDA - generic)": "FDA approved (ANDA \u2013 generic)",
}
plot_df = results_df.copy()
plot_df["regulatory_status"] = plot_df["regulatory_status"].replace(normalize_map)

counts = plot_df["regulatory_status"].value_counts()

fig, ax = plt.subplots(figsize=(10, 5))
counts.plot.barh(ax=ax)
ax.set_xlabel("Number of products")
ax.set_title("FDA Regulatory Status Distribution (10k drug products)")
ax.invert_yaxis()
for i, v in enumerate(counts.values):
    ax.text(v + 30, i, f"{v:,}", va="center", fontsize=9)
plt.tight_layout()
plt.show()

### Exploring by Status Category

In [None]:
# Products withdrawn for safety reasons
withdrawn = results_df[results_df["regulatory_status"] == "Withdrawn for safety reasons"]
print(f"Withdrawn for safety: {len(withdrawn)} products")
withdrawn[["trade_name", "ingredient", "dosage_form", "research"]].sample(min(5, len(withdrawn)), random_state=42)

In [None]:
# Discontinued products (not for safety)
discontinued = results_df[results_df["regulatory_status"] == "Discontinued (not withdrawn for safety)"]
print(f"Discontinued (not for safety): {len(discontinued):,} products")
discontinued[["trade_name", "ingredient", "applicant", "research"]].sample(5, random_state=42)

In [None]:
# Currently approved products (NDA + ANDA)
approved = results_df[results_df["regulatory_status"].isin(["FDA approved (NDA)", "FDA approved (ANDA \u2013 generic)"])]
print(f"Currently FDA approved: {len(approved):,} products ({len(approved)/len(results_df)*100:.1f}%)")
print(f"  NDA (brand): {(results_df['regulatory_status'] == 'FDA approved (NDA)').sum():,}")
print(f"  ANDA (generic): {(results_df['regulatory_status'] == 'FDA approved (ANDA \u2013 generic)').sum():,}")
approved[["trade_name", "ingredient", "applicant", "dosage_form", "research"]].sample(5, random_state=42)

### Research Quality
Each result comes with a `research` field containing the agent's sourced reasoning. Let's look at research depth.

In [None]:
results_df["research_len"] = results_df["research"].str.len()
print("Research text length (characters):")
print(results_df["research_len"].describe().round(0).astype(int))
print(f"\nShortest research entry ({results_df['research_len'].min()} chars):")
shortest = results_df.loc[results_df["research_len"].idxmin()]
print(f"  {shortest['trade_name']} ({shortest['ingredient']}): {shortest['research']}")
print(f"\nLongest research entry ({results_df['research_len'].max():,} chars):")
longest = results_df.loc[results_df["research_len"].idxmax()]
print(f"  {longest['trade_name']} ({longest['ingredient']}): {longest['research'][:500]}...")

### Failed Rows
Only 3 out of 10,000 rows failed to return a result.

In [None]:
failed = results_df[results_df["regulatory_status"].isna()]
print(f"Failed rows: {len(failed)}")
failed[["row_id", "trade_name", "ingredient", "applicant", "dosage_form"]]

### Near-Miss Status Labels
A handful of rows returned status values that are close to—but not exactly—one of the allowed values. This is a known edge case with free-text constrained outputs.

In [None]:
ALLOWED = [
    "FDA approved (NDA)",
    "FDA approved (ANDA \u2013 generic)",
    "Tentative approval",
    "Discontinued (not withdrawn for safety)",
    "Withdrawn for safety reasons",
    "Approved but currently not marketed",
    "Under FDA review",
    "Emergency Use Authorization (EUA)",
    "Not FDA approved (compounded / ex-US only)",
]

off_label = results_df[~results_df["regulatory_status"].isin(ALLOWED) & results_df["regulatory_status"].notna()]
print(f"Rows with non-standard status labels: {len(off_label)} / {len(results_df):,}")
off_label[["trade_name", "ingredient", "regulatory_status", "research"]]