# FDA Adverse Events & Recalls Data Scraper

This notebook scrapes FDA adverse events and recall data for:
- **Drugs**: FAERS (FDA Adverse Event Reporting System)
- **Medical Devices**: MDR (Medical Device Reporting) 
- **Biologics**: FAERS + Recalls
- **All Products**: FDA Recall Database

## Data Sources:
1. **FAERS**: https://fis.fda.gov/content/Exports/faers_xml_YYYYQn.zip
2. **FDA Recalls**: https://www.fda.gov/safety/recalls-market-withdrawals-safety-alerts
3. **OpenFDA API**: https://open.fda.gov/apis/


In [18]:
import sys
import subprocess

print("Reinstalling numpy and pandas for compatibility...")
subprocess.check_call([sys.executable, "-m", "pip", "install", "--upgrade", "--force-reinstall", "numpy", "pandas"])

print("\n" + "="*60)
print("IMPORTANT: RESTART YOUR KERNEL NOW!")
print("="*60)
print("Go to: Kernel -> Restart Kernel")
print("Then run the next cell again.")
print("="*60)


Reinstalling numpy and pandas for compatibility...
Collecting numpy
  Using cached numpy-2.2.6-cp310-cp310-macosx_14_0_arm64.whl (5.3 MB)
Collecting pandas
  Using cached pandas-2.3.3-cp310-cp310-macosx_11_0_arm64.whl (10.8 MB)
Collecting python-dateutil>=2.8.2
  Using cached python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
Collecting tzdata>=2022.7
  Using cached tzdata-2025.2-py2.py3-none-any.whl (347 kB)
Collecting pytz>=2020.1
  Using cached pytz-2025.2-py2.py3-none-any.whl (509 kB)
Collecting six>=1.5
  Using cached six-1.17.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: pytz, tzdata, six, numpy, python-dateutil, pandas
  Attempting uninstall: pytz
    Found existing installation: pytz 2025.2
    Uninstalling pytz-2025.2:
      Successfully uninstalled pytz-2025.2
  Attempting uninstall: tzdata
    Found existing installation: tzdata 2025.2
    Uninstalling tzdata-2025.2:
      Successfully uninstalled tzdata-2025.2
  Attempting uninstall: six
    Found exi



  Attempting uninstall: python-dateutil
    Found existing installation: python-dateutil 2.9.0.post0
    Uninstalling python-dateutil-2.9.0.post0:
      Successfully uninstalled python-dateutil-2.9.0.post0
  Attempting uninstall: pandas
    Found existing installation: pandas 2.3.3
    Uninstalling pandas-2.3.3:
      Successfully uninstalled pandas-2.3.3
Successfully installed numpy-2.2.6 pandas-2.3.3 python-dateutil-2.9.0.post0 pytz-2025.2 six-1.17.0 tzdata-2025.2

IMPORTANT: RESTART YOUR KERNEL NOW!
Go to: Kernel -> Restart Kernel
Then run the next cell again.




In [19]:
# Import Data
import sys
import subprocess

# Try importing
try:
    from pathlib import Path
    import pandas as pd
    import numpy as np
    print("✓ pandas and numpy imported successfully")
except (ValueError, ImportError) as e:
    error_msg = str(e)
    if "numpy.dtype" in error_msg or "binary incompatibility" in error_msg.lower():
        print("="*60)
        print("NUMPY/PANDAS COMPATIBILITY ERROR DETECTED")
        print("="*60)
        print("\nTo fix this issue:")
        print("1. Run this command in a NEW cell:")
        print("   !pip install --upgrade --force-reinstall numpy pandas")
        print("\n2. RESTART YOUR KERNEL:")
        print("   Kernel -> Restart Kernel (or Kernel -> Restart)")
        print("\n3. Run this cell again")
        print("\n" + "="*60)
        raise ImportError("Please follow the instructions above to fix numpy/pandas compatibility") from e
    else:
        raise

# Other imports
import requests
import json
import time
from datetime import datetime, timedelta
import zipfile
import io
from tqdm import tqdm

# Install required packages
try:
    import requests
    print("✓ requests already installed")
except ImportError:
    print("Installing requests...")
    subprocess.check_call([sys.executable, "-m", "pip", "install", "requests"])
    import requests

print("✓ All imports successful!")

BASE_DIR = Path("/Users/Kay Michnicki/AllCode/FDA Data Scraping")
OUTPUT_DIR = BASE_DIR / "fda_adverse_events_recalls"
OUTPUT_DIR.mkdir(exist_ok=True)

# Create subdirectories
(OUTPUT_DIR / "adverse_events").mkdir(exist_ok=True)
(OUTPUT_DIR / "recalls").mkdir(exist_ok=True)

print(f"Output directory: {OUTPUT_DIR}")


✓ pandas and numpy imported successfully
✓ requests already installed
✓ All imports successful!
Output directory: /Users/Kay Michnicki/AllCode/FDA Data Scraping/fda_adverse_events_recalls


In [20]:
# OpenFDA API Helper Functions

class FDADataScraper:
    """Scraper for FDA adverse events and recalls using OpenFDA API"""
    
    BASE_URL = "https://api.fda.gov"
    
    def __init__(self, limit=1000):
        """
        Initialize scraper
        
        Args:
            limit: Maximum results per API call (max 1000 per request)
        """
        self.limit = limit
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (compatible; FDA-Research-Bot/1.0)'
        })
    
    def search_adverse_events(self, search_query="*", skip=0, limit=None):
        """
        Search FDA adverse events (FAERS data)
        
        Args:
            search_query: Search query (e.g., "brand_name:KEYTRUDA", "*" for all)
            skip: Number of results to skip (for pagination)
            limit: Number of results to return (defaults to self.limit)
        
        Returns:
            List of adverse event records
        """
        if limit is None:
            limit = self.limit
        
        url = f"{self.BASE_URL}/drug/event.json"
        params = {
            'search': search_query,
            'limit': min(limit, 1000),  # API max is 1000
            'skip': skip
        }
        
        try:
            response = self.session.get(url, params=params, timeout=30)
            response.raise_for_status()
            data = response.json()
            return data.get('results', [])
        except Exception as e:
            print(f"Error fetching adverse events: {e}")
            return []
    
    def search_recalls_drugs(self, search_query="*", skip=0, limit=None):
        """Search drug recalls"""
        if limit is None:
            limit = self.limit
        
        url = f"{self.BASE_URL}/drug/enforcement.json"
        params = {
            'search': search_query,
            'limit': min(limit, 1000),
            'skip': skip
        }
        
        try:
            response = self.session.get(url, params=params, timeout=30)
            response.raise_for_status()
            data = response.json()
            return data.get('results', [])
        except Exception as e:
            print(f"Error fetching drug recalls: {e}")
            return []
    
    def search_recalls_devices(self, search_query="*", skip=0, limit=None):
        """Search medical device recalls"""
        if limit is None:
            limit = self.limit
        
        url = f"{self.BASE_URL}/device/enforcement.json"
        params = {
            'search': search_query,
            'limit': min(limit, 1000),
            'skip': skip
        }
        
        try:
            response = self.session.get(url, params=params, timeout=30)
            response.raise_for_status()
            data = response.json()
            return data.get('results', [])
        except Exception as e:
            print(f"Error fetching device recalls: {e}")
            return []
    
    def search_device_events(self, search_query="*", skip=0, limit=None):
        """Search medical device adverse events (MDR data)"""
        if limit is None:
            limit = self.limit
        
        url = f"{self.BASE_URL}/device/event.json"
        params = {
            'search': search_query,
            'limit': min(limit, 1000),
            'skip': skip
        }
        
        try:
            response = self.session.get(url, params=params, timeout=30)
            response.raise_for_status()
            data = response.json()
            return data.get('results', [])
        except Exception as e:
            print(f"Error fetching device events: {e}")
            return []
    
    def search_recalls_biologics(self, search_query="*", skip=0, limit=None):
        """Search biologics recalls (filtered from food/enforcement)"""
        if limit is None:
            limit = self.limit
        
        # Biologics may be in food/enforcement endpoint with specific product codes
        url = f"{self.BASE_URL}/food/enforcement.json"
        params = {
            'search': search_query,
            'limit': min(limit, 1000),
            'skip': skip
        }
        
        try:
            response = self.session.get(url, params=params, timeout=30)
            response.raise_for_status()
            data = response.json()
            # Filter for biologics-related recalls
            results = data.get('results', [])
            # Filter by product description containing biologics keywords
            biologics_keywords = ['biologic', 'vaccine', 'blood', 'plasma', 'biotechnology']
            filtered = [r for r in results if any(kw.lower() in str(r.get('product_description', '')).lower() 
                                                 for kw in biologics_keywords)]
            return filtered
        except Exception as e:
            print(f"Error fetching biologics recalls: {e}")
            return []
    
    def get_all_adverse_events_paginated(self, search_query="*", max_results=10000):
        """Get all adverse events (drugs/biologics) with pagination"""
        all_results = []
        skip = 0
        batch_size = 1000
        
        print(f"Fetching adverse events (max {max_results})...")
        with tqdm(total=min(max_results, 10000)) as pbar:
            while len(all_results) < max_results:
                batch = self.search_adverse_events(search_query, skip=skip, limit=batch_size)
                if not batch:
                    break
                
                all_results.extend(batch)
                skip += len(batch)
                pbar.update(len(batch))
                
                if len(batch) < batch_size:  # Last batch
                    break
                
                time.sleep(0.5)  # Rate limiting
        
        return all_results[:max_results]
    
    def get_all_device_events_paginated(self, search_query="*", max_results=10000):
        """Get all device adverse events with pagination"""
        all_results = []
        skip = 0
        batch_size = 1000
        
        print(f"Fetching device events (max {max_results})...")
        with tqdm(total=min(max_results, 10000)) as pbar:
            while len(all_results) < max_results:
                batch = self.search_device_events(search_query, skip=skip, limit=batch_size)
                if not batch:
                    break
                
                all_results.extend(batch)
                skip += len(batch)
                pbar.update(len(batch))
                
                if len(batch) < batch_size:  # Last batch
                    break
                
                time.sleep(0.5)  # Rate limiting
        
        return all_results[:max_results]

# Initialize scraper
scraper = FDADataScraper(limit=1000)
print("FDA Data Scraper initialized!")
print("\nOpenFDA API Documentation: https://open.fda.gov/apis/")
print("Note: API has rate limits, so requests are throttled")


FDA Data Scraper initialized!

OpenFDA API Documentation: https://open.fda.gov/apis/
Note: API has rate limits, so requests are throttled


In [21]:
# Processing Functions for Device and Biologics Events

def process_device_event_record(record):
    """Process a medical device adverse event record"""
    try:
        device = record.get('device', [{}])[0] if record.get('device') else {}
        manufacturer = record.get('manufacturer_d_name', '') or record.get('manufacturer_name', '')
        product_code = device.get('device_product_code', '') or record.get('device_product_code', '')
        device_name = device.get('device_name', '') or device.get('device_operator_name', '') or record.get('device_name', '')
        
        processed = {
            'event_key': record.get('event_key', ''),
            'report_date': record.get('date_of_event', '') or record.get('date_received', ''),
            'device_name': device_name,
            'device_product_code': product_code,
            'manufacturer': manufacturer,
            'event_type': record.get('event_type', ''),
            'adverse_event_flag': record.get('adverse_event_flag', ''),
            'product_problem_flag': record.get('product_problem_flag', ''),
            'device_problem': record.get('device_problem', '') or record.get('event_description', ''),
            'mdr_text': record.get('mdr_text', ''),
            'raw_data': json.dumps(record)
        }
        return processed
    except Exception as e:
        print(f"Error processing device record: {e}")
        return None

def process_biologics_event_record(record):
    """Process a biologics adverse event record (from FAERS, filtered for biologics)"""
    try:
        # Similar structure to drug adverse events but flagged as biologics
        patient = record.get('patient', {})
        drug = record.get('patient', {}).get('drug', [{}])[0] if record.get('patient', {}).get('drug') else {}
        reaction = record.get('patient', {}).get('reaction', [])
        
        # Check if it's a biologic (vaccines, blood products, etc.)
        product_name = drug.get('medicinalproduct', '') or ''
        is_biologic = any(keyword in product_name.lower() 
                         for keyword in ['vaccine', 'serum', 'plasma', 'blood', 'biologic', 'biotechnology'])
        
        processed = {
            'safetyreportid': record.get('safetyreportid', ''),
            'receivedate': record.get('receivedate', ''),
            'serious': record.get('serious', ''),
            'product_name': product_name,
            'generic_name': drug.get('activesubstancename', ''),
            'brand_name': drug.get('openfda', {}).get('brand_name', [''])[0] if drug.get('openfda') else '',
            'adverse_reactions': ', '.join([r.get('reactionmeddrapt', '') for r in reaction]) if reaction else '',
            'reaction_count': len(reaction) if reaction else 0,
            'patient_age': patient.get('patientonsetage', ''),
            'patient_sex': patient.get('patientsex', ''),
            'is_biologic': is_biologic,
            'raw_data': json.dumps(record)
        }
        return processed if is_biologic else None  # Only return if confirmed biologic
    except Exception as e:
        print(f"Error processing biologics record: {e}")
        return None


In [22]:
# Fetch Medical Device Adverse Events (MDR)

print("="*60)
print("FETCHING MEDICAL DEVICE ADVERSE EVENTS")
print("="*60)
print("Fetching device adverse events (MDR data)...")
print("Note: Start with a sample to test, then scale up")

# Get device adverse events
device_events_raw = scraper.get_all_device_events_paginated(search_query="*", max_results=1000)

print(f"\nFetched {len(device_events_raw)} device event records")

if device_events_raw:
    # Process records
    processed_device_events = []
    for record in tqdm(device_events_raw, desc="Processing device events"):
        processed = process_device_event_record(record)
        if processed:
            processed_device_events.append(processed)
    
    device_events_df = pd.DataFrame(processed_device_events)
    print(f"\nProcessed {len(device_events_df)} device events")
    print(f"\nSample data:")
    available_cols = [col for col in ['device_name', 'manufacturer', 'event_type', 'device_problem'] 
                     if col in device_events_df.columns]
    print(device_events_df[available_cols].head() if available_cols else device_events_df.head())
else:
    device_events_df = None
    print("No device events data retrieved")


FETCHING MEDICAL DEVICE ADVERSE EVENTS
Fetching device adverse events (MDR data)...
Note: Start with a sample to test, then scale up
Fetching device events (max 1000)...


100%|██████████| 1000/1000 [00:03<00:00, 265.90it/s]



Fetched 1000 device event records


Processing device events: 100%|██████████| 1000/1000 [00:00<00:00, 28645.70it/s]


Processed 1000 device events

Sample data:
  device_name manufacturer          event_type device_problem
0                                       Injury               
1                                       Injury               
2                           No answer provided               
3                                       Injury               
4                                  Malfunction               





In [23]:
# Fetch Biologics Adverse Events (from FAERS, filtered)

print("="*60)
print("FETCHING BIOLOGICS ADVERSE EVENTS")
print("="*60)
print("Fetching biologics adverse events from FAERS...")
print("Note: Filtering FAERS data for biologics products (vaccines, blood products, etc.)")

# Get adverse events and filter for biologics
print("Fetching FAERS data (this may take a while)...")
all_adverse_events = scraper.get_all_adverse_events_paginated(search_query="*", max_results=2000)

print(f"\nFetched {len(all_adverse_events)} total adverse event records")
print("Filtering for biologics products...")

# Process and filter for biologics
processed_biologics_events = []
for record in tqdm(all_adverse_events, desc="Processing biologics events"):
    processed = process_biologics_event_record(record)
    if processed:  # Only biologics are returned
        processed_biologics_events.append(processed)

if processed_biologics_events:
    biologics_events_df = pd.DataFrame(processed_biologics_events)
    print(f"\nProcessed {len(biologics_events_df)} biologics adverse events")
    print(f"\nSample data:")
    available_cols = [col for col in ['product_name', 'brand_name', 'adverse_reactions', 'serious'] 
                     if col in biologics_events_df.columns]
    print(biologics_events_df[available_cols].head() if available_cols else biologics_events_df.head())
else:
    biologics_events_df = None
    print("No biologics events found in sample")


FETCHING BIOLOGICS ADVERSE EVENTS
Fetching biologics adverse events from FAERS...
Note: Filtering FAERS data for biologics products (vaccines, blood products, etc.)
Fetching FAERS data (this may take a while)...
Fetching adverse events (max 2000)...


100%|██████████| 2000/2000 [00:06<00:00, 306.12it/s]



Fetched 2000 total adverse event records
Filtering for biologics products...


Processing biologics events: 100%|██████████| 2000/2000 [00:00<00:00, 10890.39it/s]

No biologics events found in sample





In [24]:
# Fetch Adverse Events Data

def process_adverse_event_record(record):
    """Process a single adverse event record into structured format"""
    try:
        # Extract key fields
        patient = record.get('patient', {})
        drug = record.get('patient', {}).get('drug', [{}])[0] if record.get('patient', {}).get('drug') else {}
        reaction = record.get('patient', {}).get('reaction', [])
        
        processed = {
            'safetyreportid': record.get('safetyreportid', ''),
            'receivedate': record.get('receivedate', ''),
            'serious': record.get('serious', ''),
            'seriousnessdeath': record.get('seriousnessdeath', ''),
            'seriousnesslifethreatening': record.get('seriousnesslifethreatening', ''),
            'seriousnesshospitalization': record.get('seriousnesshospitalization', ''),
            'seriousnessdisabling': record.get('seriousnessdisabling', ''),
            'drug_product_name': drug.get('medicinalproduct', ''),
            'drug_generic_name': drug.get('activesubstancename', ''),
            'drug_brand_name': drug.get('openfda', {}).get('brand_name', [''])[0] if drug.get('openfda') else '',
            'adverse_reactions': ', '.join([r.get('reactionmeddrapt', '') for r in reaction]) if reaction else '',
            'reaction_count': len(reaction) if reaction else 0,
            'patient_age': patient.get('patientonsetage', ''),
            'patient_age_unit': patient.get('patientonsetageunit', ''),
            'patient_sex': patient.get('patientsex', ''),
            'outcome': ', '.join(patient.get('reaction', [{}])[0].get('reactionoutcome', [])) if patient.get('reaction') else '',
            'raw_data': json.dumps(record)  # Keep raw for reference
        }
        return processed
    except Exception as e:
        print(f"Error processing record: {e}")
        return None

# Fetch sample adverse events (adjust max_results as needed)
print("Fetching recent adverse events...")
print("Note: Start with a small sample to test, then scale up")

# Get recent adverse events (last 1000 for testing)
adverse_events_raw = scraper.get_all_adverse_events_paginated(search_query="*", max_results=1000)

print(f"\nFetched {len(adverse_events_raw)} adverse event records")

if adverse_events_raw:
    # Process records
    processed_events = []
    for record in tqdm(adverse_events_raw, desc="Processing records"):
        processed = process_adverse_event_record(record)
        if processed:
            processed_events.append(processed)
    
    adverse_events_df = pd.DataFrame(processed_events)
    print(f"\nProcessed {len(adverse_events_df)} adverse events")
    print(f"\nSample data:")
    print(adverse_events_df[['drug_product_name', 'adverse_reactions', 'serious', 'receivedate']].head())
else:
    adverse_events_df = None
    print("No adverse events data retrieved")


Fetching recent adverse events...
Note: Start with a small sample to test, then scale up
Fetching adverse events (max 1000)...


100%|██████████| 1000/1000 [00:03<00:00, 290.64it/s]



Fetched 1000 adverse event records


Processing records: 100%|██████████| 1000/1000 [00:00<00:00, 16305.72it/s]


Processed 1000 adverse events

Sample data:
     drug_product_name                          adverse_reactions serious  \
0        DURAGESIC-100        DRUG ADMINISTRATION ERROR, OVERDOSE       1   
1               BONIVA  Vomiting, Diarrhoea, Arthralgia, Headache       1   
2            IBUPROFEN                Dyspepsia, Renal impairment       1   
3               LYRICA                           Drug ineffective       2   
4  DOXYCYCLINE HYCLATE                      Drug hypersensitivity       2   

  receivedate  
0    20080707  
1    20140306  
2    20140228  
3    20140312  
4    20140312  





In [25]:
# Fetch Recalls Data

def process_recall_record(record, product_type='drug'):
    """Process a recall record into structured format"""
    try:
        processed = {
            'recall_number': record.get('recall_number', ''),
            'recall_initiation_date': record.get('recall_initiation_date', ''),
            'product_description': record.get('product_description', ''),
            'reason_for_recall': record.get('reason_for_recall', ''),
            'product_type': product_type,
            'recalling_firm': record.get('recalling_firm', ''),
            'status': record.get('status', ''),
            'raw_data': json.dumps(record)
        }
        return processed
    except Exception as e:
        print(f"Error processing recall record: {e}")
        return None

# Fetch Drug Recalls
print("Fetching drug recalls...")
drug_recalls_list = []
for skip in range(0, 1000, 1000):  # Get up to 1000 records
    batch = scraper.search_recalls_drugs(search_query="*", skip=skip, limit=1000)
    if not batch:
        break
    drug_recalls_list.extend(batch)
    time.sleep(0.5)

print(f"Fetched {len(drug_recalls_list)} drug recall records")

# Fetch Device Recalls
print("\nFetching medical device recalls...")
device_recalls_list = []
for skip in range(0, 1000, 1000):
    batch = scraper.search_recalls_devices(search_query="*", skip=skip, limit=1000)
    if not batch:
        break
    device_recalls_list.extend(batch)
    time.sleep(0.5)

print(f"Fetched {len(device_recalls_list)} device recall records")

# Process recalls
all_recalls = []

# Process drug recalls
for record in drug_recalls_list:
    processed = process_recall_record(record, product_type='drug')
    if processed:
        all_recalls.append(processed)

# Process device recalls
for record in device_recalls_list:
    processed = process_recall_record(record, product_type='device')
    if processed:
        all_recalls.append(processed)

if all_recalls:
    recalls_df = pd.DataFrame(all_recalls)
    print(f"\nProcessed {len(recalls_df)} total recall records")
    print(f"\nBy product type:")
    print(recalls_df['product_type'].value_counts())
    print(f"\nSample recalls:")
    print(recalls_df[['product_type', 'product_description', 'reason_for_recall', 'recall_initiation_date']].head())
else:
    recalls_df = None
    print("No recall data retrieved")


Fetching drug recalls...
Fetched 1000 drug recall records

Fetching medical device recalls...
Fetched 1000 device recall records

Processed 2000 total recall records

By product type:
product_type
drug      1000
device    1000
Name: count, dtype: int64

Sample recalls:
  product_type                                product_description  \
0         drug  Progesterone 100 mg/mL in Corn Oil Injection, ...   
1         drug  Assured Instant Hand Sanitizer Aloe & Moisturi...   
2         drug  Dextroamphetamine Saccharate, Amphetamine Aspa...   
3         drug  No Drip Nasal Spray, Oxymetazoline HCl 0.05% N...   
4         drug  2 mcg/mL Fentanyl Citrate and 0.16% Bupivacain...   

                                   reason_for_recall recall_initiation_date  
0  Lack of Assurance of Sterility:  A recall of a...               20150903  
1  CGMP Deviations: Next Advanced Antibacterial H...               20200730  
2  Some bottles may contain mixed strengths of th...               20200522  
3  

In [26]:
# Organize and Save Data

print("="*60)
print("SAVING ALL DATA")
print("="*60)

# Save Drug Adverse Events
if adverse_events_df is not None and len(adverse_events_df) > 0:
    ae_parquet_path = OUTPUT_DIR / "adverse_events" / "drug_adverse_events.parquet"
    ae_csv_path = OUTPUT_DIR / "adverse_events" / "drug_adverse_events.csv"
    
    adverse_events_df.to_parquet(ae_parquet_path, index=False)
    adverse_events_df.to_csv(ae_csv_path, index=False)
    
    print(f"\n✓ Saved {len(adverse_events_df)} DRUG adverse events to:")
    print(f"  Parquet: {ae_parquet_path}")
    print(f"  CSV: {ae_csv_path}")
    print(f"  Unique drugs: {adverse_events_df['drug_product_name'].nunique()}")
else:
    print("\n✗ No drug adverse events data to save")

# Save Device Adverse Events
if device_events_df is not None and len(device_events_df) > 0:
    device_ae_parquet_path = OUTPUT_DIR / "adverse_events" / "device_adverse_events.parquet"
    device_ae_csv_path = OUTPUT_DIR / "adverse_events" / "device_adverse_events.csv"
    
    device_events_df.to_parquet(device_ae_parquet_path, index=False)
    device_events_df.to_csv(device_ae_csv_path, index=False)
    
    print(f"\n✓ Saved {len(device_events_df)} DEVICE adverse events to:")
    print(f"  Parquet: {device_ae_parquet_path}")
    print(f"  CSV: {device_ae_csv_path}")
    if 'device_name' in device_events_df.columns:
        print(f"  Unique devices: {device_events_df['device_name'].nunique()}")
else:
    print("\n✗ No device adverse events data to save")

# Save Biologics Adverse Events
if biologics_events_df is not None and len(biologics_events_df) > 0:
    bio_ae_parquet_path = OUTPUT_DIR / "adverse_events" / "biologics_adverse_events.parquet"
    bio_ae_csv_path = OUTPUT_DIR / "adverse_events" / "biologics_adverse_events.csv"
    
    biologics_events_df.to_parquet(bio_ae_parquet_path, index=False)
    biologics_events_df.to_csv(bio_ae_csv_path, index=False)
    
    print(f"\n✓ Saved {len(biologics_events_df)} BIOLOGICS adverse events to:")
    print(f"  Parquet: {bio_ae_parquet_path}")
    print(f"  CSV: {bio_ae_csv_path}")
    if 'product_name' in biologics_events_df.columns:
        print(f"  Unique products: {biologics_events_df['product_name'].nunique()}")
else:
    print("\n✗ No biologics adverse events data to save")

# Save Recalls
if recalls_df is not None and len(recalls_df) > 0:
    # Save as Parquet and CSV
    recalls_parquet_path = OUTPUT_DIR / "recalls" / "recalls.parquet"
    recalls_csv_path = OUTPUT_DIR / "recalls" / "recalls.csv"
    
    recalls_df.to_parquet(recalls_parquet_path, index=False)
    recalls_df.to_csv(recalls_csv_path, index=False)
    
    print(f"\n✓ Saved {len(recalls_df)} recall records to:")
    print(f"  Parquet: {recalls_parquet_path}")
    print(f"  CSV: {recalls_csv_path}")
    
    # Summary statistics
    print(f"\nRecalls Summary:")
    print(f"  Total records: {len(recalls_df)}")
    print(f"  By product type:")
    print(recalls_df['product_type'].value_counts())
else:
    print("\n✗ No recall data to save")


SAVING ALL DATA

✓ Saved 1000 DRUG adverse events to:
  Parquet: /Users/Kay Michnicki/AllCode/FDA Data Scraping/fda_adverse_events_recalls/adverse_events/drug_adverse_events.parquet
  CSV: /Users/Kay Michnicki/AllCode/FDA Data Scraping/fda_adverse_events_recalls/adverse_events/drug_adverse_events.csv
  Unique drugs: 268

✓ Saved 1000 DEVICE adverse events to:
  Parquet: /Users/Kay Michnicki/AllCode/FDA Data Scraping/fda_adverse_events_recalls/adverse_events/device_adverse_events.parquet
  CSV: /Users/Kay Michnicki/AllCode/FDA Data Scraping/fda_adverse_events_recalls/adverse_events/device_adverse_events.csv
  Unique devices: 1

✗ No biologics adverse events data to save

✓ Saved 2000 recall records to:
  Parquet: /Users/Kay Michnicki/AllCode/FDA Data Scraping/fda_adverse_events_recalls/recalls/recalls.parquet
  CSV: /Users/Kay Michnicki/AllCode/FDA Data Scraping/fda_adverse_events_recalls/recalls/recalls.csv

Recalls Summary:
  Total records: 2000
  By product type:
product_type
drug   

In [27]:
# Advanced Search Functions

def search_adverse_events_by_drug(drug_name, max_results=1000):
    """Search adverse events for a specific drug"""
    print(f"Searching adverse events for: {drug_name}")
    
    # Try multiple search strategies
    search_queries = [
        f"brand_name:{drug_name}",
        f"generic_name:{drug_name}",
        f"openfda.brand_name:{drug_name}",
        f"patient.drug.medicinalproduct:{drug_name}"
    ]
    
    all_results = []
    for query in search_queries:
        results = scraper.get_all_adverse_events_paginated(query, max_results=max_results//len(search_queries))
        all_results.extend(results)
        time.sleep(0.5)
    
    return all_results[:max_results]

def search_recalls_by_product(product_name, product_type='drug', max_results=500):
    """Search recalls for a specific product"""
    print(f"Searching {product_type} recalls for: {product_name}")
    
    search_query = f"product_description:{product_name}"
    
    recalls = []
    if product_type == 'drug':
        for skip in range(0, max_results, 1000):
            batch = scraper.search_recalls_drugs(search_query, skip=skip, limit=1000)
            if not batch:
                break
            recalls.extend(batch)
            time.sleep(0.5)
    elif product_type == 'device':
        for skip in range(0, max_results, 1000):
            batch = scraper.search_recalls_devices(search_query, skip=skip, limit=1000)
            if not batch:
                break
            recalls.extend(batch)
            time.sleep(0.5)
    
    return recalls[:max_results]

print("Advanced search functions ready!")
print("\nExample usage:")
print('  keytruda_ae = search_adverse_events_by_drug("KEYTRUDA", max_results=500)')
print('  keytruda_recalls = search_recalls_by_product("KEYTRUDA", product_type="drug")')


Advanced search functions ready!

Example usage:
  keytruda_ae = search_adverse_events_by_drug("KEYTRUDA", max_results=500)
  keytruda_recalls = search_recalls_by_product("KEYTRUDA", product_type="drug")


## Important Notes

### API Rate Limits
- OpenFDA API has rate limits (typically 240 requests per minute)
- The scraper includes `time.sleep(0.5)` between requests to avoid hitting limits
- For higher volume, you may need an API key: https://open.fda.gov/apis/authentication/
