# PyMAUDE Getting Started Guide

This notebook demonstrates the basics of using PyMAUDE to access FDA MAUDE adverse event data.

## What You'll Learn

1. Creating and setting up a database
2. Downloading data from FDA
3. Basic search methods
4. Working with results
5. Viewing event narratives

## 1. Database Setup

First, import PyMAUDE and create a database connection:

In [1]:
from pymaude import MaudeDatabase
import pandas as pd

# Create database (uses file in current directory)
db = MaudeDatabase('notebooks.db', verbose=True)

print("Database created successfully!")

Database created successfully!


## 2. Downloading Data

Download data for specific years. It unfortunately will take a while (approx. 20-30 minutes) the first time you run this to convert the FDA data files to PyMAUDE database format. We'll start with just 2023:

In [None]:
# Download device data for 2023
# This may take a few minutes on first run
db.add_years(2023, tables=['device', 'master'], download=True)

print("\nData downloaded and loaded!")
print("\nDatabase info:")
db.info()

## 3. Boolean Name Search

The `search_by_device_names()` method provides flexible partial matching with boolean logic. It additionally searches across multiple device name fields to ensure best results.

In [None]:
# Create search index for faster searching (one-time setup)
db.create_search_index()

# Simple search - find all pacemaker events
pacemakers = db.search_by_device_names('pacemaker')

print(f"Found {len(pacemakers)} pacemaker reports")
print(f"\nFirst few results:")
print(pacemakers[['BRAND_NAME', 'GENERIC_NAME', 'DATE_RECEIVED']].head())

### OR Logic - Multiple Terms

In [None]:
# Find devices matching ANY of these terms, i.e. devices matching "pacemaker" OR "defibrillator"
results = db.search_by_device_names(['pacemaker', 'defibrillator'])

print(f"Found {len(results)} events (pacemaker OR defibrillator)")
print(f"\nUnique generic names:")
print(results['GENERIC_NAME'].value_counts().head())

### AND Logic - Multiple Required Terms

In [None]:
# Find devices matching ALL of these terms (use nested list), i.e. "cardiac" and "catheter" must be in the device name
results = db.search_by_device_names([['cardiac', 'catheter']])

print(f"Found {len(results)} events (cardiac AND catheter)")
print(f"\nTop brands:")
print(results['BRAND_NAME'].value_counts().head())

## 4. Exact-Match Queries

For precise matching on specific fields, use `query_device()`. 

Please note: there is significant inconsistency across the MAUDE database in use of certain columns for listing event device names. We therefore recommend you use `search_by_device_names()` instead of `query_device()` because the former searches across multiple columns for matching device names. 

In [None]:
# Query by exact generic name (case-insensitive) - not recommended in most settings!
stents = db.query_device(generic_name='Vascular Stent')

print(f"Found {len(stents)} vascular stent events")
print(f"\nBrands found:")
print(stents['BRAND_NAME'].value_counts())

In [None]:
# Query by product code
by_code = db.query_device(product_code='DQY')

print(f"Found {len(by_code)} events with product code DQY")
print(f"\nGeneric names:")
print(by_code['GENERIC_NAME'].value_counts())

## 5. Date Filtering

Both search methods support date range filtering:

In [None]:
# Boolean search with date filter
recent_catheters = db.search_by_device_names(
    'catheter',
    start_date='2023-06-01',
    end_date='2023-12-31'
)

print(f"Catheter events from June-December 2023: {len(recent_catheters)}")

## 6. Working with Results

Results are pandas DataFrames - you can use all pandas operations:

In [None]:
# Get search results
results = db.search_by_device_names('insulin pump')

# Basic DataFrame operations
print(f"Total events: {len(results)}")
print(f"Columns: {list(results.columns[:10])}...")  # First 10 columns
print(f"\nEvent types:")
print(results['EVENT_TYPE'].value_counts())

In [None]:
# Filter to serious events (D=Death, IN=Injury)
serious = results[results['EVENT_TYPE'].str.contains(r'\bD\b|\bIN\b', case=False, na=False, regex=True)]

print(f"Serious events (Death or Injury): {len(serious)}/{len(results)}")
print(f"Percentage: {100 * len(serious) / len(results):.1f}%")

In [None]:
# Group by manufacturer
by_manufacturer = results.groupby('MANUFACTURER_D_NAME').size().sort_values(ascending=False)

print("Top 5 manufacturers by event count:")
print(by_manufacturer.head())

## 7. Getting Event Narratives

To read the actual event descriptions, download text data and use `get_narratives()`:

In [None]:
# Download text data (if not already downloaded)
db.add_years(2023, tables=['text'], download=True)

# Get device events
devices = db.search_by_device_names('defibrillator')

# Get narratives for first 5 events
sample_keys = devices['MDR_REPORT_KEY'].head(5).tolist()
narratives = db.get_narratives(sample_keys)

# Display first narrative
if len(narratives) > 0:
    print(f"Report {narratives.iloc[0]['MDR_REPORT_KEY']}:")
    print(narratives.iloc[0]['FOI_TEXT'][:500])  # First 500 characters
    print("...")

## 8. Exporting Results

Export your results to CSV for further analysis:

In [None]:
# Get some results
results = db.search_by_device_names('catheter', start_date='2023-01-01')

# Export to CSV
results.to_csv('catheter_events_2023.csv', index=False)

print(f"Exported {len(results)} records to catheter_events_2023.csv")

## 9. Cleanup

Always close the database when done:

In [None]:
db.close()
print("Database closed.")

## Next Steps

Now that you understand the basics, explore:

- **02_grouped_search.ipynb** - Compare multiple device categories
- **03_exact_queries.ipynb** - Precise field-based queries
- **04_analysis_helpers.ipynb** - Statistical analysis and visualization
- **05_advanced_workflows.ipynb** - Complex real-world workflows

For more information, see:
- [API Reference](docs/api_reference.md)
- [Research Guide](docs/research_guide.md)