# PyMAUDE Getting Started Guide

This notebook demonstrates the basics of using PyMAUDE to access FDA MAUDE adverse event data.

## What You'll Learn

1. Creating and setting up a database
2. Downloading data from FDA
3. Basic search methods
4. Working with results
5. Viewing event narratives

## 1. Database Setup

First, import PyMAUDE and create a database connection:

In [1]:
from pymaude import MaudeDatabase
import pandas as pd

# Create database (uses file in current directory)
# db = MaudeDatabase('notebooks.db', verbose=True)

db = MaudeDatabase('../analysis/venous_thrombectomy/maude_2008_2025.db', verbose=True)


print("Database created successfully!")

Database created successfully!


## 2. Downloading Data

Download data for specific years. It unfortunately will take a while (approx. 20-30 minutes) the first time you run this to convert the FDA data files to PyMAUDE database format. We'll start with just 2023:

In [2]:
# Download device data for 2023
# This may take a few minutes on first run
# db.add_years(2023, tables=['device', 'master'], download=True)

print("\nData downloaded and loaded!")
print("\nDatabase info:")
db.info()


Data downloaded and loaded!

Database info:

Database: ../analysis/venous_thrombectomy/maude_2008_2025.db
_maude_load_metadata 72 records
device          23,286,013 records
master          22,649,853 records
patient         23,619,555 records
text            53,246,898 records

Date range: 2008-01-01 00:00:00 to 2025-12-31 00:00:00
Database size: 52.47 GB


## 3. Boolean Name Search

The `search_by_device_names()` method provides flexible partial matching with boolean logic. It additionally searches across multiple device name fields to ensure best results.

In [3]:
# Create search index for faster searching (one-time setup)
db.create_search_index()

# Simple search - find all pacemaker events
pacemakers = db.search_by_device_names('pacemaker')

print(f"Found {len(pacemakers)} pacemaker reports")
print(f"\nFirst few results:")
print(pacemakers[['BRAND_NAME', 'GENERIC_NAME', 'DATE_RECEIVED']].head())

Search index already exists and is up to date
Found 640629 pacemaker reports

First few results:
  BRAND_NAME                       GENERIC_NAME        DATE_RECEIVED
0  LEGEND II                          PACEMAKER  2008-01-11 00:00:00
1   SYMPHONY  DXY-IMPLANTABLE CARDIAC PACEMAKER  2008-01-06 00:00:00
2   M SERIES            DEFIBRILLATOR/PACEMAKER  2008-01-02 00:00:00
3   M SERIES            DEFIBRILLATOR/PACEMAKER  2008-01-02 00:00:00
4   M SERIES            DEFIBRILLATOR/PACEMAKER  2008-01-02 00:00:00


### OR Logic - Multiple Terms

In [4]:
# Find devices matching ANY of these terms, i.e. devices matching "pacemaker" OR "defibrillator"
results = db.search_by_device_names(['pacemaker', 'defibrillator'])

print(f"Found {len(results)} events (pacemaker OR defibrillator)")
print(f"\nUnique generic names:")
print(results['GENERIC_NAME'].value_counts().head())

Found 1332062 events (pacemaker OR defibrillator)

Unique generic names:
GENERIC_NAME
PERMANENT PACEMAKER ELECTRODE                       230788
IMPLANTABLE CARDIOVERTER DEFIBRILLATOR              156677
WEARABLE CARDIOVERTER DEFIBRILLATOR                 138297
ELECTRODE, PACEMAKER, PERMANENT                      90669
IMPLANTABLE CARDIOVERTER DEFIBRILLATOR (NON-CRT)     77811
Name: count, dtype: int64


### AND Logic - Multiple Required Terms

In [5]:
# Find devices matching ALL of these terms (use nested list), i.e. "cardiac" and "catheter" must be in the device name
results = db.search_by_device_names([['cardiac', 'catheter']])

print(f"Found {len(results)} events (cardiac AND catheter)")
print(f"\nTop brands:")
print(results['BRAND_NAME'].value_counts().head())

Found 37160 events (cardiac AND catheter)

Top brands:
BRAND_NAME
THERMOCOOL® SMART TOUCH® SF BI-DIRECTIONAL NAVIGATION CATHETER    3427
ARCTIC FRONT ADVANCE CARDIAC CRYOABLATION CATHETER                1672
FARAWAVE                                                          1594
ARCTIC FRONT ADVANCE PRO¿ CARDIAC CRYOABLATION CATHETER           1284
THERMOCOOL SMARTTOUCH SF                                          1254
Name: count, dtype: int64


## 4. Exact-Match Queries

For precise matching on specific fields, use `query_device()`. 

Please note: there is significant inconsistency across the MAUDE database in use of certain columns for listing event device names. We therefore recommend you use `search_by_device_names()` instead of `query_device()` because the former searches across multiple columns for matching device names. 

In [6]:
# Query by exact generic name (case-insensitive) - not recommended in most settings!
stents = db.query_device(generic_name='Vascular Stent')

print(f"Found {len(stents)} vascular stent events")
print(f"\nBrands found:")
print(stents['BRAND_NAME'].value_counts())

Found 1909 vascular stent events

Brands found:
BRAND_NAME
E-LUMINEXX VASCULAR STENT                          412
LIFESTENT 5F VASCULAR STENT                        304
LIFESTENT VASCULAR STENT                           284
LIFESTENT XL VASCULAR STENT                        252
LIFESTENT SOLO VASCULAR STENT SYSTEM               224
                                                  ... 
PROTEGE EVER FLEX-SELF EXPANDING STENT               1
UNK BARE STENT                                       1
E-LUMINEXX VASCULAR STENT - MT-6F-ZVL 8/ 60/135      1
LIESTENT VASCULAR STENT                              1
LIIFESTENT 5F VASCULAR STENT                         1
Name: count, Length: 65, dtype: int64


In [7]:
# Query by product code
by_code = db.query_device(product_code='DQY')

print(f"Found {len(by_code)} events with product code DQY")
print(f"\nGeneric names:")
print(by_code['GENERIC_NAME'].value_counts())

Found 38567 events with product code DQY

Generic names:
GENERIC_NAME
CATHETER, PERCUTANEOUS                                    18014
CATHETER PERCUTANEOUS                                      3413
DQY                                                        2794
PTA BALLOON DILATATION CATHETER                            2154
DQY CATHETER, PERCUTANEOUS                                 1735
                                                          ...  
CROSSBOSS/STINGRAY/STINGRAY GUIDEWIRE                         1
CATHETER, PERCUTANEOUS, ANTERIOR SEPTAL DEFECT CLOSURE        1
CORONARY GUIDE CATHETER                                       1
HEMOSTAT                                                      1
SEE H11                                                       1
Name: count, Length: 666, dtype: int64


## 5. Date Filtering

Both search methods support date range filtering:

In [8]:
# Boolean search with date filter
recent_catheters = db.search_by_device_names(
    'catheter',
    start_date='2023-06-01',
    end_date='2023-12-31'
)

print(f"Catheter events from June-December 2023: {len(recent_catheters)}")

Catheter events from June-December 2023: 30280


## 6. Working with Results

Results are pandas DataFrames - you can use all pandas operations:

In [9]:
# Get search results
results = db.search_by_device_names('insulin pump')

# Basic DataFrame operations
print(f"Total events: {len(results)}")
print(f"Columns: {list(results.columns[:10])}...")  # First 10 columns
print(f"\nEvent types:")
print(results['EVENT_TYPE'].value_counts())

Total events: 1850096
Columns: ['MDR_REPORT_KEY', 'EVENT_KEY', 'DATE_RECEIVED', 'EVENT_TYPE', 'BRAND_NAME', 'GENERIC_NAME', 'MANUFACTURER_D_NAME', 'DEVICE_REPORT_PRODUCT_CODE', 'device_rowid']...

Event types:
EVENT_TYPE
M     1603692
IN     243548
D        2118
O         574
Name: count, dtype: int64


In [10]:
# Filter to serious events (D=Death, IN=Injury)
serious = results[results['EVENT_TYPE'].str.contains(r'\bD\b|\bIN\b', case=False, na=False, regex=True)]

print(f"Serious events (Death or Injury): {len(serious)}/{len(results)}")
print(f"Percentage: {100 * len(serious) / len(results):.1f}%")

Serious events (Death or Injury): 245666/1850096
Percentage: 13.3%


In [11]:
# Group by manufacturer
by_manufacturer = results.groupby('MANUFACTURER_D_NAME').size().sort_values(ascending=False)

print("Top 5 manufacturers by event count:")
print(by_manufacturer.head())

Top 5 manufacturers by event count:
MANUFACTURER_D_NAME
TANDEM DIABETES CARE                    1146068
MEDTRONIC PUERTO RICO OPERATIONS CO.     592108
INSULET CORPORATION                       44724
MEDTRONIC MINIMED                         40895
TANDEM DIABETES CARE INC.                 10551
dtype: int64


## 7. Getting Event Narratives

To read the actual event descriptions, download text data and use `get_narratives()`:

In [12]:
# Download text data (if not already downloaded)
db.add_years(2023, tables=['text'], download=True)

# Get device events
devices = db.search_by_device_names('defibrillator')

# Get narratives for first 5 events
sample_keys = devices['MDR_REPORT_KEY'].head(5).tolist()
narratives = db.get_narratives(sample_keys)

# Display first narrative
if len(narratives) > 0:
    print(f"Report {narratives.iloc[0]['MDR_REPORT_KEY']}:")
    print(narratives.iloc[0]['FOI_TEXT'][:500])  # First 500 characters
    print("...")


Grouping years by file for optimization...

Downloading files...
  Downloading foitext2023.zip...

Processing data files...

text for year 2023 already loaded and unchanged, skipping

Creating indexes...
debug: tables = []
debug: existing_tables = {'patient', '_maude_load_metadata', 'device', 'master', 'text'}
debug: there is NOT a master table
debug: there is NOT a device table

Database update complete
Report 863558:
FOUND DURING TESTING. ACCORDING TO THE REPORTER, THE DEVICE WOULD NOT DELIVER ENERGY WITH A THERAPY CABLE. THERE WAS NO PT USE ASSOCIATED WITH REPORTED EVENT.
...


## 8. Exporting Results

Export your results to CSV for further analysis:

In [13]:
# Get some results
results = db.search_by_device_names('catheter', start_date='2023-01-01')

# Export to CSV
results.to_csv('catheter_events_2023.csv', index=False)

print(f"Exported {len(results)} records to catheter_events_2023.csv")

Exported 180373 records to catheter_events_2023.csv


## 9. Cleanup

Always close the database when done:

In [14]:
db.close()
print("Database closed.")

Database closed.


## Next Steps

Now that you understand the basics, explore:

- **02_grouped_search.ipynb** - Compare multiple device categories
- **03_exact_queries.ipynb** - Precise field-based queries
- **04_analysis_helpers.ipynb** - Statistical analysis and visualization
- **05_advanced_workflows.ipynb** - Complex real-world workflows

For more information, see:
- [API Reference](docs/api_reference.md)
- [Research Guide](docs/research_guide.md)