# Reproducible Research Workflow

**Purpose**: Best practices for publication and sharing  
**Target Audience**: Researchers preparing manuscripts

## What You'll Learn

- Document data sources and versions
- Use checksum tracking for integrity
- Create reproducible analysis pipelines
- Export for publication
- Share databases with collaborators

## Why Reproducibility Matters

Reproducible research allows:
- Peer verification
- Replication studies
- Long-term validity
- Collaborative science

In [1]:
import sys
from pathlib import Path
sys.path.insert(0, str(Path().resolve().parent / 'src'))

from pymaude import MaudeDatabase
import pandas as pd
from datetime import datetime

## 1. Document Your Setup

Always record:
- Date of analysis
- maude_db version
- Years of data
- Tables used

In [2]:
# Document analysis metadata
analysis_metadata = {
    'analysis_date': datetime.now().strftime('%Y-%m-%d'),
    'analyst': 'Your Name',
    'project': 'Device Safety Study',
    'data_years': '2020-2023',
    'tables': ['device', 'master'],
    'research_question': 'Analyze pacemaker adverse events'
}

print("Analysis Metadata:")
for key, value in analysis_metadata.items():
    print(f"  {key}: {value}")

Analysis Metadata:
  analysis_date: 2026-01-10
  analyst: Your Name
  project: Device Safety Study
  data_years: 2020-2023
  tables: ['device', 'master']
  research_question: Analyze pacemaker adverse events


## 2. Create Database with Verification

The library automatically tracks checksums to verify data integrity.

In [3]:
db = MaudeDatabase('getting_started.db', verbose=True) 

# Download data (checksums tracked automatically)
db.add_years(
    years='2020-2023',
    tables=['device', 'master'],
    download=True,
    data_dir='./maude_data'
)

print("\n✓ Data loaded with checksum tracking")


Grouping years by file for optimization...

Downloading files...
  Using cached device2020.zip
  Using cached device2021.zip
  Using cached device2022.zip
  Using cached device2023.zip
  Using cached mdrfoithru2025.zip

Processing data files...

device for year 2020 already loaded and unchanged, skipping

device for year 2021 already loaded and unchanged, skipping

device for year 2022 already loaded and unchanged, skipping

device for year 2023 already loaded and unchanged, skipping

master for years 2020-2023 already loaded and unchanged, skipping

Creating indexes...

Database update complete

✓ Data loaded with checksum tracking


## 3. Parameterized Analysis Pipeline

Use configuration variables for easy replication.

In [4]:
# Analysis configuration
CONFIG = {
    'device_name': 'pacemaker',
    'start_date': '2020-01-01',
    'end_date': '2023-12-31',
    'min_events': 10  # Exclude low-count manufacturers
}

print("Analysis Configuration:")
for key, value in CONFIG.items():
    print(f"  {key}: {value}")

Analysis Configuration:
  device_name: pacemaker
  start_date: 2020-01-01
  end_date: 2023-12-31
  min_events: 10


In [5]:
# Run analysis with config
results = db.query_device(
    device_name=CONFIG['device_name'],
    start_date=CONFIG['start_date'],
    end_date=CONFIG['end_date']
)

print(f"\nQuery results: {len(results):,} events")


Query results: 234,030 events


## 4. Export for Publication

Export data and metadata for supplementary materials.

In [6]:
# Export results
output_prefix = f"{CONFIG['device_name']}_{analysis_metadata['analysis_date']}"

# Export full results
results.to_csv(f'{output_prefix}_full_data.csv', index=False)
print(f"✓ Exported: {output_prefix}_full_data.csv")

# Export summary statistics
summary = {
    'total_events': len(results),
    'date_range': f"{CONFIG['start_date']} to {CONFIG['end_date']}",
    'unique_reports': results['MDR_REPORT_KEY'].nunique()
}

summary_df = pd.DataFrame([summary])
summary_df.to_csv(f'{output_prefix}_summary.csv', index=False)
print(f"✓ Exported: {output_prefix}_summary.csv")

✓ Exported: pacemaker_2026-01-10_full_data.csv
✓ Exported: pacemaker_2026-01-10_summary.csv


## 5. Methods Section Template

Use this template for your manuscript methods section:

In [7]:
methods_text = f"""
DATA SOURCE
-----------
We analyzed data from the FDA MAUDE (Manufacturer and User Facility Device 
Experience) database for the period {CONFIG['start_date']} to {CONFIG['end_date']}.
Data were downloaded on {analysis_metadata['analysis_date']} using the maude_db 
Python library (version X.X.X).

INCLUSION CRITERIA
------------------
We included adverse event reports for devices with generic names containing 
'{CONFIG['device_name']}'. Reports were filtered to the date range specified above.

DATA INTEGRITY
--------------
Data integrity was verified using SHA-256 checksums of source files. The analysis 
database and code are available upon request for reproducibility.

ANALYSIS
--------
We analyzed {len(results):,} adverse event reports. [Describe your specific analyses here]
"""

print(methods_text)

# Save methods text
with open(f'{output_prefix}_methods.txt', 'w') as f:
    f.write(methods_text)
print(f"\n✓ Saved methods template to: {output_prefix}_methods.txt")


DATA SOURCE
-----------
We analyzed data from the FDA MAUDE (Manufacturer and User Facility Device 
Experience) database for the period 2020-01-01 to 2023-12-31.
Data were downloaded on 2026-01-10 using the maude_db 
Python library (version X.X.X).

INCLUSION CRITERIA
------------------
We included adverse event reports for devices with generic names containing 
'pacemaker'. Reports were filtered to the date range specified above.

DATA INTEGRITY
--------------
Data integrity was verified using SHA-256 checksums of source files. The analysis 
database and code are available upon request for reproducibility.

ANALYSIS
--------
We analyzed 234,030 adverse event reports. [Describe your specific analyses here]


✓ Saved methods template to: pacemaker_2026-01-10_methods.txt


## 6. Database Archiving

For long-term storage, consider:
- Uploading database to Zenodo for DOI
- Including checksums in README
- Documenting exact FDA file versions used

In [8]:
db.close()
print("\n✓ Reproducible workflow complete!")
print("\nNext steps:")
print("  1. Upload database + code to Zenodo")
print("  2. Include data availability statement in paper")
print("  3. Share analysis notebooks as supplementary material")


✓ Reproducible workflow complete!

Next steps:
  1. Upload database + code to Zenodo
  2. Include data availability statement in paper
  3. Share analysis notebooks as supplementary material
