<a href="https://colab.research.google.com/github/managedkaos/nicoles-research-data/blob/main/Nicoles_Research_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Retrieve and list all the [MAUDE zip files on fda.gov](https://www.fda.gov/medical-devices/mandatory-reporting-requirements-manufacturers-importers-and-device-user-facilities/about-manufacturer-and-user-facility-device-experience-maude).

In [1]:
import pandas as pd
from unicodedata import normalize

# Read the entire webpaage from fda.gov
tables = pd.read_html('https://www.fda.gov/medical-devices/mandatory-reporting-requirements-manufacturers-importers-and-device-user-facilities/about-manufacturer-and-user-facility-device-experience-maude')

# The read should return one table; use that as the dataframe
df = tables[0]

# Drop the first row which is only used for formatting on the web page
df.drop(index=df.index[0],
        axis=0,
        inplace=True)

# Rename the columns of the table to include 'Description' and remove tabs
df.columns = [
    'File Name',
    'Compressed Size in Bytes',
    'Uncompressed Size in Bytes',
    'Total Records',
    'Description'
]

# Convert total records to integer
df = df.astype({'Total Records':'int'})

# Print the table as markdown
# print(df.to_markdown())

In [2]:
df

Unnamed: 0,File Name,Compressed Size in Bytes,Uncompressed Size in Bytes,Total Records,Description
1,mdrfoi.zip,6167KB,87864KB,263604,MAUDE Base records received to date for 2022
2,mdrfoithru2021.zip,460013KB,4253175KB,12830703,Master Record through 2021
3,mdrfoiadd.zip,6276KB,90017KB,269188,New MAUDE Base records for the current month.
4,mdrfoichange.zip,11457KB,137162KB,421553,MAUDE Base data updates: changes to existing B...
5,patient.zip,669KB,7249KB,269189,MAUDE Patient records received to date for 2022
...,...,...,...,...,...
66,foitext2020.zip,193121KB,1134242KB,3039449,Narrative Data for 2020
67,foitext2021.zip,211070KB,1255788KB,3625862,Narrative Data for 2021
68,foitext.zip,18407KB,124772KB,441898,Narrative Data received to date for 2022
69,foitextadd.zip,8583KB,56463KB,200966,New MAUDE Narrative data for the current month.


Download all of the MAUDE files to local storage

In [3]:
import urllib.request

# Iterate all rows using DataFrame.iterrows()
for index, row in df.iterrows():
    file_name = row["File Name"]
    full_path = f"/home/{file_name}"
    url=f"https://www.accessdata.fda.gov/MAUDE/ftparea/{file_name}"
    # print(f"Downloading {url}")
    urllib.request.urlretrieve(url, full_path)
 

List the files in local storage

In [4]:
! ls /home

device2000.zip	device2018.zip		foitext2002.zip  foitext2020.zip
device2001.zip	device2019.zip		foitext2003.zip  foitext2021.zip
device2002.zip	device2020.zip		foitext2004.zip  foitextadd.zip
device2003.zip	device2021.zip		foitext2005.zip  foitextchange.zip
device2004.zip	deviceadd.zip		foitext2006.zip  foitextthru1995.zip
device2005.zip	devicechange.zip	foitext2007.zip  foitext.zip
device2006.zip	deviceproblemcodes.zip	foitext2008.zip  mdrfoiadd.zip
device2007.zip	device.zip		foitext2009.zip  mdrfoichange.zip
device2008.zip	foidev1998.zip		foitext2010.zip  mdrfoithru2021.zip
device2009.zip	foidev1999.zip		foitext2011.zip  mdrfoi.zip
device2010.zip	foidevproblem.zip	foitext2012.zip  patientadd.zip
device2011.zip	foidevthru1997.zip	foitext2013.zip  patientchange.zip
device2012.zip	foitext1996.zip		foitext2014.zip  patientproblemcode.zip
device2013.zip	foitext1997.zip		foitext2015.zip  patientproblemdata.zip
device2014.zip	foitext1998.zip		foitext2016.zip  patientthru2021.zip
device2015.z