# Process 2015 Data

Goals:
- Use the 2015 MAUDE data to create a sample dataset for experimentation

Steps:
1. Identify all common data files

|File                    |Description|Required|
|------------------------|-----------|--------|
|`mdrfoithru2021.zip`    |Master Record through 2021|X|
|`patientthru2021.zip`   |Patient Record through 2021|X|
|`foitextchange.zip`     |Narrative data updates: changes to existing narrative data and additional narrative data for existing base records|X|
|`patientproblemcode.zip`|Device Data for patientproblemcode||
|`patientproblemdata.zip`|Patient Problem Data||
|`patientchange.zip`     |MAUDE Patient data updates: changes to existing Base data||
|`mdrfoichange.zip`      |MAUDE Base data updates: changes to existing Base data||
|`devicechange.zip`      |Device data updates: changes to existing Device data and additional Device data for existing Base records||
|`deviceproblemcodes.zip`|Device Problem Data||
|`foidevproblem.zip`     |Device Data for foidevproblem||

2. Identify all 2015 data files

|File                    |Description|Required|
|------------------------|-----------|--------|
|`device2015.zip`        |Device Data for 2015|X|
|`foitext2015.zip`       |Narrative Data for 2015|X|

3. Create databases for each data type
4. Create a merged dataset using joins for each Master Data Record ID in the 2015 data

In [5]:
from os.path import exists
from zipfile import ZipFile
import pandas as pd

# Identify the data directory, working directory, and data files
data_directory = './data'
working_directory = './2015'
data_files = ['mdrfoithru2021.zip', 'patientthru2021.zip', 'foitextchange.zip',
              'device2015.zip', 'foitext2015.zip']


# Create the working directory if needed
try:
    os.makedirs(working_directory, exist_ok = True)
except OSError as error:
    print(f"Error creating {working_directory}: {error}")

# Unzip the data files into the working directory
for i in data_files:
    with ZipFile(f"{data_directory}/{i}", "r") as zip:
        zip.extractall(f"{working_directory}")
