In [None]:
import os
import io
import zipfile

import xml.etree.ElementTree as ET

In [None]:
raw_path = "../data/raw"
interim_path = "../data/interim"
export_path = "apple_health_export"

archive_name = "export.zip"
data_name = "export.xml"

data = os.path.join(interim_path, export_path, data_name)

First, unzip the by the Health App exported archive.

In [None]:
# only unzip if it does not exist
if not os.path.exists(data):

    with open(os.path.join(raw_path, archive_name), "rb") as file:
        zip_file_bytes = io.BytesIO(file.read())
        zipped_export = zipfile.ZipFile(zip_file_bytes)
        zipped_export.extractall(interim_path)

### DOCTYPE of `export.xml`

**HealthKit Export Version:** 11

By check the `DOCTYPE` definition of the file `report.xml` the following data structure could be found.

#### Data Structure

- HealthData
    - (1) ExportDate
    - (1) Me
    - (any number of the following)
        - Record
        - Correlation
        - Workout
        - ActivitySummary
        - ClinicalRecord

#### Element Description

**ExportDate:**

Time of export.

**Me:**

- Date of Birth
- Sex
- Blood Type
- Fitzpatrick Skin Type

**Record*:**

- type
- unit
- value
- sourceName
- sourceVersion
- device
- creationDate
- startDate
- endDate

**Workout*:**

- workoutActivityType
- duration
- durationUnit
- totalDistance
- totalDistanceUnit
- totalEnergyBurned
- totalEnergyBurnedUnit
- sourceName
- sourceVersion
- device
- creationDate
- startDate
- endDate

**Correlation*:**

- type
- sourceName
- sourceVersion
- device
- creationDate
- startDate
- endDate

**ActivitySummary:**

- dateComponents
- activeEnergyBurned
- activeEnergyBurnedGoal
- activeEnergyBurnedUnit
- appleMoveMinutes
- appleMoveMinutesGoal
- appleExerciseTime
- appleExerciseTimeGoal
- appleStandHours
- appleStandHoursGoal

**ClinicalRecord:**

- type
- identifier
- sourceName
- sourceURL
- fhirVersion
- receivedDate
- resourceFilePath

\* - Could be of specific type which introduces some more fields.

### Check some Entries

Parse and analyze the `export.xml` file.

In [None]:
tree = ET.parse(data)
print(tree)
tree.getroot()[:10]

Attributes of `root`
 element.

In [None]:
root_element = tree.getroot()

for attribute in root_element.keys():
    print(f"{attribute + ':': <5} {root_element.get(attribute)}")

Attributes of `ExportDate` element.

In [None]:
export_date_element = root_element[0]

for attribute in export_date_element.keys():
    print(f"{attribute + ':': <7}{export_date_element.get(attribute)}")

Attributes of `Me` element:

In [None]:
me_element = root_element[1]

for attribute in me_element.keys():
    print(f"{attribute + ':': <51}{me_element.get(attribute)}")

Attributes of `Record` element:

In [None]:
record_element = root_element[3]

for attribute in record_element.keys():
    print(f"{attribute + ':':<15} {record_element.get(attribute)}")