
## Sleep Data

For the main data of this project, the sleep data, I collected it for a period of 1 month by wearing the Apple Watch Ultra 2 at bedtime and tracking my sleep using the native Apple “Health” app. Here is the process I followed to retrieve this data:

**1) Retrieving the data from the app in the native XML format:**

The IOS health app only allows for retrieving all of the data from the app as an XML file. This includes everything the app tracks such as steps, heart rate and various other health metrics. The challenge here was to first filter through to get to just the sleep data and then to reformat it into something more familiar and usable in python such as an excel file.

I downloaded the xml file by simply going to the settings of the health app on my phone and tapping the Export All Health Data button in the settings. The app gave a folder that contained an “export.xml” file which we will now use.


**2) Converting all data to csv**

Now that we have the data in XML format we will firstly convert it into a csv format and save that.


In [None]:
import xml.etree.ElementTree as ET
import pandas as pd

# create element tree object
tree = ET.parse('raw_sleep_data.xml')

# for every health record, extract the attributes into a dictionary (columns). Then create a list (rows).
root = tree.getroot()
record_list = [x.attrib for x in root.iter('Record')]

# create DataFrame from a list (rows) of dictionaries (columns)
data = pd.DataFrame(record_list)

# proper type to dates
for col in ['creationDate', 'startDate', 'endDate']:
    data[col] = pd.to_datetime(data[col])

# save into CSV as this is a universally compatible data format
data.to_csv("converted_data/all_health_data_converted.csv", index=False)

**3) extracting sleep data**

now that we got all the data in the correct format, we are going to extract just the sleep related data, this is easy as all the rows related to it have the same type: "Sleep Analysis", just need to filter through to just get the apple watch data and not the Iphone data, as the apple watch is what tracked the sleep.

In [17]:
sleep_data = data[
    (data['type'] == "HKCategoryTypeIdentifierSleepAnalysis") & 
    (data['sourceName'] == "Imran’s Apple\xa0Watch")
]

sleep_data.to_csv("converted_data/converted_and_filtered_sleep_data.csv", index=False)

sleep_data


Unnamed: 0,type,sourceName,sourceVersion,unit,creationDate,startDate,endDate,value,device
285867,HKCategoryTypeIdentifierSleepAnalysis,Imran’s Apple Watch,11.1,,2025-03-03 05:00:36+03:00,2025-03-03 00:23:41+03:00,2025-03-03 00:34:11+03:00,HKCategoryValueSleepAnalysisAsleepCore,
285868,HKCategoryTypeIdentifierSleepAnalysis,Imran’s Apple Watch,11.1,,2025-03-03 05:00:36+03:00,2025-03-03 00:34:11+03:00,2025-03-03 00:35:41+03:00,HKCategoryValueSleepAnalysisAwake,
285869,HKCategoryTypeIdentifierSleepAnalysis,Imran’s Apple Watch,11.1,,2025-03-03 05:00:36+03:00,2025-03-03 00:35:41+03:00,2025-03-03 00:45:41+03:00,HKCategoryValueSleepAnalysisAsleepCore,
285870,HKCategoryTypeIdentifierSleepAnalysis,Imran’s Apple Watch,11.1,,2025-03-03 05:00:36+03:00,2025-03-03 00:45:41+03:00,2025-03-03 01:24:41+03:00,HKCategoryValueSleepAnalysisAsleepDeep,
285871,HKCategoryTypeIdentifierSleepAnalysis,Imran’s Apple Watch,11.1,,2025-03-03 05:00:36+03:00,2025-03-03 01:24:41+03:00,2025-03-03 01:25:11+03:00,HKCategoryValueSleepAnalysisAsleepCore,
...,...,...,...,...,...,...,...,...,...
286590,HKCategoryTypeIdentifierSleepAnalysis,Imran’s Apple Watch,11.3.1,,2025-04-22 05:00:41+03:00,2025-04-22 03:43:31+03:00,2025-04-22 04:21:31+03:00,HKCategoryValueSleepAnalysisAsleepCore,
286591,HKCategoryTypeIdentifierSleepAnalysis,Imran’s Apple Watch,11.3.1,,2025-04-22 05:00:41+03:00,2025-04-22 04:21:31+03:00,2025-04-22 04:51:31+03:00,HKCategoryValueSleepAnalysisAsleepREM,
286592,HKCategoryTypeIdentifierSleepAnalysis,Imran’s Apple Watch,11.3.1,,2025-04-22 05:00:41+03:00,2025-04-22 04:51:31+03:00,2025-04-22 04:58:01+03:00,HKCategoryValueSleepAnalysisAsleepCore,
286593,HKCategoryTypeIdentifierSleepAnalysis,Imran’s Apple Watch,11.3.1,,2025-04-22 05:00:41+03:00,2025-04-22 04:58:01+03:00,2025-04-22 04:58:31+03:00,HKCategoryValueSleepAnalysisAwake,


here we have the data we are looking for. I saved it as "converted_and_filtered_sleep_data.csv". We will further refine and extract what is needed from this data in "2_data_processing" and then move on with the analysis.


## Habit/Activity tracking

The second data source was the tracking of the daily habits.





**Manually tracked habits/activities**

These habits were collected manually in a google sheet of the following structure:

<img src="images/spreadsheet_screenshot.png" alt="Habits Spreadsheet Structure" width="800">

<a href="https://docs.google.com/spreadsheets/d/16NDO3o1wig3mOyuFSbZaC7fZIsFrOZZ75Y2NS1fizow/edit?usp=sharing">click to go to google sheet</a>

This google sheet is saved as *"converted_habits_data.csv"* in the converted_data folder, we will apply appropriate tansfromations to this data in *"2_data_processing"* and then continue with the analysis.