# Problem Set 4: Wearables

The problems in this problem set pertain to the following paper:

Li, X., Dunn, J., Salins, D., Zhou, G., Zhou, W., Rose, S. M. S. F., ... & Sonecha, R. (2017). [Digital health: tracking physiomes and activity using wearable biosensors reveals useful health-related information.](https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2001402) PLoS biology, 15(1), e2001402.


Please read this paper before starting this problem set, so you have context.

Interestingly, all of this data has been made publicly available. We will be working with a small subset in order to gain experience working with wearable data. Our goal will be to detect evidence of jetlag in Participant 1 as they travel. [Jetlag](https://en.wikipedia.org/wiki/Jet_lag) is a condition resulting from disrupted circadian rhythm while travelling.

## Part 1: About Time!

Wearable data typically takes the form of a time series. One major factor to consider is how to represent time. We can use a human readable format (ex: **Thursday, March 1, 2018 1:42:59 PM** or **3/1/18 13:42:59**) or a timestamp (ex: **1519940579**). We also need to think about timezone - as Participant 1 travels, their timezone changes, which in turn changes their local time. We often represent timezone by storing a GMT offset. Sometimes it makes sense to work with local time and other times we’ll want to work with GMT. [This](https://www.epochconverter.com) is a useful resource that explains and converts between different representations of time. If you scroll down, it also gives code examples in a variety of languages (including Python and R).

### 1.1

**What is GMT? For the time ​Thursday, March 1, 2018 1:42:59 PM GMT-08:00​, what does GMT-08:00 mean?**

YOUR WRITTEN ANSWER HERE

GMT stands for "Greenwich Mean Time" and is the mean solar time at the Royal Observatory in Greenwich, London, reckoned from midnight. It is considered by many to be synonymous for Coordinated Universal Time or UTC, which is the primary time standard by which the world regulates clocks and time.

GMT-8:00 in this case means that the given time is offset by 28,800 seconds from GMT.

### 1.2

**What is a timestamp, and what does it represent? What date and time is `1519940579`, in human readable format?**

YOUR WRITTEN ANSWER HERE
A timestamp is simply a way to measure time, and is respresents the number of seconds since January 1, 1970. 


Thursday, March 1, 2018 1:42:59 PM GMT-08:00

In [3]:
# YOUR CODE HERE
import time

time.asctime(time.gmtime(1519940579))

'Thu Mar  1 21:42:59 2018'

## Part 2: Time-Consuming Analysis

**Participant 1’s sleep data was collected using a Basis Watch and is stored in `sleep_to_03-31-16.csv`. Notice that this file include local start and end times (`local_start_time`, `local_end_time`) as well as timestamps (`start_timestamp`, `end_timestamp`) and GMT start and end times (`start_time_iso`, `end_time_iso`).**

### 2.1

**Let’s figure out how many hours Participant 1 sleeps per day. Make a histogram of the total number of hours slept each day. Use GMT start time to determine what day sleep occurs on and actual_minutes to determine sleep duration. Include a line on the plot showing average sleep per day.**

In [None]:
# YOUR CODE HERE


### 2.2

**Report basic features of the distribution of dayly sleep for Participant 1: mean, median, standard deviation, minimum, and maximum. Comment on whether these values seem reasonable. Do the data look reliable, or do you think the observations are noisy or error-prone?**

In [None]:
# YOUR CODE HERE

YOUR WRITTEN ANSWER HERE

## Part 3: Time Flies

Participant 1’s travel information is in `activities.csv`. This file contains information on a variety of activities, including (somewhat bizarrely) `table_tennis`. Note that start and end times are given in local time but a GMT offset is included.

We would like to extract a list of flights taken by Participant 1. However, as with a lot of wearable data, our labels are imperfect. Some flights are labeled `airplane` in the `Activity` column and others are labelled `transport`. However, `transport` is also used for car rides, train rides, etc. We will define a flight as an activity that is either (labeled `airplane`) OR (labeled `transport` AND has an average speed over 100 miles/hour). You can calculate speed from `Duration` (given in seconds) and `Distance` (given in miles).

### 3.1

**How many flights did Participant 1 take? Plot a histogram of the duration (in hours) of these flights. Is there anything unexpected about the distribution of durations? Give an explanation for what could give rise to these observations, and propose a way to post-process them (for purposes of this assignment, we'll be leaving these data as-is).**

In [None]:
# YOUR CODE HERE

YOUR WRITTEN ANSWER HERE

### 3.2

**Now we know when Participant 1 travelled and when they slept. Let’s put them together. We want to compare Participant 1’s sleep after travelling to their usual sleep. To do this, we’ll want to use GMT for both our flight and sleep times. Using the flight dates from question 3.1, generate a set of dates within 3 days of flight. That is, if they travelled on 3/23/14, then you should include 3/23/14, 3/24/14, and 3/25/14 as "after-flight" dates**. 

**Re-make the histogram of sleep duration from (2.1), stratifying by whether the date is an "after-flight" day. Then use a t-test to compare sleep "after-flight" to sleep not "after-flight" and report your results.**

In [None]:
# YOUR CODE HERE

### 3.3

**What can we conclude about airplane flights and sleep for Participant 1? Is there an association between the two, and if so, how large is the effect?**

In [None]:
# YOUR CODE HERE

YOUR WRITTEN ANSWER HERE