```{css, echo=FALSE}
pre, code {white-space:pre !important; overflow-x:auto}
```

# Study GPS
Claire Punturieri

Last updated: 9/28/2023

## GPS procedure

Participants current location was continuously tracked via GPS and WiFi positioning technology
on individual cell phones and available to the A-CHESS program for the study duration (a total
of three months of mobile monitoring).

At monthly intervals, during follow-up visits <!--is this true-->, participants were asked to
identify whether frequently visited locations (i.e., >XX times per XX) were helpful or
harmful to their recovery and what emotion was associated with the location (i.e., positive,
negative, or mixed).

Moves was used to track data from start of study until MM/DD/YYYY, after which FollowMee was
used to capture participant geolocation data (MM/DD/YYYY until end of study) due to
gaps in data collection.


## Data Files



| Data type    | Level      | Location                                              |
|--------------|------------|-------------------------------------------------------|
| Raw GPS      | Individual | studydata/risk/data_raw/\*\*/\*\*_GPS.gpx             |
| Raw GPS      | Individual | studydata/risk/data_raw/\*\*/\*\*_Locations.xlsx      |
| Raw GPS      | Group      | studydata/risk/data_processed/shared/gps.csv          |
| Raw GPS      | Group      | studydata/risk/data_processed/shared/locations.csv    |
| Enriched GPS | Group      | studydata/risk/data_processed/gps/gps_enriched.csv.xz |


## Data processing

<!--Scripts are, to the best of my ability, in procedural order-->

### Data cleaning and concatenating


|  Name  | Path | Function | Input             | Output           |
|--------|------|----------|-------------------|------------------|
| cln_gps.Rmd  | shared/scripts_cln/ |creates the analysis gps dataset and conducts EDA | Participants' raw gps data files (studydata/risk/data_raw/\*\*/\*\*_GPS.gpx) | gps.csv, gps.rds |
| cln_locations.Rmd | shared/scripts_cln/ | opens the individual raw excel files for frequent locations in the individual subject folders, merges them, and then does EDA | individual location files (studydata/risk/data_raw/\*\*/\*\*_Locations.xlsx) | locations.csv |
| mak_study_dates.Rmd |  gps/mak/ | create list of important study dates | visit_dates.csv, ema_morning.csv, ema_later.csv | study_dates.csv |

    <font size="4">Create dataframes</font>

**mak_study_dates.Rmd (gps/mak/mak_study_dates.Rmd)**

This code is an update of code written by Kendra to make study dates for the meta study.

It has been adapted by John for GPS study.

*Input*: visit_dates.csv, ema_morning.csv, ema_later.csv

*Output*: study_dates.csv

**mak_labels_for_windows.Rmd**

Creates lapse labels at 1 hour window duration, 1 day window duration, and 1 week window duration.

*Input:* lapses.csv, study_dates.csv

*Output:* labels_1hour.csv(.xz), labels_1day.csv(.xz), labels 1week.csv(.xz)


**mak_gps_enriched.Rmd (gps/mak/mak_gps_enriched.Rmd)**

This script aggregates gps files for all subjects and then matches each geolocation to its nearest context.
Information on valid lapse determination for inclusion in this final sample can be found in the cleaning of
EMA and lapses and in the *mak_study_dates.Rmd* scripts.

*Input:* study_dates.csv, locations.csv, gps.csv

*Output:* gps_enriched.csv(.xz)


**mak_samples.Rmd (gps/mak/mak_samples.Rmd)**

This script checks the GPS data of the N=151 and finalizes an overall sample 
and identifies an eyeball sample.

*Input*: study_dates.csv, locations.csv, gps_enriched.csv.xz

*Output*: N/A


### Modeling scripts
<!--Not sure of order of these quite yet-->

**mak_training_metrics.Rmd (gps/mak/mak_training_metrics.Rmd)**

This script aggregates all results/metrics for a batch or batches of jobs
that train all model configurations for a specific outcome/label window.

*Input:* jobs.csv, output, results

*Output*: N/A

**mak_rset.Rmd gps/mak/mak_rset.Rmd**

This script makes and saves an RSET object that includes train/test splits
defined by GA Tech for use across our labs. It saves this rset in the gps study 
data folder on the server for use with this study.

*Input:* labels_05.csv, training_ids.csv

*Output:* rset.rds

<!--
Scripts I am not sure how to categorize:

**mak_features_combined.Rmd (gps/mak/mak_features_combined.Rmd)**
Aggregates all CHTC features and checks for missing jobs and other EDA.

*Input:* returned CHTC files (features, error, out)

*Output:* features_WINDOW.csv
-->

<!--
Commenting this out for now for sake of cleanliness/organization
Claire's additions for FYP brainstorming

## Preprocessing

(Saeb et al., 2015) preprocessing procedures:
“The first procedure determined whether each GPS location data sample came from a stationary state (eg, working in an office) or a transition state (eg, walking on the street)” – threshold speed set to 1km/h
 
“The second procedure was clustering. We applied clustering only to the data samples in the stationary state. The goal was to identify the places where participants spent most of their time, such as home, workplaces, parks, etc.”

## Feature engineering

### Features relating to location
- N of locations

- Entropy and normalized entropy (n.e. = invariant to # of clusters)
    - High entropy: spending more time in fewer, more consistent locations; low entropy:
    spending more time in a larger number of more varied locations (De Angel et al., 2022)

- Location Variance
    - How varied a participant’s locations are (De Angel et al., 2022)
    
- Zip code/area delineations of income (DC map for ex.)
    *This has always been interesting to be but I need to do more thinking on if
    it would actually add any value to existing data on SES that we have.*

### Features relating to movement
- Distance
    - For example, total distance covered or maximum distance between two locations
    in a given time frame (Canzian and Musolesi, 2015)
    
- Radius of gyration
    - "Used to quantify the coverage area and is defined as the deviation
    from the centroid of places visited in the interval" (Canzian and Musolesi, 2015)
    
- Average Moving Speed

- Mobility Radius

### Features relating to time
- Homestay
    - Amount of time spent at home (De Angel et al., 2022)

- Time at Location

- Transition Time
    - Percentage of time during which a participant was in a non-stationary state (Saeb et al., 2015)

### Features relating to circadian rhythms
- GPS useful for capturing temporal information from location data (Saeb et al., 2015)
    *This sounds really cool in theory, not sure it would actually be that valuable if,
    for example, someone is a homebody.*
    
- Diurnal movements indexes (Lomb-Scargle periodogram; Fraccaro et al., 2019) https://iopscience.iop.org/article/10.3847/1538-4365/aab766 -- apparently useful
    for characterizing periodicity in unevenly sampled data
    
- Sleep regularity index (SRI) https://github.com/mengelhard/sri
    *Chronodisruption as an antecedent to lapsing?*
 

## Analytic techniques
- Geographic information systems (GIS)
    https://mgimond.github.io/Spatial/introGIS.html
- Standard deviational ellipse
- Minimum convex polygon
- Vernal density estimation
- Point pattern analysis
    https://gistbok.ucgis.org/bok-topics/point-pattern-analysis -->