Josh Barback  
`barback@fas.harvard.edu`  
Onnela Lab, Harvard T. H. Chan School of Public Health  

# Managing raw Beiwe data

This notebook provides an overview of some features of the `beiwetools.manage` subpackage.  These modules provide tools for handling user identifier files and for creating registries of raw Beiwe data.  Before reviewing this notebook, you may wish to look at `configread_example.ipynb`.

Code is provided for two example tasks:
1. Create a registry for a single directory of raw data,
2. Create a registry for multiple directories.

In addition to creating registries, these two examples demonstrate how to:  
* Manage user names and object names,
* Review numerical summaries of raw user data,
* Plot summaries of data collection,
* Save and reload a raw data registry.

We'll use the publicly available Beiwe data set found here:  [https://zenodo.org/record/1188879#.XcDUyHWYW02](https://zenodo.org/record/1188879#.XcDUyHWYW02)

Begin by downloading and extracting `data.zip`.

In [1]:
import os
from beiwetools.manage import BeiweProject
from beiwetools.configread import BeiweConfig

In [2]:
# Report logging messages:
import logging
logging.basicConfig(level=logging.INFO)

In [3]:
# Set the path to the folder containing the raw data directories:
data_dir = '/home/josh/Desktop/Beiwe_test_data/data' # change as needed

# Sample configuration files are located in examples/configuration_files:
examples_dir = os.getcwd() # change as needed
config_dir = os.path.join(examples_dir, 'configuration_files')

# Choose a directory for test ouput:
test_directory = os.path.join(examples_dir, 'test') # change as needed

# Define some study names:
study_names = ['GPS_Test', 'iOS_Test_1', 'iOS_Test_2', 'Test_Study', 'Hi_Sample']

# Get paths to raw data directories and configuration paths:
raw_dirs = sorted([os.path.join(data_dir,   d) for d in os.listdir(data_dir)  ])
temp =     sorted([os.path.join(config_dir, f) for f in os.listdir(config_dir)])
config_paths = [temp[0], temp[2], temp[3], temp[4], temp[1]]

# Match study names with configuration files and raw data directories:
config = dict(zip(study_names, config_paths))
raw    = dict(zip(study_names, raw_dirs))

# Make sure everything lines up:
for k in study_names:
    r, c = os.path.basename(raw[k]), os.path.basename(config[k])
    print('%s: \n   Raw Data Directory: %s \n   Configuration File: %s \n' % (k, r, c))

GPS_Test: 
   Raw Data Directory: onnela_lab_gps_testing 
   Configuration File: HSPH_Onnela_Lab_GPS_Testing_surveys_and_settings.json 

iOS_Test_1: 
   Raw Data Directory: onnela_lab_ios_test1 
   Configuration File: HSPH_Onnela_Lab_iOS_Test_Study_1_surveys_and_settings.json 

iOS_Test_2: 
   Raw Data Directory: onnela_lab_ios_test2 
   Configuration File: HSPH_Onnela_Lab_iOS_Test_Study_2_surveys_and_settings.json 

Test_Study: 
   Raw Data Directory: onnela_lab_test1 
   Configuration File: Test_Study_#1_surveys_and_settings.json 

Hi_Sample: 
   Raw Data Directory: passive_data_high_sampling 
   Configuration File: HSPH_Onnela_Lab_Passive_Data_High_Sampling_surveys_and_settings.json 



### 1.  Create a registry for a directory of raw data

In [4]:
# First we'll look at data from users in iOS_Test_2.
# Load user data records:
raw_dir = raw['iOS_Test_2']
p = BeiweProject.create(raw_dir)

INFO:beiwetools.manage.classes:Created raw data registry for Beiwe user ID kiu5hvmv.
INFO:beiwetools.manage.classes:Created raw data registry for Beiwe user ID sxvpopdz.
INFO:beiwetools.manage.classes:Updated user name assignments.
INFO:root:Finished generating study records for 2 of 2 users.


In [5]:
# Review the project summary, and make sure there aren't any warning flags:
p.summary.print()



----------------------------------------------------------------------
Overview
----------------------------------------------------------------------
    Unique Beiwe Users: 2
    Study Name(s): Not found
    Raw Data Directories: 1
    First Observation: 2016-06-07 18:00:00 UTC
    Last  Observation: 2016-06-10 12:00:00 UTC
    Project Duration: 2.8 days

----------------------------------------------------------------------
Device Summary
----------------------------------------------------------------------
    iPhone Users: 2
    Android Users: 0

----------------------------------------------------------------------
----------------------------------------------------------------------
    ignored_users: 0
    no_registry: 0
    without_data: 0
    no_identifiers: 0
    irregular_directories: 0
    multiple_devices: 0
    unknown_os: 0
    unnamed_objects: 0

----------------------------------------------------------------------
Registry Summary
--------------------------------

In [6]:
# To attach a configuration file to the project:
config_path = config['iOS_Test_2']
p.update_configuration(config_path)

# Object identifiers are then replaced with name assignments from the configuration.
# For example, here is the updated survey data summary:
lines = p.summary.to_string().split('\n')
for i in lines[-15:]: print(i)

INFO:beiwetools.manage.classes:Finished reading study names.
INFO:beiwetools.manage.classes:Finished reading object names.
INFO:beiwetools.manage.classes:Updated user name assignments.
INFO:beiwetools.manage.classes:Updated project configuration files.



----------------------------------------------------------------------
Survey Data
----------------------------------------------------------------------

    survey_answers:
                                    Files     Storage
        Survey 5                        3      1.4 kB

    survey_timings:
                                    Files     Storage
        Survey 2                        5      2.1 kB
        Survey 3                        2      4.8 kB
        Survey 5                        8      5.8 kB



In [7]:
# We can also review summaries of each user's data records.  
# For example, here is a summary of the first user's raw data:
i = p.ids[0]
p.data[i].summary.print()



----------------------------------------------------------------------
Identifiers
----------------------------------------------------------------------
    Beiwe User ID: kiu5hvmv
    User Name: Not found

----------------------------------------------------------------------
Raw Data Summary
----------------------------------------------------------------------
    First Observation: 2016-06-07 18:00:00 UTC
    Last  Observation: 2016-06-08 00:00:00 UTC
    Observation Period: 0.3 days
    Raw Files: 24
    Storage: 13.2 MB

----------------------------------------------------------------------
Device Records
----------------------------------------------------------------------
    Number of Phones: 1
    Phone OS: iOS

----------------------------------------------------------------------
Registry Issues
----------------------------------------------------------------------
    Irregular Directories: 0
    Unregistered Files: 0

--------------------------------------------------

In [9]:
# To export a project:
path = p.export('iOS Test 2', test_directory, track_time = False)

In [10]:
# We can then reload the project:
q = BeiweProject.load(path)

# And verify that the projects are the same:
p == q

INFO:beiwetools.manage.classes:Loaded raw data registry for Beiwe user ID kiu5hvmv.
INFO:beiwetools.manage.classes:Loaded raw data registry for Beiwe user ID sxvpopdz.


True

### 2.  Create a registry for multiple directories

In [11]:
# In some cases we may wish to merge data from Beiwe users enrolled in muliple studies.
# First let's make a dictionary attaching user IDs to configurations:
configurations = {}
for n in study_names:
    ids = os.listdir(raw[n])
    configurations.update(zip(ids, [config[n]]*len(ids)))
    
# And then create a project with these configuration assignments:
r = BeiweProject.create(raw_dirs, user_ids = 'all', configuration = configurations)

INFO:beiwetools.manage.classes:Finished reading study names.
INFO:beiwetools.manage.classes:Finished reading object names.
INFO:beiwetools.manage.classes:Created raw data registry for Beiwe user ID 6b38vskd.
INFO:beiwetools.manage.classes:Created raw data registry for Beiwe user ID efy3yeum.
INFO:beiwetools.manage.classes:Created raw data registry for Beiwe user ID kiu5hvmv.
INFO:beiwetools.manage.classes:Created raw data registry for Beiwe user ID lljhljce.
INFO:beiwetools.manage.classes:Created raw data registry for Beiwe user ID sxvpopdz.
INFO:beiwetools.manage.classes:Created raw data registry for Beiwe user ID tcqrulfj.
INFO:beiwetools.manage.classes:Updated user name assignments.
INFO:root:Finished generating study records for 6 of 6 users.


In [12]:
# The logs indicate an unknown "dummy" survey type.  Let's verify that these are inactive (deleted) surveys:
for p in config_paths:
    temp = BeiweConfig(p)
    for sid in temp.surveys:
        s = temp.surveys[sid] 
        if s.type == 'dummy': 
            print('Dummy survey %s from %s has been deleted: %s' % (s.identifier, temp.name, str(s.deleted)))



Dummy survey 58b77ae646b9fc10707b0004 from Test Study #1 has been deleted: True
Dummy survey 57ffc4ca1206f77c3dfcdfb3 from Test Study #1 has been deleted: True
Dummy survey 57ff944e1206f77c3dfcc869 from Test Study #1 has been deleted: True


In [13]:
# As before, review the project summary and warning flags:
r.summary.print()



----------------------------------------------------------------------
Overview
----------------------------------------------------------------------
    Unique Beiwe Users: 6
    Study Name(s):
        HSPH Onnela Lab GPS Testing
        HSPH Onnela Lab Passive Data High Sampling
        HSPH Onnela Lab iOS Test Study 1
        HSPH Onnela Lab iOS Test Study 2
        Test Study #1
    Raw Data Directories: 5
    First Observation: 2016-01-26 19:00:00 UTC
    Last  Observation: 2017-02-13 13:00:00 UTC
    Project Duration: 383.8 days

----------------------------------------------------------------------
Device Summary
----------------------------------------------------------------------
    iPhone Users: 4
    Android Users: 2

----------------------------------------------------------------------
----------------------------------------------------------------------
    ignored_users: 0
    no_registry: 0
    without_data: 0
    no_identifiers: 0
    irregular_directories: 0
   

In [14]:
# From the above survey data summary, we see that survey names were prefixed with study names.
# This is done in order to ensure that all names are unique, but legibility isn't great.
# Long object names will be cumbersome for plots, survey scores, and documentation.

# We can tidy this up a bit. In this case, a fast solution is to first assign some shorter names:
new_paths = {}
for n in study_names:
    temp = BeiweConfig(config[n])    
    temp.update_names({temp.name: n})
    path = temp.export(test_directory, track_time = False)
    new_paths[config[n]] = path

# Then update the configuration assignments:    
for i in configurations:
    configurations[i] = new_paths[configurations[i]]

# And finally update the project with the new configuration paths:
r.update_configuration(configurations)

# Here is what the new names look like:
lines = r.summary.to_string().split('\n')
for i in lines[-27:]: print(i)

INFO:beiwetools.manage.classes:Finished reading study names.
INFO:beiwetools.manage.classes:Finished reading object names.
INFO:beiwetools.manage.classes:Updated user name assignments.
INFO:beiwetools.manage.classes:Updated project configuration files.



----------------------------------------------------------------------
Survey Data
----------------------------------------------------------------------

    survey_answers:
                                    Files     Storage
        GPS_Test - Survey 3             1   195 Bytes
        Hi_Sample - Survey 1           30      2.1 kB
        Test_Study - Survey 3           7      3.2 kB
        Test_Study - Survey 5           8      4.6 kB
        iOS_Test_1 - Survey 5           2      1.5 kB
        iOS_Test_2 - Survey 5           3      1.4 kB

    survey_timings:
                                    Files     Storage
        GPS_Test - Survey 3             1   472 Bytes
        Hi_Sample - Survey 1           32      8.9 kB
        Test_Study - Survey 3          13     14.8 kB
        Test_Study - Survey 5         140     43.6 kB
        Test_Study - Survey 6         108     33.0 kB
        iOS_Test_1 - Survey 4           3     10.5 kB
        iOS_Test_1 - Survey 5           6      

In [86]:
# If desired, uncomment the following lines and delete the test output directory:
# import shutil
# shutil.rmtree(test_directory)