Josh Barback  
`barback@fas.harvard.edu`  
Onnela Lab, Harvard T. H. Chan School of Public Health  

# Managing raw Beiwe data

This notebook provides an overview of some features of the `beiwetools.manage` subpackage.  These modules provide tools for handling user identifier files and for creating registries of raw Beiwe data.  Before reviewing this notebook, you may wish to look at `configread_example.ipynb`.

Code is provided for two example tasks:
1. Create a registry for a single directory of raw data,
2. Create a registry for multiple directories.

In addition to creating registries, these two examples demonstrate how to:  
* Manage user names and object names,
* Review numerical summaries of raw user data,
* Plot summaries of data collection,
* Save and reload a raw data registry.

We'll use the publicly available Beiwe data set found here:  [https://zenodo.org/record/1188879#.XcDUyHWYW02](https://zenodo.org/record/1188879#.XcDUyHWYW02)

Begin by downloading and extracting `data.zip`.

In [1]:
import os
from beiwetools.manage import BeiweProject
from beiwetools.configread import BeiweConfig

ModuleNotFoundError: No module named 'beiwetools'

In [2]:
# Report logging messages:
import logging
logging.basicConfig(level=logging.INFO)

In [3]:
# Set the path to the folder containing the raw data directories:
data_dir = '/home/josh/Desktop/Beiwe_test_data/data' # change as needed

# Sample configuration files are located in examples/configuration_files:
examples_dir = os.getcwd() # change as needed
config_dir = os.path.join(examples_dir, 'configuration_files')

# Choose a directory for test ouput:
test_directory = os.path.join(examples_dir, 'test') # change as needed

# Define some study names:
study_names = ['GPS', 'iOS1', 'iOS2', 'Test', 'HiSamp']

# Get paths to raw data directories and configuration paths:
raw_dirs = sorted([os.path.join(data_dir,   d) for d in os.listdir(data_dir)  ])
temp =     sorted([os.path.join(config_dir, f) for f in os.listdir(config_dir)])
config_paths = [temp[0], temp[2], temp[3], temp[4], temp[1]]

# Match study names with configuration files and raw data directories:
config = dict(zip(study_names, config_paths))
raw    = dict(zip(study_names, raw_dirs))

# Make sure everything lines up:
for k in study_names:
    r, c = os.path.basename(raw[k]), os.path.basename(config[k])
    print('%s: \n\tRaw Data Directory: %s \n\tConfiguration File: %s \n' % (k, r, c))

GPS: 
	Raw Data Directory: onnela_lab_gps_testing 
	Configuration File: HSPH_Onnela_Lab_GPS_Testing_surveys_and_settings.json 

iOS1: 
	Raw Data Directory: onnela_lab_ios_test1 
	Configuration File: HSPH_Onnela_Lab_iOS_Test_Study_1_surveys_and_settings.json 

iOS2: 
	Raw Data Directory: onnela_lab_ios_test2 
	Configuration File: HSPH_Onnela_Lab_iOS_Test_Study_2_surveys_and_settings.json 

Test: 
	Raw Data Directory: onnela_lab_test1 
	Configuration File: Test_Study_#1_surveys_and_settings.json 

HiSamp: 
	Raw Data Directory: passive_data_high_sampling 
	Configuration File: HSPH_Onnela_Lab_Passive_Data_High_Sampling_surveys_and_settings.json 



### 1.  Create a registry for a directory of raw data

In [4]:
# First we'll look at data from users in the second iOS test study.
# Create a project from user data records:
raw_dir = raw['iOS2']
p = BeiweProject.create(raw_dir)

INFO:beiwetools.manage.classes:Loaded configuration files.
INFO:beiwetools.manage.classes:Created raw data registry for Beiwe user ID kiu5hvmv.
INFO:beiwetools.manage.classes:Created raw data registry for Beiwe user ID sxvpopdz.
INFO:beiwetools.manage.classes:Updated user name assignments.
INFO:root:Finished generating study records for 2 of 2 users.


In [5]:
# Review the project summary, and check how many user or object identifiers have been flagged with warnings:
p.summary.print()

# Note:  See docstrings for a description of each flag.  Any flagged identifiers are be stored in p.flags.



----------------------------------------------------------------------
Overview
----------------------------------------------------------------------
    Unique Beiwe Users: 2
    Study Name(s): Not found
    Raw Data Directories: 1
    First Observation: 2016-06-07 18:00:00 UTC
    Last  Observation: 2016-06-10 12:00:00 UTC
    Project Duration: 2.8 days

----------------------------------------------------------------------
Device Summary
----------------------------------------------------------------------
    iPhone Users: 2
    Android Users: 0

----------------------------------------------------------------------
Registry Summary
----------------------------------------------------------------------
    Raw Files: 164
    Storage: 96.6 MB
    Irregular Directories: 0
    Unregistered Files: 0

----------------------------------------------------------------------
Passive Data
----------------------------------------------------------------------

                      Files 

In [6]:
# To attach a configuration file to the project:
config_path = config['iOS2']
p.update_configurations(config_path)

# Object identifiers are then replaced with name assignments from the configuration.
# For example, here is the updated survey data summary:
lines = p.summary.to_string().split('\n')
for i in lines[38:52]: print(i)

INFO:beiwetools.manage.classes:Loaded configuration files.
INFO:beiwetools.manage.classes:Finished reading study names.
INFO:beiwetools.manage.classes:Finished reading object names.
INFO:beiwetools.manage.classes:Updated user name assignments.


----------------------------------------------------------------------
Survey Data
----------------------------------------------------------------------

    survey_answers:
                                    Files     Storage
        Survey 5                        3      1.4 kB

    survey_timings:
                                    Files     Storage
        Survey 2                        5      2.1 kB
        Survey 3                        2      4.8 kB
        Survey 5                        8      5.8 kB



In [7]:
# Individual user data records are stored in UserData objects.  These are found in p.data.
# We can review a summary of each user's data records.  
# For example, here is a summary of the first user's raw data:
i = p.ids[0]
p.data[i].summary.print()



----------------------------------------------------------------------
Identifiers
----------------------------------------------------------------------
    Beiwe User ID: kiu5hvmv
    User Name: Not found

----------------------------------------------------------------------
Raw Data Summary
----------------------------------------------------------------------
    First Observation: 2016-06-07 18:00:00 UTC
    Last  Observation: 2016-06-08 00:00:00 UTC
    Observation Period: 0.3 days
    Raw Files: 24
    Storage: 13.2 MB

----------------------------------------------------------------------
Device Records
----------------------------------------------------------------------
    Number of Phones: 1
    Phone OS: iOS

----------------------------------------------------------------------
Registry Issues
----------------------------------------------------------------------
    Irregular Directories: 0
    Unregistered Files: 0

--------------------------------------------------

In [8]:
# To export a project:
path = p.export('iOS Test 2', test_directory, track_time = False)

# Note:  Set track_time = True to isolate the export in a folder labeled with the local datatime.
# If track_time = False, any export with the same name will be overwritten.



In [9]:
# In the future, we may wish to reload the project:
q = BeiweProject.load(path)

# Loading from export creates an identical BeiweProject object:
p == q

INFO:beiwetools.manage.classes:Loaded raw data registry for Beiwe user ID kiu5hvmv.
INFO:beiwetools.manage.classes:Loaded raw data registry for Beiwe user ID sxvpopdz.
INFO:beiwetools.manage.classes:Loaded configuration files.


True

### 2.  Create a registry for multiple directories

In [20]:
# In some cases we may wish to merge data from Beiwe users enrolled in muliple studies.
# First let's make a dictionary attaching user IDs to configurations:
configurations = {}
for n in study_names:
    ids = os.listdir(raw[n])
    configurations.update(zip(ids, [config[n]]*len(ids)))

# And then create a project with these configuration assignments:
r = BeiweProject.create(raw_dirs, user_ids = 'all', configuration = configurations)

INFO:beiwetools.manage.classes:Loaded configuration files.
INFO:beiwetools.manage.classes:Finished reading study names.
INFO:beiwetools.manage.classes:Finished reading object names.
INFO:beiwetools.manage.classes:Created raw data registry for Beiwe user ID 6b38vskd.
INFO:beiwetools.manage.classes:Created raw data registry for Beiwe user ID efy3yeum.
INFO:beiwetools.manage.classes:Created raw data registry for Beiwe user ID kiu5hvmv.
INFO:beiwetools.manage.classes:Created raw data registry for Beiwe user ID lljhljce.
INFO:beiwetools.manage.classes:Created raw data registry for Beiwe user ID sxvpopdz.
INFO:beiwetools.manage.classes:Created raw data registry for Beiwe user ID tcqrulfj.
INFO:beiwetools.manage.classes:Updated user name assignments.
INFO:root:Finished generating study records for 6 of 6 users.


In [21]:
# There are some warnings.
# First, the logs indicate an unknown "dummy" survey type.  Let's verify that these are inactive (deleted) surveys.
# We can check all the BeiweConfig objects associated with the project:
for c_path in r.configurations:
    c = r.configurations[c_path]
    for sid in c.surveys:
        s = c.surveys[sid]
        if s.type == 'dummy':
            print('Dummy survey %s from %s has been deleted: %s' % (s.identifier, c.name, str(s.deleted)))

Dummy survey 58b77ae646b9fc10707b0004 from Test Study #1 has been deleted: True
Dummy survey 57ffc4ca1206f77c3dfcdfb3 from Test Study #1 has been deleted: True
Dummy survey 57ff944e1206f77c3dfcc869 from Test Study #1 has been deleted: True


In [22]:
# As before, review the project summary and flagged identifiers:
r.summary.print()



----------------------------------------------------------------------
Overview
----------------------------------------------------------------------
    Unique Beiwe Users: 6
    Study Name(s):
        HSPH Onnela Lab GPS Testing
        HSPH Onnela Lab Passive Data High Sampling
        HSPH Onnela Lab iOS Test Study 1
        HSPH Onnela Lab iOS Test Study 2
        Test Study #1
    Raw Data Directories: 5
    First Observation: 2016-01-26 19:00:00 UTC
    Last  Observation: 2017-02-13 13:00:00 UTC
    Project Duration: 383.8 days

----------------------------------------------------------------------
Device Summary
----------------------------------------------------------------------
    iPhone Users: 4
    Android Users: 2

----------------------------------------------------------------------
Registry Summary
----------------------------------------------------------------------
    Raw Files: 18039
    Storage: 4.9 GB
    Irregular Directories: 0
    Unregistered Files: 0



In [23]:
# The logging warning indicated that survey names were prefixed with study names.
# This was done to ensure that all names are unique, e.g. several studies may have "Survey 1".
# However, long object names will be cumbersome for plots, survey scoring, and documentation.
# Ideally, this should be addressed before creating a BeiweProject.
# To do this, first assign some shorter names:
new_paths = {}
for n in study_names:
    temp = BeiweConfig(config[n])
    old_names = list(temp.name_assignments.values())
    new_names = [s.replace(temp.name, 'n').replace('Survey ', 'S') for s in old_names]
    temp.update_names(dict(zip(old_names, new_names)))
    new_paths[config[n]] = temp.export(test_directory, track_time = False)

# Then update the configuration assignments:    
for i in configurations:
    configurations[i] = new_paths[configurations[i]]

# And finally update the project with the new configuration paths:
r.update_configurations(configurations)

# Here is what the new names look like:
lines = r.summary.to_string().split('\n')
for i in lines[38:60]: print(i)

INFO:beiwetools.manage.classes:Loaded configuration files.
INFO:beiwetools.manage.classes:Finished reading study names.
INFO:beiwetools.manage.classes:Finished reading object names.
INFO:beiwetools.manage.classes:Updated user name assignments.


        S1                             30      2.1 kB

    survey_timings:
                                    Files     Storage
        55db4c0597013e3fb50376a7      108     33.0 kB
        5613cfd497013e703b725e62        1   472 Bytes
        5751ca931206f715072b96c5        6      2.0 kB
        5751cc171206f715072b96c6        3     10.5 kB
        5756e64e1206f73b274d4e54        8      5.8 kB
        57587d961206f706f213fb8a        2      4.8 kB
        57587e411206f706f213fbb8        5      2.1 kB
        575f0ee81206f707453870f7      140     43.6 kB
        57ffc4ca1206f77c3dfcdfb3       13     14.8 kB
        S1                             32      8.9 kB

----------------------------------------------------------------------
Flagged Identifiers
----------------------------------------------------------------------
    ignored_users: 0
    no_registry: 0
    without_data: 0
    no_identifiers: 0
    irregular_directories: 0
    multiple_devices: 0
    unknown_os: 0
    unnamed_obj

In [30]:
r.lookup['object_name']

# Here is what the new names look like:
lines = r.summary.to_string().split('\n')
for i in lines[45:72]: print(i)


----------------------------------------------------------------------
Survey Data
----------------------------------------------------------------------

    survey_answers:
                                    Files     Storage
        5613cfd497013e703b725e62        1   195 Bytes
        5751ca931206f715072b96c5        2      1.5 kB
        5756e64e1206f73b274d4e54        3      1.4 kB
        575f0ee81206f707453870f7        8      4.6 kB
        57ffc4ca1206f77c3dfcdfb3        7      3.2 kB
        S1                             30      2.1 kB

    survey_timings:
                                    Files     Storage
        55db4c0597013e3fb50376a7      108     33.0 kB
        5613cfd497013e703b725e62        1   472 Bytes
        5751ca931206f715072b96c5        6      2.0 kB
        5751cc171206f715072b96c6        3     10.5 kB
        5756e64e1206f73b274d4e54        8      5.8 kB
        57587d961206f706f213fb8a        2      4.8 kB
        57587e411206f706f213fbb8        5      

In [None]:
# Use BeiweProject objects to get dictionaries of filepath registries and passive data settings:

r.assemble('gps')

r.settings('')



In [86]:
# If desired, uncomment the following lines and delete the test output directory:
# import shutil
# shutil.rmtree(test_directory)