<h1>Empatica Watch Session Fetcher</h1>

<p>The purpose of this notebook is very similar to the DataFetcher notebook, in that we'll be scraping data from Empatica's website, but the only difference is this notebook will be gathering only a user-specified amount of the most recent sessions created by a user instead of every session they've ever logged.</p>

<p>A successful implementation of this notebook will allow a user to say "I want my most recent 5 sessions each saved to their own folder for further analysis down the line".</p>

<p>Let's get right into it in a similar vein to the DataFetcher notebook.</p>

<h3>Pertinent Library Importing</h3>

<p>We'll be importing essentially the same libraries as we imported for the overall DataFetcher Notebook</p>

In [1]:
import os
import json
import getpass
import subprocess
from datetime import datetime
import requests
import zipfile
import StringIO
from shutil import copyfile
from bs4 import BeautifulSoup
from urllib2 import urlopen
from requests.auth import HTTPBasicAuth
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

<h3>Folder Structure Understanding</h3>

<p>We're going to be copying scripts in our current directory into each of the session folders in much the same way that we did in the overall data script. The structure will be the same in that each script will be put into a folder for each session.</p>

In [2]:
!ls

SessionFetcher.ipynb [31maddHeaders.sh[m[m


<p>So we see that there is 1 main script that we will be copying into each data session directory(addHeaders.sh). This is very subject to change as we add Matlab signal analysis scripts to each session. In the future we'll include user customizability in these Python notebooks to choose which matlab analysis scripts get added. </p>

In [3]:
folder_nums = int(subprocess.check_output('find ./* -maxdepth 0 -type d | wc -l', shell=True))
print folder_nums

0


<p>I agree. We haven't downloaded any sessions yet so there won't be any folders in our current directory.</p>

<h3>Empatica Site Exploration</h3>

In [4]:
# Base URL to navigate the data on their site. Every URL will be an extension of this.
base_url = 'https://www.empatica.com/connect/'

# Start HTTP Session. This allows us to stay "logged in"
s = requests.session()

<p>So now we have a login session created for Empatica's site but no actual data supplied to that session. That hardly does us any good, so we'll use the login necessary that links us to one individual patient's watch</p>

In [5]:
# We'll supply the login information in base64 encrypted data in a manner compatible with Empatica's site
data = {'username':raw_input('\nWhat is your email: '), 'password':getpass.getpass('What is your password: ')}


What is your email: torment10@aim.com
What is your password: ········


In [6]:
# Did the login work?
login_response = s.post(base_url+'authenticate.php', data)
print login_response.status_code

200


In [7]:
# We can check the html code for the page we landed on with the following code.
sessions_response_html = s.get(base_url+'connect.php/users/19466/sessions?from=0&to=999999999999')
print sessions_response_html

<Response [200]>


In [8]:
# We don't just want a response message, we want data! Let's get the actual html code from that site.
sessions_json = sessions_response_html.text

In [9]:
# This should create a more usable, UTF-8 data variable for us to scrape.
# All UTF-8 means in this context is that it's text we can parse.
parsed = json.loads(sessions_json)

In [10]:
# Let's clean up those JSON's to make them a bit more readable and indexable via Python.
sessions_list = json.dumps(parsed, indent=4, sort_keys=True)
print sessions_list

[
    {
        "device": "E4 2.1", 
        "device_id": "fc9618", 
        "duration": "10567", 
        "exit_code": "0", 
        "id": "548689", 
        "label": "5034", 
        "start_time": "1537260782", 
        "status": "0"
    }, 
    {
        "device": "E4 2.1", 
        "device_id": "fc9618", 
        "duration": "7590", 
        "exit_code": "0", 
        "id": "548769", 
        "label": "5034", 
        "start_time": "1537271996", 
        "status": "0"
    }, 
    {
        "device": "E4 2.1", 
        "device_id": "fc9618", 
        "duration": "3561", 
        "exit_code": "0", 
        "id": "548878", 
        "label": "5034", 
        "start_time": "1537284101", 
        "status": "0"
    }, 
    {
        "device": "E4 2.1", 
        "device_id": "fc9618", 
        "duration": "17427", 
        "exit_code": "0", 
        "id": "549911", 
        "label": "5034", 
        "start_time": "1537431643", 
        "status": "0"
    }, 
    {
        "device": "E4 2.1"

<h3>Local Saving</h3>

In [11]:
# Get a variable for the number of sessions that we see from the website.
num_sessions = len(parsed)
print num_sessions

42


<p>So we have 42 overall sessions. This isn't an incredible amount that would take too long, but say we have 400 sessions and just performed a lab session that only uses 5 sessions. That doesn't do us any good downloading 395 extra sessions and it's a waste of storage/resources. So we'll ask the user how long their lab session was.</p>

In [12]:
lab_length = int(raw_input('How many Empatica sessions did your lab protocol span?: '))

How many Empatica sessions did your lab protocol span?: 10


<p>So we've got our user input. Let's get the full set of sessions for the user and then create a subset of that representing the last 'lab_length' number of E4 sessions.</p>

In [13]:
last_lab_sessions = parsed[-lab_length:]
id_list = [last_lab_sessions[i]['id'] for i in range(0,len(last_lab_sessions))]
print len(last_lab_sessions)

10


<p>Let's see the dates for each of these sessions and make sure they're appropriately close.</p>

In [14]:
# List of all start dates for each session
date_list_datetime = [datetime.utcfromtimestamp(int(last_lab_sessions[i]['start_time'])).strftime('%Y-%m-%d-%H:%M:%S') for i in range(0,len(last_lab_sessions))]
print date_list_datetime
print len(date_list_datetime)

['2018-10-25-07:02:36', '2018-10-25-07:08:27', '2018-10-25-07:13:06', '2018-10-25-07:17:49', '2018-10-25-07:22:19', '2018-10-25-07:28:31', '2018-10-25-07:35:40', '2018-10-25-07:44:30', '2018-10-25-07:48:28', '2018-10-25-07:52:32']
10


<p>Awesome, we've got 10 dates that are all within an hour of each other. Nice efficient lab session.</p>

<p>Now let's get into the actual logic to save these sessions to our computer.</p>

In [15]:
downloads_url = base_url+'download.php'

for i,id_number in enumerate(id_list):
    print "Downloading Session: ", id_number
    download_full_url = downloads_url + '?id=' + id_number
    download = s.get(download_full_url)

    z = zipfile.ZipFile(StringIO.StringIO(download.content))

    timestamp = date_list_datetime[i]
    my_dir = os.path.join(os.getcwd(), timestamp)

    if not os.path.isdir(my_dir):
        os.makedirs(my_dir)
    z.extractall(my_dir)

    with open(my_dir+'/json_info.json', 'w') as outfile:
        json.dump(last_lab_sessions[i], outfile)

    print "Download complete to folder",timestamp,"\n"

Downloading Session:  568011
Download complete to folder 2018-10-25-07:02:36 

Downloading Session:  568017
Download complete to folder 2018-10-25-07:08:27 

Downloading Session:  568022
Download complete to folder 2018-10-25-07:13:06 

Downloading Session:  568027
Download complete to folder 2018-10-25-07:17:49 

Downloading Session:  568029
Download complete to folder 2018-10-25-07:22:19 

Downloading Session:  568033
Download complete to folder 2018-10-25-07:28:31 

Downloading Session:  568042
Download complete to folder 2018-10-25-07:35:40 

Downloading Session:  568051
Download complete to folder 2018-10-25-07:44:30 

Downloading Session:  568053
Download complete to folder 2018-10-25-07:48:28 

Downloading Session:  568054
Download complete to folder 2018-10-25-07:52:32 



<p>Absolutely gorgeous, 10 folders created with timestamp names. We can just manually set these into an overall session using Unix commands or otherwise do it pythonically if it becomes a burden to do. Ideally the workflow would just be putting all date folders into an overall folder immediately. Let's do the Unix commands here in our notebook, yay Python.</p>

In [16]:
!mkdir Oct25LabSession

In [17]:
!ls

[34m2018-10-25-07:02:36[m[m  [34m2018-10-25-07:28:31[m[m  [34mOct25LabSession[m[m
[34m2018-10-25-07:08:27[m[m  [34m2018-10-25-07:35:40[m[m  SessionFetcher.ipynb
[34m2018-10-25-07:13:06[m[m  [34m2018-10-25-07:44:30[m[m  [31maddHeaders.sh[m[m
[34m2018-10-25-07:17:49[m[m  [34m2018-10-25-07:48:28[m[m
[34m2018-10-25-07:22:19[m[m  [34m2018-10-25-07:52:32[m[m


In [18]:
!mv 2018* Oct25LabSession/

In [19]:
!ls Oct25LabSession/

[34m2018-10-25-07:02:36[m[m [34m2018-10-25-07:17:49[m[m [34m2018-10-25-07:35:40[m[m [34m2018-10-25-07:52:32[m[m
[34m2018-10-25-07:08:27[m[m [34m2018-10-25-07:22:19[m[m [34m2018-10-25-07:44:30[m[m
[34m2018-10-25-07:13:06[m[m [34m2018-10-25-07:28:31[m[m [34m2018-10-25-07:48:28[m[m


In [20]:
!ls

[34mOct25LabSession[m[m      SessionFetcher.ipynb [31maddHeaders.sh[m[m


<p>Gorgeous.</p>

<h3>Conclusion</h3>

<p>That's basically all we want. We've got all of our data downloaded and ready to analyze!</p>

<p>The next step for our signal processing will be within the Oct25LabSession folder we've created. We need to create some data and engineering features for our signals in that directory so I'll see you in the First BVP Lab Analysis Notebook!</p>