## Developing a recipe of your own 

Until now, we covered how to configure and use dcarte to download the raw datasets collected by the ukdri. We've also learned that dcarte has three derived domains to extend its capabilities by adding some cleaning logic over the raw data.

In the last tutorial, we will create a new domain and one new derived dataset.

We start like always by loading some key libraries. 

In [1]:
import dcarte 
import pandas as pd 
import seaborn as sns 
import matplotlib.pyplot as plt
import numpy as np 

ModuleNotFoundError: No module named 'dcarte'

This notebook deconstructs the motion dataset in the base domain. We will go over the different steps taken to create this dataset as an example that I hope you can take further in your research.

In [2]:
dcarte.__version__

'0.3.33'

Here is a print screen of the output of reloading the entire motion dataset to review the different parts that create this simple building block dataset.   
**please don't run this during the tutorial as it will take around 60 min to reconstruct this dataset completely**
![](imgs/figure-06.png)

After the initial download, we have an updated version of the different parent datasets needed to reconstruct the motion dataset.  
And as a result, dcarte will load the dataset from the local store, which is very fast.

In [3]:
motion = dcarte.load('motion', 'base')

Finished Loading motion in:                    2.2 seconds   


As you can see, the motion dataset uses various datasets, all updated to a recent version since we just reloaded them.  
let's load the different elements using dcarte.

In [4]:
activity = dcarte.load('activity', 'raw')
entryway = dcarte.load('entryway', 'base')
bed_occupancy = dcarte.load('bed_occupancy', 'base')

Finished Loading activity in:                  2.6 seconds   
Finished Loading entryway in:                  0.1 seconds   
Finished Loading bed_occupancy in:             0.1 seconds   


In [5]:
bed_occupancy.head(10)

Unnamed: 0,patient_id,start_date,location_name
0,2GN1PHeHwRzNYQ7q4Nvg7g,2021-05-29 04:20:00,Bed_out
1,2GN1PHeHwRzNYQ7q4Nvg7g,2021-05-29 04:56:00,Bed_in
2,2GN1PHeHwRzNYQ7q4Nvg7g,2021-05-29 04:58:00,Bed_out
3,2GN1PHeHwRzNYQ7q4Nvg7g,2021-05-29 05:00:00,Bed_in
4,2GN1PHeHwRzNYQ7q4Nvg7g,2021-05-29 07:10:00,Bed_out
5,2GN1PHeHwRzNYQ7q4Nvg7g,2021-05-29 07:20:00,Bed_in
6,2GN1PHeHwRzNYQ7q4Nvg7g,2021-05-29 07:48:00,Bed_out
7,2GN1PHeHwRzNYQ7q4Nvg7g,2021-05-29 14:36:00,Bed_in
8,2GN1PHeHwRzNYQ7q4Nvg7g,2021-05-29 16:47:00,Bed_out
9,2GN1PHeHwRzNYQ7q4Nvg7g,2021-05-29 16:57:00,Bed_in


In [6]:
entryway.head(10)

Unnamed: 0,patient_id,location_name,start_date,end_date,source,sink,transition,dur
0,2GN1PHeHwRzNYQ7q4Nvg7g,front door,2021-05-14 13:33:44,2021-05-14 13:34:56,opened,closed,opened>closed,72.0
1,2GN1PHeHwRzNYQ7q4Nvg7g,front door,2021-05-14 13:34:56,2021-05-14 13:50:30,opened,closed,opened>closed,934.0
2,2GN1PHeHwRzNYQ7q4Nvg7g,front door,2021-05-14 13:50:30,2021-05-14 13:50:32,opened,closed,opened>closed,2.0
3,2GN1PHeHwRzNYQ7q4Nvg7g,front door,2021-05-14 13:50:32,2021-05-14 14:39:52,opened,closed,opened>closed,2960.0
4,2GN1PHeHwRzNYQ7q4Nvg7g,front door,2021-05-14 14:41:35,2021-05-14 14:41:39,opened,closed,opened>closed,4.0
5,2GN1PHeHwRzNYQ7q4Nvg7g,front door,2021-05-14 15:17:30,2021-05-14 15:17:38,opened,closed,opened>closed,8.0
6,2GN1PHeHwRzNYQ7q4Nvg7g,front door,2021-05-14 15:18:52,2021-05-14 15:19:02,opened,closed,opened>closed,10.0
7,2GN1PHeHwRzNYQ7q4Nvg7g,front door,2021-05-14 15:24:54,2021-05-14 15:24:59,opened,closed,opened>closed,5.0
8,2GN1PHeHwRzNYQ7q4Nvg7g,front door,2021-05-14 15:39:56,2021-05-14 15:40:03,opened,closed,opened>closed,7.0
9,2GN1PHeHwRzNYQ7q4Nvg7g,front door,2021-05-14 16:08:34,2021-05-14 16:08:50,opened,closed,opened>closed,16.0


In [7]:
activity.head(10)

Unnamed: 0,start_date,patient_id,home_id,location_id,location_name,source
0,2020-12-09 14:53:17,Dr1tnvKDk2bHG8YS21nwiR,8dmPt4UuEdAC7p9XpWpMXQ,PNPiyDzPr9baA5KeXBso6p,bedroom1,raw_activity_pir
1,2020-12-09 14:53:18,Dr1tnvKDk2bHG8YS21nwiR,8dmPt4UuEdAC7p9XpWpMXQ,2gdBq1z4fMLeNrc9C9JQXJ,kitchen,raw_activity_pir
2,2020-12-09 14:53:18,Dr1tnvKDk2bHG8YS21nwiR,8dmPt4UuEdAC7p9XpWpMXQ,TwGbQd3ozRP3DhtjUKzwSZ,bathroom1,raw_activity_pir
3,2020-12-09 14:53:19,Dr1tnvKDk2bHG8YS21nwiR,8dmPt4UuEdAC7p9XpWpMXQ,PoSbtob4YKWLw2bwk1mEGy,lounge,raw_activity_pir
4,2020-12-09 14:53:36,Dr1tnvKDk2bHG8YS21nwiR,8dmPt4UuEdAC7p9XpWpMXQ,J4kNh4xK52vf6NhXA5Lvhi,hallway,raw_activity_pir
5,2020-12-09 14:54:11,Dr1tnvKDk2bHG8YS21nwiR,8dmPt4UuEdAC7p9XpWpMXQ,TwGbQd3ozRP3DhtjUKzwSZ,bathroom1,raw_activity_pir
6,2020-12-09 14:54:12,Dr1tnvKDk2bHG8YS21nwiR,8dmPt4UuEdAC7p9XpWpMXQ,2gdBq1z4fMLeNrc9C9JQXJ,kitchen,raw_activity_pir
7,2020-12-09 14:54:13,Dr1tnvKDk2bHG8YS21nwiR,8dmPt4UuEdAC7p9XpWpMXQ,PNPiyDzPr9baA5KeXBso6p,bedroom1,raw_activity_pir
8,2020-12-09 14:55:18,Dr1tnvKDk2bHG8YS21nwiR,8dmPt4UuEdAC7p9XpWpMXQ,PNPiyDzPr9baA5KeXBso6p,bedroom1,raw_activity_pir
9,2020-12-09 14:59:53,Dr1tnvKDk2bHG8YS21nwiR,8dmPt4UuEdAC7p9XpWpMXQ,PNPiyDzPr9baA5KeXBso6p,bedroom1,raw_activity_pir


In [8]:
from dcarte.utils import localize_time
activity = localize_time(activity,['start_date'])
activity.head(10)

Europe/London


ValueError: No objects to concatenate

In [None]:
fact = ['patient_id','location_name', 'start_date']
motion = pd.concat([activity[fact],bed_occupancy[fact],entryway[fact]])

In [None]:
motion = motion.sort_values(['patient_id','start_date'])

In [None]:
mapping = {'bathroom1': 'Bathroom', 
               'WC1': 'Bathroom',
               'kitchen': 'Kitchen',
               'hallway': 'Hallway',
               'corridor1': 'Hallway',
               'dining room': 'Lounge',
               'living room': 'Lounge',
               'lounge': 'Lounge',
               'bedroom1': 'Bedroom',
               'front door': 'Front door',
               'back door': 'Back door'}   
motion.location_name = motion.location_name.map(mapping)

In [None]:
motion = motion[~motion.location_name.isin(['office', 
                                                'conservatory', 
                                                'study', 
                                                'cellar'])]   

In [None]:
motion.head(10)

Unnamed: 0,patient_id,location_name,start_date
360442,2GN1PHeHwRzNYQ7q4Nvg7g,Bathroom,2021-05-14 13:34:46
360443,2GN1PHeHwRzNYQ7q4Nvg7g,Kitchen,2021-05-14 13:34:51
360444,2GN1PHeHwRzNYQ7q4Nvg7g,Bedroom,2021-05-14 13:34:52
360447,2GN1PHeHwRzNYQ7q4Nvg7g,Hallway,2021-05-14 13:34:55
360448,2GN1PHeHwRzNYQ7q4Nvg7g,Lounge,2021-05-14 13:34:55
0,2GN1PHeHwRzNYQ7q4Nvg7g,Front door,2021-05-14 13:34:56
360460,2GN1PHeHwRzNYQ7q4Nvg7g,Lounge,2021-05-14 13:35:37
360479,2GN1PHeHwRzNYQ7q4Nvg7g,Lounge,2021-05-14 13:36:49
360480,2GN1PHeHwRzNYQ7q4Nvg7g,Bedroom,2021-05-14 13:36:49
360481,2GN1PHeHwRzNYQ7q4Nvg7g,Bathroom,2021-05-14 13:36:50


In [None]:
motion = motion.reset_index(drop=True)

In [None]:
motion.head(10)

Unnamed: 0,patient_id,location_name,start_date
0,2GN1PHeHwRzNYQ7q4Nvg7g,Bathroom,2021-05-14 13:34:46
1,2GN1PHeHwRzNYQ7q4Nvg7g,Kitchen,2021-05-14 13:34:51
2,2GN1PHeHwRzNYQ7q4Nvg7g,Bedroom,2021-05-14 13:34:52
3,2GN1PHeHwRzNYQ7q4Nvg7g,Hallway,2021-05-14 13:34:55
4,2GN1PHeHwRzNYQ7q4Nvg7g,Lounge,2021-05-14 13:34:55
5,2GN1PHeHwRzNYQ7q4Nvg7g,Front door,2021-05-14 13:34:56
6,2GN1PHeHwRzNYQ7q4Nvg7g,Lounge,2021-05-14 13:35:37
7,2GN1PHeHwRzNYQ7q4Nvg7g,Lounge,2021-05-14 13:36:49
8,2GN1PHeHwRzNYQ7q4Nvg7g,Bedroom,2021-05-14 13:36:49
9,2GN1PHeHwRzNYQ7q4Nvg7g,Bathroom,2021-05-14 13:36:50


In [None]:
from dcarte.local import LocalDataset
from dcarte.config import get_config
cfg = get_config()


In [None]:
cfg

{'compression': 'GZIP',
 'data_folder': '/Users/eyalsoreq/dcarte/data',
 'domains': [{'dataset': 'vital_signs', 'domain': 'raw'},
  {'dataset': 'blood_pressure', 'domain': 'raw'},
  {'dataset': 'environmental', 'domain': 'raw'},
  {'dataset': 'activity', 'domain': 'raw'},
  {'dataset': 'door', 'domain': 'raw'},
  {'dataset': 'sleep_event', 'domain': 'raw'},
  {'dataset': 'sleep_mat', 'domain': 'raw'},
  {'dataset': 'behavioural', 'domain': 'raw'},
  {'dataset': 'appliances', 'domain': 'raw'},
  {'dataset': 'encounter', 'domain': 'raw'},
  {'dataset': 'procedure', 'domain': 'raw'},
  {'dataset': 'issue', 'domain': 'raw'},
  {'dataset': 'observation_notes', 'domain': 'raw'},
  {'dataset': 'device_types', 'domain': 'lookup'},
  {'dataset': 'patients', 'domain': 'lookup'},
  {'dataset': 'homes', 'domain': 'lookup'},
  {'dataset': 'activity_dailies', 'domain': 'profile'},
  {'dataset': 'activity_weeklies', 'domain': 'profile'},
  {'dataset': 'sleep_dailies', 'domain': 'profile'},
  {'datase

In [None]:
domain = 'base'
module = 'base'
parent_datasets = [['activity','raw'],
                   ['entryway','base'],
                   ['bed_occupancy','base']]
p_datasets = {d[0]:dcarte.load(*d) for d in parent_datasets}
_ = LocalDataset(dataset_name = 'motion_new',
             datasets = p_datasets,
             pipeline = ['process_motion'],
             domain = domain,
             module = module,
             module_path = f'{cfg["data_folder"]}/recipes/{domain}/{module}.py',
             dependencies = parent_datasets)


Finished Loading activity in:                  0.6 seconds   
Finished Loading entryway in:                  0.1 seconds   
Finished Loading bed_occupancy in:             0.1 seconds   


In [None]:
dcarte.domains()

Unnamed: 0,RAW,BASE,LEGACY,PROFILE,SLEEP_STUDY,LOOKUP,BED_HABITS
0,Activity,Bed_Occupancy,Device_Type,Activity_Dailies,Diurnal,Device_Types,Bed_Occupancy
1,Appliances,Doors,Doors,Activity_Weeklies,Nocturnal,Homes,
2,Behavioural,Entryway,Entryway,Light,Withings_Nights,Patients,
3,Blood_Pressure,Habitat,Flags,Physiology_Dailies,Withings_Tidy,,
4,Door,Kitchen,Light,Physiology_Weeklies,,,
5,Encounter,Motion,Motion,Sleep_Dailies,,,
6,Environmental,Motion_New,Observation,Sleep_Weeklies,,,
7,Issue,Physiology,Physiology,Temperature,,,
8,Observation_Notes,Sleep,Temperature,,,,
9,Procedure,Transitions,Wellbeing,,,,


In [None]:
dcarte.load('Homes','lookup')

Processing homes :*

Downloading homes: 100%|██████████| 1/1 [00:02<00:00,  2.57s/it]

Finished Loading Homes in:                    15.4 seconds   





Unnamed: 0,home_id,patient_id,n_known_occupants,source
0,8dmPt4UuEdAC7p9XpWpMXQ,Dr1tnvKDk2bHG8YS21nwiR,1,homes
1,3urzngo8P3h1Qpj4LJ1L6F,BLDnFyF4xzynLB7BhRd9XL,1,homes
2,G33KwgsR6sA9PnsrM2xgtg,JYN9EVX3wyv76VbubFPpUB,1,homes
3,XHLVPner73QuYyJ7Q5UBrs,H4pbjYo1fKg2KmgyZarpy2,1,homes
4,Q2XVykEm73scBgKQgwVcwu,5xSoeEijWNFxswzMEAih71,1,homes
...,...,...,...,...
144,Xssm3gdFMthPYSo13QBbxi,RxCHTWbYDmuRtYBfVxGtfP,1,homes
145,VXoxkPHAXmXXLfs4QAZbnZ,LkREF5vWKVcNDP5EkPdJu2,1,homes
146,TUrxezEXJntW4YxUesbpwE,9MKzv1CrANBHCfwTJ3pKDS,1,homes
147,VyySRztD7uqjWjM3gBfsvS,KNdFgnVbDHDVXaSNeCoWGG,1,homes
