# Welcome to the Healthcare Hackathon

This weekend you will be working on solving critical healthcare problems using your Data Science skills. This notebook will help you get up and running quickly with the data. But before we get to the data, first we need to configure a username and email.

In [1]:
!git config --global user.name Francisco
!git config --global user.email francisconfqsimoes@gmail.com

Next we need to setup a git branch. This is the branch you will use during the hackathon.

In [2]:
# Fill in the name of your team as the branch name
!git checkout -b mosquitopie_prod
!git push --set-upstream origin $(git rev-parse --abbrev-ref HEAD)

M	introduction.ipynb
Switched to a new branch 'mosquitopie_prod'
Total 0 (delta 0), reused 0 (delta 0)
To https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/cp-hackathon-repository
 * [new branch]      mosquitopie_prod -> mosquitopie_prod
Branch mosquitopie_prod set up to track remote branch mosquitopie_prod from origin.


Now you have a branch to work with. If you want to create commits and push your code you can execute the following snippet: `!git commit -a -m "Add a descriptive message here for your commits"`

# Getting Started with the data

Several data sets are available this weekend. Each data set can be downloaded from s3. For each subject in the hackathon we created a seperate folder. The following folders are available: CarePay, MCF, MomCare, NWH, RedCross, SafeCare, SatelliteImages. You can use the following code snippet to list the data in the s3 bucket and download the data:

In [11]:
import boto3
import pandas as pd
from sagemaker import get_execution_role

def list_data(folder):
    """
    List all available data sets in the folder
    """
    s3 = boto3.resource('s3')
    bucket='cp-hackathon-data'
    hackathon_data = s3.Bucket(bucket)
    keys = []
    for obj in hackathon_data.objects.filter(Prefix=folder):
        keys.append(obj.key)
    return keys
def get_data(key):
    role = get_execution_role()
    bucket='cp-hackathon-data'
    data_location = 's3://{}/{}'.format(bucket, key)
    return pd.read_csv(data_location)

With the `get_data` function you can download the relevant dataset from s3. For the people that like R, you can achieve something similar with the following snippet:

```
library(reticulate)
sagemaker <- import('sagemaker')
session <- sagemaker$Session()
role_arn <- sagemaker$get_execution_role()
folder <- <fill in your folder here>
session$list_s3_files('cp-hackathon-data', folder)
session$read_s3_file('cp-hackathon-data', <some key in the bucket here>)
```

For more information on R, check out [this link to the documentation](https://sagemaker.readthedocs.io/en/stable/session.html#sagemaker.session.Session.read_s3_file)

In [12]:
list_data('RedCross')

['RedCross/',
 'RedCross/ovitrap_data.csv',
 'RedCross/ovitrap_data_monthly_province.csv',
 'RedCross/weather_data_monthly_province.csv']

In [20]:
data_ovitrap_per_school = get_data('RedCross/ovitrap_data.csv')

Couldn't call 'get_role' to get Role ARN from role name cp-sagemaker-hackathon-role to get Role path.


In [21]:
data_ovitrap_per_province = get_data('RedCross/ovitrap_data_monthly_province.csv')

Couldn't call 'get_role' to get Role ARN from role name cp-sagemaker-hackathon-role to get Role path.


In [15]:
data_weather = get_data('RedCross/weather_data_monthly_province.csv')

Couldn't call 'get_role' to get Role ARN from role name cp-sagemaker-hackathon-role to get Role path.


# Happy Hacking!

In [23]:
data_ovitrap_per_school[:5]

Unnamed: 0.1,Unnamed: 0,date,id,latitude,longitude,name,value
0,0,2012-06-19,3,14.5044,121.056,Kapitan Jose Cardones Memorial School,17.31
1,1,2012-07-03,3,14.5044,121.056,Kapitan Jose Cardones Memorial School,15.52
2,2,2012-07-17,3,14.5044,121.056,Kapitan Jose Cardones Memorial School,30.0
3,3,2012-07-24,3,14.5044,121.056,Kapitan Jose Cardones Memorial School,22.64
4,4,2012-09-11,3,14.5044,121.056,Kapitan Jose Cardones Memorial School,28.13


In [34]:
data_ovitrap_per_province[10:15] 

Unnamed: 0,adm_level,date,count_ovi,mean_ovi,error_ovi,error_relative_ovi
1000,Ifugao,2015-05-01,4.0,17.0,0.0,0.0
1001,Ifugao,2015-06-01,0.0,,,
1002,Ifugao,2015-07-01,32.0,17.283437,4.999864,0.289286
1003,Ifugao,2015-08-01,43.0,25.156279,4.983643,0.198107
1004,Ifugao,2015-09-01,44.0,27.520682,4.536097,0.164825


In [35]:
data_ovitrap_per_province[1000:1005]

Unnamed: 0,adm_level,date,count_ovi,mean_ovi,error_ovi,error_relative_ovi
1000,Ifugao,2015-05-01,4.0,17.0,0.0,0.0
1001,Ifugao,2015-06-01,0.0,,,
1002,Ifugao,2015-07-01,32.0,17.283437,4.999864,0.289286
1003,Ifugao,2015-08-01,43.0,25.156279,4.983643,0.198107
1004,Ifugao,2015-09-01,44.0,27.520682,4.536097,0.164825


In [25]:
data_weather[:5]

Unnamed: 0,adm_level,date,JAXA_GPM_L3_GSMaP_v6_operational_hourlyPrecipRateGC,MODIS_006_MOD11A1_LST_Day_1km,MODIS_006_MOD11A1_LST_Night_1km,MODIS_006_MYD13A1_EVI,NASA_FLDAS_NOAH01_C_GL_M_V001_Qair_f_tavg,NASA_FLDAS_NOAH01_C_GL_M_V001_Rainf_f_tavg,NASA_FLDAS_NOAH01_C_GL_M_V001_SoilMoi00_10cm_tavg,NASA_FLDAS_NOAH01_C_GL_M_V001_SoilTemp00_10cm_tavg,NASA_FLDAS_NOAH01_C_GL_M_V001_Tair_f_tavg,NASA_FLDAS_NOAH01_C_GL_M_V001_Wind_f_tavg
0,Abra,2012-06-01,0.430867,15043.554,14639.858,4180.815,0.016763,0.000172,0.425508,296.31796,297.09457,4.97715
1,Abra,2012-07-01,0.373154,15020.108,14632.604,3662.6284,0.016362,0.000186,0.428822,295.84708,296.72488,3.478785
2,Abra,2012-08-01,0.620665,14917.032,14486.046,3183.575,0.016541,0.000245,0.43218,295.6657,296.15265,6.328744
3,Abra,2012-09-01,0.10662,14980.0205,14696.892,4785.534,0.015967,9.9e-05,0.419719,296.26144,296.81122,3.009687
4,Abra,2012-10-01,0.053113,14986.928,14666.716,4770.0083,0.014127,3.9e-05,0.390687,295.36737,295.53925,4.924677


In [37]:
print('province shape: {}, weather shape: {}'.format(data_ovitrap_per_province.shape, data_weather.shape))

province shape: (8256, 6), weather shape: (6873, 12)
