<h1> HIMSS Demo - HealtheDatalab </h1>
<h2> Structured Machine Learning using Tensorflow </h2>

This notebook illustrates:
<ol>
<li> Prerequisites - Installation of dependencies/project setup
<li> Preparation of data (Bunsen library/HealtheDatalab library)
<li> Label generation 
<li> Features generation (TF Sequence examples Generation)
<li> Train and Evaluate Machine Learning Model
<li> Demonstrate prediction (los) using the model
</ol>
<hr />


<h2> Prerequisites </h2>
Install/upgrade packages
Execute this only once.
Reset kernel after this cell


In [None]:
#CODE(WIP)

<h2> Preparation of data </h2>
This cell creates FHIR bundles from RAW Synthetic data

In [None]:
from pyspark.sql import SparkSession

# Enable Hive support for our session so we can save resources as Hive tables
spark = SparkSession.builder \
                    .config('hive.exec.dynamic.partition.mode', 'nonstrict') \
                    .enableHiveSupport() \
                    .getOrCreate()
from bunsen.stu3.bundles import load_from_directory, extract_entry, write_to_database

# Load and cache the bundles so we don't reload them every time.
bundles = load_from_directory(spark, 'gs://bunsen/data/bundles').cache()

# Get the encounters and patients
encounters = extract_entry(spark, bundles, 'encounter')
patients = extract_entry(spark, bundles, 'patient')

# (TBD) The json bundles are transformed to TF record format
 

<h2> Label generation </h2>
Input: FHIR bundles
Output: Labels

In [None]:
from absl import app
from absl import flags
import apache_beam as beam
from proto.stu3 import google_extensions_pb2
from proto.stu3 import resources_pb2
from py.google.fhir.labels import encounter
from py.google.fhir.labels import label

@beam.typehints.with_input_types(resources_pb2.Bundle)
@beam.typehints.with_output_types(google_extensions_pb2.EventLabel)
class LengthOfStayRangeLabelAt24HoursFn(beam.DoFn):
  """Converts Bundle into length of stay range at 24 hours label.

    Cohort: inpatient encounter that is longer than 24 hours
    Trigger point: 24 hours after admission
    Label: multi-label for length of stay ranges, see label.py for detail
  """

  def process(self, bundle):
    """Iterate through bundle and yield label.

    Args:
      bundle: input stu3.Bundle proto
    Yields:
      stu3.EventLabel proto.
    """
    patient = encounter.GetPatient(bundle)
    if patient is not None:
      # Cohort: inpatient encounter > 24 hours.
      for enc in encounter.Inpatient24HrEncounters(bundle):
        for one_label in label.LengthOfStayRangeAt24Hours(patient, enc):
          yield one_label
          
          
          
from apache_beam.options.pipeline_options import GoogleCloudOptions
from apache_beam.options.pipeline_options import StandardOptions
from apache_beam.options.pipeline_options import SetupOptions
from apache_beam.options.pipeline_options import PipelineOptions

from apache_beam.io import ReadFromText
from apache_beam.io import WriteToText
from apache_beam.metrics import Metrics
from apache_beam.metrics.metric import MetricsFilter

import apache_beam as beam
import re


options = PipelineOptions()
google_cloud_options = options.view_as(GoogleCloudOptions)
google_cloud_options.project = 'grand-magpie-222719'
google_cloud_options.job_name = 'job1'
google_cloud_options.staging_location = 'gs://de-testbunsen/staging'
google_cloud_options.temp_location = 'gs://de-testbunsen/temp'
options.view_as(StandardOptions).runner = 'DirectRunner'

p = beam.Pipeline(options=options)

bundles = p | 'read' >> beam.io.ReadFromTFRecord(
    'gs://de-testbunsen/data/test_bundle.tfrecord-00000-of-00001', coder=beam.coders.ProtoCoder(resources_pb2.Bundle))
    
labels = bundles | 'BundleToLabel' >> beam.ParDo(
    LengthOfStayRangeLabelAt24HoursFn())
_ = labels | beam.io.WriteToTFRecord(
    'gs://de-testbunsen/data',
    coder=beam.coders.ProtoCoder(google_extensions_pb2.EventLabel))


p.run().wait_until_finish()

<h2> Features generation </h2>
Input: FHIR bundles, GS Bucket with "Bundle (tfrecord) and GS Bucket "Labels (tfrecord)"
Output: tf seqex example. This has 1. Context - patient information 2. Timeseries data - such as encounters.
Status: direct runner - time consuming for 1000+ records. 

In [None]:
#CODE(WIP)

<h2> Train and Evaluate Machine Learning Model </h2>
Input: Training and Evaluation Dataset
Output: Model

In [None]:
#Progress: Basic building blocks WORK 
#Need to stitch up ouptut of the previous cell.


<h2> Demonstrate prediction (los) using the model </h2>
Input: New Data set
Output: Length Of Stay Prediction

In [None]:
#Use 3rd set of unseen data by model
#Use trained model from previous cell


## FINALLY - WHEN YOU ARE READY - deploy to CMLE - BOOM!