# Bed Occupancy Forecasting

In this notebook we use Azure AutoML to forecast the bed occupancy for each day for each city.

$*****$ For Demonstration purpose only, Please customize as per your enterprise security needs and compliances.License agreement: https://github.com/microsoft/Azure-Analytics-and-AI-Engagement/blob/main/HealthCare/License.md $*****$ 

## Legal Notices 

This presentation, demonstration, and demonstration model are for informational purposes only. Microsoft makes no warranties, express or implied, in this presentation demonstration, and demonstration model. Nothing in this presentation, demonstration, or demonstration model modifies any of the terms and conditions of Microsoft’s written and signed agreements. This is not an offer and applicable terms and the information provided is subject to revision and may be changed at any time by Microsoft.

This presentation, demonstration, and/or demonstration model do not give you or your organization any license to any patents, trademarks, copyrights, or other intellectual property covering the subject matter in this presentation, demonstration, and demonstration model.

The information contained in this presentation, demonstration and demonstration model represent the current view of Microsoft on the issues discussed as of the date of presentation and/or demonstration, and the duration of your access to the demonstration model. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of presentation and/or demonstration and for the duration of your access to the demonstration model.

No Microsoft technology, nor any of its component technologies, including the demonstration model, is intended or made available: (1) as a medical device; (2) for the diagnosis of disease or other conditions, or in the cure, mitigation, treatment or prevention of a disease or other conditions; or (3) as a substitute for the professional clinical advice, opinion, or judgment of a treating healthcare professional. Partners or customers are responsible for ensuring the regulatory compliance of any solution they build using Microsoft technologies.

© 2020 Microsoft Corporation. All rights reserved

In [1]:
import os
import time
import pickle

import numpy as np
import pandas as pd

import azureml.core
from azureml.core import Dataset, Datastore, Experiment, Workspace
from azureml.core.experiment import Experiment
from azureml.data import DataType
from azureml.data.datapath import DataPath
from azureml.train.automl import AutoMLConfig
from azureml.widgets import RunDetails
from azureml.core.compute import AmlCompute

import GlobalVariables

## Setting up the workspace

In [2]:
ws = Workspace.from_config()
ws

Workspace.create(name='mlw-healthcare-dev', subscription_id='6f6a71d2-83bb-42b0-9912-2e243ef214c4', resource_group='rg-healthcare-dev')

## Create new datastore for Datasets

In [3]:
blob_datastore_name=GlobalVariables.BED_OCCUPANCY_DATASTORE_NAME # Name of the datastore in workspace
container_name=GlobalVariables.GLOBAL_CONTAINER_NAME
account_name=GlobalVariables.STORAGE_ACCOUNT_NAME
account_key=GlobalVariables.STORAGE_ACCOUNT_KEY # Storage account access key

blob_datastore = Datastore.register_azure_blob_container(workspace=ws, 
                                                         datastore_name=blob_datastore_name, 
                                                         container_name=container_name, 
                                                         account_name=account_name,
                                                         account_key=account_key)

dstore = Datastore.get(ws, datastore_name=blob_datastore_name)

## Convert to Pandas DataFrame to do data preparation

In [4]:
filepath1 = GlobalVariables.BED_OCCUPANCY_INPUT_FILE_PATH
print(filepath1)


# Set the path to the storage account containing the file
datastore_path = [DataPath(dstore, filepath1)]
patientdataset = Dataset.Tabular.from_delimited_files(path=datastore_path)
patientdataset.take(5).to_pandas_dataframe()

/bedoccupancyv4.csv


Unnamed: 0,total_patients,Date,hospital_id
0,3,2015-12-16,2
1,1,2015-12-16,9
2,2,2015-12-16,1
3,2,2015-12-16,21
4,4,2015-12-17,9


In [5]:
file_path2 = GlobalVariables.TOTAL_BEDS_FILE_PATH

# Set the path to the storage account containing the file
datastore_path = [DataPath(dstore, file_path2)]
beddataset = Dataset.Tabular.from_delimited_files(path=datastore_path)
beddataset.take(10).to_pandas_dataframe()

Unnamed: 0,hospital__id,city,total_beds
0,1,Los Angeles,931
1,2,Chicago,931
2,21,Miami,438
3,9,Honolulu,303
4,27,Anchorage,250


In [6]:
patient=patientdataset.to_pandas_dataframe()
bed=beddataset.to_pandas_dataframe()
bed.columns = bed.columns.str.replace('__','_')

In [7]:
bed_occupancy_dataset=patient.merge(bed, on='hospital_id', how='left')
bed_occupancy_dataset

Unnamed: 0,total_patients,Date,hospital_id,city,total_beds
0,3,2015-12-16,2,Chicago,931
1,1,2015-12-16,9,Honolulu,303
2,2,2015-12-16,1,Los Angeles,931
3,2,2015-12-16,21,Miami,438
4,4,2015-12-17,9,Honolulu,303
...,...,...,...,...,...
9021,146,2020-11-28,9,Honolulu,303
9022,133,2020-11-28,27,Anchorage,250
9023,227,2020-11-28,21,Miami,438
9024,466,2020-11-28,2,Chicago,931


In [8]:
bed_occupancy_dataset['occupancy_rate'] = (np.divide(bed_occupancy_dataset['total_patients'], bed_occupancy_dataset['total_beds']))*100
bed_occupancy_dataset.head()

Unnamed: 0,total_patients,Date,hospital_id,city,total_beds,occupancy_rate
0,3,2015-12-16,2,Chicago,931,0.322234
1,1,2015-12-16,9,Honolulu,303,0.330033
2,2,2015-12-16,1,Los Angeles,931,0.214823
3,2,2015-12-16,21,Miami,438,0.456621
4,4,2015-12-17,9,Honolulu,303,1.320132


## Data Preparation for AutoML

In [9]:
df = bed_occupancy_dataset[['Date', 'occupancy_rate','city']]
df

Unnamed: 0,Date,occupancy_rate,city
0,2015-12-16,0.322234,Chicago
1,2015-12-16,0.330033,Honolulu
2,2015-12-16,0.214823,Los Angeles
3,2015-12-16,0.456621,Miami
4,2015-12-17,1.320132,Honolulu
...,...,...,...
9021,2020-11-28,48.184818,Honolulu
9022,2020-11-28,53.200000,Anchorage
9023,2020-11-28,51.826484,Miami
9024,2020-11-28,50.053706,Chicago


In [10]:
timeseries_df = df[['Date', 'occupancy_rate','city']]

### Split train and test datasets

In [11]:
date_cutoff = "2020-9-30"

In [12]:
train_df = timeseries_df[timeseries_df['Date'] <= date_cutoff]
train_df

Unnamed: 0,Date,occupancy_rate,city
0,2015-12-16,0.322234,Chicago
1,2015-12-16,0.330033,Honolulu
2,2015-12-16,0.214823,Los Angeles
3,2015-12-16,0.456621,Miami
4,2015-12-17,1.320132,Honolulu
...,...,...,...
8727,2020-09-30,1.396348,Chicago
8728,2020-09-30,3.424658,Miami
8729,2020-09-30,3.630363,Honolulu
8730,2020-09-30,2.000000,Anchorage


In [13]:
test_df = timeseries_df[timeseries_df['Date'] >= date_cutoff]
test_df

Unnamed: 0,Date,occupancy_rate,city
8727,2020-09-30,1.396348,Chicago
8728,2020-09-30,3.424658,Miami
8729,2020-09-30,3.630363,Honolulu
8730,2020-09-30,2.000000,Anchorage
8731,2020-09-30,0.859291,Los Angeles
...,...,...,...
9021,2020-11-28,48.184818,Honolulu
9022,2020-11-28,53.200000,Anchorage
9023,2020-11-28,51.826484,Miami
9024,2020-11-28,50.053706,Chicago


In [14]:
train_df_dict = {}
test_df_dict = {}

In [15]:
cities=train_df.city.unique()
print(cities)

['Chicago' 'Honolulu' 'Los Angeles' 'Miami' 'Anchorage']


In [16]:
local_data_folder = 'bed_occupancy_data/'
if not os.path.exists(local_data_folder):
    os.mkdir(local_data_folder)

In [17]:
def process_city(city):
    if " " in city:
        city2 = city.replace(" ", '_')
    else:
        city2 = city
    return city2

## Upload the train and test datasets to data store

In [18]:
for _,city in enumerate(cities):
    city2 = process_city(city)
    train_df_dict[city2]= train_df[train_df['city']== city][['Date', 'occupancy_rate']]
    test_df_dict[city2]= pd.date_range('2020-10-01', periods=92, freq='D').to_frame(index=False, name="Date")
    train_df_dict[city2].to_csv(f'{local_data_folder}occupancy_train_{city2}.csv', index=False)
    test_df_dict[city2].to_csv(f'{local_data_folder}occupancy_test_{city2}.csv', index=False)
    dstore.upload_files(
    files = [f'{local_data_folder}occupancy_train_{city2}.csv',f'{local_data_folder}occupancy_test_{city2}.csv'],
    relative_root = local_data_folder,
    target_path = '/',
    overwrite=True,
    show_progress=True
)

Uploading an estimated of 2 files
Uploading bed_occupancy_data/occupancy_train_Chicago.csv
Uploaded bed_occupancy_data/occupancy_train_Chicago.csv, 1 files out of an estimated total of 2
Uploading bed_occupancy_data/occupancy_test_Chicago.csv
Uploaded bed_occupancy_data/occupancy_test_Chicago.csv, 2 files out of an estimated total of 2
Uploaded 2 files
Uploading an estimated of 2 files
Uploading bed_occupancy_data/occupancy_train_Honolulu.csv
Uploaded bed_occupancy_data/occupancy_train_Honolulu.csv, 1 files out of an estimated total of 2
Uploading bed_occupancy_data/occupancy_test_Honolulu.csv
Uploaded bed_occupancy_data/occupancy_test_Honolulu.csv, 2 files out of an estimated total of 2
Uploaded 2 files
Uploading an estimated of 2 files
Uploading bed_occupancy_data/occupancy_train_Los_Angeles.csv
Uploaded bed_occupancy_data/occupancy_train_Los_Angeles.csv, 1 files out of an estimated total of 2
Uploading bed_occupancy_data/occupancy_test_Los_Angeles.csv
Uploaded bed_occupancy_data/occ

## Build the models in Azure AutoML

In [19]:
data_types = {
    'occupancy_rate': DataType.to_float(),
    'Date': DataType.to_datetime("%Y-%m-%d")
}

print(len(data_types))


# #### Load Training data from Storage Blob as a TabularDataSet

2


In [20]:
y_variable = "occupancy_rate"

In [22]:
compute = AmlCompute(ws, "health-cluster")

In [23]:
local_runs={}

In [24]:
for i,city in enumerate(cities):
    city2 = process_city(city)
    filepath = f'/occupancy_train_{city2}.csv'
    print(filepath)
    datastore_path = [DataPath(dstore, filepath)]
    traindataset = Dataset.Tabular.from_delimited_files(path=datastore_path, set_column_types=data_types)
    experiment_name = f'Bed_Occupancyv3_{city2}'
    print(experiment_name)
    print(traindataset)
    experiment = Experiment(ws, experiment_name)

    automl_config = AutoMLConfig(task = 'forecasting',
                         debug_log = 'automl_errors.log',
                         iteration_timeout_minutes = 15,
                         n_cross_validations=3,
                         experiment_timeout_minutes = 15,
                         label_column_name=y_variable,
                         time_column_name='Date',
                         enable_early_stopping=True,
                         compute_target = compute,
                         training_data = traindataset,
                         model_explainability=True)
    local_run = experiment.submit(automl_config, show_output = False)
    local_runs[city2]=local_run

/occupancy_train_Chicago.csv
Bed_Occupancyv3_Chicago
TabularDataset
{
  "source": [
    "('total_occupancy_prediction_store', 'occupancy_train_Chicago.csv')"
  ],
  "definition": [
    "GetDatastoreFiles",
    "ParseDelimited",
    "DropColumns",
    "SetColumnTypes",
    "SetColumnTypes"
  ]
}
Running on remote.
/occupancy_train_Honolulu.csv
Bed_Occupancyv3_Honolulu
TabularDataset
{
  "source": [
    "('total_occupancy_prediction_store', 'occupancy_train_Honolulu.csv')"
  ],
  "definition": [
    "GetDatastoreFiles",
    "ParseDelimited",
    "DropColumns",
    "SetColumnTypes",
    "SetColumnTypes"
  ]
}
Running on remote.
/occupancy_train_Los_Angeles.csv
Bed_Occupancyv3_Los_Angeles
TabularDataset
{
  "source": [
    "('total_occupancy_prediction_store', 'occupancy_train_Los_Angeles.csv')"
  ],
  "definition": [
    "GetDatastoreFiles",
    "ParseDelimited",
    "DropColumns",
    "SetColumnTypes",
    "SetColumnTypes"
  ]
}
Running on remote.
/occupancy_train_Miami.csv
Bed_Occupancy

In [30]:
print(local_runs)

Run(Experiment: Bed_Occupancyv3_Anchorage,
Id: AutoML_f939ef2f-35fb-4191-9260-260ece9031a6,
Type: automl,
Status: NotStarted)


_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [6]:
ws = Workspace.from_config()
blob_datastore_name=GlobalVariables.GLOBAL_DATASTORE_NAME
dstore = Datastore.get(ws, datastore_name=blob_datastore_name)
#ws_ds = ws.get_default_datastore()

print('Workspace Name: ' + ws.name, 
      'Resource Group: ' + ws.resource_group,
      'Default Storage Account Name: ' + dstore.account_name,
      'AzureML Core Version: ' + azureml.core.VERSION,
      sep = '\n')

Workspace Name: mlw-healthcare-dev
Resource Group: rg-healthcare-dev
Default Storage Account Name: sthealthcaredev001
AzureML Core Version: 1.19.0


In [32]:
list_models = {}

In [None]:
for _,city in enumerate(cities):
    city2 = process_city(city)
    print(city2)
    _,list_models[city2]=local_runs[city2].get_output()
       

datastore_path = [DataPath(dstore, filepath)]
traindataset = Dataset.Tabular.from_delimited_files(path=datastore_path, set_column_types=data_types)
traindataset.to_pandas_dataframe().info()

Chicago
Honolulu
Los_Angeles
Miami
Anchorage


## Upload predictions to storage account

In [None]:
test={}

In [None]:
for _,city in enumerate(cities):
    city2 = process_city(city)
    filepath = f'/occupancy_test_{city2}.csv'
    print(filepath)
    datastore_path = [DataPath(dstore, filepath)]
    testdataset = Dataset.Tabular.from_delimited_files(path=datastore_path, set_column_types=data_types)
    test_df=testdataset.to_pandas_dataframe()
    predictions = list_models[city2].predict(test_df)
    test[city2]= test_df
    test[city2]['occupancy_rate'] = predictions
    test[city2]['forecasted'] = True
    test[city2]['city'] = city
    test[city2] = test[city2][['Date','city','occupancy_rate','forecasted']]

## Upload predictions to storage account

In [None]:
test_df_list= [test[process_city(city)] for _,city in enumerate(cities)]  # List of your dataframes
master_test_df=pd.concat(test_df_list)
master_test_df.sort_values('Date',inplace=True)

In [None]:
master_test_df

In [None]:
train_df= train_df[['Date','city','occupancy_rate']]
train_df['forecasted']='False'
train_df

In [None]:
master_df = pd.concat([train_df,master_test_df])

In [None]:
master_df

In [None]:
master_df.to_csv(local_data_folder+'bed_occupancy_forecastedv7.csv',index=False)

In [None]:

local_files = [local_data_folder + 'bed_occupancy_forecastedv7.csv']
print(local_files)

dstore.upload_files(
    files = local_files,
    relative_root = local_data_folder,
    target_path = '/',
    overwrite=True,
    show_progress=True
)