---

_Proprietary and Confidential. Do not distribute without permission._

---

# Restart Partners  

---
_To:_ Jun Amora (Mayors Office of the City of New York) 
_From:_ Bharat Shyam, Rich Tong  (Restart US)
_Re:_ Analysis for NYC PPE needs  
_Date:_ 20 May 2020  

--- 

New York City needs a 90-day stockpile for the heathcare workers, first responders and congregate care facilities is really important, but coming up with an estimate for this is difficult given the variability of the infection and the uncertainty in the degree of economic recovery and social mobility. Therefore, we are providing another projection to augment yours that shows that our figures are within 30%-50% of your bottoms estimate. Given that we are happy to:

- _Refine healthcare estimates_. All models are heavily dependent on estimates of population involved and usage data. 
- _Non-healthcare estimates_. For instance, this model does project needs outside of the healthcare area such as small business, vertical industries and vulnerable populations.
- _Long-term modeling_. We are extending the model to include test equipment, disinfectant wipes and liquid disinfectants, so happy to add things that you need. Also we will be integrating epi and economic models too and would love to partner with you on that.

Given the uncertainties involved, this might help you make the right estimates. What follows next are:

1. Disclaimer. This is not a definitive estimate. You should use other sources and information to make your decisions.
2. Data Sources. We have included the model source data, how the model is constructed and then results. Feel free to use this data and modify as appropriate, but it serves as documentation for all the assumptions made.
3. Model. The way the calculation is done with assumptions and resulting projections
4. Outputs. The conclusions we can draw from the projection.

## Disclaimer
It must be noted that the Restart Partners ("Restart") Equipment Model (the "Model") is made available for public use free of charge. Determining equipment needs for each jurisdiction, entity or other party (each a "User") is a complex and multifaceted decision process. Restart does not does not have the authority or ability to assign empirical risks levels nor make definitive use decisions for any User. Rather, the Model provides one approach to making recommendations that can help Users make decisions about their potential equipment uses by allowing them to calculate their potential requirements. Users are strongly encouraged to consider other sources of information and expressly disclaim any cause of action they may have against Restart arising from or relating to the Model or its analysis. Implementing the equipment levels projected by the Model will not eliminate the risk of COID-19 cases being linked to activites in an economy or workplace. In this context, it is important to note that this equipment alone will not eliminate the risk of infection. All Users should remain informed about and abide by any decisions made by local public health and government authorities regarding specific mitigation efforts, including equipment in the model, as the situation is dynamic.

# Model Data

Because we do not have New York City specific data, we used various open data sources to fill in the five major assumptions in the model:

1. Usage by Population. This cuts the item usage per person per day. This right now is a series of levels. So we have four levels for civilians and then two levels for healthcare workers.
2. Usage per Patient. This is the way Epidemiological models work. That is, given a number of patients, calculate how much they will need. The model currently uses the [WHO Surge Essential Supplies Forecasting Tool v1.2](https://www.who.int/emergencies/diseases/novel-coronavirus-2019/technical-guidance/covid-19-critical-items) and estimates the entire US population use with 1,000 cases and fast transmission and slow response. So this is a very pessimistic scenario. This makes sense when calculating the surge estimates.

In [None]:
# Get libraries
import pandas as pd
import numpy as np

# Strategy for the Model

The next steps are a little complicated, but we are converting the entire model into a series of vectorized operations. 

## Daily Usage of Equipment by Protection levels D[l, n]

So first we need the inputs which are the list of protection levels p x the number of items we are tracking n. So this is an p x n matrix where each entry says for protection level l, we have so many items per capita per day. So the first data list is Daily_usage D = l rows x n columns or D.shape = ( l, n )

In Python speak this is Usage_pd since it is a _P_anda _D_ataframe. Right now this is a test matrix that is constructed in the Jupyter Notebook, but longer term, it should be pulled from an database with the suitable annotations.

## Sub-Population Count vector P[p, 1]

In the original model, we had p sub-populations. In the simplest model, it was just two populations: non-employees of healthcare companies and employees. With SOC and other codes, there are close to a thousands. So the populations are a vector that includes a description and a population count.

So for example P = [ 7400, 435 ] which is just about the right numbers for Seattle and is a [ 1, p ] vector

## Sub-Population Usage of Protection Levels U[p, l]

You can think of this as a one-hot matrix in the most simple form. That is for each subpopulation, what level of protection do you need. 

For example, in the simplest case, if there are six protection levels, then if non-employees get level 1 and healthcare employees get level 6, then the matrix looks like:

     0 1 0 0 0 0 0
     0 0 0 0 0 0 1

While date entry is complicated, this let's you take any given population and give it fractions of protection. For instance, if a healthcare employer typically had 50% of it's workers as office workers at level 2, 25% as customer facing and 10% taking care of non COVID and 15% in direct contact, that vector would look like:

    0 0.5 0.25 0 0 0.10 0.15

This gives the modeler great flexibility with employers or any population

The matrix of usage U looks like p rows and l columns

    Usage_pd.shape = [ p, l ]

## Required Equipment per capita per sub-population R[p, n]

So with this, you can see that with a single operation you can get to the actual equipment levels require per person per day for a given population. Note that there is new Python 3.5 syntax for [matrix multiply](https://docs.python.org/3/whatsnew/3.5.html#whatsnew-pep-465)

    U x D = [ p, l ] x [l, n] = R[p, n]
    # in the new Python 3.5 syntax using ampersand
    # np.dot for matrices but not tensors
    R = U @ D
    R = U.dot(D)

In Python Numpy speak, we are doing a matrix [multiply](https://www.tutorialexample.com/understand-numpy-np-multiply-np-dot-and-operation-a-beginner-guide-numpy-tutorial/)

## Total required equipment for a sub-population T[p, n]

Now that we have the per-capita requirements, we need to do a scalar multiply by row

    R[p, n] x P[p, 1] = T[p, n]
    # Or in python using broadcasting which extends P out n columns
    T = R * P

## Merging populations to with Essential index by population E[p, e]

Many times the subpopulations are going to be too large to understand. For instance when there are 800 job classfied by SOC or where there are 350 employer class by NAICS-6, so for convenience, we define essential levels. You can think of the of this as for each population, where do they fit in where they start. Essential (which has changed since version 1.x) can be thought of as the time period of start. So Essential 0 (like Defcon 1), is the most important and so forth. 

This let's you stage start up, so for the example subpopulations of non-healthcare employed and heathcare employed, it might look like a simple matrix across 6 start periods as or more analytically E is p rows and e columns

    0 0 0 0 0 0 1
    1 0 0 0 0 0 0

But this system also allows a stageed restart, so for example, if you want have the workers to come back in the next period for healthcare employees

    0 0 0 0 0 0.5 0.5
    0.5 0.5 0 0 0 0

## Requirements by essential index R[e, n]

In some sense we are doing compression by this, so we are looking at Essential index e is much less than the number of populations p. Or more succinctly e << p and we can get to E with a transpose

    E[p, e].T x T[p, n] = e x p * p x n = R[e, n]
    # In python this looks like
    R = E.T @ T

## The easy parts Required Cost per day RC[e, n] and Stockpile S[e, n]

OK that was the hard stuff, with these matrices reduced to essential levels and the equipment needed for each, there are just some simple scalar multiplies to get where Cost is a row vector that is all the costs

    Gross cost for the equipment = RC[e, n] = R[e, n] * C[1, n]
    Stockpile needed for d Days = S[e, n] = R[e, n] * d

Obviously you may not want to stock pile for all e Essential levels, so you just select what you want for instance S[0] will give you the stockpile needs for the most essential level 0.

## Handling reuse and disinfection and reuse by essential level RE[e, n]

This is a little tricky, but basically there is a vector that describes which items can be disinfected by what class. That is because some populations are easier to disinfect than others so this is a matrix of the major essential levels.

For simplicity, we don't model it against every sub-population, but we could. So the Reuse by essential looks like

    







In [None]:

# Eventually we will do this from a database import, but for now, let's use
# the data that is normally in the Excel sheet and just recreate 
# https://colab.research.google.com/drive/1Bcx54NQePYt88RWWmODrRA1pxz-2tnNW?authuser=5#scrollTo=1xwe8g08yRbG

# Using PEP https://www.python.org/dev/peps/pep-0008/
# For simplicity do as a dictionary
usage_names = ['Protection Name',
               'Source of Data',
               'N95 Surgical Respirator',
               'N95 Mask',
               'ASTM 3 Surgical Mask',
               'ASTM 1-2 Surgical Mask',
               'Face Shield',
               'Gown',
               'Gloves',
               'Shoe Covers',
               'Test Kits',
               ''
               ]
print(usage_names)
usage_data = {'Source': [ 'WA0', 'WA1', 'WA2', 'WA3', 'WA4', 'WA5', 'WA6'],
              'N95Surgical' : [ 0, 0, 0, 0, 0, 0, 1.18],
              'N95' :[ 0, 0, 0, 0, 0.05, 0.10, 0],
              }

print(usage_data)

usage_pd = pd.DataFrame(usage_data)
# use these counts to check the matrix vector bugs
pop_count = usage_pd.shape[0]
item_count = usage_pd.shape[1]
print('usage_pd shape is ', usage_pd.shape,
      'population count is ', pop_count,
      'item count is ', item_count)

print(usage_pd)

## Get the population data

Start with the simplest assumption, two populations, one that is `WA6` and one that is `WA2`

In [None]:
# This is a dummy test case, later we will use extraction first form a
# spreadsheet and then eventually from a data store that is reliable
# And which has revision control
population_data = { 'pop_name' : ['Total less healthcare employees', 'Employees of healthcare companies'],
                    'population' : [7179.6, 735.2]}

print(population_data)

pop_pd = pd.DataFrame(population_data)
print(pop_pd)

# Estimate usage by population with a matrix multiply

Now we have a vector which are the population usages and we have a list of needs, so we need to do a matrix multiply of population by needs

In [None]:
# Now we need a matrix which is the pop_type x usage_type and the coefficient is just how much is needed for each
pop_usage_matrix = np.zeros()

# Attachment: Test Code

Used to test various features of the notebook

## Test of cloning an external Repo

NOte that this does a complete clone in the virtual machine, make sure you have enough space. Also you need to reclone when you close a Notebook instance, so this can be slow with lots of data.

However, it does allow you checkout particular branches and have a realiable dataset.

In [None]:
# Clone the entire repo.
!git clone -l -s git://github.com/jakevdp/PythonDataScienceHandbook.git cloned-repo
%cd cloned-repo
!ls

## Test of copying a single file from a repo

This one way to get small datasets, you just point to the raw file and use `!curl` to bring it into the machine.

In [None]:
# Fetch a single <1MB file using the raw GitHub URL.
!curl --remote-name \
     -H 'Accept: application/vnd.github.v3.raw' \
     --location https://api.github.com/repos/jakevdp/PythonDataScienceHandbook/contents/notebooks/data/california_cities.csv

## Test of connecting to Google Drive

This we can use if we don't need a repo, but are just loading a static file. We normally want everything from a repo or reliable storage, but this is good for quick analysis. In most cases, you should just check this into a repo and then use the github raw extract instead so you get version control.

Note that this does require an authentication everytime you start the Notebook, so the raw extract works better particularly if there it is a public repo.

In [None]:
from google.colab import drive
drive.mount('/gdrive')

In [None]:
with open('/gdrive/My Drive/foo.txt', 'w') as f:
  f.write('Hello Google Drive!')
!cat '/gdrive/My Drive/foo.txt'

## Connecting two cells together for summaries with Cross-output Communications

This is the best method for connecting the longer analysis to a cell that just has the executive summary data. _This does not appear to be working. Need to debug_

In [None]:
%%javascript
const listenerChannel = new BroadcastChannel('channel');
listenerChannel.onmessage = (msg) => {
  const div = document.createElement('div');
  div.textContent = msg.data;
  document.body.appendChild(div);
};

In [None]:
%%javascript
const senderChannel = new BroadcastChannel('channel');
senderChannel.postMessage('Hello world!');

## Creating forms for entry

This is going to be used to parameterize models. This sets global variables that can be used in cells farther down.

In [None]:
#@title Example form fields
#@markdown Forms support many types of fields.

no_type_checking = ''  #@param
string_type = 'example'  #@param {type: "string"}
slider_value = 142  #@param {type: "slider", min: 100, max: 200}
number = 102  #@param {type: "number"}
date = '2010-11-05'  #@param {type: "date"}
pick_me = "monday"  #@param ['monday', 'tuesday', 'wednesday', 'thursday']
select_or_input = "apples" #@param ["apples", "bananas", "oranges"] {allow-input: true}
#@markdown ---


## Display Pandas data dataframes use Vega datasets as an example

This uses the extension `google.colab.data_table` and there is a default data set called `vega_datasets` where you can extract data. It is not clear where the data is or how to figure out how ot use it. Google-fu does not help although the [source code](https://github.com/googlecolab/colabtools/blob/master/google/colab/data_table.py) tells us that `vega_dataset` has airport data in it.

But the hing is in the name Vega which is a visualization package and [Vega Datasets access from Python](https://github.com/jakevdp/vega_datasets) are a standard set of data for visualization testing. The core datasets are kept in [github.io](https://vega.github.io/vega-datasets/).

In [None]:
%load_ext google.colab.data_table

In [None]:
from vega_datasets import data
data.cars()

In [None]:
%unload_ext google.colab.data_table

In [None]:
data.stocks()

## Github Rendering of Jupyter

This is pretty cool, but [Github](https://help.github.com/en/github/managing-files-in-a-repository/working-with-jupyter-notebook-files-on-github) actually renders the Jupyter notebooks as statis HTML when you browse it. That means just clicking on a `.ipynb` will give you something reasonable. It is not interactive nor is anything running behind it, but it does mean that documents produced by use are easily readable.