In [1]:
import pandas as pd
import numpy as np

### Estimating the number of cases in a region of interest

The most interesting quantity for the mission of Opendemic is the number of cases in a Region Of Interest (ROI). In particular, this region will be a disk centered at the user's coordinates with a fixed radius. If one day we will have data with high spatial resolution, the disk will become a bad approximation fairly quickly. For this reason, the analysis that will be presented in the next paragraphs works for a general definition of ROI which just needs to be $L_1$ measurable, namely we want to be able to know its surface.

In this example we will consider the case of a user located in New York City (a.k.a. *Region*) and we will use data from the following link.

In [2]:
url = 'https://github.com/beoutbreakprepared/nCoV2019/blob/master/dataset_archive/covid-19.data.2020-03-31T012540.csv?raw=true'
df = pd.read_csv(url)
nyc_cases = df[df.city == 'New York City']
nyc_cases

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,ID,age,sex,city,province,country,wuhan(0)_not_wuhan(1),latitude,longitude,geo_resolution,...,date_death_or_discharge,notes_for_discussion,location,admin3,admin2,admin1,country_new,admin_id,data_moderator_initials,travel_history_binary
4677,000-1-13723,39,female,New York City,New York,United States,1.0,40.661,-73.944,point,...,,,,,,New York,United States,33,,
10734,000-1-18851,,,New York City,New York,United States,1.0,40.661,-73.944,point,...,,,,,,New York,United States,33,,
10735,000-1-18852,,,New York City,New York,United States,1.0,40.661,-73.944,point,...,,,,,,New York,United States,33,,
10736,000-1-18853,,,New York City,New York,United States,1.0,40.661,-73.944,point,...,,,,,,New York,United States,33,,
11542,000-1-19574,,,New York City,New York,United States,1.0,40.661,-73.944,point,...,,,,,,New York,United States,33,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
38019,000-1-455,,,New York City,New York,United States,1.0,40.661,-73.944,point,...,,,,,,New York,United States,33,,
38032,000-1-456,,,New York City,New York,United States,1.0,40.661,-73.944,point,...,,,,,,New York,United States,33,,
38043,000-1-457,,,New York City,New York,United States,1.0,40.661,-73.944,point,...,,,,,,New York,United States,33,,
38055,000-1-458,,,New York City,New York,United States,1.0,40.661,-73.944,point,...,,,,,,New York,United States,33,,


### Prior knowledge on the data
We will assume to know the surface and the population of NYC.

In [3]:
# Data from https://en.wikipedia.org/wiki/New_York_City
region_surface = 783.84 * 1e6 # square meters
region_density = 10715 * 1e-6 # people per square meter

Now, we want to know the rate of infected people in the area.
The estimate can be as sophisticated as we want, but the first as-simple-as-wrong guess is that it is equal to the number of positive tests in the region divided by the total number of performed tests in the region.
This estimate is agnostic to all the asymptomatic non-tested infected cases and assumes that all symptomatic cases have been tested.
I will use the data available at [this link](https://www.vox.com/2020/3/26/21193848/coronavirus-us-cases-deaths-tests-by-state), which claim to be provided by *COVID Tracking Project, Census Bureau* and to be updated at March 30.
I don't know how reliable they are, but they will still be useful for our proof of concept.

In [4]:
positive_test = 59513
total_test = 172360
empirical_infected_rate = positive_test / total_test

In [5]:
def correct_infected_rate(eir, **kwargs):
    """
    Function that computes the corrected rate of infected people.
    
    For the moment, this function estimates the infected rate as the
    empirical infected rate (eir). More sophisticated definitions
    can be employed by changing this function. See for example the
    paper about the Diamond Princes or about the Italian isolated town.
    
    Args:
        eir: float empirical infected rate
    Returns:
        float Corrected infected rate
    """
    return eir

In order to retrieve the number of *cases around me* (CAM), we need to know the number of *people around me* (PAM) and the *infected rate* (IR), then we can compute it as follows.
$$
CAM = PAM \times IR
$$

The PAM estimate can be obtained by multiplying the population density (PD) in the region by the surface of the ROI.

$$
PAM = PD \times \text{surface}(ROI)
$$

In this way, the CAM can be directly computed from the available quantities.

### Definition of ROI
As we said, the simplest region of interest that we can consider is the disk centered at the user's coordinates with a fixed radius, which in this example will be equal to $1$ kilometer.

In [6]:
def surface_roi(r=1000):
    """
    Function that returns the surface of the considered ROI.
    
    For the moment, the considered ROI is a disk of given radius.
    
    Args:
        r: float Radius of the considered disk in meters. (Default: 1000 m)
    Returns:
        float Surface of the ROI.
    """
    return np.pi * r * r

In [7]:
pam = region_density * surface_roi()
cam = pam * correct_infected_rate(empirical_infected_rate)
print('Cases around me: ', cam)

Cases around me:  11622.977735553215
