# see19 Guide

**A dataset and interface for visualizing and analyzing the epidemiology of Coronavirus Disease 2019 aka SARS-CoV-2 aka COVID19 aka C19**

Find it on [GitHub](https://github.com/ryanskene/see19)

# 4. Visualizating Regional Impacts

`CaseStudy` has a `comp_chart` attribute, which is an instance of the `CompChart2D` class and provides categorical time-series charts comparing various regions on a single `comp_category`.

`CompChart2D` object utilizes `Bokeh` for chart creation.

Charts are available in **multi-line** and **bar** format with optional overlay of a second factor on a separate y-axis.

## 4.1 Daily Fatalities Comparison - Italy

We will illustrate with an example, focusing on only the top 5 most impacted regions in Italy.

In [1]:
# required to display Bokeh charts in Jupyter Notebooks
from bokeh.io import output_notebook, show
output_notebook()

In [2]:
from see19 import CaseStudy, get_baseframe
baseframe = get_baseframe()

In [3]:
itaregions = list(baseframe[baseframe['country'] == 'Italy'] \
    .sort_values(by='deaths', ascending=False).region_name.unique())[:5]

casestudy = CaseStudy(baseframe, regions=itaregions, start_hurdle=3, start_factor='deaths')

When `CaseStudy` is instantiated, `comp_chart` is also instantiated with its own attributes.

In [4]:
print (casestudy.comp_chart)

<see19.charts.CompChart2D object at 0x124ef5a90>


In particular, all the various `comp_categories` and `factors` are automatically provided labels via the `label` attribute. There tens of labels but some are show below for illustration purposes.

In [8]:
for k,v in casestudy.comp_chart.labels.items():
    print ('{}: {}'.format(k, v))
    if k == 'temp':
        break

max_days: Days Since 3 Deaths Until Max Fatality Rate
deaths: Days Since 3 Deaths
deaths_new_dma_per_1M: Daily Deaths per 1M (3DMA)
deaths_per_1M: Cumulative Deaths per 1M
deaths_new_dma_per_1M_lognat: Daily Deaths per 1M (3DMA)
(Natural Log)
deaths_new_dma_per_person_per_land_KM2: Daily Deaths / Person / Land KM² (3DMA)
deaths_new_dma_per_person_per_city_KM2: Daily Deaths / Person / City KM² (3DMA)
deaths_new_dma_per_person_per_city_KM2_lognat: Daily Deaths / Person / Land KM² (3DMA)
(Natural Log)
deaths_per_person_per_land_KM2: Total Deaths / Person / Land KM² (3DMA)
deaths_per_person_per_city_KM2: Total Deaths / Person / City KM² (3DMA)
cases_new_dma_per_1M: Daily Cases per 1M (3DMA)
cases_new_dma_per_1M_lognat: Daily Cases per 1M (3DMA)
(Natural Log)
cases_new_dma_per_person_per_city_KM2: Total Cases / Person / City KM² (3DMA)
cases_new_dma_per_person_per_land_KM2: Total Cases / Person / Land KM² (3DMA)
cases_new_dma_per_person_per_city_KM2_lognat: Total Cases / Person / City KM² (

#### make

Charts are rendered with the `make` method. 

The basic `comp_chart` is structured as the `comp_category` on the y-axis with the number of days from the `start_hurdle` on the x-axis.

`comp_category` is the major keyword, which defaults to `deaths_new_dma_per_1M`

`make` accepts many optional kwargs. Below we see that `height` and `width` can be customized and that the positioning of the line labels can be adjusted using `label_offsets`

In [21]:
kwargs = {
    'width': 725, 'height': 500,
    'label_offsets': {
        'Liguria': {'x_offset': -170, 'y_offset': -5},
        'Veneto': {'x_offset': 0, 'y_offset': -20},
        'Piemonte': {'x_offset': -20, 'y_offset': -25},
        'Emilia-Romagna': {'x_offset': -5, 'y_offset': -22},
        'Lombardia': {'x_offset': -10, 'y_offset': 10},
    },    
}
p = casestudy.comp_chart.make(comp_type='multiline', **kwargs)
show(p)

After `make` is called, many other attributes are available on the `casestudy.comp_chart` instance, including a dataframe tailor-made to the chart

In [22]:
casestudy.comp_chart.df_comp.head(2)

Unnamed: 0,region_id,country_id,region_name,country_code,country,date,cases,deaths,population,land_KM2,...,growth_deaths_new_dma_per_1M,growth_deaths_new_dma_per_person_per_land_KM2,growth_deaths_new_dma_per_person_per_city_KM2,growth_cases_per_1M,growth_cases_per_person_per_land_KM2,growth_cases_per_person_per_city_KM2,growth_deaths_per_1M,growth_deaths_per_person_per_land_KM2,growth_deaths_per_person_per_city_KM2,days
36017,34,110,Veneto,ITA,Italy,2020-03-03 00:00:00+00:00,307.0,3.0,4821683.0,14681.071027,...,,,,1.124542,1.124542,1.124542,1.5,1.5,1.5,0 days
36018,34,110,Veneto,ITA,Italy,2020-03-04 00:00:00+00:00,360.0,6.0,4821683.0,14681.071027,...,4.0,4.0,4.0,1.172638,1.172638,1.172638,2.0,2.0,2.0,1 days


As mentioned, `comp_chart` can accept additional keywords. Here we change `comp_category`, `palette_base`, increase the axis label size, and narrow the `regions` further.

We also add a custom title.

In [24]:
from bokeh.palettes import Category20b
kwargs['regions'] = ['Lombardia', 'Piemonte']
kwargs['comp_category'] = 'deaths_per_person_per_land_KM2'
kwargs['palette_base'] = Category20b[20]
kwargs['x_fontsize'] = 14
kwargs['y_fontsize'] = 14
kwargs['label_offsets'] = {
    'Lombardia': {'x_offset': -10, 'y_offset': 0},
}
kwargs['title'] = 'Comparison of Cumulative Fatalities adjusted for Density as of May 3'
p = casestudy.comp_chart.make(comp_type='multiline', **kwargs)
show(p)

## 4.2 Daily Fatalities Comparison - 10 Most Impacted Regions

Now we'll look at new cases in the 10 most impacted regions globally.

In [27]:
regions = list(baseframe[~(baseframe['region_name'] == 'Hubei')] \
    .sort_values(by='cases', ascending=False).region_name.unique())[:7]
casestudy = CaseStudy(baseframe, regions=regions, start_hurdle=3, start_factor='deaths', count_dma=7, lognat=True)

In [28]:
kwargs ={
    'title': 'Comparison of Population Adjusted Daily Cases in Top 10 Most Impacted Regions Excluding Hubei',
    'width': 925,
    'palette_base': Category20b[20]
}
p = casestudy.comp_chart.make(comp_category='cases_new_dma_per_1M', comp_type='multiline', **kwargs)
show(p)

The above is a cluttered mess to the point where I can't even be bothered to fix the labels. 

There are two clear regions separating themselves and the remaining regions are so far behind that it makes it difficult to read the chart. This is where `lognat` can come in handy.

In [33]:
kwargs ={
    'title': 'Comparison of Population Adjusted Daily Cases in Top 10 Most Impacted Regions Excluding Hubei',
    'width': 925,
    'palette_base': Category20b[20],
    'label_offsets': {
        'New Jersey': {'x_offset': -5, 'y_offset': 5},
        'New York': {'x_offset': 20, 'y_offset': -20},
        'France': {'x_offset': 10, 'y_offset': 10},
        'Germany': {'x_offset': 0, 'y_offset': -20},
    }, 
}
p = casestudy.comp_chart.make(comp_category='cases_new_dma_per_1M_lognat', comp_type='multiline', **kwargs)
show(p)

## 4.3 Factor Comparison: Oxford Government Tracker

**Oxford Stringency Index from day of 1st Death**

The comparison chart can be used to compare any time-dynamic variable provided by the `factor` attribute.

The below chart compares the Oxford Stringency Index for each selected region

In [34]:
regions = ['Germany', 'Spain', 'Taiwan']

casestudy = CaseStudy(
    baseframe, count_categories='cases_new_per_1M', regions=regions, 
    start_hurdle=10, start_factor='cases', factors=['strindex']
)
kwargs = {
    'width': 825, 'height': 500,
    'palette_base': Category20b[20],
    'label_offsets': {
        'Taiwan': {'x_offset': 10, 'y_offset': 5},
    },
}
p = casestudy.comp_chart.make(comp_category='strindex', comp_type='multiline', **kwargs)
show(p)

## 4.4 MultiBar Comparison

Staggered bar charts are available by selecting `v-bar` for the `comp_type` parameter. `v-bar` is also the default parameter. The chart staggers the outcomes for individual regions on each day around the midpoint.

can provide more clear point-to-point comparisons between two or three regions. If you are interested in more than three regions, the 4D Bar chart is recommended.

Options for tinkering with the color scheme are also shown.

**NOTE:** Despite its low score on the Oxford Stringency Index, Taiwan ***doesn't even register on the page*** and so has been removed from the comparison.

In [35]:
from bokeh.palettes import Spectral
regions = ['Germany', 'Spain']

casestudy = CaseStudy(baseframe, regions=regions, factors='strindex', count_dma=14, start_hurdle=10, start_factor='cases', lognat=True)

kwargs = {
    'width': 825, 'height': 500,
    'palette_base': Spectral[11],
    'palette_flip': True,
    'palette_shift': 3,
}

p = casestudy.comp_chart.make(comp_category='cases_new_dma_per_1M', **kwargs)
show(p)

Via the `overlay` parameter, an optional second factor can be added to the right y-axis for understanding related variables.

Here we compare based on ***density***.

We can see from below that, despite having a lower `stringency index` overall and seemingly reacting slower to to the onset of the outbreak, Germany has managed to maintain materially lower case counts.

The impacts of testing should be investigated.

In [36]:
regions = ['Germany', 'Spain']

kwargs = {
    'width': 825, 'height': 500,
    'legend_location': 'top_left',
    'palette_base': Spectral[11],
    'palette_flip': True,
    'palette_shift': 3,
    'overlay': 'strindex',  
}

p = casestudy.comp_chart.make(comp_category='cases_new_dma_per_person_per_land_KM2', **kwargs)
show(p)

### Saving Files

All chart instances in `see19` have a `save_file` option. Simply set that option to `True` and provide a `filename` and the file will be saved to yor location of choice.

# Next Section

Click on this link to go to the next notebook: [5. Visualizing Factors in 4D](https://ryanskene.github.io/see19/guide/5.%20See19%20-%20Visualizing%20Factors%20in%204D.html)