# see19 Guide

**A dataset and interface for visualizing and analyzing the epidemiology of Coronavirus Disease 2019 aka SARS-CoV-2 aka COVID19 aka C19**

Find it on [GitHub](https://github.com/ryanskene/see19)

Current with version 0.3.0.

# 4. Visualizating Regional Impacts

4.1 [Daily Fatalities Comparison - Italy](#section4.1)  
4.2 [Daily Fatalities Comparison - 10 Most Impacted Regions](#section4.2)  
4.3 [Factor Comparison: Oxford Government Tracker](#section4.3)  
4.4 [4.4 MultiBar Comparison](#section4.4)

`CaseStudy` has a `comp_chart` attribute, which is an instance of the `CompChart2D` class and provides categorical time-series charts comparing various regions on a single `comp_category`.

`CompChart2D` object utilizes `Bokeh` for chart creation.

Charts are available in **multi-line** and **bar** format with optional overlay of a second factor on a separate y-axis.

<h2><a id='section4.1'>4.1 Daily Fatalities Comparison - Italy</a></h2>

We will illustrate with an example, focusing on only the top 5 most impacted regions in Italy.

In [1]:
from bokeh.io import output_notebook, show
output_notebook()

In [2]:
from see19 import CaseStudy, get_baseframe
baseframe = get_baseframe()

[*********************100%*************************] Downloading ... COMPLETE

In [3]:
itaregions = baseframe[baseframe['country'] == 'Italy'] \
    .sort_values(by='deaths', ascending=False).region_name.unique().tolist()[:3]

casestudy = CaseStudy(baseframe, regions=itaregions, start_hurdle=3, start_factor='deaths')

When `CaseStudy` is instantiated, `comp_chart` is also instantiated with its own attributes.

In [4]:
print (casestudy.comp_chart)

<see19.charts.CompChart2D object at 0x124381650>


In particular, all the various `comp_categories` and `factors` are automatically provided labels via the `label` attribute. There are many labels but some are show below for illustration purposes.

In [5]:
for k,v in casestudy.comp_chart.labels.items():
    print ('{}: {}'.format(k, v))
    if k == 'temp':
        break

cases_dma: Cumulative Cases (3DMA)
cases_new: Daily Cases
cases_new_dma: Daily Cases (3DMA)
deaths_dma: Cumulative Deaths (3DMA)
deaths_new: Daily Deaths
deaths_new_dma: Daily Deaths (3DMA)
tests_dma: Cumulative Tests (3DMA)
tests_new: Daily Tests
tests_new_dma: Daily Tests (3DMA)
cases: Cumulative Cases
deaths: Cumulative Deaths
tests: Cumulative Tests
cases_dma_per_1M: Cumulative Cases per 1M (3DMA)
cases_dma_per_person_per_land_KM2: Cumulative Cases / Person / Land KM² (3DMA)
cases_dma_per_person_per_city_KM2: Cumulative Cases / Person / City KM² (3DMA)
cases_new_per_1M: Daily Cases per 1M
cases_new_per_person_per_land_KM2: Daily Cases / Person / Land KM²
cases_new_per_person_per_city_KM2: Daily Cases / Person / City KM²
cases_new_dma_per_1M: Daily Cases per 1M (3DMA)
cases_new_dma_per_person_per_land_KM2: Daily Cases / Person / Land KM² (3DMA)
cases_new_dma_per_person_per_city_KM2: Daily Cases / Person / City KM² (3DMA)
deaths_dma_per_1M: Cumulative Deaths per 1M (3DMA)
deaths_dma_

### make()

Charts are rendered with the `make` method. 

The basic `comp_chart` is structured as the `comp_category` on the y-axis with the number of days from the `start_hurdle` on the x-axis.

`comp_category` is the major keyword, which defaults to `deaths_new_dma_per_1M`

`make` accepts many optional kwargs. Below we see that `height` and `width` can be customized and that the positioning of the line labels can be adjusted using `label_offsets`

In [6]:
kwargs = {
    'width': 725, 'height': 500,
    'label_offsets': {
        'Piemonte': {'x_offset': -175, 'y_offset': 10},
        'Emilia-Romagna': {'x_offset': -150, 'y_offset': -5},
        'Lombardia': {'x_offset': -275, 'y_offset': 165},
    },    
}
p = casestudy.comp_chart.make(comp_type='multiline', **kwargs)
show(p)

After `make` is called, many other attributes are available on the `casestudy.comp_chart` instance, including a dataframe tailor-made to the chart

In [7]:
casestudy.comp_chart.df_comp.head(2)

Unnamed: 0,region_id,country_id,region_code,region_name,country_code,country,date,cases,deaths,tests,...,growth_cases_per_1M,growth_cases_per_person_per_land_KM2,growth_cases_per_person_per_city_KM2,growth_deaths_per_1M,growth_deaths_per_person_per_land_KM2,growth_deaths_per_person_per_city_KM2,growth_tests_per_1M,growth_tests_per_person_per_land_KM2,growth_tests_per_person_per_city_KM2,days
23374,36,110,LOM,Lombardia,ITA,Italy,2020-02-27,403.0,14.0,,...,1.562016,1.562016,1.562016,1.555556,1.555556,1.555556,,,,3 days
23375,36,110,LOM,Lombardia,ITA,Italy,2020-02-28,531.0,17.0,,...,1.317618,1.317618,1.317618,1.214286,1.214286,1.214286,,,,4 days


As mentioned, `comp_chart` can accept additional keywords. Here we change `comp_category`, `palette_base`, increase the axis label size, and narrow the `regions` further.

We also add a custom title.

In [8]:
from bokeh.palettes import Category20b
kwargs['regions'] = ['Lombardia', 'Piemonte']
kwargs['comp_category'] = 'deaths_per_person_per_land_KM2'
kwargs['palette_base'] = Category20b[20]
kwargs['x_fontsize'] = 14
kwargs['y_fontsize'] = 14
kwargs['label_offsets'] = {
    'Lombardia': {'x_offset': -10, 'y_offset': 0},
}
kwargs['title'] = 'Comparison of Cumulative Fatalities adjusted for Density as of May 3'
p = casestudy.comp_chart.make(comp_type='multiline', **kwargs)
show(p)

<h2><a id='section4.2'>4.2 Daily Fatalities Comparison - 10 Most Impacted Regions</a></h2>

Now we'll look at new cases in the 10 most impacted regions globally.

In [9]:
regions = list(baseframe[~(baseframe['region_name'] == 'Hubei')] \
    .sort_values(by='cases', ascending=False).region_name.unique())[:7]
casestudy = CaseStudy(baseframe, regions=regions, start_hurdle=3, start_factor='deaths', count_dma=7, lognat=True)

In [10]:
kwargs ={
    'title': 'Comparison of Population Adjusted Daily Cases in Top 10 Most Impacted Regions Excluding Hubei',
    'width': 925,
    'palette_base': Category20b[20]
}
p = casestudy.comp_chart.make(comp_category='cases_new_dma_per_1M', comp_type='multiline', **kwargs)
show(p)

The above is a cluttered mess to the point where I can't even be bothered to fix the labels. 

There are two clear regions that separated themselves and the remaining regions were so far behind that it makes it difficult to read the chart. This is where `lognat` can come in handy.

In [11]:
kwargs ={
    'title': 'Comparison of Population Adjusted Daily Cases in Top 10 Most Impacted Regions Excluding Hubei',
    'width': 925,
    'palette_base': Category20b[20],
    'label_offsets': {
        'New Jersey': {'x_offset': -5, 'y_offset': 5},
        'New York': {'x_offset': 20, 'y_offset': -20},
        'France': {'x_offset': 10, 'y_offset': 10},
        'Germany': {'x_offset': 0, 'y_offset': -20},
    }, 
}
p = casestudy.comp_chart.make(comp_category='cases_new_dma_per_1M_lognat', comp_type='multiline', **kwargs)
show(p)

<h2><a id='section4.3'>4.3 Factor Comparison: Oxford Government Tracker</a></h2>

**Oxford Stringency Index from day of 1st Death**

The comparison chart can be used to compare any time-dynamic variable provided by the `factor` attribute.

The below chart compares the Oxford Stringency Index for each selected region

In [12]:
regions = ['Germany', 'Spain', 'Taiwan']

casestudy = CaseStudy(
    baseframe, count_categories='cases_new_per_1M', regions=regions, 
    start_hurdle=10, start_factor='cases', factors=['strindex']
)
kwargs = {
    'width': 825, 'height': 500,
    'palette_base': Category20b[20],
    'label_offsets': {
        'Taiwan': {'x_offset': 10, 'y_offset': 5},
    },
}
p = casestudy.comp_chart.make(comp_category='strindex', comp_type='multiline', **kwargs)
show(p)

<h2><a id='section4.4'>4.4 MultiBar Comparison</a></h2>

Staggered bar charts are available by selecting `v-bar` for the `comp_type` parameter. `v-bar` is also the default parameter. The chart staggers the outcomes for individual regions on each day around the midpoint.

can provide more clear point-to-point comparisons between two or three regions. If you are interested in more than three regions, the 4D Bar chart is recommended.

Options for tinkering with the color scheme are also shown.

**NOTE:** Despite its low score on the Oxford Stringency Index, Taiwan ***doesn't even register on the page*** and so has been removed from the comparison.

In [13]:
from bokeh.palettes import Spectral
regions = ['Germany', 'Spain']

casestudy = CaseStudy(baseframe, regions=regions, factors='strindex', count_dma=14, start_hurdle=10, start_factor='cases', lognat=True)

kwargs = {
    'width': 825, 'height': 500,
    'palette_base': Spectral[11],
    'palette_flip': True,
    'palette_shift': 2,
    'legend': True,
}

p = casestudy.comp_chart.make(comp_category='cases_new_dma_per_1M', **kwargs)
show(p)

Via the `overlay` parameter, an optional second factor can be added to the right y-axis for understanding related variables.

Here we compare based on ***density***.

We can see from below that, despite having a lower `stringency index` overall and seemingly reacting slower to to the onset of the outbreak, Germany has managed to maintain materially lower case counts.

The impacts of testing should be investigated.

In [14]:
regions = ['Germany', 'Spain']

kwargs = {
    'width': 825, 'height': 500,
    'legend_location': 'top_left',
    'palette_base': Spectral[11],
    'palette_flip': True,
    'palette_shift': 2,
    'overlay': 'strindex',  
    'legend': True,
}

p = casestudy.comp_chart.make(comp_category='cases_new_dma_per_person_per_land_KM2', **kwargs)
show(p)

### Saving Files

All chart instances in `see19` have a `save_file` option. Simply set that option to `True` and provide a `filename` and the file will be saved to yor location of choice.

# Next Section

Click on this link to go to the next notebook: [5. Visualizing Factors in 4D](https://ryanskene.github.io/see19/guide/5.%20See19%20-%20Visualizing%20Factors%20in%204D.html)