# see19 Guide

**A dataset and interface for visualizing and analyzing the epidemiology of Coronavirus Disease 2019 aka SARS-CoV-2 aka COVID19 aka C19**

Find it on [GitHub](https://github.com/ryanskene/see19)

Current with version 0.3.0.

# 3. the Casestudy Interface

3.1 [Basics](#section3.1)  
3.2 [Filtering](#section3.2)  
3.3 [Available Factors](#section3.3)  
3.4 [Additional Flags](#section3.4)

See19 Visualization and Data analysis is completed via the `CaseStudy` class.
    
`CaseStudy` can be accessed directly from the `see19` module

In [1]:
import pandas as pd

In [None]:
from see19 import CaseStudy, get_baseframe
baseframe = get_baseframe()
casestudy = CaseStudy(baseframe)

<h2><a id='section3.1'>3.1 Basics</a></h2>

The original baseframe can be accessed via the `baseframe` attribute

In [121]:
casestudy.baseframe.head(2)

Unnamed: 0,region_id,country_id,region_code,region_name,country_code,country,date,cases,deaths,tests,...,genito,childbirth,perinatal,congenital,other,external,visitors,travel_year,gdp,gdp_year
9362,282,110,ABR,Abruzzo,ITA,Italy,2020-01-01,,,,...,442.0,1.0,16.0,19.0,384.0,2059,181458.0,2017.0,45608600000.0,2016.0
9363,282,110,ABR,Abruzzo,ITA,Italy,2020-01-02,,,,...,442.0,1.0,16.0,19.0,384.0,2059,181458.0,2017.0,45608600000.0,2016.0


`CaseStudy` automatically computes different adjustments including:

1. Daily new cases, fatalities, and tests
2. Daily Moving Average (DMA) for new and cumulative cases, fatalities, and tests
3. Population and density adjustments for new and cumulative cases, fatalities, and tests
4. Daily growth or change in 1. thru 3. above

These adjustments are referred to as `count_categories`.

The amended dataframe can be accessed via the `df` attribute:

In [5]:
casestudy.df.head(2)

Unnamed: 0,region_id,country_id,region_code,region_name,country_code,country,date,cases,deaths,tests,...,growth_cases_per_1M,growth_cases_per_person_per_land_KM2,growth_cases_per_person_per_city_KM2,growth_deaths_per_1M,growth_deaths_per_person_per_land_KM2,growth_deaths_per_person_per_city_KM2,growth_tests_per_1M,growth_tests_per_person_per_land_KM2,growth_tests_per_person_per_city_KM2,days
71,32,110,TRE,P.A. Trento,ITA,Italy,2020-03-12,107.0,1.0,,...,1.38961,1.38961,1.38961,,,,,,,0 days
72,32,110,TRE,P.A. Trento,ITA,Italy,2020-03-13,163.0,2.0,,...,1.523364,1.523364,1.523364,2.0,2.0,2.0,,,,1 days


For ease of selection, `CaseStudy` has a number of class attributes with different groupings of count categories: `BASECOUNT_CATS`, `PER_CATS`, `LOGNAT_CATS`, `ALL_CATS`, `DMA_COUNT_CATS`, `PER_COUNT_CATS`.

`DMA_COUNT_CATS` is shown as an example:

In [6]:
CaseStudy.DMA_COUNT_CATS

['cases_dma',
 'cases_new_dma',
 'deaths_dma',
 'deaths_new_dma',
 'tests_dma',
 'tests_new_dma',
 'cases_dma_per_1M',
 'cases_dma_per_person_per_land_KM2',
 'cases_dma_per_person_per_city_KM2',
 'cases_new_dma_per_1M',
 'cases_new_dma_per_person_per_land_KM2',
 'cases_new_dma_per_person_per_city_KM2',
 'deaths_dma_per_1M',
 'deaths_dma_per_person_per_land_KM2',
 'deaths_dma_per_person_per_city_KM2',
 'deaths_new_dma_per_1M',
 'deaths_new_dma_per_person_per_land_KM2',
 'deaths_new_dma_per_person_per_city_KM2',
 'tests_dma_per_1M',
 'tests_dma_per_person_per_land_KM2',
 'tests_dma_per_person_per_city_KM2',
 'tests_new_dma_per_1M',
 'tests_new_dma_per_person_per_land_KM2',
 'tests_new_dma_per_person_per_city_KM2',
 'cases_dma_lognat',
 'cases_new_dma_lognat',
 'deaths_dma_lognat',
 'deaths_new_dma_lognat',
 'tests_dma_lognat',
 'tests_new_dma_lognat',
 'cases_dma_per_1M_lognat',
 'cases_dma_per_person_per_land_KM2_lognat',
 'cases_dma_per_person_per_city_KM2_lognat',
 'cases_new_dma_per_

By providing `lognat=True`, `CaseStudy` will also take the natural log of each of 1. thru 3. above

In [7]:
casestudy = CaseStudy(baseframe, lognat=True)

In [8]:
casestudy.LOGNAT_CATS

['cases_dma_lognat',
 'cases_new_lognat',
 'cases_new_dma_lognat',
 'deaths_dma_lognat',
 'deaths_new_lognat',
 'deaths_new_dma_lognat',
 'tests_dma_lognat',
 'tests_new_lognat',
 'tests_new_dma_lognat',
 'cases_lognat',
 'deaths_lognat',
 'tests_lognat',
 'cases_dma_per_1M_lognat',
 'cases_dma_per_person_per_land_KM2_lognat',
 'cases_dma_per_person_per_city_KM2_lognat',
 'cases_new_per_1M_lognat',
 'cases_new_per_person_per_land_KM2_lognat',
 'cases_new_per_person_per_city_KM2_lognat',
 'cases_new_dma_per_1M_lognat',
 'cases_new_dma_per_person_per_land_KM2_lognat',
 'cases_new_dma_per_person_per_city_KM2_lognat',
 'deaths_dma_per_1M_lognat',
 'deaths_dma_per_person_per_land_KM2_lognat',
 'deaths_dma_per_person_per_city_KM2_lognat',
 'deaths_new_per_1M_lognat',
 'deaths_new_per_person_per_land_KM2_lognat',
 'deaths_new_per_person_per_city_KM2_lognat',
 'deaths_new_dma_per_1M_lognat',
 'deaths_new_dma_per_person_per_land_KM2_lognat',
 'deaths_new_dma_per_person_per_city_KM2_lognat',
 't

In [11]:
'In total, there are {} different `count_categories` to choose from.'.format(len(CaseStudy.ALL_COUNT_CATS))

'In total, there are 96 different `count_categories` to choose from.'

<h2><a id='section3.2'>3.2 Filtering</a></h2>

Thankfully, `casestudy.df` can be limited to specific count categories via the `count_categories` attribute:

In [55]:
casestudy = CaseStudy(baseframe, count_categories='tests_new_dma_per_person_per_land_KM2')
casestudy.df.head(2)

Unnamed: 0,region_id,country_id,region_code,region_name,country_code,country,date,cases,deaths,tests,population,land_KM2,land_dens,city_KM2,city_dens,tests_new_dma_per_person_per_land_KM2,days
71,32,110,TRE,P.A. Trento,ITA,Italy,2020-03-12,107.0,1.0,,515201.0,2938.79544,175.310262,2938.79544,175.310262,,0 days
72,32,110,TRE,P.A. Trento,ITA,Italy,2020-03-13,163.0,2.0,,515201.0,2938.79544,175.310262,2938.79544,175.310262,,1 days


In [56]:
casestudy = CaseStudy(baseframe, count_categories=['deaths_new_dma_per_person_per_land_KM2', 'growth_cases_new_per_1M'])
casestudy.df.head(2)

Unnamed: 0,region_id,country_id,region_code,region_name,country_code,country,date,cases,deaths,tests,population,land_KM2,land_dens,city_KM2,city_dens,deaths_new_dma_per_person_per_land_KM2,growth_cases_new_per_1M,days
71,32,110,TRE,P.A. Trento,ITA,Italy,2020-03-12,107.0,1.0,,515201.0,2938.79544,175.310262,2938.79544,175.310262,0.001901,1.2,0 days
72,32,110,TRE,P.A. Trento,ITA,Italy,2020-03-13,163.0,2.0,,515201.0,2938.79544,175.310262,2938.79544,175.310262,0.003803,1.866667,1 days


`CaseStudy` can further filter `baseframe` as follows:
    
* `regions` to limit the frame to certain regions
* `countries` to limit the frame to certain countries
* `exclude_regions` to exclude certain regions
* `exclude_countries` to exclude certain countries

Specific regions can be included or excluded by providing the `region_name`, `region_code`, or `region_id`.
Specific countries can be included or excluded by providing the `country`, `country_code`, or `country_id`.

Each of the four parameters can accept a single region as a `str` object or multiple regions via several common iterables.

Below we select three regions:

In [57]:
regions = ['New York', 'FL', 32]
casestudy = CaseStudy(
    baseframe, regions=regions, count_categories=CaseStudy.BASECOUNT_CATS, 
)

In [58]:
casestudy.df.head(3)

Unnamed: 0,region_id,country_id,region_code,region_name,country_code,country,date,cases,deaths,tests,...,cases_dma,cases_new,cases_new_dma,deaths_dma,deaths_new,deaths_new_dma,tests_dma,tests_new,tests_new_dma,days
71,32,110,TRE,P.A. Trento,ITA,Italy,2020-03-12,107.0,1.0,,...,78.666667,30.0,24.666667,0.333333,1.0,0.333333,,,,0 days
72,32,110,TRE,P.A. Trento,ITA,Italy,2020-03-13,163.0,2.0,,...,115.666667,56.0,37.0,1.0,1.0,0.666667,,,,1 days
73,32,110,TRE,P.A. Trento,ITA,Italy,2020-03-14,206.0,2.0,,...,158.666667,43.0,43.0,1.666667,0.0,0.666667,,,,2 days


We can see that all three regions are indeed in the object by grouping:

In [59]:
pd.concat([df_group.iloc[:1] for region_id, df_group in casestudy.df.groupby('region_id')]).head(3)

Unnamed: 0,region_id,country_id,region_code,region_name,country_code,country,date,cases,deaths,tests,...,cases_dma,cases_new,cases_new_dma,deaths_dma,deaths_new,deaths_new_dma,tests_dma,tests_new,tests_new_dma,days
71,32,110,TRE,P.A. Trento,ITA,Italy,2020-03-12,107.0,1.0,,...,78.666667,30.0,24.666667,0.333333,1.0,0.333333,,,,0 days
4444,64,236,FL,Florida,USA,United States of America (the),2020-03-06,7.0,2.0,64.0,...,4.666667,3.0,1.333333,0.666667,2.0,0.666667,43.333333,24.0,,0 days
5962,75,236,NY,New York,USA,United States of America (the),2020-03-14,615.0,2.0,3303.0,...,453.666667,194.0,132.333333,0.666667,2.0,0.666667,2270.333333,103.0,998.333333,0 days


The region and country filters are important mechanisms for isolating data.

Here, we focus on US regions only, but exclude some of the most impacted ones:

In [60]:
countries = ['USA']
excluded_regions = ['NY', 'NJ']
casestudy = CaseStudy(
    baseframe, countries=countries, excluded_regions=excluded_regions, count_categories=CaseStudy.BASECOUNT_CATS, 
)

And below we can see that we have various US states in the dataset and that New York or New Jersey are *not* included.

In [61]:
casestudy.df.head(2)

Unnamed: 0,region_id,country_id,region_code,region_name,country_code,country,date,cases,deaths,tests,...,cases_dma,cases_new,cases_new_dma,deaths_dma,deaths_new,deaths_new_dma,tests_dma,tests_new,tests_new_dma,days
1896,44,236,AL,Alabama,USA,United States of America (the),2020-03-25,439.0,1.0,2812.0,...,292.333333,197.0,94.0,0.333333,1.0,0.333333,2321.666667,491.0,403.333333,0 days
1897,44,236,AL,Alabama,USA,United States of America (the),2020-03-26,531.0,1.0,4099.0,...,404.0,92.0,111.666667,0.666667,0.0,0.333333,3077.333333,1287.0,755.666667,1 days


In [62]:
pd.concat([df_group.iloc[:1] for region_id, df_group in casestudy.df.groupby('region_id')]).head(3)

Unnamed: 0,region_id,country_id,region_code,region_name,country_code,country,date,cases,deaths,tests,...,cases_dma,cases_new,cases_new_dma,deaths_dma,deaths_new,deaths_new_dma,tests_dma,tests_new,tests_new_dma,days
1896,44,236,AL,Alabama,USA,United States of America (the),2020-03-25,439.0,1.0,2812.0,...,292.333333,197.0,94.0,0.333333,1.0,0.333333,2321.666667,491.0,403.333333,0 days
2066,48,236,WY,Wyoming,USA,United States of America (the),2020-04-13,275.0,1.0,5964.0,...,268.666667,5.0,7.333333,0.333333,1.0,0.333333,5627.333333,505.0,325.0,0 days
2198,49,236,AK,Alaska,USA,United States of America (the),2020-03-25,43.0,1.0,1691.0,...,40.333333,1.0,3.666667,0.333333,1.0,0.333333,1227.0,669.0,241.0,0 days


In [63]:
casestudy.df[casestudy.df.region_name.isin(excluded_regions)]

Unnamed: 0,region_id,country_id,region_code,region_name,country_code,country,date,cases,deaths,tests,...,cases_dma,cases_new,cases_new_dma,deaths_dma,deaths_new,deaths_new_dma,tests_dma,tests_new,tests_new_dma,days


### Limiting data via different start and tail hurdles

Parameters exist that allow you to filter the dataset such that regions and days appear only if they meet certain criteria.

`start_factor` and `start_hurdle` provide the ability to effectively *crop* the beginning of region's period of data.

`tail_factor` and `tail_hurdle` do the same for the end of a region's period.

`start_factor` and `tail_factor` accept almost any factor in the dataset, from the count_categories to dates.

The `hurdle` is the level the region must reach to be included. For instance, if a `start_factor` of `cases_new_per_1M` is selected and a `start_hurdle` of `1.0`, then each region's first row in `casestudy.df` will be the day that the region met or exceeded **1.0 new cases per 1M people**.

These options are a convenient way to compare regions that have been impacted in similar ways or, perhaps, to fairly compare regions that were impacted at different times.

The default parameters for `start_factor` and `start_hurdle` limit the data to regions with at least one cumulative fatality.

**NOTE**: a `days` column is added to `casestudy.df`. This is a count of the number of days from the current date back to the first date in frame.  When a `start_factor` is provided, this is the first date that the `start_hurdle` is met. When `start_factor` is not provided, this is the first date in the dataset.

Examples are show below.

In [68]:
casestudy = CaseStudy(
    baseframe, regions=['Spain'], count_categories=CaseStudy.BASECOUNT_CATS, 
    start_factor='cases', start_hurdle=3
)
casestudy.df.head(2)

Unnamed: 0,region_id,country_id,region_code,region_name,country_code,country,date,cases,deaths,tests,...,cases_dma,cases_new,cases_new_dma,deaths_dma,deaths_new,deaths_new_dma,tests_dma,tests_new,tests_new_dma,days
37382,491,209,ESP,Spain,ESP,Spain,2020-02-25,6.0,0.0,,...,3.333333,4.0,1.333333,0.0,0.0,0.0,,,,0 days
37383,491,209,ESP,Spain,ESP,Spain,2020-02-26,13.0,0.0,,...,7.0,7.0,3.666667,0.0,0.0,0.0,,,,1 days


In [69]:
casestudy = CaseStudy(
    baseframe, countries=['Sweden'], 
    count_categories='deaths_new', start_factor='deaths_new', start_hurdle=3
)
casestudy.df.head(2)

Unnamed: 0,region_id,country_id,region_code,region_name,country_code,country,date,cases,deaths,tests,population,land_KM2,land_dens,city_KM2,city_dens,deaths_new,days
38006,495,214,SWE,Sweden,SWE,Sweden,2020-03-16,1103.0,6.0,15629.323415,9415570.0,415314.854224,22.67092,2150.411192,4378.497486,3.0,0 days
38007,495,214,SWE,Sweden,SWE,Sweden,2020-03-17,1190.0,7.0,17074.543751,9415570.0,415314.854224,22.67092,2150.411192,4378.497486,1.0,1 days


To see the earliest dates in the dataframe, prior to any deaths being recorded, set `start_factor` to `''`.

In [80]:
casestudy = CaseStudy(
    baseframe, regions='RJ', count_categories='tests_new_dma', 
    factors=['temp', 'strindex'], start_factor=''
)
casestudy.df.head(2)

Unnamed: 0,region_id,country_id,region_code,region_name,country_code,country,date,cases,deaths,tests,population,land_KM2,land_dens,city_KM2,city_dens,tests_new_dma,temp,strindex,days
45324,557,31,RJ,Rio De Janeiro,BRA,Brazil,2020-01-01,,,,15962668.0,42269.311478,377.642016,2203.766328,7243.357792,,20.984674,0.0,0 days
45325,557,31,RJ,Rio De Janeiro,BRA,Brazil,2020-01-02,,,,15962668.0,42269.311478,377.642016,2203.766328,7243.357792,,21.225153,0.0,1 days


<h2><a id='section3.3'>3.3 Available Factors</a></h2>

The remaining columns in the `baseframe` can be included in a `CaseStudy` instance on an ***opt-in*** basis via the `factors` attribute:

In [122]:
casestudy = CaseStudy(baseframe, count_categories='cases_new_per_person_per_land_KM2', factors=['no2', 'strindex'])
casestudy.df.head(2)

Unnamed: 0,region_id,country_id,region_code,region_name,country_code,country,date,cases,deaths,tests,population,land_KM2,land_dens,city_KM2,city_dens,cases_new_per_person_per_land_KM2,no2,strindex,days
71,32,110,TRE,P.A. Trento,ITA,Italy,2020-03-12,107.0,1.0,,515201.0,2938.79544,175.310262,2938.79544,175.310262,0.171125,,85.19,0 days
72,32,110,TRE,P.A. Trento,ITA,Italy,2020-03-13,163.0,2.0,,515201.0,2938.79544,175.310262,2938.79544,175.310262,0.319434,,85.19,1 days


For convenience, a number of factor groupings can be accessed via `CaseStudy` attributes:

* `GMOBIS`, `AMOBIS`, `CAUSES`, `MAJOR_CAUSES`, `POLLUTS`, `TEMP_MSMTS`, `MSMTS`
    * various groupings for factor data
    * `GMOBIS` refer to Google Mobility data.
    * `AMOBIS` refer to Apple Mobility data.
* `STRINDEX_CATS`, `CONTAIN_CATS`, `ECON_CATS`, `HEALTH_CATS`
    * groupings for the Oxford Stringency Index

In [123]:
print (CaseStudy.MSMTS)
print (CaseStudy.MAJOR_CAUSES)

['uvb', 'rhum', 'temp', 'dewpoint']
['circul', 'infectious', 'respir', 'endo']


Demographic population age groupings can be accessed via the `see19` module:
* `ALL_RANGES` - all the possible demographic age ranges
* `RANGES` - a dictionary of various groupings of age ranges

In [112]:
from see19 import RANGES
RANGES.keys()

dict_keys(['UNDERS', 'OVERS', 'SCHOOL_GOERS', 'Y_MILLS', 'MILLS', 'MID', 'MID_PLUS'])

In [113]:
overs = RANGES['OVERS']['ranges']

casestudy = CaseStudy(baseframe, regions='Lombardia', count_categories='deaths_new_per_person_per_land_KM2', factors=overs)
casestudy.df.head(2)

Unnamed: 0,region_id,country_id,region_code,region_name,country_code,country,date,cases,deaths,tests,...,A70PLUSB,A75PLUSB,A80PLUSB,A85PLUSB,A65PLUSB_%,A70PLUSB_%,A75PLUSB_%,A80PLUSB_%,A85PLUSB_%,days
658,36,110,LOM,Lombardia,ITA,Italy,2020-02-24,172.0,6.0,,...,1490749.0,963768.0,0.0,0.0,0.208224,0.154784,0.100068,0.0,0.0,0 days
659,36,110,LOM,Lombardia,ITA,Italy,2020-02-25,240.0,9.0,,...,1490749.0,963768.0,0.0,0.0,0.208224,0.154784,0.100068,0.0,0.0,1 days


In [114]:
casestudy = CaseStudy(baseframe, regions='LOM', count_categories='deaths_new_per_person_per_land_KM2', factors=CaseStudy.MAJOR_CAUSES)
casestudy.df.head(2)

Unnamed: 0,region_id,country_id,region_code,region_name,country_code,country,date,cases,deaths,tests,...,deaths_new_per_person_per_land_KM2,circul,infectious,respir,endo,circul_%,infectious_%,respir_%,endo_%,days
658,36,110,LOM,Lombardia,ITA,Italy,2020-02-24,172.0,6.0,,...,,74695,4630,20185,6566.0,0.007756,0.000481,0.002096,0.000682,0 days
659,36,110,LOM,Lombardia,ITA,Italy,2020-02-25,240.0,9.0,,...,0.00507,74695,4630,20185,6566.0,0.007756,0.000481,0.002096,0.000682,1 days


Some factors are only available at a country level, regardless of the sub regions available for some countries.

By setting `country_level=True`, `casestudy` will aggregate most data among the subregions up to the country level to allow for proper comparison across the broad range of countries.

The **Oxford Stringency Index** and its derivatives is one such data group only available at the country level.

In [115]:
casestudy = CaseStudy(baseframe, 
    count_categories='deaths_new_per_person_per_land_KM2', 
    factors='strindex',
    country_level=True,
)
casestudy.df.tail(2)

Unnamed: 0,region_id,country_id,region_code,region_name,country_code,country,date,cases,deaths,tests,population,land_KM2,land_dens,city_KM2,city_dens,deaths_new_per_person_per_land_KM2,strindex,days
149,id_for_USA,236,,name_for_USA,USA,United States of America (the),2020-05-29,1734368.0,102001.0,15935858.0,307692971.0,9087502.0,33.858916,710152.024025,433.277609,35.352579,72.69,90 days
150,id_for_USA,236,,name_for_USA,USA,United States of America (the),2020-05-30,1756599.0,102913.0,16327422.0,307692971.0,9087502.0,33.858916,710152.024025,433.277609,26.935298,72.69,91 days


Above you can see that all US states have been aggregated into a single region with an region_id 

With respect to the `STRINDEX_CATS` subgroups, if all the required categories are provided, `CaseStudy` will sum the individual category values. 

For example, if `CONTAIN_CATS` are provided, the aggregate of the eight categories will be included in the `c_sum` column.

Note if all five `h` indicators are provided, `CaseStudy` will also tabulate a `key3_sum`, which aggregates the scores on the `h1`, `h2`, and `h3` indicators.

In [116]:
casestudy = CaseStudy(baseframe, 
    count_categories='deaths_new_per_person_per_land_KM2', 
    factors=CaseStudy.CONTAIN_CATS,
    country_level=True,
)
casestudy.df.tail(2)

Unnamed: 0,region_id,country_id,region_code,region_name,country_code,country,date,cases,deaths,tests,...,c1,c2,c3,c4,c5,c6,c7,c8,c_sum,days
149,id_for_USA,236,,name_for_USA,USA,United States of America (the),2020-05-29,1734368.0,102001.0,15935858.0,...,3.0,3.0,2.0,4.0,1.0,2.0,2.0,3.0,20.0,90 days
150,id_for_USA,236,,name_for_USA,USA,United States of America (the),2020-05-30,1756599.0,102913.0,16327422.0,...,3.0,3.0,2.0,4.0,1.0,2.0,2.0,3.0,20.0,91 days


Additional computations can be added for each factor via the `factor_dmas` attribute. 

The attribute is a dictionary of the form `str(factor_name): int(dma)`. 

When provided, `CaseStudy` will automatically add `_dma`, `_growth`, and `_growth_dma` computations

In [117]:
casestudy = CaseStudy(baseframe, count_categories='deaths_new_dma_per_1M', 
    factors=['temp', 'c1', 'strindex'], 
    factor_dmas={'temp': 7, 'c1': 14},
    country_level=True,
)
casestudy.df.head(2)

Unnamed: 0,region_id,country_id,region_code,region_name,country_code,country,date,cases,deaths,tests,...,temp,c1,strindex,temp_dma,temp_growth,temp_growth_dma,c1_dma,c1_growth,c1_growth_dma,days
10802,293,1,AFG,Afghanistan,AFG,Afghanistan,2020-03-22,40.0,1.0,,...,10.778741,3.0,36.11,9.634337,1.067747,1.0138,1.928571,1.0,,0 days
10803,293,1,AFG,Afghanistan,AFG,Afghanistan,2020-03-23,40.0,1.0,,...,8.560785,3.0,36.11,9.475166,0.794229,0.992667,2.142857,1.0,,1 days


To provide a single dma for all the factors submitted, build the dictionary ahead of time:

In [118]:
factor_dmas = {msmt: 14 for msmt in CaseStudy.MSMTS}
casestudy = CaseStudy(
    baseframe, count_categories='tests_new_per_1M', 
    factors=CaseStudy.MSMTS, factor_dmas=factor_dmas
)
casestudy.df.head(2)

Unnamed: 0,region_id,country_id,region_code,region_name,country_code,country,date,cases,deaths,tests,...,rhum_dma,rhum_growth,rhum_growth_dma,temp_dma,temp_growth,temp_growth_dma,dewpoint_dma,dewpoint_growth,dewpoint_growth_dma,days
71,32,110,TRE,P.A. Trento,ITA,Italy,2020-03-12,107.0,1.0,,...,90.887667,1.050915,1.014481,4.082184,0.959184,1.238369,-1.975261,1.896068,-0.823534,0 days
72,32,110,TRE,P.A. Trento,ITA,Italy,2020-03-13,163.0,2.0,,...,91.989446,0.995192,1.014527,4.513664,1.053689,1.218875,-0.780131,1.026207,-0.81909,1 days


Other factors are adjusted to population. These factors are appended with `_%` and can be seen via the `pop_cats` attribute.

These are typically time-static factors.

In [119]:
casestudy = CaseStudy(baseframe, count_categories='deaths_new_dma_per_1M', factors=['visitors', 'gdp', 'A65PLUSB' ])
casestudy.pop_cats

['A65PLUSB', 'visitors', 'gdp']

In [120]:
casestudy.df[['region_name', 'date', 'visitors_%', 'gdp_%', 'A65PLUSB_%']].head(2)

Unnamed: 0,region_name,date,visitors_%,gdp_%,A65PLUSB_%
71,P.A. Trento,2020-03-12,19.864474,54504.746691,0.203018
72,P.A. Trento,2020-03-13,19.864474,54504.746691,0.203018


<h3><a id='section3.4'>3.4 Additional Flags</a></h3>

There are several additional flags and methods that will be touched on briefly, however, you are encouraged to read the analysis pages to see them in action.

* `world_averages`: when set to `True`, averages each date in the dataset across all the regions, to provide a ***per_region*** statistic for each factor

* `favor_earlier`: when set to `True`, scales any selected rows such that the rows values favor earlier dates over later ones. A new column is added with the `_earlier` suffix. This is helpful when attempting to study the impacts of early moves to, say, social distance. Factors are selected by passing a list to the `factors_to_favor_earlier` parameter.

# Next Section

Click on this link to go to the next notebook: [4. Visualizing Regional Impacts](https://ryanskene.github.io/see19/guide/4.%20See19%20-%20Visualizing%20Regional%20Impacts.html)