**Cleaning data**

We show two data sets.

In the first one, to estimate the economy activity we'll be using the **State Quarterly Indicator of Economic Activity (ITAEE by its Spanish acronym)** *Indicador Trimestral de la Actividad Económica Estatal* produced by INEGI. We develop the generalized diffusion index for the Mexican economy using state economic coincident indexes. 

In the second one, we'll be using a database that shows a list of the **measures that the Government of Mexico City** and Mayor's Offices have implemented to address the health contingency of COVID-19.

**Exploring ITAEE data**

In [62]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
file_to_load = "Resources/conjunto_de_datos_itaee_itaee_0202020_t4.csv"

# Read Purchasing File and store into Pandas data frame
itaee_data = pd.read_csv(file_to_load)
itaee_data.head()


Unnamed: 0,Descriptores,2003|T1,2003|T2,2003|T3,2003|T4,2003|Anual,2004|T1,2004|T2,2004|T3,2004|T4,...,2019|T1,2019|T2,2019|T3,2019|T4,2019|Anual,2020|T1<R>,2020|T2<R>,2020|T3<R>,2020|T4<P>,2020|Anual
0,Índice de volumen físico base 2013=100|Total,77.503956,83.41688,76.811752,81.440841,79.793357,79.9842,86.779487,80.300091,86.178942,...,112.92374,115.252267,116.813835,124.816904,117.451687,111.270259,96.733689,103.686066,114.045403,106.433854
1,Índice de volumen físico base 2013=100|Activid...,108.140544,129.946779,136.743335,131.581676,126.603083,92.034139,145.707737,107.641831,110.949912,...,66.215335,118.774439,113.766654,114.620901,103.344332,68.024463,121.476781,120.27748,106.84962,104.157086
2,Índice de volumen físico base 2013=100|Activid...,95.729909,97.763263,101.331206,101.862423,99.1717,96.155207,99.010923,107.112578,107.918355,...,99.814865,97.246303,102.025838,107.609435,101.67411,98.542354,70.65284,84.817679,98.313456,88.081583
3,Índice de volumen físico base 2013=100|Activid...,111.214383,119.095754,112.281942,132.428817,118.755224,108.393536,117.238413,123.1682,120.259932,...,103.405632,106.515984,109.689338,128.401767,112.00318,111.892557,90.493663,110.0044,153.099161,116.372445
4,Índice de volumen físico base 2013=100|Activid...,51.356571,53.17637,52.987197,54.621116,53.035314,57.755654,57.532249,57.619424,58.920519,...,107.018435,106.859964,106.964871,105.611481,106.613688,100.980324,92.667713,92.684218,97.840866,96.04328


In [63]:
# Inspect all columns
itaee_data.columns

Index(['Descriptores', '2003|T1', '2003|T2', '2003|T3', '2003|T4',
       '2003|Anual', '2004|T1', '2004|T2', '2004|T3', '2004|T4', '2004|Anual',
       '2005|T1', '2005|T2', '2005|T3', '2005|T4', '2005|Anual', '2006|T1',
       '2006|T2', '2006|T3', '2006|T4', '2006|Anual', '2007|T1', '2007|T2',
       '2007|T3', '2007|T4', '2007|Anual', '2008|T1', '2008|T2', '2008|T3',
       '2008|T4', '2008|Anual', '2009|T1', '2009|T2', '2009|T3', '2009|T4',
       '2009|Anual', '2010|T1', '2010|T2', '2010|T3', '2010|T4', '2010|Anual',
       '2011|T1', '2011|T2', '2011|T3', '2011|T4', '2011|Anual', '2012|T1',
       '2012|T2', '2012|T3', '2012|T4', '2012|Anual', '2013|T1', '2013|T2',
       '2013|T3', '2013|T4', '2013|Anual', '2014|T1', '2014|T2', '2014|T3',
       '2014|T4', '2014|Anual', '2015|T1', '2015|T2', '2015|T3', '2015|T4',
       '2015|Anual', '2016|T1', '2016|T2', '2016|T3', '2016|T4', '2016|Anual',
       '2017|T1', '2017|T2', '2017|T3', '2017|T4', '2017|Anual', '2018|T1',
       '2018

In [36]:
itaee_data.dtypes

Descriptores     object
2003|T1         float64
2003|T2         float64
2003|T3         float64
2003|T4         float64
                 ...   
2020|T1<R>      float64
2020|T2<R>      float64
2020|T3<R>      float64
2020|T4<P>      float64
2020|Anual      float64
Length: 91, dtype: object

In [56]:
# Extract only columns 
reduced_itaee = itaee_data.loc[:, ['Descriptores','2018|T1','2018|T2', '2018|T3', '2018|T4', '2018|Anual', 
                                   '2019|T1', '2019|T2', '2019|T3', '2019|T4', '2019|Anual',
                                   '2020|T1<R>', '2020|T2<R>', '2020|T3<R>', '2020|T4<P>', '2020|Anual']]
reduced_itaee.head()

Unnamed: 0,Descriptores,2018|T1,2018|T2,2018|T3,2018|T4,2018|Anual,2019|T1,2019|T2,2019|T3,2019|T4,2019|Anual,2020|T1<R>,2020|T2<R>,2020|T3<R>,2020|T4<P>,2020|Anual
0,Índice de volumen físico base 2013=100|Total,110.250927,117.324349,116.516508,123.961167,117.013238,112.92374,115.252267,116.813835,124.816904,117.451687,111.270259,96.733689,103.686066,114.045403,106.433854
1,Índice de volumen físico base 2013=100|Activid...,60.9624,115.786488,110.580173,113.251287,100.145087,66.215335,118.774439,113.766654,114.620901,103.344332,68.024463,121.476781,120.27748,106.84962,104.157086
2,Índice de volumen físico base 2013=100|Activid...,103.673852,105.636262,102.085147,104.836266,104.057882,99.814865,97.246303,102.025838,107.609435,101.67411,98.542354,70.65284,84.817679,98.313456,88.081583
3,Índice de volumen físico base 2013=100|Activid...,120.240812,120.981544,112.864082,119.615397,118.425459,103.405632,106.515984,109.689338,128.401767,112.00318,111.892557,90.493663,110.0044,153.099161,116.372445
4,Índice de volumen físico base 2013=100|Activid...,107.535656,110.93097,111.526827,109.019763,109.753304,107.018435,106.859964,106.964871,105.611481,106.613688,100.980324,92.667713,92.684218,97.840866,96.04328


**First category "Índice de volumen físico base 2013=100"**

In [68]:
#Extract only rows that cointains'Índice de volumen físico base 2013=100'
index_volume = reduced_itaee.loc[reduced_itaee['Descriptores'].str.contains('Índice de volumen físico base 2013=100')]
index_volume.head()

Unnamed: 0,Descriptores,2018|T1,2018|T2,2018|T3,2018|T4,2018|Anual,2019|T1,2019|T2,2019|T3,2019|T4,2019|Anual,2020|T1<R>,2020|T2<R>,2020|T3<R>,2020|T4<P>,2020|Anual
0,Índice de volumen físico base 2013=100|Total,110.250927,117.324349,116.516508,123.961167,117.013238,112.92374,115.252267,116.813835,124.816904,117.451687,111.270259,96.733689,103.686066,114.045403,106.433854
1,Índice de volumen físico base 2013=100|Activid...,60.9624,115.786488,110.580173,113.251287,100.145087,66.215335,118.774439,113.766654,114.620901,103.344332,68.024463,121.476781,120.27748,106.84962,104.157086
2,Índice de volumen físico base 2013=100|Activid...,103.673852,105.636262,102.085147,104.836266,104.057882,99.814865,97.246303,102.025838,107.609435,101.67411,98.542354,70.65284,84.817679,98.313456,88.081583
3,Índice de volumen físico base 2013=100|Activid...,120.240812,120.981544,112.864082,119.615397,118.425459,103.405632,106.515984,109.689338,128.401767,112.00318,111.892557,90.493663,110.0044,153.099161,116.372445
4,Índice de volumen físico base 2013=100|Activid...,107.535656,110.93097,111.526827,109.019763,109.753304,107.018435,106.859964,106.964871,105.611481,106.613688,100.980324,92.667713,92.684218,97.840866,96.04328


In [65]:
#Extract only row that cointains the principal economic activities
index_vol_act = index_volume.loc[(index_volume['Descriptores'] == 'Índice de volumen físico base 2013=100|Actividades primarias')|
                                (index_volume['Descriptores'] == 'Índice de volumen físico base 2013=100|Actividades secundarias')|
                                (index_volume['Descriptores'] == 'Índice de volumen físico base 2013=100|Actividades terciarias')]              
index_vol_act.head()

Unnamed: 0,Descriptores,2018|T1,2018|T2,2018|T3,2018|T4,2018|Anual,2019|T1,2019|T2,2019|T3,2019|T4,2019|Anual,2020|T1<R>,2020|T2<R>,2020|T3<R>,2020|T4<P>,2020|Anual
1,Índice de volumen físico base 2013=100|Activid...,60.9624,115.786488,110.580173,113.251287,100.145087,66.215335,118.774439,113.766654,114.620901,103.344332,68.024463,121.476781,120.27748,106.84962,104.157086
2,Índice de volumen físico base 2013=100|Activid...,103.673852,105.636262,102.085147,104.836266,104.057882,99.814865,97.246303,102.025838,107.609435,101.67411,98.542354,70.65284,84.817679,98.313456,88.081583
7,Índice de volumen físico base 2013=100|Activid...,111.077067,118.749271,118.277898,126.296844,118.60027,114.54446,117.444459,118.617228,126.918691,119.38121,112.842821,99.899122,105.976779,115.965896,108.671155


**Trimestral index "Índice de volumen físico base 2013=100"**

In [66]:
#Extract only trimestral columns
trim_index_vol_act = index_vol_act.loc[:, ['Descriptores','2018|T1','2018|T2', '2018|T3', '2018|T4', 
                                   '2019|T1', '2019|T2', '2019|T3', '2019|T4',
                                   '2020|T1<R>', '2020|T2<R>', '2020|T3<R>', '2020|T4<P>']]
trim_index_vol_act.head()

Unnamed: 0,Descriptores,2018|T1,2018|T2,2018|T3,2018|T4,2019|T1,2019|T2,2019|T3,2019|T4,2020|T1<R>,2020|T2<R>,2020|T3<R>,2020|T4<P>
1,Índice de volumen físico base 2013=100|Activid...,60.9624,115.786488,110.580173,113.251287,66.215335,118.774439,113.766654,114.620901,68.024463,121.476781,120.27748,106.84962
2,Índice de volumen físico base 2013=100|Activid...,103.673852,105.636262,102.085147,104.836266,99.814865,97.246303,102.025838,107.609435,98.542354,70.65284,84.817679,98.313456
7,Índice de volumen físico base 2013=100|Activid...,111.077067,118.749271,118.277898,126.296844,114.54446,117.444459,118.617228,126.918691,112.842821,99.899122,105.976779,115.965896


**Anual index "Índice de volumen físico base 2013=100"**

In [67]:
#Extract only anual columns
anual_index_vol_act = index_vol_act.loc[:, ['Descriptores','2018|Anual','2019|Anual','2020|Anual']]
anual_index_vol_act.head()

Unnamed: 0,Descriptores,2018|Anual,2019|Anual,2020|Anual
1,Índice de volumen físico base 2013=100|Activid...,100.145087,103.344332,104.157086
2,Índice de volumen físico base 2013=100|Activid...,104.057882,101.67411,88.081583
7,Índice de volumen físico base 2013=100|Activid...,118.60027,119.38121,108.671155


**Second category "Variación acumulada"**

In [69]:
#Extract only rows that cointains'Variación acumulada'
variation_percentage = reduced_itaee.loc[reduced_itaee['Descriptores'].str.contains('Variación acumulada')]
variation_percentage

Unnamed: 0,Descriptores,2018|T1,2018|T2,2018|T3,2018|T4,2018|Anual,2019|T1,2019|T2,2019|T3,2019|T4,2019|Anual,2020|T1<R>,2020|T2<R>,2020|T3<R>,2020|T4<P>,2020|Anual
20,Variación acumulada|Total,-0.436616,1.472273,2.592941,2.702365,2.702365,2.424299,0.26397,0.260994,0.3747,0.3747,-1.464246,-8.84057,-9.652408,-9.380736,-9.380736
21,Variación acumulada|Actividades primarias,-15.149552,-4.364366,-0.842782,4.750912,4.750912,8.61668,4.662483,3.977101,3.19461,3.19461,2.732188,2.438767,3.689392,0.786452,0.786452
22,Variación acumulada|Actividades secundarias,-1.24697,1.057025,0.199356,0.068205,0.068205,-3.722237,-5.852056,-3.952615,-2.290813,-2.290813,-1.274871,-14.140774,-15.070575,-13.368721,-13.368721
23,Variación acumulada|Actividades secundarias|21...,-11.089007,-7.835837,-6.323641,-4.815112,-4.815112,-14.001219,-12.975887,-9.73646,-5.423056,-5.423056,8.207411,-3.589624,-2.259101,3.901019,3.901019
24,Variación acumulada|Actividades secundarias|22...,4.820544,4.623102,4.805195,4.424368,4.424368,-0.480977,-2.100196,-2.772838,-2.860612,-2.860612,-5.642122,-9.458815,-10.756347,-9.914681,-9.914681
25,Variación acumulada|Actividades secundarias|23...,-2.009347,-0.736489,-2.714467,-2.701003,-2.701003,-7.713753,-9.536409,-6.942739,-4.102601,-4.102601,-1.105641,-18.533455,-19.837466,-16.262139,-16.262139
26,Variación acumulada|Actividades secundarias|31...,-1.140821,2.129154,2.006335,1.789877,1.789877,-0.895677,-3.445459,-1.908182,-0.93679,-0.93679,-1.020499,-11.513373,-12.20853,-11.657999,-11.657999
27,Variación acumulada|Actividades terciarias,-0.338849,1.520957,2.862246,2.991345,2.991345,3.12161,0.940963,0.718725,0.658464,0.658464,-1.485571,-8.296507,-9.094942,-8.971307,-8.971307
28,Variación acumulada|Actividades terciarias|43-...,1.530788,2.476234,4.133631,4.094004,4.094004,2.970678,-0.482785,-1.155926,-0.825454,-0.825454,-7.393862,-21.261992,-19.225818,-15.851542,-15.851542
29,Variación acumulada|Actividades terciarias|Resto,-0.720951,1.314968,2.584541,2.742302,2.742302,3.153156,1.251488,1.134382,0.998025,0.998025,-0.252877,-5.517119,-6.89954,-7.425342,-7.425342


In [71]:
#Extract only row that cointains the principal economic activities
index_acumvar_act = variation_percentage.loc[(variation_percentage['Descriptores'] == 'Variación acumulada|Actividades primarias')|
                                (variation_percentage['Descriptores'] == 'Variación acumulada|Actividades secundarias')|
                                (variation_percentage['Descriptores'] == 'Variación acumulada|Actividades terciarias')]              
index_acumvar_act.head()

Unnamed: 0,Descriptores,2018|T1,2018|T2,2018|T3,2018|T4,2018|Anual,2019|T1,2019|T2,2019|T3,2019|T4,2019|Anual,2020|T1<R>,2020|T2<R>,2020|T3<R>,2020|T4<P>,2020|Anual
21,Variación acumulada|Actividades primarias,-15.149552,-4.364366,-0.842782,4.750912,4.750912,8.61668,4.662483,3.977101,3.19461,3.19461,2.732188,2.438767,3.689392,0.786452,0.786452
22,Variación acumulada|Actividades secundarias,-1.24697,1.057025,0.199356,0.068205,0.068205,-3.722237,-5.852056,-3.952615,-2.290813,-2.290813,-1.274871,-14.140774,-15.070575,-13.368721,-13.368721
27,Variación acumulada|Actividades terciarias,-0.338849,1.520957,2.862246,2.991345,2.991345,3.12161,0.940963,0.718725,0.658464,0.658464,-1.485571,-8.296507,-9.094942,-8.971307,-8.971307


**Trimestral variation "Variación acumulada"**

In [72]:
#Extract only trimestral columns
trim_index_acumvar_act = index_acumvar_act.loc[:, ['Descriptores','2018|T1','2018|T2', '2018|T3', '2018|T4', 
                                   '2019|T1', '2019|T2', '2019|T3', '2019|T4',
                                   '2020|T1<R>', '2020|T2<R>', '2020|T3<R>', '2020|T4<P>']]
trim_index_acumvar_act.head()

Unnamed: 0,Descriptores,2018|T1,2018|T2,2018|T3,2018|T4,2019|T1,2019|T2,2019|T3,2019|T4,2020|T1<R>,2020|T2<R>,2020|T3<R>,2020|T4<P>
21,Variación acumulada|Actividades primarias,-15.149552,-4.364366,-0.842782,4.750912,8.61668,4.662483,3.977101,3.19461,2.732188,2.438767,3.689392,0.786452
22,Variación acumulada|Actividades secundarias,-1.24697,1.057025,0.199356,0.068205,-3.722237,-5.852056,-3.952615,-2.290813,-1.274871,-14.140774,-15.070575,-13.368721
27,Variación acumulada|Actividades terciarias,-0.338849,1.520957,2.862246,2.991345,3.12161,0.940963,0.718725,0.658464,-1.485571,-8.296507,-9.094942,-8.971307


**Anual variation "Variación acumulada"**

In [73]:
#Extract only anual columns
anual_index_acumvar_act = index_acumvar_act.loc[:, ['Descriptores','2018|Anual','2019|Anual','2020|Anual']]
anual_index_acumvar_act.head()

Unnamed: 0,Descriptores,2018|Anual,2019|Anual,2020|Anual
21,Variación acumulada|Actividades primarias,4.750912,3.19461,0.786452
22,Variación acumulada|Actividades secundarias,0.068205,-2.290813,-13.368721
27,Variación acumulada|Actividades terciarias,2.991345,0.658464,-8.971307


*As part of this introduction, to determine when to limit economic activity in the capital, the Mexican government set up a system that takes into account 10 measures of hospitalizations, infections and deaths.*

*The levels of risk were labeled according to the colors of Mexican traffic lights: green meant the numbers were low, orange denoted a higher risk and a few restrictions, and red signaled a widespread outbreak that called for a shutdown of all nonessential businesses.*

*The calculation assigns a certain number of points to each indicator, depending on its severity. When the sum of all points total more than 31, the state, or the capital city, gets a red light — and that prompts a lockdown.*