<a><img src="https://miro.medium.com/max/2600/1*UgtAVUCzrnAVEpz-if-0gA.png" width="800" align="center"></a>

<h1 align="center"><font size="5">Data Visualization and Analysis of COVID-19 in Mexico</font></h1>

<p align="justify" style="text-align:justify"><font size="3">The purpose of this notebook is to describe the general panorama of the current situation in Mexico with the most affected states in Mexico and the most hit countries. This can serve to help people take useful social and health measures to slow or stop the spread of COVID-19 at the national or local level.</font></p>

**Keywords:** COVID-19, Data Visualization, Data Analysis

# Introduction

<p align="justify" style="text-align:justify"><font size="3">Since late December 2019, an outbreak of a novel coronavirus disease (COVID-19; previously known as 2019-nCoV) was reported in Wuhan, China, which has subsequently affected 210 countries worldwide. In general, COVID-19 is an acute resolved disease but it can also be deadly, with a 2% case fatality rate [1]. As of August 10, 2020, about 480 k cases have been confirmed, with over 52,298 deaths and 322 k recovered in Mexico.</font></p>


<p align="justify" style="text-align:justify"><font size="3">Severe disease onset might result in death due to massive alveolar damage and progressive respiratory failure. The Covid-19 pandemic is of high risk for patients on chronic hemodialysis due to their immunosuppressed state, advanced age, and the coexistence of significant comorbidities, in particular cardiovascular disease, diabetes mellitus, and others. Also remarkable is that cardiovascular risk factors are highly common among the Mexican population and increasing at alarming rates. Cardiovascular diseases represent the first cause of death in Mexico [2].</font></p>

# Data Processing

<p align="justify" style="text-align:justify"><font size="3">Data processing occurs when data is collected and translated into usable information. Data almost never comes in a form that is ready for us. It is important for data processing to be done correctly as not to negatively affect the data output [2].</font></p>

# Data Collection

<p align="justify" style="text-align:justify"><font size="3">Collecting data is the first step in the data processing. Data is pulled from available sources. It is important that the data sources available are trustworthy and well-built so that the data collected (and later used as information) is of the highest possible quality [2].</font></p>

<p align="justify" style="text-align:justify"><font size="3">Novel Coronavirus (COVID-19) epidemiological data has been around since 22 January, 2020. Data loaded from the John Hopkins CSSE data repository comes from various sources including the World Health Organization (WHO), DXY.cn. The data is updated twice on a daily basis.</font></p>

[John Hopkins CSSE data repository](https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series)

<p align="justify" style="text-align:justify"><font size="3">Fields available in the data include Province/State, Country/Region, Last Update, Confirmed, Suspected, Recovered, and Deaths.</font></p>

<p align="justify" style="text-align:justify"><font size="3">On 23 March, 2020, a new data structure was released. The current resources for the latest time-series data are:</font></p>

- time_series_covid19_confirmed_global.csv
- time_series_covid19_deaths_global.csv
- time_series_covid19_recovered_global.csv

### Import the modules 

In [1]:
import pandas as pd 
import numpy as np 
import seaborn as sns

from ipywidgets import interact, interactive, fixed, interact_manual
from IPython.core.display import display, HTML
import matplotlib.pyplot as plt 
import ipywidgets as widgets
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import altair as alt
print('Modules are imported.')

Modules are imported.


### Import the data 

In [2]:
df_confirmed = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
df_recovered = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')
df_death = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
df_country = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_country.csv')

# Data Preparation

<p align="justify" style="text-align:justify"><font size="3">Once the data is collected, it then enters the data preparation stage. Data preparation, often referred to as “pre-processing,” is the stage at which raw data is cleaned up and organized for the following stage of data processing. During preparation, raw data is diligently checked for any errors. The purpose of this step is to eliminate bad data (redundant, incomplete, or incorrect data).</font></p>

In [3]:
df_death['Country/Region'] = df_death['Country/Region'].replace('Mainland China', 'China')
df_confirmed['Country/Region'] = df_confirmed['Country/Region'].replace('Mainland China', 'China')
df_recovered['Country/Region'] = df_recovered['Country/Region'].replace('Mainland China', 'China')

In [4]:
df_confirmed.columns = map(str.lower, df_confirmed.columns)
df_recovered.columns = map(str.lower, df_recovered.columns)
df_death.columns = map(str.lower, df_death.columns)
df_country.columns = map(str.lower, df_country.columns)

In [5]:
df_confirmed = df_confirmed.rename(columns={'province/state': 'state', 'country/region': 'country'})
df_recovered = df_recovered.rename(columns={'province/state': 'state', 'country/region': 'country'})
df_death = df_death.rename(columns={'province/state': 'state', 'country/region': 'country'})
df_country = df_country.rename(columns={'country_region': 'country'})

# Data Input

<p align="justify" style="text-align:justify"><font size="3">Data input is the first stage in which raw data begins to take the form of usable information.</font></p>

In [6]:
sorted_country = df_country.sort_values('confirmed', ascending=False).head(10)
sorted_recovered = df_country.sort_values('recovered', ascending=False).head(10)

In [7]:
confirmed_total = int(df_country['confirmed'].sum())
deaths_total = int(df_country['deaths'].sum())
recovered_total = int(df_country['recovered'].sum())
active_total = int(df_country['active'].sum())

<p align="justify" style="text-align:justify"><font size="3">As the coronavirus reaches more than 170 countries and the WHO declares nCoV as a global pandemic, the following are lists of the top ten countries affected by COVID-19.</font></p>

### Sort the countries with the most confirmed cases

In [8]:
sorted_confirmed = df_country.sort_values('confirmed', ascending=False).head(10)
sorted_confirmed.head(10)

Unnamed: 0,country,last_update,lat,long_,confirmed,deaths,recovered,active,incident_rate,people_tested,people_hospitalized,mortality_rate,uid,iso3
178,US,2021-05-06 21:20:45,40.0,-100.0,32598405.0,580012.0,,,9894.306848,,,1.779265,840,USA
79,India,2021-05-06 21:20:45,20.593684,78.96288,21077410.0,230168.0,17280844.0,3566398.0,1527.343698,,,1.092013,356,IND
23,Brazil,2021-05-06 21:20:45,-14.235,-51.9253,14930183.0,414399.0,13269684.0,1246100.0,7024.004757,,,2.775579,76,BRA
62,France,2021-05-06 21:20:45,46.2276,2.2137,5789282.0,106011.0,368366.0,5314905.0,8869.266909,,,1.83116,250,FRA
177,Turkey,2021-05-06 21:20:45,38.9637,35.2433,4977982.0,42187.0,4626799.0,308996.0,5902.344165,,,0.847472,792,TUR
142,Russia,2021-05-06 21:20:45,61.524,105.3188,4799872.0,110366.0,4421329.0,268177.0,3289.060034,,,2.299353,643,RUS
182,United Kingdom,2021-05-06 21:20:45,55.0,-3.0,4444257.0,127843.0,14897.0,4301517.0,6546.646935,,,2.876589,826,GBR
85,Italy,2021-05-06 21:20:45,41.8719,12.5674,4082198.0,122263.0,3557133.0,402802.0,6751.694639,,,2.995029,380,ITA
162,Spain,2021-05-06 21:20:45,40.463667,-3.74922,3559222.0,78726.0,150376.0,3330120.0,7612.530252,,,2.211888,724,ESP
66,Germany,2021-05-06 21:20:45,51.165691,10.451526,3491076.0,84239.0,3121130.0,285707.0,4166.760111,,,2.412981,276,DEU


### Data table with the top hit confirmed cases

In [9]:
df = sorted_confirmed.head(10)

In [10]:
data_table = pd.DataFrame()
data_table["Country"] = df['country']
data_table["Confirmed"] = df['confirmed']
data_table["Deaths"] = df['deaths']
data_table["Recovered"] = df['recovered']

In [11]:
#hide_input
(data_table.style.set_properties(**{'text-align': 'right'}).background_gradient(cmap='Reds').hide_index()).set_caption(
    'Top 10 countries most affected by coronavirus')

Country,Confirmed,Deaths,Recovered
US,32598405.0,580012.0,
India,21077410.0,230168.0,17280844.0
Brazil,14930183.0,414399.0,13269684.0
France,5789282.0,106011.0,368366.0
Turkey,4977982.0,42187.0,4626799.0
Russia,4799872.0,110366.0,4421329.0
United Kingdom,4444257.0,127843.0,14897.0
Italy,4082198.0,122263.0,3557133.0
Spain,3559222.0,78726.0,150376.0
Germany,3491076.0,84239.0,3121130.0


# Data Exploration

<p align="justify" style="text-align:justify"><font size="3">The output/interpretation stage is the stage at which data is finally usable to non-data scientists. It is translated, readable, and often in the form of graphs, videos, images, plain text, etc.</font></p>

In [12]:
df_sorted_country = df_country.sort_values('confirmed', ascending= False)

### Bubble chart with the top 10 worst hit countries - confirmed

In [13]:
def covid_bubble_chart(n):
    df = df_sorted_country.head(n)
    fig = px.scatter(df, x="country", y="confirmed", size="confirmed", color="country",
               hover_name="country", size_max=90)
    fig.update_layout(
    title="The top 10 Worst hit countries- Confirmed",
    xaxis_title="Countries",
    yaxis_title="Confirmed Cases",
    width = 700
    )
    fig.show();
    
    #fig.write_image("images/fig1.png")
fig = go.FigureWidget( layout=go.Layout() )
interact(covid_bubble_chart, n=10)

ipywLayout = widgets.Layout(border='solid 2px green')
ipywLayout.display='none'
widgets.VBox([fig], layout=ipywLayout)

interactive(children=(IntSlider(value=10, description='n', max=30, min=-10), Output()), _dom_classes=('widget-…

VBox(children=(FigureWidget({
    'data': [], 'layout': {'template': '...'}
}),), layout=Layout(border='solid …

### Treemap with the top 10 worst hit countries - confirmed

In [14]:
fig = px.treemap(sorted_confirmed, 
                 path=["country"], 
                 values="confirmed", height=700,
                 title='The top 10 worst affected countries - Deaths Cases',
                 color_discrete_sequence = px.colors.qualitative.Prism)

fig.data[0].textinfo = 'label+text+value'
#fig.write_image("images/fig2.png")

fig.show()


### Bar chart with the top 10 worst hit countries - confirmed

In [16]:
fig = px.bar(sorted_confirmed,
    x = "country",
    y = "confirmed",
    title= "The Top 10 worst affected countries - Confirmed", 
    color_discrete_sequence=["orange"], 
    height=600,
    width=1000
)

#fig.write_image("images/fig3.png")
fig.show()

### Sort the countries with the most cases of death

In [15]:
sorted_deaths = df_country.sort_values('deaths', ascending=False).head(10)
sorted_deaths.head(10)

Unnamed: 0,country,last_update,lat,long_,confirmed,deaths,recovered,active,incident_rate,people_tested,people_hospitalized,mortality_rate,uid,iso3
178,US,2021-05-06 21:20:45,40.0,-100.0,32598405.0,580012.0,,,9894.306848,,,1.779265,840,USA
23,Brazil,2021-05-06 21:20:45,-14.235,-51.9253,14930183.0,414399.0,13269684.0,1246100.0,7024.004757,,,2.775579,76,BRA
79,India,2021-05-06 21:20:45,20.593684,78.96288,21077410.0,230168.0,17280844.0,3566398.0,1527.343698,,,1.092013,356,IND
114,Mexico,2021-05-06 21:20:45,23.6345,-102.5528,2355985.0,218007.0,1877347.0,260631.0,1843.60502,,,9.253327,484,MEX
182,United Kingdom,2021-05-06 21:20:45,55.0,-3.0,4444257.0,127843.0,14897.0,4301517.0,6546.646935,,,2.876589,826,GBR
85,Italy,2021-05-06 21:20:45,41.8719,12.5674,4082198.0,122263.0,3557133.0,402802.0,6751.694639,,,2.995029,380,ITA
142,Russia,2021-05-06 21:20:45,61.524,105.3188,4799872.0,110366.0,4421329.0,268177.0,3289.060034,,,2.299353,643,RUS
62,France,2021-05-06 21:20:45,46.2276,2.2137,5789282.0,106011.0,368366.0,5314905.0,8869.266909,,,1.83116,250,FRA
66,Germany,2021-05-06 21:20:45,51.165691,10.451526,3491076.0,84239.0,3121130.0,285707.0,4166.760111,,,2.412981,276,DEU
162,Spain,2021-05-06 21:20:45,40.463667,-3.74922,3559222.0,78726.0,150376.0,3330120.0,7612.530252,,,2.211888,724,ESP


### Bar chart with the top 10 worst hit countries - Death Cases

In [16]:
figura = px.bar(
    sorted_deaths,
    x = "country",
    y = "deaths",
    title= "The top 10 worst hit countries - Death cases", # the axis names
    color_discrete_sequence=["red"], 
    height=600,
    width=1000
)


#figura.write_image("images/fig4.png")
figura.show()

### Sort the countries with the most deaths cases

In [17]:
sorted_recovered = df_country.sort_values('recovered', ascending=False).head(10)
sorted_recovered.head(10)

Unnamed: 0,country,last_update,lat,long_,confirmed,deaths,recovered,active,incident_rate,people_tested,people_hospitalized,mortality_rate,uid,iso3
79,India,2021-05-06 21:20:45,20.593684,78.96288,21077410.0,230168.0,17280844.0,3566398.0,1527.343698,,,1.092013,356,IND
23,Brazil,2021-05-06 21:20:45,-14.235,-51.9253,14930183.0,414399.0,13269684.0,1246100.0,7024.004757,,,2.775579,76,BRA
177,Turkey,2021-05-06 21:20:45,38.9637,35.2433,4977982.0,42187.0,4626799.0,308996.0,5902.344165,,,0.847472,792,TUR
142,Russia,2021-05-06 21:20:45,61.524,105.3188,4799872.0,110366.0,4421329.0,268177.0,3289.060034,,,2.299353,643,RUS
85,Italy,2021-05-06 21:20:45,41.8719,12.5674,4082198.0,122263.0,3557133.0,402802.0,6751.694639,,,2.995029,380,ITA
66,Germany,2021-05-06 21:20:45,51.165691,10.451526,3491076.0,84239.0,3121130.0,285707.0,4166.760111,,,2.412981,276,DEU
37,Colombia,2021-05-06 21:20:45,4.5709,-74.2973,2934611.0,76015.0,2754940.0,103656.0,5767.38339,,,2.590292,170,COL
6,Argentina,2021-05-06 21:20:45,-38.4161,-63.6167,3071496.0,65865.0,2734465.0,271166.0,6795.980076,,,2.144395,32,ARG
138,Poland,2021-05-06 21:20:45,51.9194,19.1451,2818378.0,68993.0,2546751.0,202634.0,7446.844968,,,2.447968,616,POL
81,Iran,2021-05-06 21:20:45,32.427908,53.688046,2610018.0,73906.0,2057692.0,478420.0,3107.424976,,,2.831628,364,IRN


### Bar chart with the top 10 worst hit countries - confirmed

In [18]:
figura = px.bar(
    sorted_recovered,
    x = "country",
    y = "recovered",
    title= "Top 10 recovered countries", # the axis names
    color_discrete_sequence=["green"], 
    height=600,
    width=1000
)

#figura.write_image("images/fig5.png")
figura.show()

# Time Series COVID-19 Mexico

<p align="justify" style="text-align:justify"><font size="3">A time-series data that contains the counts on infected cases, deaths, and recoveries across countries is also given. The time-series data has individual files for each case and needs to be processed before visualization. The country coordinates are also provided for time-series visualization on geoplots such as Choropleth Maps.</font></p>

### Confirmed cases and deaths time series in Mexico

In [19]:
def confirmedDeathsCases(country):
    labels = ['confirmed', 'deaths']
    colors = ['blue', 'red']
    mode_size = [6, 8]
    line_size = [4, 5]
    
    df_list = [df_confirmed, df_death]
    
    
    fig = go.Figure();
    
    for i, df in enumerate(df_list):
        if country == 'World' or country == 'world':
            x_data = np.array(list(df.iloc[:, 20:].columns))
            y_data = np.sum(np.asarray(df.iloc[:,4:]),axis = 0)
            
        else:    
            x_data = np.array(list(df.iloc[:, 20:].columns))
            y_data = np.sum(np.asarray(df[df['country'] == country].iloc[:,20:]),axis = 0)
            
        fig.add_trace(go.Scatter(x=x_data, y=y_data, mode='lines+markers',
        name=labels[i],
        line=dict(color=colors[i], width=line_size[i]),
        connectgaps=True,
        text = "Total " + str(labels[i]) +": "+ str(y_data[-1])
        ));
    
    fig.update_layout(
        title="COVID 19 cases of " + country,
        xaxis_title='Date',
        yaxis_title='No. of Confirmed Cases',
        margin=dict(l=20, r=20, t=40, b=20),
        paper_bgcolor="lightgrey",
        width = 800,
        
    );
    
    fig.update_yaxes(type="linear")
    fig.show();
    
    #fig.write_image("images/fig6.png")
    
    


In [20]:
interact(confirmedDeathsCases, country='Mexico')
ipywLayout = widgets.Layout(border='solid 2px green')

interactive(children=(Text(value='Mexico', description='country'), Output()), _dom_classes=('widget-interact',…

### Confirmed and recovered cases time series in Mexico

In [22]:
def confirmedRecoveredCases(country):
    labels = ['confirmed', 'recovered']
    colors = ['blue', 'green']
    mode_size = [6, 8]
    line_size = [4, 5]
    
    df_list = [df_confirmed, df_death]
    
    
    fig = go.Figure();
    
    for i, df in enumerate(df_list):
        if country == 'World' or country == 'world':
            x_data = np.array(list(df.iloc[:, 20:].columns))
            y_data = np.sum(np.asarray(df.iloc[:,4:]),axis = 0)
            
        else:    
            x_data = np.array(list(df.iloc[:, 20:].columns))
            y_data = np.sum(np.asarray(df[df['country'] == country].iloc[:,20:]),axis = 0)
            
        fig.add_trace(go.Scatter(x=x_data, y=y_data, mode='lines+markers',
        name=labels[i],
        line=dict(color=colors[i], width=line_size[i]),
        connectgaps=True,
        text = "Total " + str(labels[i]) +": "+ str(y_data[-1])
        ));
    
    fig.update_layout(
        title="COVID 19 cases of " + country,
        xaxis_title='Date',
        yaxis_title='No. of Confirmed Cases',
        margin=dict(l=20, r=20, t=40, b=20),
        paper_bgcolor="lightgrey",
        width = 800,
        
    );
    
    fig.update_yaxes(type="linear")
    fig.show();
    
    #fig.write_image("images/fig7.png")
    
    


In [23]:
interact(confirmedRecoveredCases, country='Mexico')
ipywLayout = widgets.Layout(border='solid 2px green')

interactive(children=(Text(value='Mexico', description='country'), Output()), _dom_classes=('widget-interact',…

### Import the data 

In [24]:
deaths_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
confirmed_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
recovered_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')
country_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_country.csv')

# Data Preparation

In [25]:
dates = confirmed_df.columns[4:]

confirmed_df_long = confirmed_df.melt(
    id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'], 
    value_vars=dates, 
    var_name='Date', 
    value_name='Confirmed'
)

deaths_df_long = deaths_df.melt(
    id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'], 
    value_vars=dates, 
    var_name='Date', 
    value_name='Deaths'
)

recovered_df_long = recovered_df.melt(
    id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'], 
    value_vars=dates, 
    var_name='Date', 
    value_name='Recovered'
)

In [26]:
recovered_df_long = recovered_df_long[recovered_df_long['Country/Region']!='Canada']

full_table = confirmed_df_long.merge(
  right=deaths_df_long, 
  how='left',
  on=['Province/State', 'Country/Region', 'Date', 'Lat', 'Long']
)

full_table = full_table.merge(
  right=recovered_df_long, 
  how='left',
  on=['Province/State', 'Country/Region', 'Date', 'Lat', 'Long']
)

In [27]:
full_table['Date'] = pd.to_datetime(full_table['Date'])

In [28]:
full_table['Recovered'] = full_table['Recovered'].fillna(0)

In [29]:
ship_rows = full_table['Province/State'].str.contains('Grand Princess') | full_table['Province/State'].str.contains('Diamond Princess') | full_table['Country/Region'].str.contains('Diamond Princess') | full_table['Country/Region'].str.contains('MS Zaandam')
full_ship = full_table[ship_rows]

In [30]:
full_table = full_table[~(ship_rows)]

In [31]:
full_table['Active'] = full_table['Confirmed'] - full_table['Deaths'] - full_table['Recovered']

In [32]:
full_grouped = full_table.groupby(['Date', 'Country/Region'])['Confirmed', 'Deaths', 'Recovered', 'Active'].sum().reset_index()


Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.



In [33]:
temp = full_grouped.groupby(['Country/Region', 'Date', ])['Confirmed', 'Deaths', 'Recovered']
temp = temp.sum().diff().reset_index()
mask = temp['Country/Region'] != temp['Country/Region'].shift(1)
temp.loc[mask, 'Confirmed'] = np.nan
temp.loc[mask, 'Deaths'] = np.nan
temp.loc[mask, 'Recovered'] = np.nan

temp.columns = ['Country/Region', 'Date', 'New cases', 'New deaths', 'New recovered']

full_grouped = pd.merge(full_grouped, temp, on=['Country/Region', 'Date'])

full_grouped = full_grouped.fillna(0)

cols = ['New cases', 'New deaths', 'New recovered']
full_grouped[cols] = full_grouped[cols].astype('int')

full_grouped['New cases'] = full_grouped['New cases'].apply(lambda x: 0 if x<0 else x)


Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.



In [34]:
full_grouped.to_csv('COVID-19-time-series-clean-complete.csv')
full_clean_data = pd.read_csv('COVID-19-time-series-clean-complete.csv', parse_dates=['Date'])

# Data Exploration

### Select time series of Mexico

In [35]:
countries = ['Mexico']

selected_data = full_clean_data[full_clean_data['Country/Region'].isin(countries)]

### Confirmed cases time series in Mexico

In [36]:
interval = alt.selection_interval()

circle = alt.Chart(selected_data).mark_circle().encode(
    x='monthdate(Date):O',
    y='Country/Region',
    color=alt.condition(interval, 'Country/Region', alt.value('lightgray')),
    size=alt.Size('Confirmed:Q',
        scale=alt.Scale(range=[0, 3000]),
        legend=alt.Legend(title='Confirmed Cases')
    )
    
).properties(
    width=1000,
    height=300,
    selection=interval
)

In [37]:
bars = alt.Chart(selected_data).mark_bar().encode(
    y='Country/Region',
    color='Country/Region',
    x='sum(Confirmed):Q'
).properties(
    width=1000
).transform_filter(
    interval
)

circle

### Confirmed cases time series in Mexico

In [38]:
circle & bars

In [39]:
full_grouped = pd.read_csv('COVID-19-time-series-clean-complete.csv', parse_dates=['Date'])
mexico = full_grouped[full_grouped['Country/Region'] == 'Mexico']

In [40]:
base = alt.Chart(mexico).mark_bar().encode(
    x='monthdate(Date):O',
).properties(
    width=700
)

## colors of the charts
red = alt.value('#f54242')
green = alt.value('#32CD32')

### Total confirmed cases time series in Mexico

In [41]:
base.encode(y='Confirmed').properties(title='Total confirmed')

### Total death cases time series in Mexico

In [42]:
base.encode(y='Deaths',color=red).properties(title='Total Deaths') 

### Total recovered cases time series in Mexico

In [43]:
base.encode(y='Recovered',color=green).properties(title='Total Recovered')

# Data Visualization and Analysis of Covid-19 in Mexico by State

In [44]:
dataset = pd.read_csv('https://raw.githubusercontent.com/DiegoHurtad0/Covid-19-Dataset-Mexico/master/datasetCovid19Mexico.csv')

### Data table of Covid-19 in Mexico

In [45]:
data_mex = pd.DataFrame()
data_mex["State"] = dataset['Ubicación']
data_mex["Confirmed"] = dataset['Confirmados']
data_mex["Recovered"] = dataset['Personas recuperadas']
data_mex["Deaths"] = dataset['Muertes']

### Data table with the confirmed, recovered, and death cases

In [46]:
#hide_input
(data_mex.style.set_properties(**{'text-align': 'right'}).background_gradient(cmap='Reds').hide_index()).set_caption(
    'Confirmed, deaths and recovered cases')

State,Confirmed,Recovered,Deaths
Aguascalientes,21065,6676,1735
Baja California,42237,17030,6820
Baja California Sur,23541,10118,958
Campeche,7980,5421,1008
Chiapas,9350,6229,1358
Chihuahua,40845,9953,4921
Ciudad de México,482472,123000,22601
Coahuila de Zaragoza,60487,25536,5224
Colima,9507,4916,931
Durango,28876,8779,1873


### Sort the states with the most confirmed cases

In [47]:
sorted_confirmed = dataset.sort_values('Confirmados', ascending=False).head(10)
sorted_confirmed.index = np.arange(1,len(sorted_confirmed)+1)
sorted_confirmed.head(10)

Unnamed: 0,Ubicación,Confirmados,Personas recuperadas,Muertes,lon,lat,ciudad
1,Ciudad de México,482472,123000,22601,-99.1269,19.4978,ciudad de méxico
2,Jalisco,192396,25008,24494,-103.39182,20.66682,Guadalajara
3,Estado de Hidalgo,107803,11619,7389,-98.7459,20.1153,Pachuca de Soto
4,Nuevo León,107137,39428,6740,-100.31847,25.67507,Monterrey
5,Guerrero,70362,18256,8571,-99.50578,17.5506,Chilpancingo de los Bravo
6,Sonora,63062,32444,5290,-110.97732,29.1026,Hermosillo
7,Puebla,61988,28432,7118,-97.55506,19.77097,Puebla de Zaragoza
8,Coahuila de Zaragoza,60487,25536,5224,-101.25,25.4333,Saltillo
9,Tabasco,52936,29715,3511,-92.93028,17.98689,Villahermosa
10,Veracruz,51213,30060,7213,-96.91589,19.53124,Xalapa-Enríquez


### Bar chart with the top 10 worst hit states - confirmed

In [48]:
fig = px.bar(sorted_confirmed,
    x = "Ubicación",
    y = "Confirmados",
    title= "The Top 10 worst affected states - Confirmed", # the axis names
    color_discrete_sequence=["orange"], 
    height=600,
    width=1000
)


#fig.write_image("images/fig8.png")
fig.show()

In [49]:
figura = px.bar(sorted_confirmed, 
             x="Confirmados", y="Ubicación", 
                title='The Top 10 worst affected states - Confirmed', 
                text='Confirmados', orientation='h', 
                color_discrete_sequence=["orange"], 
             width=800, height=700, range_x = [0, max(sorted_confirmed['Confirmados'])])

# fig.update_traces(marker_color=dth, opacity=0.6, textposition='outside')

#figura.write_image("images/fig9.png")
figura.show()

### Treemap with the top 10 worst hit states in Mexico - confirmed

In [50]:
figura = px.treemap(sorted_confirmed, 
                 path=["Ubicación"], 
                 values="Confirmados", height=800,
                 title='The Top 10 worst affected states - Confirmeds',
                 color_discrete_sequence = px.colors.qualitative.Prism)
figura.data[0].textinfo = 'label+text+value'

#figura.write_image("images/fig10.png")
figura.show()

### Sort the states with the most cases of death

In [51]:
sorted_death = dataset.sort_values('Muertes', ascending=False).head(10)
sorted_death.index = np.arange(1,len(sorted_death)+1)
sorted_death.head(10)

Unnamed: 0,Ubicación,Confirmados,Personas recuperadas,Muertes,lon,lat,ciudad
1,Jalisco,192396,25008,24494,-103.39182,20.66682,Guadalajara
2,Ciudad de México,482472,123000,22601,-99.1269,19.4978,ciudad de méxico
3,Guerrero,70362,18256,8571,-99.50578,17.5506,Chilpancingo de los Bravo
4,Estado de Hidalgo,107803,11619,7389,-98.7459,20.1153,Pachuca de Soto
5,Veracruz,51213,30060,7213,-96.91589,19.53124,Xalapa-Enríquez
6,Puebla,61988,28432,7118,-97.55506,19.77097,Puebla de Zaragoza
7,Baja California,42237,17030,6820,-115.452263,32.624538,Mexicali
8,Nuevo León,107137,39428,6740,-100.31847,25.67507,Monterrey
9,Sonora,63062,32444,5290,-110.97732,29.1026,Hermosillo
10,Coahuila de Zaragoza,60487,25536,5224,-101.25,25.4333,Saltillo


### Bar chart with the top 10 worst hit states - deaths

In [52]:
figura = px.bar(
    sorted_death,
    x = "Ubicación",
    y = "Muertes",
    title= "the top 10 Worst hit states - Deaths", # the axis names
    color_discrete_sequence=["red"], 
    height=600,
    width=1000
)

#figura.write_image("images/fig11.png")
figura.show()

In [53]:
figura = px.bar(sorted_death, 
             x="Muertes", y="Ubicación", 
                title='the top 10 Worst hit states - Deaths', text='Muertes', orientation='h', 
                color_discrete_sequence=["red"], 
             width=800, height=700, range_x = [0, max(sorted_death['Muertes'])])


#figura.write_image("images/fig12.png")
figura.show()

### Treemap with the top 10 worst hit states in Mexico - deaths

In [54]:
figura = px.treemap(sorted_death, 
                 path=["Ubicación"], 
                 values="Muertes", height=800,
                 title='top 10 Worst hit States in Mexico - Deaths',
                 color_discrete_sequence = px.colors.qualitative.Prism)
figura.data[0].textinfo = 'label+text+value'

#figura.write_image("images/fig13.png")
figura.show()

### Sort the states with the most recovered cases

In [55]:
sorted_recovered = dataset.sort_values('Personas recuperadas', ascending=False).head(10)
sorted_recovered.index = np.arange(1,len(sorted_recovered)+1)
sorted_recovered.head(10)

Unnamed: 0,Ubicación,Confirmados,Personas recuperadas,Muertes,lon,lat,ciudad
1,Ciudad de México,482472,123000,22601,-99.1269,19.4978,ciudad de méxico
2,Estado de México,31599,78038,3126,-99.65324,19.28786,Toluca de Lerdo
3,Guanajuato,31921,39608,4629,-101.258,21.0181,Guanajuato
4,Nuevo León,107137,39428,6740,-100.31847,25.67507,Monterrey
5,Sonora,63062,32444,5290,-110.97732,29.1026,Hermosillo
6,Veracruz,51213,30060,7213,-96.91589,19.53124,Xalapa-Enríquez
7,Tabasco,52936,29715,3511,-92.93028,17.98689,Villahermosa
8,Puebla,61988,28432,7118,-97.55506,19.77097,Puebla de Zaragoza
9,Tamaulipas,48283,27481,3922,-99.14599,23.74174,Ciudad Victoria
10,Coahuila de Zaragoza,60487,25536,5224,-101.25,25.4333,Saltillo


### Bar chart with the top 10 worst hit states - recovered

In [56]:
figura = px.bar(
    sorted_recovered,
    x = "Ubicación",
    y = "Personas recuperadas",
    title= "top 10 States in Mexico with most recovered cases", # the axis names
    color_discrete_sequence=["green"], 
    height=600,
    width=1000
)

#figura.write_image("images/fig14.png")
figura.show()

In [57]:
figura = px.bar(sorted_recovered, 
             x="Personas recuperadas", y="Ubicación", 
                title='top 10 States in Mexico with most recovered cases', text='Personas recuperadas', orientation='h', 
                color_discrete_sequence=["green"], 
             width=800, height=700, range_x = [0, max(sorted_recovered['Personas recuperadas'])])

# fig.update_traces(marker_color=dth, opacity=0.6, textposition='outside')

#figura.write_image("images/fig15.png")
figura.show()

### Treemap with the top 10 states in Mexico with the most recovered cases

In [58]:
figura = px.treemap(sorted_recovered, 
                 path=["Ubicación"], 
                 values="Personas recuperadas", height=800,
                 title='Top 10 States in Mexico with most recovered cases',
                 color_discrete_sequence = px.colors.qualitative.Prism)
figura.data[0].textinfo = 'label+text+value'

#figura.write_image("images/fig16.png")
figura.show()

# Time Series of Covid-19 in Mexico by State

### Import the data

In [59]:
df_Defunciones = pd.read_csv('https://raw.githubusercontent.com/DiegoHurtad0/Covid-19-Dataset-Mexico/master/time_series_covid19_deaths_Mexico.csv')
df_confirmados = pd.read_csv('https://raw.githubusercontent.com/DiegoHurtad0/Covid-19-Dataset-Mexico/master/time_series_covid19_confirmed_Mexico.csv')
poblacion = df_confirmados[df_confirmados.columns[1]]

In [60]:
df_confirmados = df_confirmados.iloc[:32]
poblacion = poblacion[:-1]

### Total confirmed cases time series in Mexico by state

In [61]:
total_confirmed = df_confirmados.values.tolist()
#time series for each state
lista_mexico = []
lista_confirmed = []
for i in range(len(df_confirmados)):
    lista = total_confirmed[i]
    del lista[:3]
    #time series for each state
    lista_mexico.append(lista)
    #sum of each state
    lista_confirmed.append(sum(lista_mexico[i]))

### Total cases of death time series in Mexico by state

In [62]:
#dataframe to list
total_Defunciones = df_Defunciones.values.tolist()
#time series for each state
lista_deathstates = []
lista_Defunciones = []
for i in range(len(total_Defunciones)):
    lista = total_Defunciones[i]
    del lista[:3]
    #time series for each state
    lista_deathstates.append(lista)
    #sum of each state
    lista_Defunciones.append(sum(lista_deathstates[i]))

### Delete Mexico total cases

In [63]:
lista_Defunciones = lista_Defunciones[:-1]

### Data table with the confirmed, recovered, and death cases by state in Mexico

In [64]:
data_table = pd.DataFrame()
data_table["Region"] = df_confirmados[df_confirmados.columns[2]]
data_table["Confirmed cases"] = lista_confirmed
data_table["Confirmed cases per 100,000 people"] = np.round(100000*(data_table["Confirmed cases"] / poblacion), decimals=1).values
data_table["Confirmed deaths"] = lista_Defunciones
data_table["Confirmed deaths per 100.000 people"] = \
np.round(100000*data_table["Confirmed deaths"]/(poblacion), decimals=1)

### Plot table

In [65]:
#hide_input
(data_table.style.set_properties(**{'text-align': 'right'}).background_gradient(cmap='Reds').hide_index()).set_caption(
    'Statistics by region: Confirmed cases and confirmed deaths')

Region,Confirmed cases,"Confirmed cases per 100,000 people",Confirmed deaths,Confirmed deaths per 100.000 people
AGUASCALIENTES,22046,1536.7,1858,129.5
BAJA CALIFORNIA,43466,1195.8,7095,195.2
BAJA CALIFORNIA SUR,24847,3087.7,1091,135.6
CAMPECHE,8177,817.2,1028,102.7
CHIAPAS,9689,169.1,1389,24.2
CHIHUAHUA,41758,1098.5,5029,132.3
DISTRITO FEDERAL,514415,5703.9,24563,272.4
COAHUILA,62688,1947.6,5451,169.4
COLIMA,9990,1272.4,993,126.5
DURANGO,29852,1597.2,1971,105.5


In [66]:
df_confirmados = pd.read_csv('https://raw.githubusercontent.com/DiegoHurtad0/Covid-19-Dataset-Mexico/master/time_series_covid19_confirmed_Mexico.csv')

In [67]:
states = df_confirmados[df_confirmados.columns[2]]

In [69]:
df_confirmados.head(5)
df_confirmados = df_confirmados.iloc[:31]

### Delete useless columns 

In [70]:
del df_confirmados['cve_ent']
del df_confirmados['poblacion']

In [71]:
df_confirmados.reset_index(drop=True, inplace=True)

In [72]:
df_confirmados.head(5)
df_confirmados.to_csv('df_confirmados.csv', index=False)

In [73]:
states = df_confirmados[df_confirmados.columns[1]]

In [74]:
df_confirmados = pd.read_csv('df_confirmados.csv', index_col=0)

In [75]:
data = pd.DataFrame()
i = 0
for date in df_confirmados.keys():
    for n, nombre in enumerate(df_confirmados.index):
        data[i] = date,nombre,df_confirmados[date].loc[nombre], n
        i += 1

data.head(4)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,6407,6408,6409,6410,6411,6412,6413,6414,6415,6416
0,12-01-2020,12-01-2020,12-01-2020,12-01-2020,12-01-2020,12-01-2020,12-01-2020,12-01-2020,12-01-2020,12-01-2020,...,05-08-2020,05-08-2020,05-08-2020,05-08-2020,05-08-2020,05-08-2020,05-08-2020,05-08-2020,05-08-2020,05-08-2020
1,AGUASCALIENTES,BAJA CALIFORNIA,BAJA CALIFORNIA SUR,CAMPECHE,CHIAPAS,CHIHUAHUA,DISTRITO FEDERAL,COAHUILA,COLIMA,DURANGO,...,QUERETARO,QUINTANA ROO,SAN LUIS POTOSI,SINALOA,SONORA,TABASCO,TAMAULIPAS,TLAXCALA,VERACRUZ,YUCATAN
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,5,0,0,0
3,0,1,2,3,4,5,6,7,8,9,...,21,22,23,24,25,26,27,28,29,30


In [76]:
#hide
df_confirmados = df_confirmados.reset_index()
regiones = df_confirmados['nombre'].values
data = data.T
data = data.rename(columns={0: "date", 1: "region", 2: "casos", 3: "codigo region"})

In [77]:
#disable max rows of the chart
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

In [78]:
#hide
data["casos"] = data["casos"].astype(int)

## Chart with the total confirmed cases by state

In [79]:
#hide_input
input_dropdown = alt.binding_select(options=data['region'].unique())
selection1 = alt.selection_single(fields=['region'], bind=input_dropdown, name=' ')
selection2 = alt.selection_multi(fields=['region'], on='mouseover')
color = alt.condition(selection1 | selection2,
                    alt.Color('region:N', scale=alt.Scale(scheme='tableau20'), legend=None),
                    alt.value('lightgray'))

chart = alt.Chart(data).mark_bar().encode(
    x=alt.X('date:O', axis=alt.Axis(title='Date')),
    y=alt.Y('casos', axis=alt.Axis(title='Confirmed cases')),
    color=color,
    tooltip=['date', 'region', 'casos'],
    order=alt.Order(
    
    'codigo region',
    sort='descending'
    )
    
).properties(
    title='COVID-19 in Mexico: Total confirmed cases by State'
).add_selection(
    selection1, selection2
).transform_filter(
    selection1
)

legend = alt.Chart(data).mark_point().encode(
    y=alt.Y('region:N', axis=alt.Axis(orient='right'), sort=regiones),
    color=color
).add_selection(
    selection1, selection2
)

chart.properties(width=900, height=700) | legend

### Import time series of deaths cases

In [80]:
link = ''
df_Defunciones = pd.read_csv('https://raw.githubusercontent.com/DiegoHurtad0/Covid-19-Dataset-Mexico/master/time_series_covid19_deaths_Mexico.csv')

In [81]:
df_Defunciones = df_Defunciones.iloc[:31]

In [82]:
del df_Defunciones['cve_ent']
del df_Defunciones['poblacion']
df_Defunciones.head(5)

Unnamed: 0,nombre,17-03-2020,18-03-2020,19-03-2020,20-03-2020,21-03-2020,22-03-2020,23-03-2020,24-03-2020,25-03-2020,...,27-07-2020,28-07-2020,29-07-2020,30-07-2020,31-07-2020,01-08-2020,02-08-2020,03-08-2020,04-08-2020,05-08-2020
0,AGUASCALIENTES,0,0,0,0,0,0,0,0,0,...,2,3,2,1,2,0,2,1,1,0
1,BAJA CALIFORNIA,0,0,0,0,0,0,0,0,0,...,13,15,12,21,14,8,13,14,5,0
2,BAJA CALIFORNIA SUR,0,0,0,0,0,0,0,0,0,...,5,6,8,7,2,0,1,2,0,0
3,CAMPECHE,0,0,0,0,0,0,0,0,0,...,5,5,11,3,7,8,3,2,0,0
4,CHIAPAS,0,0,0,0,0,0,0,0,0,...,6,2,6,4,6,6,4,7,3,1


In [83]:
del df_Defunciones['17-03-2020']
df_Defunciones.head(5)

Unnamed: 0,nombre,18-03-2020,19-03-2020,20-03-2020,21-03-2020,22-03-2020,23-03-2020,24-03-2020,25-03-2020,26-03-2020,...,27-07-2020,28-07-2020,29-07-2020,30-07-2020,31-07-2020,01-08-2020,02-08-2020,03-08-2020,04-08-2020,05-08-2020
0,AGUASCALIENTES,0,0,0,0,0,0,0,0,0,...,2,3,2,1,2,0,2,1,1,0
1,BAJA CALIFORNIA,0,0,0,0,0,0,0,0,0,...,13,15,12,21,14,8,13,14,5,0
2,BAJA CALIFORNIA SUR,0,0,0,0,0,0,0,0,0,...,5,6,8,7,2,0,1,2,0,0
3,CAMPECHE,0,0,0,0,0,0,0,0,0,...,5,5,11,3,7,8,3,2,0,0
4,CHIAPAS,0,0,0,0,0,0,0,0,0,...,6,2,6,4,6,6,4,7,3,1


In [84]:
df_Defunciones.to_csv('df_Defunciones.csv', index=False)

In [85]:
df_Defunciones = pd.read_csv('df_Defunciones.csv', index_col=0)

In [86]:
df_Defunciones.head(5)

Unnamed: 0_level_0,18-03-2020,19-03-2020,20-03-2020,21-03-2020,22-03-2020,23-03-2020,24-03-2020,25-03-2020,26-03-2020,27-03-2020,...,27-07-2020,28-07-2020,29-07-2020,30-07-2020,31-07-2020,01-08-2020,02-08-2020,03-08-2020,04-08-2020,05-08-2020
nombre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AGUASCALIENTES,0,0,0,0,0,0,0,0,0,0,...,2,3,2,1,2,0,2,1,1,0
BAJA CALIFORNIA,0,0,0,0,0,0,0,0,0,0,...,13,15,12,21,14,8,13,14,5,0
BAJA CALIFORNIA SUR,0,0,0,0,0,0,0,0,0,0,...,5,6,8,7,2,0,1,2,0,0
CAMPECHE,0,0,0,0,0,0,0,0,0,0,...,5,5,11,3,7,8,3,2,0,0
CHIAPAS,0,0,0,0,0,0,0,0,0,0,...,6,2,6,4,6,6,4,7,3,1


In [87]:
#hide
new_data = pd.DataFrame()
i = 0
for date in df_Defunciones.keys():
    for n, nombre in enumerate(df_Defunciones.index):
        new_data[i] = date, nombre, df_Defunciones[date].loc[nombre], n
        i += 1

In [88]:
#hide
data = df_Defunciones.reset_index()
regiones = data['nombre'].values
new_data = new_data.T
new_data = new_data.rename(columns={0: "date", 1: "region", 2: "fallecidos", 3: "codigo region"})

In [89]:
#hide
new_data["fallecidos"] = new_data["fallecidos"].astype(int)

## Chart with the total cases of death by state

In [90]:
#hide_input

input_dropdown = alt.binding_select(options=new_data['region'].unique())
selection1 = alt.selection_single(fields=['region'], bind=input_dropdown, name=' ')
selection2 = alt.selection_multi(fields=['region'], on='mouseover')

color = alt.condition(selection1 | selection2,
                    alt.Color('region:N', scale=alt.Scale(scheme='tableau20'), legend=None),
                    alt.value('lightgray'))

bars = alt.Chart(new_data).mark_bar().encode(
    x=alt.X('date:O', axis=alt.Axis(title='Date')),
    y=alt.Y('fallecidos', axis=alt.Axis(title='Confirmed deaths')),
    color=color,
    tooltip=['date', 'region', 'fallecidos'],
    order=alt.Order(
    # Sort the segments of the bars by this field
    'codigo region',
    sort='descending'
    )
    ).properties(
    title='COVID-19 in Mexico: Total confirmed deaths by states'
).add_selection(
    selection1, selection2
).transform_filter(
    selection1
)

legend = alt.Chart(new_data).mark_point().encode(
    y=alt.Y('region:N', axis=alt.Axis(orient='right'), sort=regiones),
    color=color
).add_selection(
    selection1, selection2
)

bars.properties(width=800, height=600) | legend

In [91]:
display(HTML("<div style = 'background-color: #504e4e; padding: 30px '>" +
             "<span style='color: #FF8000; font-size:30px;'></span>" +
             "<span style='color: #FF0000; font-size:30px;margin-left:20px;'></span>"+
             "<span style='color: #66CC00; font-size:30px; margin-left:20px;'></span>"+
             "</div>")
       )