# Covid Vaccination Progress - Mexico Specialization
 
# Introduction 

(For better visualization watch the notebook on a desktop computer)

The last year was a particularly difficult one for many people, as with the global pandemic a lot of things were lost. 2021 looks to be a special year, as with covid vaccines the spread should start to lower down globally. It is a very hopeful thought for many people, including me.

This was a project I developed in the previous weeks as part of a certificate that I desired to get. When I encountered this dataset I knew, that I desired to do an EDA (exploratory data analysis) for it, as I found that it was very useful information, that many people would want to know in a comfortable and comprehensible fashion. 
This was the primary motivation of this project, as well as, being able to analyze and comprehend how the world is going and getting along with the global vaccination.

Particularly, I intended to analyze this data for my home country, Mexico. This, for us Mexicans, to understand how our government is achieving this global pandemic, and how can we compare to neighboring nations.

# Technologies/Libraries Used

For this project I used mainly the following resources:
1. Numpy
2. Pandas
3. Plotly
Numpy mainly gives numeric facility and speed in processing, while pandas is a very useful tool for datasets processing and management. Finally, Plotly is an incredible tool for data visualization. 

Everything in this notebook was programmed in Python

In [None]:
!pip install cufflinks
!pip install plotly --upgrade
!pip install chart_studio
!pip install opendatasets

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import os
import cufflinks as cf
import chart_studio.plotly as py
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime
%matplotlib inline

from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot 
init_notebook_mode(connected = True)
cf.go_offline()

In [None]:
df_covid = pd.read_csv("../input/covid-world-vaccination-progress/country_vaccinations.csv")

In [None]:
df_population = pd.read_csv("../input/population-by-country-2020/population_by_country_2020.csv")
df_population = df_population[["Country (or dependency)","Population (2020)"]]
strtmp = "Czech Republic (Czechia)"
df_population['Country (or dependency)'] = df_population['Country (or dependency)'].replace(['Czech Republic (Czechia)'],'Czechia')

countries = df_covid["country"].unique()

countriesPop = df_population["Country (or dependency)"].unique()
countriesPop.sort()

temp = list(set(countriesPop)-(set(countries)))
df_population = df_population.groupby(["Country (or dependency)"])[["Population (2020)"]].max();
df_population.drop(labels=temp ,inplace=True)

In [None]:
df_covid["date"] = pd.to_datetime(df_covid.date)
dates = df_covid["date"].unique()
countries = df_covid["country"].unique()

In [None]:
min_date = min(dates)
max_date = max(dates)

In [None]:
def enhanceData(str1,current_data):
    dictionary_list = []
    dictionary_data = dict()
    for date in dates:
        date = pd.to_datetime(date)
        
        for country in countries:
            dictionary_data = {'date':date,
                                'country': country}
            if (date,country) not in current_data:
                
                if date == min_date:
                    dictionary_data[str1]= 0.0
                else:
                    dictionary_data[str1]= np.nan
            else:
                if date == min_date and current_data.get((date,country)) == np.nan:
                    dictionary_data[str1]= 0.0
                else:
                    dictionary_data[str1]= current_data.get((date,country))
                
            dictionary_list.append(dictionary_data)
    
    df_final = pd.DataFrame.from_dict(dictionary_list)
    return df_final

In [None]:
currentData = dict(zip(zip(df_covid.date, df_covid.country), df_covid.total_vaccinations))
df_totalVaccinations = enhanceData('total_vaccinations',currentData)
currentData = dict(zip(zip(df_covid.date, df_covid.country), df_covid.daily_vaccinations))
df_dailyVaccinations = enhanceData('daily_vaccinations',currentData)

In [None]:
df_totalVaccinations = df_totalVaccinations.sort_values(["country","date"] , ascending=[True, True])
df_dailyVaccinations = df_dailyVaccinations.sort_values(["country","date"] , ascending=[True, True])

# Data Visualization

In the next part of the notebook, we will be seeing all of our data visualization as well as some explanation of the data we are seeing. For each plot, you can zoom in by selecting a part of the graph. As well hover by to display specific data.

## Total Vaccinations in the World 

In the following chart, we can remark how the total vaccinations in the world have changed with time. In the cell below, it will display the current amount of vaccinations that have been made.

In [None]:
df_totalVaccinations = df_totalVaccinations.reset_index(drop=True)
df_totalVaccinations["total_vaccinations"] = df_totalVaccinations["total_vaccinations"].replace(to_replace=np.nan,method='ffill')
df_groupedVaccinations = df_totalVaccinations.groupby("date")[["total_vaccinations"]].sum()

In [None]:
currentVaccinesTotal = df_groupedVaccinations.loc[max_date]
print(f'{int(currentVaccinesTotal):,}', 'of vaccines in the whole world')

As we can see by the number, it is still rather low, given the number of people in the world

In [None]:
px.line(df_groupedVaccinations, y = "total_vaccinations", labels= {"x": "Date" , "y": "Total Vaccinations"})
fig = go.Figure()
fig.add_trace(go.Scatter( x = df_groupedVaccinations.index, y = df_groupedVaccinations.total_vaccinations,
                        mode="lines+markers",
                        line = dict(color = "LightSkyBlue")))
fig.update_layout(title='Total Vaccinations in the World',
                 xaxis_title = 'Date', yaxis_title='Number of vaccinations')

## Animation of Vaccines Progression

In the following figure, we can observe how vaccines have progressed in various countries around the world.
Some things to look at in the following animation are:

* Darker means more vaccines have been applied in that country
* Not all countries appear colored at the end of the animation as not all countries have started their vaccinations or they don't have that data publicly available
* The mathematical operation log2 was applied to this particular data animation for better visualization, as the current difference between top vaccinated countries like the United States and low vaccinated countries is too big, the animation would almost remain unchanged.

**You can also move freely between the dates and hover on each country to examine the individual information on that current date. **

In [None]:
np.seterr(divide = 'ignore')
df_totalVaccinations["log"] = np.log2(df_totalVaccinations["total_vaccinations"])
df_totalVaccinations["number_vaccines"] = pd.to_numeric(df_totalVaccinations["total_vaccinations"], downcast = 'integer')
df_totalVaccinations["number_vaccines"] = df_totalVaccinations["number_vaccines"].apply(lambda x: f'{x:,}')
date_string = df_totalVaccinations.date.astype(str)

In [None]:
fig = px.choropleth(df_totalVaccinations,locations='country',
                    locationmode= 'country names',
                    hover_name='country',
                    hover_data=['number_vaccines'],
                    color='log',
                    color_continuous_scale=px.colors.sequential.Darkmint,
                    animation_frame=date_string,
                    projection="natural earth")

fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 100
fig.layout.updatemenus[0].buttons[0].args[1]['transition']['duration'] =15
fig.update_layout(coloraxis_showscale=False)
fig.update_geos(visible=True, resolution=110)
fig.show()

## Total Vaccinations per Country

In the following figure, we can observe the full number of vaccinations per country. For simplicity and to better appreciate the graph, only the top 30 countries are shown.

We need to differentiate this figure from the next ones, as it can be confusing. This figure is the total amount of vaccine shots that each country has given. This does not mean that is the full amount of people vaccinated, as most of the current vaccines require two shots of the same vaccine. As well, people may also only have one dose applied to them, waiting for the second one.

As in the previous figures and also in the next ones the darker the color the bigger the number of vaccinations that have been applied.

In [None]:
df_totalByCountry = df_totalVaccinations.query("date == @max_date")
df_totalByCountry = df_totalByCountry.sort_values('total_vaccinations',ascending=False)
df_totalByCountry = df_totalByCountry.head(25)
topVaccinatedCountries = df_totalByCountry['country'].unique()

In [None]:
fig = px.bar(df_totalByCountry, x='country', y='total_vaccinations', text= 'total_vaccinations',
            color = 'total_vaccinations',
            color_continuous_scale=px.colors.sequential.Blugrn)
fig.update_traces(texttemplate='%{text:.2s}',textposition='outside')
fig.update_layout(uniformtext_minsize=5)
fig.update_layout(xaxis_tickangle=45)
fig.update_layout(coloraxis_showscale=False)
fig.update_layout(title='Total Vaccinations per Country',
                 xaxis_title = 'Country', yaxis_title='Number of vaccinations')

## People Vaccinated at least once

The next figures present the number of people that have received at least one shot of a vaccine. As we can observe it is a lower number of people vaccinated per country as the number of vaccines applied, for the same reason mentioned earlier, that many vaccines require two shots.

In [None]:
df_peopleVaccinated = df_covid.groupby(["country"])[["people_vaccinated","people_fully_vaccinated"]].max()
df_peopleVaccinated = df_peopleVaccinated.sort_values("people_vaccinated",ascending = False)
df_peopleVaccinated = df_peopleVaccinated.drop(index='England')
df_peopleVaccinatedOnce = df_peopleVaccinated.head(25)

In [None]:
fig = px.bar(df_peopleVaccinatedOnce, x = df_peopleVaccinatedOnce.index, y='people_vaccinated' , text='people_vaccinated',
             color = 'people_vaccinated',
            color_continuous_scale=px.colors.sequential.Redor)
fig.update_traces(texttemplate='%{text:.2s}',textposition='outside')
fig.update_layout(uniformtext_minsize=5)
fig.update_layout(xaxis_tickangle=45)
fig.update_layout(margin=dict(autoexpand=True))
fig.update_layout(title="People Vaccinated atlease once")
fig.update_layout(xaxis_title = 'Country',yaxis_title='Number of People Vaccinated')
fig.update_layout(coloraxis_showscale=False)
fig.show()

## People Fully Vaccinated

Here, we are presenting the number of people that are fully vaccinated, this implies that they have received the total doses for the vaccine they were applied to.
It is interesting to observe that the number of people fully vaccinated reduces by a lot compared to the number of people vaccinated with at least one dose, this would mean that many people have received one dose, but are still waiting for the second one.

In [None]:
df_peopleVaccinated = df_peopleVaccinated.sort_values("people_fully_vaccinated",ascending = False)
df_peopleFullyVaccinated = df_peopleVaccinated.head(25)

In [None]:
fig = px.bar(df_peopleFullyVaccinated, x = df_peopleFullyVaccinated.index, y='people_fully_vaccinated' , 
             text='people_fully_vaccinated',
             color = 'people_fully_vaccinated',
             color_continuous_scale=px.colors.sequential.Purp)
fig.update_traces(texttemplate='%{text:.2s}',textposition='outside')
fig.update_layout(uniformtext_minsize=5)
fig.update_layout(xaxis_tickangle=45)
fig.update_layout(margin=dict(autoexpand=True))
fig.update_layout(title="People Fully Vaccinated")
fig.update_layout(xaxis_title = 'Country',yaxis_title='Fully Vaccinated people')
fig.update_layout(coloraxis_showscale=False)
fig.show()

## Percentage of Population Vaccinated

The next figure is very interesting, as it presents the percentage of people fully vaccinated by country. It only displays the top 30 countries.
One interesting fact is that in the last two graphs the United States was on top of the list by a lot, but as its population is quite large, the percentage of their population that are fully vaccinated is still quite low.

We can see that Israel is leading the charts with an impressive percentage

In [None]:
frames = [df_peopleVaccinated,df_population]
df_peopleFullyVaccinated2 = pd.concat(frames, axis=1)


df_peopleFullyVaccinated2["percentage"] = df_peopleFullyVaccinated2["people_fully_vaccinated"]/df_peopleFullyVaccinated2["Population (2020)"]*100.0
df_peopleFullyVaccinated2.sort_values(["percentage"],ascending = False, inplace = True)
df_peopleFullyVaccinated3 = df_peopleFullyVaccinated2.head(25)

In [None]:
fig = px.bar(df_peopleFullyVaccinated3, x=df_peopleFullyVaccinated3.index, y='percentage', text= 'percentage',
             color = 'percentage',
             color_continuous_scale=px.colors.sequential.Teal)
fig.update_traces(texttemplate='%{text:.1f}%',textposition='outside')
fig.update_layout(uniformtext_minsize=5)
fig.update_layout(xaxis_tickangle=30)
fig.update_layout(margin=dict(autoexpand=True))
fig.update_layout(coloraxis_showscale=False)
fig.update_layout(title="Percentage of total Person in the country Vaccinated")
fig.update_layout(xaxis_title = 'Country',yaxis_title='Percentage')

## Scheme Vaccinations
   
The next figure is a little strange but interesting. It presents the variations of schemes that every country uses. For example, if Mexico mainly uses vaccines Oxford/AstraZeneca, Pfizer/BioNTech, Sputnik V, that itself would be a vaccine scheme, so if another country uses the same three vaccines, would enter the same scheme.

Therefore, the graph presents the number of vaccines that have been applied by each vaccination scheme.

In [None]:
df_vaccineTypes = df_covid.groupby(["country"])[["total_vaccinations" , "vaccines"]].max()
df_totalVaccineTypes = df_vaccineTypes.groupby(["vaccines"])[["total_vaccinations"]].sum()
df_totalVaccineTypes = df_totalVaccineTypes.sort_values(["total_vaccinations"],ascending=False)

In [None]:
fig = px.bar(df_totalVaccineTypes, x=df_totalVaccineTypes.index, y='total_vaccinations', text= 'total_vaccinations',
            color = 'total_vaccinations',
            color_continuous_scale=px.colors.sequential.Burg)
fig.update_traces(texttemplate='%{text:.2s}',textposition='outside')
fig.update_layout(uniformtext_minsize=5)
fig.update_layout(xaxis_tickangle=70)
fig.update_layout(margin=dict(autoexpand=True))
fig.update_layout(coloraxis_showscale=False)
fig.update_layout(title="Scheme Vaccinations")
fig.update_layout(xaxis_title = 'Vaccine Scheme',yaxis_title='Total Vaccinations')

# Mexico vs the World 

Now we are going to analyze in more detail how Mexico is doing with the vaccine application comparing it to other countries and itself. We Mexicans need to keep track of how our government is doing.

In [None]:
def formatCountry(df,country):
    df_country = df[df["country"] == country]
    df_country = df_country.reset_index(drop=True)
    df_country.loc[0,["people_fully_vaccinated","daily_vaccinations_raw","daily_vaccinations"]] = 0.0
    columns = ["total_vaccinations","people_vaccinated","people_fully_vaccinated","daily_vaccinations_raw" , "daily_vaccinations" ]
    df_country[columns] = df_country[columns].ffill()
    return df_country

In [None]:
df_mexico = formatCountry(df_covid,"Mexico")

In [None]:
df_dailyVaccinations["order"] = df_dailyVaccinations["country"].map({"Mexico": 1,"United States":2}).fillna(3)
df_dailyVaccinations.sort_values(["order","country"],ascending=False, inplace=True)
df_dailyVaccinations["daily_vaccinations"] = df_dailyVaccinations["daily_vaccinations"].replace(np.nan,0)

In [None]:
def plotOneCountry(df,country, latam = False):
    if country not in topVaccinatedCountries:
        topVaccinatedCountries.append(country)
    if latam == False:
        df = df.loc[df['country'].isin(topVaccinatedCountries)]
    df["order"] = df["country"].map({country: 1,"United States":2}).fillna(3)
    df.sort_values(["order","country"],ascending=False, inplace=True)
    fig = px.line_3d(df, x = 'date', y = 'country', z= 'daily_vaccinations',
                    color='country',hover_name ='country', hover_data=["daily_vaccinations"])
    if latam == False:
        fig.update_traces(line = dict(color='lightgrey'))
    fig.update_traces(patch={"line":{"color":"red", "width":3.5}}, 
                      selector={"legendgroup":country})
    fig.update_traces(patch={"line":{"color":"blue", "width":3.5}}, 
                      selector={"legendgroup":"United States"})
    fig.update_layout(title="Daily Vaccinations {} vs The World".format(country),
                     xaxis_title = 'Date',yaxis_title='Number of Daily Vaccinations')
    if latam == True:
        fig.update_layout(title="Daily Vaccinations {} vs Latam".format(country),
                          xaxis_title = 'Date',yaxis_title='Number of Daily Vaccinations')
    fig.update_layout(
    width=800,
    height=700,
    autosize=False,
    scene=dict(
        camera=dict(
            up=dict(
                x=0,
                y=0,
                z=1
            ),
            eye=dict(
                x=-1,
                y=-1.5,
                z=1,
            )
        ),
        aspectratio = dict( x=1, y=1, z=0.7 ),
        
    ),
)
    fig.update_layout(showlegend=False)                                                                           
    fig.show()

In [None]:
def plotDaily(df,country):
    fig = px.bar(df, x ='date' , y='daily_vaccinations',
             title='Daily Vaccinations in {}'.format(country),
             text='daily_vaccinations',
             hover_name = 'country', hover_data = ['daily_vaccinations'],
             color = 'daily_vaccinations',
             color_continuous_scale=px.colors.sequential.Teal)
    fig.update_traces(texttemplate='%{text:.2s}',textposition='outside')
    fig.update_layout(uniformtext_minsize=5)
    fig.update_layout(coloraxis_showscale=False)
    fig.update_layout(hovermode='x')
    fig.update_layout(xaxis_title = 'Country',yaxis_title='Number of Daily Vaccinations')
    fig.show()

## Daily Vaccinations 

The next figure presents us the daily vaccinations per country. We represent the United States as the blue line for reference, while Mexico is the red line. Some points need to be considered, one of them is that sometimes a country goes one day to 0 vaccines applied, that could be either if in that day the country did not report how many vaccines they applied, or they applied none.

The figure is a 3D line graph and shows the top 25 countries in total vaccinations. It is easy to move around and check each country and its daily vaccinations. The x-axis represents the date, the y-axis represents the country, and the z-axis the daily amount of vaccinations.

In [None]:
plotOneCountry(df_dailyVaccinations,"Mexico")

## Daily Vaccinations Mexico

In the figure below we can appreciate the daily vaccinations for Mexico alone. As we can see at the beginning it started with about 50k vaccines, but then it stopped from around Jan 29 until Feb 16, where it spiked again. While it may seem that a lot of vaccines are applied, the percentage of people vaccinated in Mexico is very low. You can observe in the graphs above, that is not even in the top 30 countries. 

In the next couple of graphs, we will see how Mexico is comparing against other countries in Latinamerica

In [None]:
plotDaily(df_mexico,"Mexico")

## Daily Vaccinations against LATAM

As we can see in the 3d plot below, Mexico is around third place, behind Brazil and Chile respectively. Various countries in Latinoamerica have or no date, or have not started their vaccination progress, this could tell us about the irregular distribution of vaccines around the world, especially to countries with a low GDP. 

In [None]:
latamCountries = ['Argentina','Barbados','Bolivia','Brazil','Chile','Colombia','Costa Rica','Dominican Republic', 'Ecuador','El Salvador','Guatemala', 'Guyana' , 'Mexico','Panama', 'Paraguay', 'Peru' , 'Uruguay','Venezuela']

In [None]:
df_latam = df_dailyVaccinations.loc[df_dailyVaccinations['country'].isin(latamCountries)]
plotOneCountry(df_latam,"Mexico", True)

# Percentage of fully Vaccinations LATAM

The graph below is astonishing, as only Chile and Uruguay has surpassed 10% of its populations fully vaccinated. It is shocking. Even most of the Latam countries have below 5% in fully vaccinated persons.

Could this show the inequality of the vaccine's distribution?

In [None]:
df_fullyLatam = df_peopleFullyVaccinated2.loc[latamCountries]
df_fullyLatam['percentage'] = df_fullyLatam['percentage'].replace(np.nan,0)

In [None]:
df_fullyLatam = df_fullyLatam.sort_values(['percentage'],ascending=False)
fig = px.bar(df_fullyLatam, x=df_fullyLatam.index, y='percentage', text= 'percentage',
             color = 'percentage',
             color_continuous_scale=px.colors.sequential.Purp)
fig.update_traces(texttemplate='%{text:.2f}%',textposition='outside')
fig.update_layout(uniformtext_minsize=10)
fig.update_layout(xaxis_tickangle=30)
fig.update_layout(margin=dict(autoexpand=True))
fig.update_layout(coloraxis_showscale=False)
fig.update_layout(title="Percentage of population Vaccinated")
fig.update_layout(xaxis_title = 'Country',yaxis_title='Percentage')

# Conclusion

There is a lot of stuff going on in the world right now, and with the unfortunate events of last year, truly everyone is hoping that it can end soon. This is my first data visualization project, and I hope that this was useful to someone. You can check in daily, as it will automatically update. Let's hope that the rate of vaccination increases and we can go back to the life we had. 

If you want to see the code that is behind the notebook, you can select the option of copy and edit on the top right of the notebook, or you can click on the individuals' squares that say "Code" and "Output" that are along with the notebook.

If you liked the project, I would appreciate it a lot if you upvoted this notebook, as well as if you want to leave a comment. 

And don't forget to wear a mask! 

# References

There a lot of resources that helped me develop this project in any way. Here is a list of them:
* Jovian: Data Analysis with Python: Zero to Pandas Course
* Derek Banas [Plotly Tutorial](https://www.youtube.com/watch?v=GGL6U0k8WYA&t=1545s)
* Plotly [Documentation](https://plotly.com/python/)
* Pandas [Documentation](https://pandas.pydata.org/docs/)
* [Covid-19 EDA: Man vs Disease](https://www.kaggle.com/pawanbhandarkar/covid-19-eda-man-vs-disease) from Pawan Bhandarkar Notebook 
* [COVID-19 Vaccination Progress](https://www.kaggle.com/gpreda/covid-19-vaccination-progress) from Gabriel Preda
* And a lot of Stack Overflow

If you had any doubts about the notebook, personal recommendations, errors, bugs, or any suggestions, you can e-mail me at "miguelizondov@gmail.com"