<h1 style="font-weight: 700;font-family: 'Poppins'; text-align: center; font-size: 37px;">
    NATIONAL COVID-19 IMMUNISATION PROGRESS
</h1>

<img src="https://c.files.bbci.co.uk/53A9/production/_115371412_gettyimages-1265248637.jpg" width="400px" />

<br />

<h1 style="
    font-weight: 700;
    font-family: 'Poppins';
    font-size: 23px;
           text-align: center;">
    <span style="color: #3CB64B;">#LINDUNG DIRI, </span><span>LINDUNG SEMUA.</span>
</h1>

<div style="text-align: center; font-size:40px;"><span>&#183; </span><span>&#183; </span><span>&#183; </span></div>

<p style="text-align:center; font-size:18px;">Bored of looking images below?</p>

<div style="display:flex;">
<img src="https://scontent.fkul8-1.fna.fbcdn.net/v/t1.6435-9/165277391_126509419440613_769630491692162842_n.png?_nc_cat=110&ccb=1-3&_nc_sid=730e14&_nc_ohc=HbtP1RuC9J4AX8uKpBA&_nc_ht=scontent.fkul8-1.fna&oh=3fb79e89c66403e78d229f557fa03609&oe=608C14F2" width="300px" /> 
<img src="https://scontent.fkul8-1.fna.fbcdn.net/v/t1.6435-9/165593227_126508386107383_8475892755212741435_n.png?_nc_cat=109&ccb=1-3&_nc_sid=730e14&_nc_ohc=uiRq4H8cSg0AX_F5Umt&_nc_ht=scontent.fkul8-1.fna&oh=c54370e5ff47594b042477365494a57f&oe=608CE878" width="300px" />
</div>

<br />
<p style="text-align:center;">In this notebook, we will be learning to visualize daily vacciation and registered data with Python. Instead of just looking at images posted in JKJAV daily, why not make it yourself in Python!</p>

<div style="text-align: center; font-size:40px;"><span>&#183; </span><span>&#183; </span><span>&#183; </span></div>

<h1 style="background-color: #3CB64B;text-align:center; font-family: 'Poppins'">Preparation & Data Source</h1>

First, let's import all the library we will be using. We will be using `json` to read our GeoJSON file, a file of which outline Malaysia Map, more on that later; `pandas` to read CSV file; `plotly` to plot our interactive maps; and `geopandas` to plot the GeoJSON file.

In [None]:
import json
import pandas as pd
import plotly.express as px
import geopandas as gpd

Let's read the GeoJSON file and load into `states`. The next cell shows that we could read with `geopandas` and plot it out as shown.

In [None]:
with open('../input/malaysia-vaccination-progress/Malaysia.geojson') as file:
    states = json.load(file)

In [None]:
gpd.read_file('../input/malaysia-vaccination-progress/Malaysia.geojson').plot()

Read all three CSV files.

- `registered` is the total registered citizen by states daily.
- `population` is the population with age 18 and above by states.
- `vaccinated` is the number of citizen that is vaccinated by state. The dataset contains 1st dose and 2nd dose data.

You could see the sample of the DataFrame with `head()`.

In [None]:
registered = pd.read_csv('../input/malaysia-vaccination-progress/Registered.csv')
population = pd.read_csv('../input/malaysia-vaccination-progress/Malaysia_Population_18yo.csv')
vaccinated = pd.read_csv('../input/malaysia-vaccination-progress/Vaccination.csv')

### Registration Data Preparation

In [None]:
registered.head()

As you can observed above, the data is stored by states, which is not good for map plotting. The ideal plotting data should be having columns of `date`, `state` and `number`. Not a big deal, you can do that with `pandas`. Here we use `melt()` function, where it is used to unpivot a DataFrame from wide to long format. Then, we sort the 'melted' DataFrame by date and then by state with `sort_values()`. Lastly, we reset the index and drop the index column with `reset_index()`.

In [None]:
registered=pd.melt(registered,id_vars='date',value_vars=[i for i in list(registered.columns) if i !='date'],
        var_name='state',value_name='registered').sort_values(by=['date','state']).reset_index(drop=True)

In [None]:
registered.head(20)

Perfect! Next, we want to have the percentage of population, how do we do that? 
1. We append the population by state to every row. We use `join()` function to join the `registered` DataFrame with `population` DataFrame, on the column of `state`.
2. We divide the number registered with population. 


In [None]:
registered=registered.join(population.set_index('state'),on='state')
registered['percentage']=registered['registered']/registered['population']

In [None]:
registered.head()

### Vaccination Data Preparation

Let's take a look into our vaccinated DataFrame. 

In [None]:
vaccinated.head()

It's huge!! Let's break down into 2 smaller DataFrame, called `vaccinated1` and `vaccinated2` for 1 dose receipients and 2 doses receipients respectively. To do so, let's select columns that has `dose1` in the column name and `dat`, as shown below.

In [None]:
vaccinated1=vaccinated[[i for i in list(vaccinated.columns) if 'dose1' in i or i=='date']]

In [None]:
vaccinated1.head()

Now, let's remove the annoying `dose1_` in front of every state name. Let's do so by splitting the string with `split`. Since every column after date is state name, let's repeat them. Let's see how it works.

In [None]:
state = [i.split('_')[1] for i in vaccinated1.columns[1:]]

In [None]:
state

Nicely done!, Let's append `date` into the first position of the `state` list, and then rename our DataFrame column name.

In [None]:
state.insert(0,'date')

In [None]:
vaccinated1.columns=state

In [None]:
vaccinated1.head()

Perfect! Let's melt our DataFrame like before!

In [None]:
vaccinated1=pd.melt(vaccinated1,id_vars='date',value_vars=[i for i in list(vaccinated1.columns) if i != 'date'],
        var_name='state',value_name='dose1').sort_values(by=['date','state']).reset_index(drop=True)

In [None]:
vaccinated1.head(20)

Let's repeat for dose 2!

In [None]:
vaccinated2=vaccinated[[i for i in list(vaccinated.columns) if 'dose2' in i or i=='date']]
vaccinated2.columns=state
vaccinated2=pd.melt(vaccinated2,id_vars='date',value_vars=[i for i in list(vaccinated2.columns) if i != 'date'],
        var_name='state',value_name='dose2').sort_values(by=['date','state']).reset_index(drop=True)

In [None]:
vaccinated2.tail(20)

<h1 style="background-color: #3CB64B;text-align:center; font-family: 'Poppins'">Data Visualization</h1>

Let's go!!

### Visualization 1: Visualize Population by state!

In [None]:
fig = px.choropleth(population, geojson=states, locations='state', 
                    featureidkey="properties.short",
                    color = 'population',
                    color_continuous_scale="Viridis",
                    center = {"lat": 6, "lon": 100.7129},
                    title = "Malaysia Population by State (Aged 18 and above)")
fig.update_geos(fitbounds="locations", visible=False)
fig.show()

### Visualization 2: Visualize Latest Registered Data by state!

In [None]:
fig = px.choropleth(registered.tail(16), geojson=states, locations='state', 
                    featureidkey="properties.short",
                    color = 'percentage',
                    range_color=(0, 0.5),
                    color_continuous_scale="Viridis",
                    center = {"lat": 6, "lon": 100.7129},
                    title = "Total Registration on {} (by Percentage)".format(registered.tail(16).iloc[0]['date']))
fig.update_geos(fitbounds="locations", visible=False)
fig.show()

Is there something else we can do? I want to see the number and percentage too! 

In [None]:
fig = px.choropleth(registered.tail(16), geojson=states, locations='state', 
                    featureidkey="properties.short",
                    color = 'percentage',
                    range_color=(0, 0.5),
                    color_continuous_scale="Viridis",
                    center = {"lat": 6, "lon": 100.7129},
                    hover_name="state", 
                    hover_data={
                        'state': False,
                        'date': False,
                        'population': False,
                        'registered': True,
                        'percentage': ':.2%'
                    },
                    title = "Total Registration on {} (by Percentage)".format(registered.tail(16).iloc[0]['date']))
fig.update_geos(fitbounds="locations", visible=False)
fig.show()

Note:
1. As you might observed, we pass in `registered.tail(16)` as our data. Because we would like to take the latest data available.
2. `featureidkey="properties.short"` is used to find the corresponding state polygon area in our GeoJSON file. In out GeoJSON file, there's a aspecial key called `short` in the `properties` field.
3. `.format()` is used to format a string. `"String is {}".format(x)` will inject the variable `x` into the `{}`.
4. `hover_data` is passed in with a dict. The dict key is the `registered` DataFrame. `True` represents that the data is needed to show in the tooltip, and vice-versa. The `:.2%` is a d3-string formatting, telling that 'showing as percentage with 2 decimal points'.

Let's ANIMATE!

In [None]:
import plotly.express as px

fig = px.choropleth(registered, geojson=states, locations='state', 
                    featureidkey="properties.short",
                    color='percentage',
                    animation_frame='date',
                    color_continuous_scale="Viridis",
                    center = {"lat": 6, "lon": 100.7129},
                    hover_name="state", 
                    hover_data={
                        'state': False,
                        'date': False,
                        'population': False,
                        'registered': True,
                        'percentage': ':.2%'
                    },
                    title = "Total Registration by State")
fig.update_geos(fitbounds="locations", visible=False)
fig.show()

In [None]:
import plotly.io as pio
pio.write_html(fig, file='registration.html', auto_open=True,include_plotlyjs="cdn")

In [None]:
!curl --upload-file ./registration.html https://transfer.sh/registration.html

### Visualization 3: Visualize Latest Vaccination Data by state!

DIY!