# Geospatial Data Analysis for Environment studies 
## Lecture 12: Geospatial Analyses on the Cloud: Kaggle
#### *Name*: Siraphop SAISA-ARD (Mag)  
#### *Student ID*: 21M51964


### 1. **The selected country**: Vietnam
The following block imports the data and libraries. It is noted that the Daily Infection data uses the name "Viet Nam", and in the Vaccination data uses "Vietnam".

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt

selected_country_df1 = 'Viet Nam'
selected_country_df2 = 'Vietnam'

df1 = pd.read_csv("../input/covid19-global-dataset/worldometer_coronavirus_daily_data.csv", header=0)
df1 = df1[df1['country'] == selected_country_df1]
df2 = pd.read_csv("../input/covid-world-vaccination-progress/country_vaccinations.csv", header=0)
df2 = df2[df2['country'] == selected_country_df2]
df1.index = pd.to_datetime(df1['date'])
df2.index = pd.to_datetime(df2['date'])

df_pop = pd.read_csv("../input/population-by-country-2020/population_by_country_2020.csv", header=0)
pop_viet_df = df_pop[df_pop["Country (or dependency)"]=='Vietnam']
pop_viet = pd.to_numeric(pop_viet_df["Population (2020)"],downcast='float').values

In [None]:
# For debugging
print(df1.columns)
print(df2.columns)
print(np.unique(df1['country']))
print(np.unique(df2['country']))

# Analyses
### 2. Construct a time-series and colored tables
#### 2.1) Time-series
Firstly, the data is simply visualized on the same plot with different axes. However, labeling both y axes and legends is not easy by using pyplot, so the plotly is used instead.

In [None]:
fig,ax = plt.subplots(figsize=(10,5))
ax.plot(df1.index,df1['daily_new_cases'],label='Daily Cases')
plt.ylabel("Daily New Cases")
ax1 = ax.twinx()
ax1.plot(df2.index,df2['daily_vaccinations'],'r',label='Daily Vaccination')
plt.xlabel("Daily Vaccination")
plt.legend()
plt.show()

# Improved Visualization
From here, iplot and plotly are used to plot instead.

In [None]:
from plotly.offline import iplot
import plotly.graph_objs as go
import plotly.express as px

daily_new_cases = go.Scatter(x=df1.index, y=df1['daily_new_cases'], yaxis='y1', name='Daily confirmed cases')
daily_vaccinations = go.Scatter(x=df2.index, y=df2['daily_vaccinations'], yaxis='y2', name='Daily Vaccinations')

layout_obj = go.Layout(title='COVID vs VACCINES', xaxis=dict(title='Date'), yaxis=dict(title='Daily cases'), yaxis2=dict(title='Daily vaccination',side='right',overlaying='y'))
fig = go.Figure(data=[daily_new_cases,daily_vaccinations],layout=layout_obj)
iplot(fig)

#### 2.2) Colored tables
Aside from the required number of daily covid cases and number of number of daily vaccinations, the cumulative total cases and cumulative total cases per million people are constructed in addition, together with the number of vaccinations per million people.

In [None]:
daily_new_cases = df1["daily_new_cases"]
cumulative_total_cases = df1["cumulative_total_cases"]
permil_total_cases = cumulative_total_cases.copy()*1000000./pop_viet
permil_total_cases.rename('permillion_total_cases',inplace=True)
daily_vaccinations = df2["daily_vaccinations"]
daily_vaccinations_per_million = df2["daily_vaccinations_per_million"]
people_fully_vaccinated_per_hundred = df2["people_fully_vaccinated_per_hundred"]
df_both = pd.concat([daily_new_cases,cumulative_total_cases,percent_total_cases,daily_vaccinations,daily_vaccinations_per_million,people_fully_vaccinated_per_hundred],axis=1)
style_object = df_both.style.background_gradient(cmap='jet').highlight_max('daily_new_cases').set_caption('Daily cases')
display(style_object)


### 3. What is the global ranking of the selected country in the number of cases and vaccinations per number of people
#### Number of case in Vietnam : 166th highest /220
#### Number of vaccination per hundred in Vietnam : 165th highest /211

In [None]:
df1 = pd.read_csv('../input/covid19-global-dataset/worldometer_coronavirus_daily_data.csv',header=0)
df1 = df1.groupby('country').sum()['daily_new_cases'].sort_values(ascending=False)
print(df1.to_string())
print(len(df1))
df2 = pd.read_csv("../input/covid-world-vaccination-progress/country_vaccinations.csv", header=0)
df2 = df2.groupby('country').max()['people_fully_vaccinated_per_hundred'].sort_values(ascending=False)
print(df2.to_string())
print(len(df2))

### 4. Inspect the daily vaccinations and COVID-19 cases of the country
<p>From the time-serie plot(and the table), the vaccination campaign started on March 8th, 2021. The number of infection was started to be recorded from February 15th, 2020. According to the graph, the number of cases in Vietnam is comparatively low. From the graph, there were 4 waves in total: April 2020, August 2020, February 2021, annd May 2021. The highest number of daily infection is 190 which is around 1 month after the peak of the vaccination. The percentage of vacinated people is 0.03; however, the data in Gibraltar seem to be invalid because it exceeds 100.</p>
<p>Contradictory, the biggest wave of daily infection is just after the vaccination campaign was initiated, so it is suspected that the number has been underreported.  Moreover, the number of daily vaccination has a decreasing, but it should be increasing, so further investigation is needed.</p>

### 5. News reviewing
<p> Even though it is unprofessional to cite Wikipedia in an academic work, the article about COVID in Vietnam has been nicely compiled with citations in https://en.wikipedia.org/wiki/Timeline_of_the_COVID-19_pandemic_in_Vietnam, although it is bizarrely descriptive and detailed. The first case was confirmed on January 2020, and ccordingly, the government stated a lockdown in the district, so it did not spread further from this cluster. Then a case was confirmed in March 2020 in Hanoi, and continuously increased from the people who arrived from Europe, causing the 1st wave in April 2020. The second wave started in July 2020 when there was a case which the source of infection is unknown. Consequently, hundreds of cases were reported across the country braching from this case. The government enacted a similar strategy, and could contain the infection, according to the source. The 3rd wave began in January 2021, which was related to a single migrant worker. The government tried to use the same strategy of local lockdown, but it had already spread too far and took some time for the situation to be under control. Then on the following day, the government started the vaccination campaign. Finally, the 4th wave started in the end of April which is the highest of all time. The investigation led to a patient who had just returned from abroad and a group of Chinese experts who had visited many places before testing positive. Another reason is a 4-day holiday for Reunnification day and International Worker's Day because the people traveled. </p>
<p> The information regarding the number of daily vaccination does not disagree with the data from other source, which uses the cumulative data and shows a similar declination of the administration rate. However, since the data regardinng the vaccination rate drop is relatively new, the related information is not available yet. </p>