## Course: CSI-703 Scientific and Statistical Visualization<br>Instructor: Holly Russo, PhD

### Author: Samiul Islam


#### It contains:
- 

In [None]:
from IPython.display import IFrame
from IPython.display import Image
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import plotly.express as px
import geopandas as gpd
import pickle
from nltk.corpus import stopwords
import string
from nltk.stem.wordnet import WordNetLemmatizer
from wordcloud import WordCloud
import folium
from folium.plugins import StripePattern
import warnings
warnings.filterwarnings("ignore")

### Assignment 1: Visualization Experience Questionnaire

In [None]:
IFrame("Assignment_Files/Assignment1/Assignment 1_ISLAM_SAMIUL_SKILLS.pdf", width=900, height=400)

### Assignment 2: Python or R: "Hello !"

In [None]:
n = "Samiul Islam"
g = "G01201813"
msg = "My name is "+n+" and my G number is "+g
print(msg)

# web app using streamlit
import streamlit as st
name = st.text_input('Name', '')
g_num = st.text_input('G-Number', '')
st.write('Hello', name, '-', g_num)

### Assignment 3: What does that even mean?

This is a snapshot (included in the next cell) of a visualization framework that I did as a part of another project. The actual framework is built as a web app with the help of Java & JavaScript, and the different figures attached here are actually different stages of the same web app.

In [None]:

Image("Assignment_Files/Assignment3/Assignment 3_Snapshot of Covid 19 Data Visualization Framework.png")

### Assignment 4: I just can't see it.

I submitted the same figure/snapshot (see the previous cell) from the assignment 3 here.

### Assignment 5: Tell me your secrets.

All the questions and related answers of this assignment has been discussed below:

In [None]:
# Take input from the excel file and load it to a dataframe
df = pd.read_excel('Assignment_Files/Data/forestfires.xlsx')
# Print out head to see how the data looks like
print(df.head())

Question 5.1: Is there any particular season (or months) where a forest fire is more devastating?

N.B.: Can this even be answered from the given dataset?

In [None]:
# Aggregate the data on monthly basis
df_sum = df.groupby(['month']).sum()
# subset the dataframe by keeping only the 'burnt area'
# month is not an attribute here but index
df_area = df_sum[['area']]

# Print out to see how it looks like; found that the grouped data is not ordered by months (Jan to Dec)
print('\nIncorrectly ordered:')
print(df_area)
# Re-indexing the dataframe to fix the order of the months
df_area = df_area.reindex(index = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct',
       'nov', 'dec'])
# Printing out to see if the order has been fixed
print('\nCorrectly ordered:')
print(df_area)

In [None]:
# Plot the aggregated area for each month to identify how devasting each month was?
# Set the size of the plot for a high-resolution plot and setup the fonts for ticks, legend, and title.
df_area.plot.bar(figsize=(15, 10))
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)
plt.legend(fontsize=20)
plt.xlabel('Month',fontsize=20)
plt.ylabel('Burnt Area (HA)',fontsize=20)
plt.title('Burnt Forest Area by Month',fontsize=35)

Answer 5.1: By looking at the above plots, one can tell that the situation worsens during August and September, and it seems to start in July and finally ends in October.
However, having only this data and the plot above may be confusing. We are summing up all records to calculate the burnt area of each month without thinking of how balanced the data is. For example, it could be the case that we have more samples for August and September than any other months, and thus when we sum up the monthly records, their value becomes very high compared to others.

In [None]:
# Calculate how many samples we have for each month
dataFreqByMonth = (df.groupby('month').size())
# Re-index to fix the order of the months
dataFreqByMonth=dataFreqByMonth.reindex(index = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct',
       'nov', 'dec'])
# Print out to see how balanced the data is
dataFreqByMonth

Answer 5.1 Cont.: We can tell from the above statistics that the data acquisition is not well distributed among months. I would not conclude without having more info regarding how the acquisition is happening. If the acquisition is triggered by some set of input parameters that successfully detects the chances of fire, then this imbalanced data is representative. If this is not the case, then further processing is needed (i.e., we can plot the area per record by month).

In [None]:
sns.set(rc = {'figure.figsize':(15,8)})
sns.set_theme(style="whitegrid")
ax = sns.violinplot(x="month", y="area", data=df, gridsize=500, width=1.2, order=['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct',
       'nov', 'dec'])

Answer 5.1 Cont.: The violin plot above is fancier and requires less effort but contains more information than the barplot above. It automatically groups the data by months with simple commands and shows how the data points are distributed each month.

Question 5.2: Let's assume 'area' is our target.
How correlated are each weather/dryness-related indices to the 'area'? Also, how correlated are they to each other?

List of the weather/dryness related features:
- The Fine Fuel Moisture Code (FFMC) represents the fuel moisture of forest litter fuels under the shade of a forest canopy
- The Duff Moisture Code (DMC) represents fuel moisture of decomposed organic material underneath the litter
- The Drought Code (DC), much like the Keetch-Byrum Drought Index, represents drying deep into the soil
- The Initial Spread Index (ISI) is a numeric rating of the expected rate of fire spread
- Temperature
- RH: Relative Humidity
- Wind
- Rain

In [None]:
# Generate a correlation matrix
df_corr = df[['FFMC', 'DMC', 'DC', 'ISI', 'temp', 'RH', 'wind', 'rain', 'area']]

corr = df_corr.corr()
mask = np.zeros_like(corr, dtype=bool)
mask[np.triu_indices_from(mask)] = True
corr[mask] = np.nan
(corr
 .style
 .background_gradient(cmap='PuOr_r', axis=None, vmin=-1, vmax=1)
 .highlight_null(null_color='#f1f1f1')
 .set_precision(3))

In [None]:
#Generate pair plot

sns.set_context( rc={"font_scale":30})

g = sns.pairplot(df_corr, 
    kind='reg',
    diag_kind="hist",
    corner=True,
    )

Answer 5.2: There is almost no correlation between the area and any other variable. We can tell by looking at the correlation matrix and the pair plot. I included a trendline in the pair plot, and if we focus on the 'area' row, we see completely flatlines for each pair.

There are some correlation between different variables such as DMC & DC, FFMC & ISI, RH & Temp (negative corr.) etc.

Question 5.3: Do they (the weather/dryness variables) show any relation between them if we group them by month?

In [None]:
# Convert the month column from string to int
df["month"].replace({"jan": 1, "feb": 2, "mar": 3, "apr": 4, "may": 5, "jun": 6, "jul": 7, "aug": 8, "sep": 9, "oct": 10, "nov": 11, "dec": 12}, inplace=True)
df["month"] = df["month"].astype(int)
df

In [None]:
# Plot a pair plot color-coded by months to see if they show any relation
fig = px.parallel_coordinates(df, color="month",
                             color_continuous_scale=px.colors.diverging.Tealrose,
                             color_continuous_midpoint=6)
fig.show()

Answer 5.3: Although I couldn't find a way to assign distinct colors to each of the months, we can tell that the variables show some relations when we group them via months. For example, the teal-colored lines are basically the first quadrant, and they show high value for FFMC but low for DMC and DC. On the other hand, the last quadrant of the year shows a high value for DC while showing similar characteristics for the other two variables.

### Assignment 6: Data munging is fun!

All the questions and related answers of this assignment has been discussed below:

In [None]:
# Print out the shape of the dataframe to see how many rows and columns we have
print(df.shape)

Answer 6.1: Here, we have 13 columns/variables and a total of 517 rows/observations in the data.

Question 6.2: What are the different data types in the dataset (e.g., string, Boolean, integer, floating, date/time, categorical, etc.)?

In [None]:
# Print out the list of variables and associated data types
print(df.dtypes)

Answer 6.2: We have three different data types here in this dataframe: Integer (int64), Float (float64), and Object. X, Y, and RH variables are integer variables, month and day are object types, and the rest of the variables are float types.

Question 6.3: What variables would you rename to make your visualization look better?

Answer 6.3: For this particular dataset, I would keep the name of the variables as it is since they are abbreviated well. For example, the 'DC' variable represents DC (Drought Code) index from the FWI (Fire Weather Index) system. All other variables also represent a shorter and concise version of their definition.

Question 6.4: Describe any missing values. Using the rule of thumb in Data Visualization Made Simple, would you remove those rows or columns?

In [None]:
# Check if there are any missing values in the dataset
df.isna().sum()

Answer 6.4: If we follow the above code snippet, we can see that we do not have any missing values in the dataset; hence, we do not have to remove observations or variables here. However, if we had seen some missing values in the data, it would need to be adjusted accordingly. For example, let's assume for almost all of these records, we found that the value of RH is missing. In that case, we would have removed this variable instead of eliminating the observations. On the other hand, if we had seen temp has null for three observations in the dataset, we could have removed only those three observations. There are different techniques for estimating a missing value so that we do not need to remove those observations.

Question 6.5: What other cleaning / prep steps would you do, based on the advice in Data Visualization Made Simple.

In [None]:
# For example, let's see we want to work on a subset of this forest fire dataset where we only consider the observations with some burnt area.
# Meaning, we want to only consider the observations where the 'area' value is greater than 0.0
# By seeing the .head() functions output above, it seems evident that the value of rain is mostly zero.
# But, we cannot tell that with certainty by only looking at the first five observations.
# So, let's see how many observations have rain (a positive value) out of these 517 rows.
print('Observations with rain:',len(df[df["rain"] > 0]))

In [None]:
# Preparing a new dataframe by keeping observations with burnt area only
df_burnt = df[df["area"] > 0]

print('Observations with rain after dropping no-burnt area(s):',len(df_burnt[df_burnt["rain"] > 0]))

# Since, there are only 8 records with positive values for rain, for the further analysis, I am dropping rain while selecting observations with burnt areas only
#df_burnt = df[df["area"] > 0][['X','Y','month','day','FFMC', 'DMC', 'DC', 'ISI', 'temp', 'RH', 'wind', 'area']]
print('\n',df_burnt)

Answer 6.5: Here, I am assuming that my further analysis will only require observations of burnt area. So, I have excluded those observations that do not have positive values in the burnt area variable and got 270 observations out of those 517. Also, I have seen only eight records with rain in the whole dataset, and after filtering for the burnt areas, there remain only 2. So, I have then decided to drop the 'rain' column.

### Assignment 7: Write your own evaluation.

In [None]:
IFrame("Assignment_Files/Assignment7/Assignment 7_Islam Samiul Evaluation Criteria for Reviewing Graphics.pdf", width=900, height=600)

### Assignment 8: Visualizing correlation, comparisons, and trends.

All the questions and related answers of this assignment has been discussed below:

Question 8.1: Do the weather/dryness variables show any relation between them if we group them by month? Below is the list of those variables.

List of the weather/dryness related features:
- The Fine Fuel Moisture Code (FFMC) represents the fuel moisture of forest litter fuels under the shade of a forest canopy
- The Duff Moisture Code (DMC) represents fuel moisture of decomposed organic material underneath the litter
- The Drought Code (DC), much like the Keetch-Byrum Drought Index, represents drying deep into the soil
- The Initial Spread Index (ISI) is a numeric rating of the expected rate of fire spread
- Temperature
- RH: Relative Humidity
- Wind
- Rain

In [None]:
# Plot a pair plot color-coded by months to see if they show any relation
fig = px.parallel_coordinates(df, color="month",
                             color_continuous_scale=px.colors.diverging.Tealrose,
                             color_continuous_midpoint=6,
                              title="Relations between weather/dryness variables while grouping them by month")
fig.show()

Answer 8.1: Although I couldn't find a way to assign distinct colors to each of the months, we can tell that the variables show some relations when we group them via months. For example, the teal-colored lines are basically the first quadrant and they show high value for FFMC but low for DMC and DC. On the other hand, the last quadrant of the year shows a high value for DC while showing similar characteristics for the other two variables.

### Assignment 9: Visualizing distributions and part-to-whole

All the questions and related answers of this assignment has been discussed below:

Question 9.1: How is the area of forest fire distributed over months?

In [None]:
df_pie_plot = df_sum[['wind', 'area']]
months = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec']
df_pie_plot = df_pie_plot.loc[months]
x = df_pie_plot.index.tolist()
y = df_pie_plot['area'].tolist()
percent = [100*ey / sum(y) for ey in y]
colors = ['lightcoral','lightskyblue','yellow','yellowgreen','grey','pink','blue','darkgreen','cyan','magenta','violet','gold']
patches, texts = plt.pie(y, colors = colors, counterclock=False, startangle=-270, radius=1.5)
labels = ['{0} - {1:1.2f} %'.format(i,j) for i,j in zip(x, percent)]
plt.title('Part-to-whole\nDistribution of\nForest Fire\n\n', loc='center',fontsize=35)

plt.legend(patches, labels, title = 'Month', loc='center', bbox_to_anchor=(-0.1, 1.),
           fontsize=18)

In [None]:
# Plot the aggregated area for each month to identify how devasting each month was?
# Set the size of the plot for a high-resolution plot and setup the fonts for ticks, legend, and title.
df_bar_plot = df_area.loc[months]
x = df_bar_plot.index.tolist()
y = df_bar_plot['area'].tolist()
x_pos = np.arange(len(x))
print(x_pos)
plt.figure(figsize=(10, 10), dpi=500)

plt.bar(x_pos, y, color=['lightcoral','lightskyblue','yellow','yellowgreen','grey','pink','blue','darkgreen','cyan','magenta','violet','gold'])
plt.xticks(x_pos, x)
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)
#plt.legend(fontsize=18)
plt.xlabel('\nMonth',fontsize=18)
plt.ylabel('Burnt Area (HA)\n',fontsize=18)
plt.title('Burnt Forest Area by Month\n',fontsize=35)
plt.legend('',frameon=False)

Answer 9.1: The bar and pie plots above represent the affected areas' distribution over months and convey different key information related to the distribution. The bar plot tells us the actual burnt area by months (including unit), while the pie chart on the right tells us the corresponding percentage by showing the same visually. On top, the color palette has been kept analogous to make it easily comparable to the readers.

I could have avoided these two charts and instead made one (any one of these two) with all the information from both charts (actual burnt area, percentage, and color palette to identify the corresponding months). However, I thought that route could outflow the readers and undermine the actual goal of information portrayal.

### Assignment 10: Visualizing geospatial data

All the questions and related answers of this assignment has been discussed below:

In [None]:
# import relevant data
data=pd.read_csv("Assignment_Files/Data/zipcode_wise_tweet_count_IN.csv")

In [None]:
# load the selected subset of the US map (shapefile) containin IN only.
# I am using a backup of the partial shapefile.
# If you want to learn more on how I selected the subset and did a backup, you may
# want to go over this file: 'Assignments/Assignment5_6_8_9_10_11_12/ISLAM_SAMIUL.ipynb' on
# my git hub repo.
with open('Assignment_Files/Data/us_map.pickle', 'rb') as handle:
    us_map = pickle.load(handle)

In [None]:
# print out the shapefile and data (head of them) to see how they look like
print(us_map.head())
print(data.head())

In [None]:
# merge them based on ZCTA5CE10 which is zip-code
map_data = us_map.merge(data, on="ZCTA5CE10")

In [None]:
# print out newly created map_data to see how it looks like
map_data.head()

Question 10.1: Given that we have access to a dataset that contains the number of tweets grouped by zip codes made from Indiana during 2014, we would like to know how they are geographically distributed. What are the key takeaways from the distribution?

In [None]:
fig, ax = plt.subplots(1, figsize=(100, 100))
fig.patch.set_facecolor('white')
plt.xticks(rotation=90)
plt.yticks(fontsize=50)
map_data.plot(column="t_count", cmap="Reds", linewidth=1, ax=ax, edgecolor="0")
plt.title('\n\nDistribution of Tweets Made from Indiana during 2014\nMap is Sub-divided into Zip Codes', fontdict = {'fontsize' : 150})
bar_info = plt.cm.ScalarMappable(cmap="Reds", norm=plt.Normalize(vmin=0, vmax= max(map_data['t_count'])))
bar_info._A = []
cbar = fig.colorbar(bar_info)
cbar.ax.tick_params(labelsize=100)
cbar.ax.set_ylabel('\n# of tweets\n', rotation=90, fontsize = 100)
ax.axis("off")

Answer 10.1: The map tells us about the distribution, and the associated color bar on the right helps decode the distribution. As stated in the title, the map is sub-divided by zip codes, and the data is from 2014, representing tweet counts from Indiana.

In my opinion, the key takeaways are:
- Areas that contribute most are geographically smaller (possibly indicating highly populated areas).
- Although some areas contribute 50,000 or more tweets, most of the areas’ contribution is around 10k-15k.
- More active users reside in the center than in the bordering areas.

About data source: Data related to tweets is a subset of the data collected from Twitter that contains tweets from the United States of America for 2014. I use United States Census Bureau’s shapefiles to plot the geographical maps (available here: https://www2.census.gov/geo/tiger/TIGER2020/ZCTA5/).

### Assignment 11: Visualizing concepts and qualitative data

All the questions and related answers of this assignment has been discussed below:

In [None]:
# read tweets
# IL has more than 20 million tweets. Working with those will require
# a lot of time. To minimize that, I am using first 20,000 of those tweets here.
# you may want to go over this file: 'Assignments/Assignment5_6_8_9_10_11_12/ISLAM_SAMIUL.ipynb' on
# my git hub repo to use a larger set.
df_tweets_il = pd.read_csv('Assignment_Files/Data/tweets_il_20k.csv')

# print out the head of the tweets to see how the look like
print(df_tweets_il.head())

In [None]:
# clean the data
stop = set(stopwords.words('english'))
exclude = set(string.punctuation)
lemma = WordNetLemmatizer()

def clean(text):
    stop_free = ' '.join([word for word in str(text).lower().split() if word not in stop])
    punc_free = ''.join(ch for ch in stop_free if ch not in exclude)
    normalized = ' '.join([lemma.lemmatize(word) for word in punc_free.split()])
    return normalized

df_tweets_il['text_clean']=df_tweets_il['text'].apply(clean)

In [None]:
# print out the head of the data after cleaning
df_tweets_il.head()

In [None]:
# aggregate the cleaned tweeted text
agg_tweets_il = df_tweets_il['text_clean'].str.cat(sep=' ')

Question 11.1: Given that we have access to tweets, can you show a high-level representation of what people talk about the most?

In [None]:
# generate wordcloud
wordcloud = WordCloud(width=1920, height=1080).generate(agg_tweets_il)
plt.figure(figsize=(12, 9), dpi=1200).patch.set_facecolor('xkcd:white')
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.title('Wordcloud of the tweets made from IL during 2014') # Instead of selecting 20 mil tweets,
                                                        # I am using first 20,000 of those to reduce runtime.
plt.show()

Answer 11.1: This question can be answered in many different ways based on the level of details we want. However, to visualize a high-level understanding, word cloud can be one of the simplest when portraying precisely what we need to see. The figure attached shows that people talk more about ‘amp,’ ‘love,’ ‘time’ etc. and talk less about ‘eat,’ ‘music’, ‘wish’ etc. Even though this tells us the frequency of topics, we need sentiment analysis to understand their mindset (related to these topics; whether they are thinking positively or negatively).

In [None]:
### Assignment 12: What does that even mean? Take #2

All the questions and related answers of this assignment has been discussed below:

Before discussing the improvements I made on the initial figure, I would like to talk about the initial figure briefly; what I was trying to visualize and how I was planning to do so. I had access to worldwide covid data for a period of two and a half months. We had two attributes: daily confirmed cases and daily deaths.

My initial plan was to build an interactive visualization framework where the user will be able to see how the data is geographically distributed and a visual ratio of confirmed cases, deaths, and recovered. In the source data, we did not have information about how many people recovered each day. I took a portion of the difference between active cases and deaths to synthesize that variable. This process was not accurate, or this was in no way representative of actual data. However, I did this to design the framework only so that once I have the data, the framework would be able to render that too. In the initial design, I thought about a slider/date selector so that the user can change the date to see the situation of each day. If you follow the snapshot below (this is the same snapshot that I submitted for assignment 3 and 4), you should be able to relate how my visualization framework was doing. The four maps of the snapshot are four different stages; I put them together to show how the user was able to interact with them. I represented each country's data with a circle that had an optional feature of composing all the attributes or variables.¶

The 'date selector' feature was not functional in my initial design. I used java and javascript to design that.

In [None]:
Image("Assignment_Files/Assignment3/Assignment 3_Snapshot of Covid 19 Data Visualization Framework.png")

I revamped the initial design and came up with a slightly different but significantly improved version of the framework to visualize that same data. I discarded the 'recovered' variable I synthesized in the new design and only focused on the 'daily confirmed cases' and 'daily deaths.' Now, I have two attributes/variables, and thus I decided to draw two different figures to visualize them. If I had access to the 'recovered' variable, I would have done another similar plot.

In the new design, I decided not to use that circle to represent the attributes' values; instead, I used a gradient color scale to color code the countries on the entire map. I am listing the improvements of the new design below:

- New design allows the user to zoom in/out on the map and drag it to reposition, which was not present in the initial design.
- Although the initial design was compact since it was able to display the impacts of multiple variables in the same plot, it was cluttery. When the value increases, the circle's radius also increases, obstructing the user from seeing the underlying geo-areas.
- I was not thinking about the color-blind community before (what would happen if someone could not distinguish between red, green, and yellow). In the new design, since I used gradient colors, they would at least be able to get the idea of relative comparison.
- In the initial design, I handled countries that were doing really well (zero confirmed cases or deaths) and countries that do not share data equally by not drawing a circle for both of them. This approach could mislead the audience. Now, I have explicitly handled the countries we do not have any reporting for and added a check box to highlight them if needed.
- It was so naive, but I did not include a title on the map describing what the users saw. I have added that here too.
- 'Date selector' feature was not functional back then; I fixed it too. (It is not available on the html I am invoking below; these are available on the streamlit version that I prepared for presentation).
- In the initial design, the geo-areas were not labeled. There was no way of knowing which area represented which country. I have used the open street map in the new design to resolve that.
- The new design also has a feature to switch between dark and light mode, which should be helpful for people who work in both dark and light environments.
- Now, user can hover over a country and get to see the name of the country along with daily confirmed cases and deaths.

This new framework has been designed using python with the help of folium library. Later, to add more functionalities, I used streamlit to encapsulate all these. You may want to go over the codes below and the attached two maps (HTML pages) to have an idea of the of the new interactive framework.

In [None]:
world_map = gpd.read_file("Assignment_Files/Data/Shapefile_World/world-administrative-boundaries.shp")
world_map

In [None]:
# Here, as a covid 19 dataset, I am using a compact version. To understand how this version has been prepared from
# the raw data, you may want to go over this file: 'Assignments/Assignment5_6_8_9_10_11_12/ISLAM_SAMIUL.ipynb' on
# my git hub repo.
with open('Assignment_Files/Data/datewise_data.pickle', 'rb') as handle:
    datewise_data = pickle.load(handle)
datewise_data

In [None]:
# Map of confirmed cases for March 29, 2020
d = '3/29/2020'
merged_data = world_map.merge(datewise_data[d], on="name")
my_map = folium.Map(location=[42, 0], zoom_start=2.5)

folium.Choropleth(
    geo_data = merged_data,
    name = 'COVID 19 Spread',
    data = merged_data,
    columns = ['name', 'countConfirmed'],
    key_on = 'feature.properties.name',
    fill_color = 'Oranges',
    fill_opacity = 0.7,
    line_opacity = 0.2,
    legend_name = '# of Patients',
    smooth_factor=0,
    Highlight= True,
    line_color = "#000000",
    show=True,
    overlay=True,
    nan_fill_color = "White"
).add_to(my_map)

################
# Here we add cross-hatching (crossing lines) to display the Null values.
nans = merged_data[merged_data["countConfirmed"].isnull()]['name'].values
gdf_nans = merged_data[merged_data['name'].isin(nans)]
sp = StripePattern(angle=45, color='black', space_color='black', line_color = "black", weight = 2, space_weight = 2, line_weight = 0, space_opacity = 0.75, line_opacity = 0.75)
sp.add_to(my_map)
folium.features.GeoJson(name="Unreported",data=gdf_nans, style_function=lambda x :{'fillPattern': sp},show=True).add_to(my_map)


#Hover
# Add hover functionality.
style_function = lambda x: {'fillColor': '#ffffff', 
                            'color':'#000000', 
                            'fillOpacity': 0.1, 
                            'weight': 0.1}
highlight_function = lambda x: {'fillColor': '#000000', 
                                'color':'#000000', 
                                'fillOpacity': 0.50, 
                                'weight': 0.1}
NIL = folium.features.GeoJson(
    data = merged_data,
    style_function=style_function, 
    control=False,
    highlight_function=highlight_function, 
    tooltip=folium.features.GeoJsonTooltip(
        fields=['name','countConfirmed', 'countDeaths'],
        aliases=['Country','Confirmed Cases', 'Deaths'],
        style=("background-color: white; color: #333333; font-family: arial; font-size: 12px; padding: 10px;") 
    )
)
my_map.add_child(NIL)
my_map.keep_in_front(NIL)
#sample_map2

loc = 'Distribution of COVID-19 Daily Confirmed Cases '+'('+d+')'
title_html = '''
             <h3 align="center" style="font-size:32px"><b>{}</b></h3>
             '''.format(loc)
my_map.get_root().html.add_child(folium.Element(title_html))



# Add dark and light mode. 
folium.TileLayer('cartodbdark_matter',name="dark mode",control=True).add_to(my_map)
folium.TileLayer('cartodbpositron',name="light mode",control=True).add_to(my_map)




# We add a layer controller. 
folium.LayerControl(collapsed=False).add_to(my_map)
################


#my_map.save('confirmed.html')
my_map

In [None]:
# Map of daily deaths for March 29, 2020
d = '3/29/2020'
merged_data = world_map.merge(datewise_data[d], on="name")
my_map = folium.Map(location=[42, 0], zoom_start=2.5)

folium.Choropleth(
    geo_data = merged_data,
    name = 'COVID 19 Spread',
    data = merged_data,
    columns = ['name', 'countDeaths'],
    key_on = 'feature.properties.name',
    fill_color = 'Oranges',
    fill_opacity = 0.7,
    line_opacity = 0.2,
    legend_name = '# of Patients',
    smooth_factor=0,
    Highlight= True,
    line_color = "#000000",
    show=True,
    overlay=True,
    nan_fill_color = "White"
).add_to(my_map)

################
# Here we add cross-hatching (crossing lines) to display the Null values.
nans = merged_data[merged_data["countConfirmed"].isnull()]['name'].values
gdf_nans = merged_data[merged_data['name'].isin(nans)]
sp = StripePattern(angle=45, color='black', space_color='black', line_color = "black", weight = 2, space_weight = 2, line_weight = 0, space_opacity = 0.75, line_opacity = 0.75)
sp.add_to(my_map)
folium.features.GeoJson(name="Unreported",data=gdf_nans, style_function=lambda x :{'fillPattern': sp},show=True).add_to(my_map)


#Hover
# Add hover functionality.
style_function = lambda x: {'fillColor': '#ffffff', 
                            'color':'#000000', 
                            'fillOpacity': 0.1, 
                            'weight': 0.1}
highlight_function = lambda x: {'fillColor': '#000000', 
                                'color':'#000000', 
                                'fillOpacity': 0.50, 
                                'weight': 0.1}
NIL = folium.features.GeoJson(
    data = merged_data,
    style_function=style_function, 
    control=False,
    highlight_function=highlight_function, 
    tooltip=folium.features.GeoJsonTooltip(
        fields=['name','countConfirmed', 'countDeaths'],
        aliases=['Country','Confirmed Cases', 'Deaths'],
        style=("background-color: white; color: #333333; font-family: arial; font-size: 12px; padding: 10px;") 
    )
)
my_map.add_child(NIL)
my_map.keep_in_front(NIL)
#sample_map2

loc = 'Distribution of COVID-19 Daily Deaths '+'('+d+')'
title_html = '''
             <h3 align="center" style="font-size:32px"><b>{}</b></h3>
             '''.format(loc)
my_map.get_root().html.add_child(folium.Element(title_html))



# Add dark and light mode. 
folium.TileLayer('cartodbdark_matter',name="dark mode",control=True).add_to(my_map)
folium.TileLayer('cartodbpositron',name="light mode",control=True).add_to(my_map)




# We add a layer controller. 
folium.LayerControl(collapsed=False).add_to(my_map)
################


#my_map.save('deaths.html')
my_map

I have prepared a more complete version of these two maps with better features. The implementation of that can be found in the 'FinalProject' folder of my repository. I have also deployed it using streamlit cloud. You may want to see it here: https://share.streamlit.io/samiul-gmu/csi703_spring2022_samiul/main/Final_Project_Web_App_Deployable.py