# COVID19 analysis in R & viz in Leaflet!
This tutorial was developed for visualizing the COVID19 data. By using open source components, the following instructions will get you flying into web-mapping.

## 1. Import the modules

In [1]:
# R Packages
library(magrittr) # for pipe operations like %>% and %<>%
library(lubridate) # for date operations
library(tidyverse) # collection of R packages for data science, including dplyr and tidyr for data processing and ggplot2 for graphics
library(gridExtra) # for arranging multiple grid-based plots on a page
library(kableExtra) # works together with kable() from knitr to build complex HTML or LaTeX tables
library(reshape2) # restructures and aggregates data
library(formattable) # for formatting numerical values

“package ‘lubridate’ was built under R version 3.6.2”

Attaching package: ‘lubridate’


The following objects are masked from ‘package:base’:

    date, intersect, setdiff, union


── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──

[32m✔[39m [34mggplot2[39m 3.2.1     [32m✔[39m [34mpurrr  [39m 0.3.3
[32m✔[39m [34mtibble [39m 3.0.0     [32m✔[39m [34mdplyr  [39m 0.8.5
[32m✔[39m [34mtidyr  [39m 1.0.2     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.3.1     [32m✔[39m [34mforcats[39m 0.5.0

“package ‘tibble’ was built under R version 3.6.2”
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mlubridate[39m::[32mas.difftime()[39m masks [34mbase[39m::as.difftime()
[31m✖[39m [34mlubridate[39m::[32mdate()[39m        masks [34mbase[39m::date()
[31m✖[39m [34mtidyr[39m::[32mextract()[39m         masks [34mmagrittr[39m::extract()
[31m✖[39

## 2. Load the COVID dataset
Our data source is from the JHU's Githug repository (Source: [Johns Hopkins CSSEGISandData](https://github.com/CSSEGISandData/COVID-19)).

Let's first read it into a DataFrame and then inspect it a little.

In [2]:
# read in the latest data
data <- read.csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/04-30-2020.csv")
# inspect dimension of the dataset
dim(data)
# formats the raw data
data <- data %>% select(-c(FIPS,Admin2,Province_State)) %>% rename(Location=Combined_Key,
                                                                   Country=Country_Region,
                                                                   Latitude=Lat,
                                                                   Longitude=Long_)
# show
data %>% head()

Unnamed: 0_level_0,Country,Last_Update,Latitude,Longitude,Confirmed,Deaths,Recovered,Active,Location
Unnamed: 0_level_1,<fct>,<fct>,<dbl>,<dbl>,<int>,<int>,<int>,<int>,<fct>
1,US,2020-05-01 02:32:28,34.22333,-82.46171,31,0,0,31,"Abbeville, South Carolina, US"
2,US,2020-05-01 02:32:28,30.29506,-92.4142,130,10,0,120,"Acadia, Louisiana, US"
3,US,2020-05-01 02:32:28,37.76707,-75.63235,264,4,0,260,"Accomack, Virginia, US"
4,US,2020-05-01 02:32:28,43.45266,-116.24155,671,16,0,655,"Ada, Idaho, US"
5,US,2020-05-01 02:32:28,41.33076,-94.47106,1,0,0,1,"Adair, Iowa, US"
6,US,2020-05-01 02:32:28,37.1046,-85.2813,81,10,0,71,"Adair, Kentucky, US"


In [3]:
# drops entries with missing values
data <- na.omit(data)
dim(data)

In [4]:
# view data grouped as a country (eg. UK)
data %>% filter(grepl(".+(United Kingdom)$", Location)) 

Country,Last_Update,Latitude,Longitude,Confirmed,Deaths,Recovered,Active,Location
<fct>,<fct>,<dbl>,<dbl>,<int>,<int>,<int>,<int>,<fct>
United Kingdom,2020-05-01 02:32:28,18.2206,-63.0686,3,0,3,0,"Anguilla, United Kingdom"
United Kingdom,2020-05-01 02:32:28,32.3078,-64.7505,114,6,48,60,"Bermuda, United Kingdom"
United Kingdom,2020-05-01 02:32:28,18.4207,-64.64,6,1,3,2,"British Virgin Islands, United Kingdom"
United Kingdom,2020-05-01 02:32:28,19.3133,-81.2546,73,1,10,62,"Cayman Islands, United Kingdom"
United Kingdom,2020-05-01 02:32:28,49.3723,-2.3644,537,40,386,111,"Channel Islands, United Kingdom"
United Kingdom,2020-05-01 02:32:28,-51.7963,-59.5236,13,0,11,2,"Falkland Islands (Malvinas), United Kingdom"
United Kingdom,2020-05-01 02:32:28,36.1408,-5.3536,144,0,131,13,"Gibraltar, United Kingdom"
United Kingdom,2020-05-01 02:32:28,54.2361,-4.5481,315,21,260,34,"Isle of Man, United Kingdom"
United Kingdom,2020-05-01 02:32:28,16.7425,-62.18737,11,1,2,8,"Montserrat, United Kingdom"
United Kingdom,2020-05-01 02:32:28,21.694,-71.7979,12,1,5,6,"Turks and Caicos Islands, United Kingdom"


In [5]:
# extract the coordinate data
lonlat <- data %>% select(Location,Country,Latitude,Longitude)
lonlat %>% head()

Unnamed: 0_level_0,Location,Country,Latitude,Longitude
Unnamed: 0_level_1,<fct>,<fct>,<dbl>,<dbl>
1,"Abbeville, South Carolina, US",US,34.22333,-82.46171
2,"Acadia, Louisiana, US",US,30.29506,-92.4142
3,"Accomack, Virginia, US",US,37.76707,-75.63235
4,"Ada, Idaho, US",US,43.45266,-116.24155
5,"Adair, Iowa, US",US,41.33076,-94.47106
6,"Adair, Kentucky, US",US,37.1046,-85.2813


## 3. Data Manipulation
The structure of the raw data was not very easy to work with (at least to me) in our situation. So I decided that a little bit of manipulation is needed. By taking `Confirmed`, `Deaths`, `Recovered`, and `Active` down from columns to rows, it would be easier for us to aggregate or filter, etc later on.

Here, we are also going to merge back the location data into the main dataframe.

In [6]:
# aggregates and restructures by calling `melt()` on the data
data <- data %>% 
        select(-c(Last_Update,Latitude,Longitude)) %>% 
        melt( measure.vars = c('Confirmed','Deaths','Recovered','Active'))%>% 
        merge(lonlat,by=c("Location","Country")) %>%
        rename(type=variable,total=value)

# filters out zeros and shows the first few rows
data <- data %>% filter(total>0)
data %>% head

Unnamed: 0_level_0,Location,Country,type,total,Latitude,Longitude
Unnamed: 0_level_1,<fct>,<fct>,<fct>,<int>,<dbl>,<dbl>
1,"Abbeville, South Carolina, US",US,Confirmed,31,34.22333,-82.46171
2,"Abbeville, South Carolina, US",US,Active,31,34.22333,-82.46171
3,"Acadia, Louisiana, US",US,Active,120,30.29506,-92.4142
4,"Acadia, Louisiana, US",US,Deaths,10,30.29506,-92.4142
5,"Acadia, Louisiana, US",US,Confirmed,130,30.29506,-92.4142
6,"Accomack, Virginia, US",US,Active,260,37.76707,-75.63235


In [7]:
# view confirmed cases by Location in the US, odering by total number of cases
data %>% filter(Country=="US",type=="Confirmed")%>% arrange(-total) %>% head()

Unnamed: 0_level_0,Location,Country,type,total,Latitude,Longitude
Unnamed: 0_level_1,<fct>,<fct>,<fct>,<int>,<dbl>,<dbl>
1,"New York City, New York, US",US,Confirmed,167478,40.76727,-73.97153
2,"Cook, Illinois, US",US,Confirmed,36513,41.84145,-87.81659
3,"Nassau, New York, US",US,Confirmed,35854,40.74067,-73.58942
4,"Suffolk, New York, US",US,Confirmed,33664,40.8832,-72.80122
5,"Westchester, New York, US",US,Confirmed,28970,41.16278,-73.75742
6,"Los Angeles, California, US",US,Confirmed,23220,34.30828,-118.22824


In [8]:
data %>% filter(type=="Confirmed") %>% arrange(-total) %>% head(10)

Unnamed: 0_level_0,Location,Country,type,total,Latitude,Longitude
Unnamed: 0_level_1,<fct>,<fct>,<fct>,<int>,<dbl>,<dbl>
1,Spain,Spain,Confirmed,213435,40.46367,-3.74922
2,Italy,Italy,Confirmed,205463,41.87194,12.56738
3,United Kingdom,United Kingdom,Confirmed,171253,55.3781,-3.436
4,"New York City, New York, US",US,Confirmed,167478,40.76727,-73.97153
5,France,France,Confirmed,165764,46.2276,2.2137
6,Germany,Germany,Confirmed,163009,51.16569,10.45153
7,Turkey,Turkey,Confirmed,120204,38.9637,35.2433
8,Russia,Russia,Confirmed,106498,61.52401,105.31876
9,Iran,Iran,Confirmed,94640,32.42791,53.68805
10,Brazil,Brazil,Confirmed,87187,-14.235,-51.9253


In [9]:
# view top 3 locations(NOT top 3 countries) having the most cases 
data %>% filter(type=="Confirmed") %>% arrange(-total) %>% head(3)
top3 <- data %>% filter(type=="Confirmed") %>% arrange(-total) %>% head(3)%>% select(Location) %>% as.list() %>% unlist()


Unnamed: 0_level_0,Location,Country,type,total,Latitude,Longitude
Unnamed: 0_level_1,<fct>,<fct>,<fct>,<int>,<dbl>,<dbl>
1,Spain,Spain,Confirmed,213435,40.46367,-3.74922
2,Italy,Italy,Confirmed,205463,41.87194,12.56738
3,United Kingdom,United Kingdom,Confirmed,171253,55.3781,-3.436


## 4. Data Visualization
Now, we are going to use [Leaflet](https://leafletjs.com/) API in R to make an interactive map to visualize the data we have in hand.

Enough chatter. Let’s go nuts with Leaflet!

In [10]:
library(leaflet)
group_data <- data %>% mutate(popup=paste(Location,"<br>",type," : ",comma(total,digits=0)))
groups = as.character(unique(group_data$type))
groupColors = colorFactor(palette = c("blue","red","green4","lightblue"), domain = group_data$type)

# plot those fuckers!!!
map <- leaflet(group_data) %>%
  # Base groups
  addTiles(group = "OSM") %>%
  addProviderTiles(providers$CartoDB.DarkMatter, group = "Dark Matter (default)") %>%
  addProviderTiles(providers$Stamen.TonerLite, group = "Toner Lite") %>%
  # Overlay groups
  addCircles(lng = ~Longitude, lat = ~Latitude, weight = 2,color=~groupColors(type),
             radius = ~log(total) * 50000, popup = ~popup, group=~type) %>%
  addLayersControl(
      baseGroups = c("Dark Matter (default)","OSM", "Toner Lite"),
      overlayGroups = groups,
      options = layersControlOptions(collapsed = FALSE))%>% 
    hideGroup(c( "Active","Recovered","Confirmed"))  


In [11]:
# set view level
map <- map %>% setView(34, 27, zoom = 1.5)
map

In [12]:
# exports leaflet object to HTML
leaflet_to_HTML <- function(leaflet_object,output_label,title){
    leaflet_object$sizingPolicy$padding <- "0"
    htmlwidgets::saveWidget(
        leaflet_object, 
        paste0(output_label,'.html'), 
        libdir = "lib",
        title = title,
        selfcontained = TRUE
    )    
}

output_label = paste0("covidmap-",Sys.Date()-1)
title = 'COVID19 Daily Report'

leaflet_to_HTML(map,output_label,title)