# Exploring data on COVID-19

> In this post, We will cover the basic syntax of julia language, and explore the data related on COVID-19. This is the summary of lecture "Introduction to Computational Thinking with Julia, with applications to modeling the COVID-19 Pandemic" from MIT.

- toc: true 
- badges: true
- comments: true
- author: Chanseok Kang
- categories: [Julia, MIT]
- image: images/US_data.png

## Why Julia?

- Julia: Developed at MIT by Prof. Alan Edelman's group
- Released in 2012.
- Current release: 1.4

- Free, open source software
- developed by world-wide community on [Github](https://github.com/JuliaLang/julia)
- Over 3000 registered packages in wide range of domains

## Julia

- Modern, powerful language
- Interactive but high performance (fast) - previously mutually exclusive
- Syntax: similar to Python / Matlab / R
- But carefully designed for high-performance computational science & engineering applications
- Design means that most of Julia is written in Julia itself
- Hence much easier to examine and modify algorithms

![us_data](image/US_data.png)

In [1]:
url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"

"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"

In [2]:
url

"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"

In [3]:
typeof(url)

String

In [4]:
*

* (generic function with 357 methods)

In [5]:
(1 + 2im) * (3 + im)

1 + 7im

In [6]:
@which (1+ 2im) * (3 + im)

## Grab the data

In [7]:
download(url, "covid_data.csv")

"covid_data.csv"

In [8]:
readdir

readdir (generic function with 2 methods)

In [9]:
readdir()

234-element Array{String,1}:
 ".ipynb_checkpoints"
 "2020-05-21-Software-Engineering-Practices-Pt-1.ipynb"
 "2020-05-22-01-Decorator.ipynb"
 "2020-05-22-02-More-On-Decorators.ipynb"
 "2020-05-23-01-Read-clean-and-validate.ipynb"
 "2020-05-23-02-Distributions.ipynb"
 "2020-05-24-01-Relationships.ipynb"
 "2020-05-24-02-Multivariate-Thinking.ipynb"
 "2020-05-25-01-Preparing-the-data-for-analysis.ipynb"
 "2020-05-25-02-Exploring-the-relationship-between-gender-and-policing.ipynb"
 "2020-05-25-03-Visual-exploratory-data-analysis.ipynb"
 "2020-05-25-04-Software-Engineering-Practices-Pt-2.ipynb"
 "2020-05-26-01-Analyzing-the-effect-of-weather-on-policing.ipynb"
 ⋮
 "2020-12-28-01-Exploring-data-on-COVID-19.ipynb"
 "README.md"
 "checkpoints"
 "covid_data.csv"
 "dataset"
 "html"
 "image"
 "models"
 "my_icons"
 "spark-warehouse"
 "utils"
 "video"

## Install Package

In [10]:
using Pkg   # built-in package manager in Julia: Pkg
Pkg.add("CSV")   # calls the `add` function from the module Pkg.  This installs a package

[32m[1m   Updating[22m[39m registry at `C:\Users\kcsgo\.julia\registries\General`


[?25l

[32m[1m   Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`




[32m[1m  Resolving[22m[39m package versions...
[32m[1m   Updating[22m[39m `C:\Users\kcsgo\.julia\environments\v1.4\Project.toml`
[90m [no changes][39m
[32m[1m   Updating[22m[39m `C:\Users\kcsgo\.julia\environments\v1.4\Manifest.toml`
[90m [no changes][39m


In [11]:
Pkg.add("DataFrames")

[32m[1m  Resolving[22m[39m package versions...
[32m[1m   Updating[22m[39m `C:\Users\kcsgo\.julia\environments\v1.4\Project.toml`
[90m [no changes][39m
[32m[1m   Updating[22m[39m `C:\Users\kcsgo\.julia\environments\v1.4\Manifest.toml`
[90m [no changes][39m


## Load a package

Load a package every time we run a Julia session:

In [12]:
using CSV, DataFrames

In [13]:
CSV.read("./covid_data.csv", DataFrame)

Unnamed: 0_level_0,Province/State,Country/Region,Lat,Long,1/22/20
Unnamed: 0_level_1,String?,String,Float64?,Float64?,Int64
1,missing,Afghanistan,33.9391,67.71,0
2,missing,Albania,41.1533,20.1683,0
3,missing,Algeria,28.0339,1.6596,0
4,missing,Andorra,42.5063,1.5218,0
5,missing,Angola,-11.2027,17.8739,0
6,missing,Antigua and Barbuda,17.0608,-61.7964,0
7,missing,Argentina,-38.4161,-63.6167,0
8,missing,Armenia,40.0691,45.0382,0
9,Australian Capital Territory,Australia,-35.4735,149.012,0
10,New South Wales,Australia,-33.8688,151.209,0


In [14]:
data = CSV.read("covid_data.csv", DataFrame)
data

Unnamed: 0_level_0,Province/State,Country/Region,Lat,Long,1/22/20
Unnamed: 0_level_1,String?,String,Float64?,Float64?,Int64
1,missing,Afghanistan,33.9391,67.71,0
2,missing,Albania,41.1533,20.1683,0
3,missing,Algeria,28.0339,1.6596,0
4,missing,Andorra,42.5063,1.5218,0
5,missing,Angola,-11.2027,17.8739,0
6,missing,Antigua and Barbuda,17.0608,-61.7964,0
7,missing,Argentina,-38.4161,-63.6167,0
8,missing,Armenia,40.0691,45.0382,0
9,Australian Capital Territory,Australia,-35.4735,149.012,0
10,New South Wales,Australia,-33.8688,151.209,0


In [15]:
typeof(data)

DataFrame

We can rename each column name.

In [16]:
data_2 = rename(data, 1 => "province", 2 => "country")
data_2

Unnamed: 0_level_0,province,country,Lat,Long,1/22/20
Unnamed: 0_level_1,String?,String,Float64?,Float64?,Int64
1,missing,Afghanistan,33.9391,67.71,0
2,missing,Albania,41.1533,20.1683,0
3,missing,Algeria,28.0339,1.6596,0
4,missing,Andorra,42.5063,1.5218,0
5,missing,Angola,-11.2027,17.8739,0
6,missing,Antigua and Barbuda,17.0608,-61.7964,0
7,missing,Argentina,-38.4161,-63.6167,0
8,missing,Armenia,40.0691,45.0382,0
9,Australian Capital Territory,Australia,-35.4735,149.012,0
10,New South Wales,Australia,-33.8688,151.209,0


In [17]:
# ! is convention: function *modifies* its argument in place
rename!(data, 1 => "province", 2 => "country") 

Unnamed: 0_level_0,province,country,Lat,Long,1/22/20
Unnamed: 0_level_1,String?,String,Float64?,Float64?,Int64
1,missing,Afghanistan,33.9391,67.71,0
2,missing,Albania,41.1533,20.1683,0
3,missing,Algeria,28.0339,1.6596,0
4,missing,Andorra,42.5063,1.5218,0
5,missing,Angola,-11.2027,17.8739,0
6,missing,Antigua and Barbuda,17.0608,-61.7964,0
7,missing,Argentina,-38.4161,-63.6167,0
8,missing,Armenia,40.0691,45.0382,0
9,Australian Capital Territory,Australia,-35.4735,149.012,0
10,New South Wales,Australia,-33.8688,151.209,0


In [18]:
data

Unnamed: 0_level_0,province,country,Lat,Long,1/22/20
Unnamed: 0_level_1,String?,String,Float64?,Float64?,Int64
1,missing,Afghanistan,33.9391,67.71,0
2,missing,Albania,41.1533,20.1683,0
3,missing,Algeria,28.0339,1.6596,0
4,missing,Andorra,42.5063,1.5218,0
5,missing,Angola,-11.2027,17.8739,0
6,missing,Antigua and Barbuda,17.0608,-61.7964,0
7,missing,Argentina,-38.4161,-63.6167,0
8,missing,Armenia,40.0691,45.0382,0
9,Australian Capital Territory,Australia,-35.4735,149.012,0
10,New South Wales,Australia,-33.8688,151.209,0


In [19]:
?rename

search: [0m[1mr[22m[0m[1me[22m[0m[1mn[22m[0m[1ma[22m[0m[1mm[22m[0m[1me[22m [0m[1mr[22m[0m[1me[22m[0m[1mn[22m[0m[1ma[22m[0m[1mm[22m[0m[1me[22m! p[0m[1mr[22mop[0m[1me[22mrty[0m[1mn[22m[0m[1ma[22m[0m[1mm[22m[0m[1me[22ms



```
rename(df::AbstractDataFrame, vals::AbstractVector{Symbol};
       makeunique::Bool=false)
rename(df::AbstractDataFrame, vals::AbstractVector{<:AbstractString};
       makeunique::Bool=false)
rename(df::AbstractDataFrame, (from => to)::Pair...)
rename(df::AbstractDataFrame, d::AbstractDict)
rename(df::AbstractDataFrame, d::AbstractVector{<:Pair})
rename(f::Function, df::AbstractDataFrame)
```

Create a new data frame that is a copy of `df` with changed column names. Each name is changed at most once. Permutation of names is allowed.

# Arguments

  * `df` : the `AbstractDataFrame`; if it is a `SubDataFrame` then renaming is only allowed if it was created using `:` as a column selector.
  * `d` : an `AbstractDict` or an `AbstractVector` of `Pair`s that maps the original names or column numbers to new names
  * `f` : a function which for each column takes the old name as a `String` and returns the new name that gets converted to a `Symbol`
  * `vals` : new column names as a vector of `Symbol`s or `AbstractString`s of the same length as the number of columns in `df`
  * `makeunique` : if `false` (the default), an error will be raised if duplicate names are found; if `true`, duplicate names will be suffixed with `_i` (`i` starting at 1 for the first duplicate).

If pairs are passed to `rename` (as positional arguments or in a dictionary or a vector) then:

  * `from` value can be a `Symbol`, an `AbstractString` or an `Integer`;
  * `to` value can be a `Symbol` or an `AbstractString`.

Mixing symbols and strings in `to` and `from` is not allowed.

See also: [`rename!`](@ref)

# Examples

```julia
julia> df = DataFrame(i = 1, x = 2, y = 3)
1×3 DataFrame
 Row │ i      x      y
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      2      3

julia> rename(df, :i => :A, :x => :X)
1×3 DataFrame
 Row │ A      X      y
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      2      3

julia> rename(df, :x => :y, :y => :x)
1×3 DataFrame
 Row │ i      y      x
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      2      3

julia> rename(df, [1 => :A, 2 => :X])
1×3 DataFrame
 Row │ A      X      y
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      2      3

julia> rename(df, Dict("i" => "A", "x" => "X"))
1×3 DataFrame
 Row │ A      X      y
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      2      3

julia> rename(uppercase, df)
1×3 DataFrame
 Row │ I      X      Y
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      2      3
```


## Interact.jl: Simple interactive visualizations

In [22]:
Pkg.add("Interact")

[32m[1m  Resolving[22m[39m package versions...
[32m[1m  Installed[22m[39m FixedPointNumbers ───── v0.8.4
[32m[1m  Installed[22m[39m CSSUtil ─────────────── v0.1.1
[32m[1m  Installed[22m[39m FunctionalCollections ─ v0.5.0
[32m[1m  Installed[22m[39m Interact ────────────── v0.10.3
[32m[1m  Installed[22m[39m InteractBase ────────── v0.10.5
[32m[1m  Installed[22m[39m WebSockets ──────────── v1.5.7
[32m[1m  Installed[22m[39m HTTP ────────────────── v0.9.2
[32m[1m  Installed[22m[39m Measures ────────────── v0.3.1
[32m[1m  Installed[22m[39m Widgets ─────────────── v0.6.2
[32m[1m  Installed[22m[39m JSExpr ──────────────── v0.5.2
[32m[1m  Installed[22m[39m ColorTypes ──────────── v0.10.9
[32m[1m  Installed[22m[39m IniFile ─────────────── v0.5.0
[32m[1m  Installed[22m[39m Observables ─────────── v0.3.2
[32m[1m  Installed[22m[39m URIs ────────────────── v1.1.0
[32m[1m  Installed[22m[39m WebIO ───────────────── v0.8.15
[32m[1m  Inst

In [23]:
using Interact

┌ Info: Precompiling Interact [c601a237-2ae4-5e1e-952c-7a85b0c7eef1]
└ @ Base loading.jl:1260


In [24]:
for i in 1:10
    @show i
end

i = 1
i = 2
i = 3
i = 4
i = 5
i = 6
i = 7
i = 8
i = 9
i = 10


In [25]:
typeof(1:10)

UnitRange{Int64}

In [26]:
collect(1:10)

10-element Array{Int64,1}:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10

In [27]:
for i in 1:10
    println("i = ", i)
end

i = 1
i = 2
i = 3
i = 4
i = 5
i = 6
i = 7
i = 8
i = 9
i = 10
