## Imports
* lets-plot: data visualization
* krangl: data w{rangl}ing

In [2]:
%use lets-plot, krangl

---
## Load up the COVID-19 data

This data was made available by the **Indiana State Department of Health**. This data is accuarate as of 16 November 2020. The data set that I used can be downloaded [here](https://hub.mph.in.gov/dataset/covid-19-county-statistics).

In [3]:
val countyDf = DataFrame.readCSV("covid_report_county.csv")

Here is a quick look at the data.

In [4]:
countyDf.head()

LOCATION_ID,COVID_COUNT,COVID_DEATHS,COVID_TEST,COUNTY_NAME
18001,1351,15,7455,Adams
18003,14545,268,100255,Allen
18005,2445,61,23836,Bartholomew
18007,249,1,2601,Benton
18009,402,10,3330,Blackford


In [8]:
println("Total Number of COVID Tests completed in Indiana: ${countyDf.cols[3].sum(removeNA=true)}")

Total Number of COVID Tests completed in Indiana: 1952202


---
## Total COVID-19 Cases per County in Indiana

In [9]:
val p = lets_plot(countyDf.toMap())

In [12]:
val casesLayer = geom_bar(stat=Stat.identity, fill=0xFF6666, sampling=sampling_none) {
    x = "COUNTY_NAME"
    y = "COVID_TEST"
}

In [13]:
p + casesLayer + ggtitle("Total COVID-19 Cases per County in Indiana") + ggsize(1000, 300)

---
## Total COVID-19 Deaths per County in Indiana

In [11]:
val deathLayer = geom_bar(stat=Stat.identity, sampling=sampling_none) {
    x = "COUNTY_NAME"
    y = "COVID_DEATHS"
}

In [423]:
p + deathLayer + ggtitle("Total COVID-19 Deaths per County in Indiana") + ggsize(1000, 300)

---
Next, I analyze a much smaller dataset that includes gender and age range data as well as location. To properly load this dataset, I need to tell the dataframe what types to expect. If not, it assumes that one of these columns is supposed to hold binary data, and will fail to load the dataset. 

In [14]:
val df = DataFrame.readCSV("covid_report.csv", colTypes =
        mapOf("LOCATION_ID" to ColType.Int,
              "GENDER" to ColType.String,
              "AGEGRP" to ColType.String,
              "DATE" to ColType.String,
              "COVID_COUNT" to ColType.Int,
              "COUNTY_NAME" to ColType.String
             ))

Here is the format of this dataset.

In [15]:
df.head()

LOCATION_ID,GENDER,AGEGRP,DATE,COVID_COUNT,COUNTY_NAME
18001,F,0-19,YYYY-00-DD 00:00:SS,1,Adams
18001,F,0-19,YYYY-00-DD 00:00:SS,1,Adams
18001,F,0-19,YYYY-00-DD 00:00:SS,1,Adams
18001,F,0-19,YYYY-00-DD 00:00:SS,1,Adams
18001,F,0-19,YYYY-00-DD 00:00:SS,1,Adams


Below, I seperate and sum up the male and female cases. Then I create a bar chart to compare the instances of COVID-19 by gender.

In [30]:
val femaleDf = df.filter{it["GENDER"] eq "F"}
println("Female cases: ${femaleDf.cols[4].sum()}")

Female cases: 24641


In [31]:
val maleDf = df.filter{it["GENDER"] eq "M"}
println("Male cases: ${maleDf.cols[4].sum()}")

Male cases: 23314


In [32]:
val p = lets_plot(
    mapOf(
        "Gender" to listOf("M", "F"), 
        "Cases" to listOf(maleDf.cols[4].sum(), femaleDf.cols[4].sum())
    ))

In [33]:
val genderLayer = geom_bar(stat=Stat.identity) {
    x = "Gender"
    y = "Cases"
    fill = "Gender"
}

In [34]:
p + genderLayer + ggtitle("COVID-19 cases evaluated by Gender")

---
## COVID-19 cases evaluated by Gender and Age Group

In [26]:
val p = lets_plot(df.toMap())

In [27]:
val layer = geom_bar{
    x = "AGEGRP"
    fill = "GENDER"
}

In [28]:
p + layer + ggtitle("COVID-19 cases evaluated by Gender and Age Group")