# Relationship Between Alcohol Consumption & Prevalence of COVID-19 Worldwide

Arwa Hararwala, Josiah Guy, & Jess Strait

## Project Idea

This project will develop a scatterplot that is able to display by country the prevalence of COVID-19 and the percentage of energy intake (kcal) from alcoholic beverages. This plot will show if there is a relationship between countries who consume larger percentages of energy from alcohol and higher rates of COVID-19 prevalence. This project is very relevant to our community, since college students partake in a lot of drinking activities and since COVID-19 is still a large part of our lives. Visual analysis techniques will be used to sort data visually by color and interactively by filtering to show the distinction between continents. Users will be able to leverage this plot to understand where countries and their continents stand in the space of alcohol consumption, COVID-19 prevalence, and the relationship between those two variables.

### Data Sources (Acknowledgement)

<cite> Maria Ren (2020) - "COVID-19 Healthy Diet Dataset Original Data". Published online at Kaggle.com. Retrieved from: 'https://www.kaggle.com/mariaren/covid19-healthy-diet-dataset' [Online Resource]</cite>

<cite>Max Roser, Hannah Ritchie, Esteban Ortiz-Ospina and Joe Hasell (2020) - "Coronavirus Pandemic (COVID-19)". Published online at OurWorldInData.org. Retrieved from: 'https://github.com/owid/covid-19-data/tree/master/public/data' [Online Resource]</cite>
    

### Programming Sources (Acknowledgement)

<cite> Bostock, M., Ogievetsky, V., & Heer, J. (2011). D³ data-driven documents. *IEEE transactions on visualization and computer graphics*, 17(12), 2301-2309.</cite>

<cite> J. D. Hunter, "Matplotlib: A 2D Graphics Environment", Computing in Science & Engineering, vol. 9, no. 3, pp. 90-95, 2007. https://matplotlib.org/2.0.2/examples/api/barchart_demo.html [Online Resource]</cite>

<cite> Jeff Reback, Wes McKinney, jbrockmendel, Joris Van den Bossche, Tom Augspurger, Phillip Cloud, … h-vetinari. (2021, March 2). pandas-dev/pandas: Pandas 1.2.3 (Version v1.2.3). Zenodo. http://doi.org/10.5281/zenodo.4572994 [Online Resource]</cite>

<cite> McKinney, Data structures for statistical computing in python, Proceedings of the 9th Python in Science Conference, Volume 445, 2010. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html [Online Resource]</cite>

http://bl.ocks.org/aaizemberg/78bd3dade9593896a59d

## Prototype

![picture](https://drive.google.com/uc?id=1TzZ6LzhlWpAKIJjbYEUlIg6Jgo0pOjqO)

#### Write-up

The goal of this graph is to show whether or not continents with a higher percentage of COVID cases over their population have been intaking more of a percentage of alcohol in their diet. We look at continents here to give a less granular look to our research question. Are regions of the world intaking more alcohol if more people around them are getting COVID?

Our initial idea was to create a side-by-side bar chart. This allowed us to easily identify the categorical variables in our vizualization. Also, since both of our statistics are percentages, these were able to be graphed on the same y-axis. Both represent parts of a whole, yet the whole differs.

After further discussion and in class evaluations, we realized that doing a side-by-side bar chart would not be as effective as creating a scatterplot for our data. This would allow us to see the correlation between alcoholic consumtion and covid-19 cases per country. 

![picture](https://drive.google.com/uc?id=1wdGeQ4luCmf6Hmagajsfj48eszq6VAFP)

#### Write-up

This is another prototype that our group had created to see how each country compared to the relationship between higher alcohol consumption rates and covid-19 prevalence. We realize now that we should have made a scatterplot that was able to describe the correlation between alcohol intake and covid-19 prevalence and then have a graphical interaction feature where the viewer could hover over the differnt points to find out the country it was associated with. 

While these prototypes are not exaclty the format that we will be using for our final scatterplot graph, they were a great starting point for our group to see how countries related to alcohol and covid-19. It gave us a point of inflection as to the different ways of showing the variables of interest in a clear and more effective graphical display.

## Final Graph

describe describe describe describe describe describe describe describe describe describe 


###  Final Write-up

final final final final final final final final final final final final 

### Code

##### Data Cleaning Steps

In [None]:
%%html
<table>
    <tr style="background-color:white">
        <td colspan="2" style="text-align: center;"><h1>TITLE</h1></td>
    </tr>
    <tr style="background-color:white">
        <td><div id="final"></div></td><td><div id="input"></div></td>
    </tr>
    <tr style="background-color:white">
        <td colspan="2"><div id="legend"></div></td>
    </tr>
</table>

0,1
TITLE,TITLE
,
,


In [None]:
#import libraries 
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# read in csv file 
url = '/content/drive/MyDrive/DS 330 - group folder/Food_Supply_kcal_Data.csv'
alcohol_consumed = pd.read_csv(url)
 
url2 = '/content/drive/MyDrive/DS 330 - group folder/owid-covid-data.csv'
covid_data = pd.read_csv(url2)
 
 
alcohol_consumed = alcohol_consumed.fillna(0) # replace all NAs with 0s
covid_data = covid_data.fillna(0) # replace all NAs with 0s

In [None]:
###### Commented out rows below could be used in order to sort by continents ######
#from datetime import datetime, timedelta
 
 
 
covid_data['date'] = pd.to_datetime(covid_data['date']) #Convert to datetime
max_date = covid_data['date'].max() #Constantly update this viz, where it's always pulling from the latest date.
 
 
 
latest_covid = covid_data[covid_data['date'] == max_date] #Create latest_covid, a table mapped to latest covid-19 data
 
latest_covid = pd.merge(latest_covid, alcohol_consumed, left_on = "location", right_on = "Country", how='left')
 
 
 
latest_covid = latest_covid.rename(columns = {"Alcoholic Beverages":"alcohol", "Country":"country"})
 
 
covid_country = latest_covid.groupby(['country', 'continent']).agg({'total_cases':'sum', 'alcohol':'mean', 'population':'sum'}).reset_index() #Group_by for Country (used in Graph #6)
#covid_continent = latest_covid.groupby(['continent']).agg({'total_cases':'sum', 'alcohol':'mean', 'population':'sum'}).reset_index() #Group_by to look specifically at continents
#covid_continent['covid_percent'] = covid_continent.apply(lambda row: (row.total_cases / row.population)*100, axis = 1)
covid_country['covid_percent'] = covid_country.apply(lambda row: (row.total_cases / row.population)*100, axis = 1)
 
covid_country.head()

Unnamed: 0,country,continent,total_cases,alcohol,population,covid_percent
0,Afghanistan,Asia,56069.0,0.0,38928341.0,0.144031
1,Albania,Europe,120022.0,0.912,2877800.0,4.170616
2,Algeria,Africa,115970.0,0.0896,43851043.0,0.264463
3,Angola,Africa,21642.0,1.9388,32866268.0,0.065849
4,Antigua and Barbuda,North America,1011.0,2.3041,97928.0,1.032391


In [None]:
dataForD3 = []
for index, row in covid_country.iterrows():
    dataForD3.append(row.to_dict())

In [None]:
print (dataForD3[0])

{'country': 'Afghanistan', 'continent': 'Asia', 'total_cases': 56069.0, 'alcohol': 0.0, 'population': 38928341.0, 'covid_percent': 0.14403131127524801}


In [None]:
# Enabling D3 in Google Colab
from IPython.display import  HTML
 
def load_d3_in_cell_output():
  display(HTML("<script src='https://d3js.org/d3.v6.min.js'></script>"))
  
get_ipython().events.register('pre_run_cell', load_d3_in_cell_output)

In [None]:
covid_country.to_csv('final_frame.csv', index = False, header=True)

##### Graphing in D3

# Scatterplot
# Color: continent
# X/Y: Alcohol and COVID cases
# Slider/dropdown to filter by continent

In [None]:
import IPython
js_code = \
'''
  let data = {}
  d3.select("div#final").selectAll("*").remove()
 
  const width = 700
  const height = 600
  const margin = 60 
 
  data = data.map(d=> ({{country:d.country, continent:d.continent, total_cases:d.total_cases, alcohol:d.alcohol, population:d.population, covid_percent:d.covid_percent }}))
  //console.log(data)

  const y = d3.scaleLinear().range([height-margin, margin]).domain(d3.extent(data, (d,i) => d.covid_percent))
  const x = d3.scaleLinear().range([margin, width-margin]).domain(d3.extent(data, (d,i)=>d.alcohol))
  const colorscale = d3.scaleOrdinal().range(["#3366cc", "#dc3912", "#ff9900", "#109618", "#990099", "#0099c6"]).domain(d3.extent(data, (d,i) => d.continent))
 
  const svg = d3.select("div#output-area").append("svg")
    .attr("width", width)
    .attr("height", height)
            
  const xAxis = d3.axisBottom().scale(x)
            
            
  svg.append("g")
    .attr("class", "axis")
    .attr("transform", "translate(0," + (height-margin) + ")")
    .call(xAxis)
            
  const yAxis = d3.axisLeft().scale(y)
            
  svg.append("g")
    .attr("class", "axis")
    .attr("transform", "translate(" + margin + ",0)")
    .call(yAxis)
            
  svg.selectAll("circle")
    .data(data) 
    .join("circle")
    .attr("cx", (d,i)=> x(d.alcohol))
    .attr("cy", (d,i)=> y(d.covid_percent))
    .attr("r", 5)
    .transition()
    .attr("r", (d,i) => d)
    .style("fill",d=>colorscale(d.continent))
    .append("title")   
    .text(d=>"Country: "+d.country+"\n"+"Population: "+d.population)
            
  svg.append("text")
    .attr("transform", "rotate(-90,10,"+(height/2)+")")
    .attr("x", 15)
    .attr("y", height/2)
    .style("text-anchor", "middle")
    .text("% of Population Diagnosed With COVID-19")
            
  svg.append("text")
    .attr("x", (width / 2))             
    .attr("y", 40)
    .attr("text-anchor", "middle")  
    .style("font-size", "20px") 
    .style("font-weight", "bold")
    .text("COVID-19 Cases Per Country as a Function of Alcohol Consumption (kCal)")
            
  svg.append("text")
    .attr("x", width/2)
    .attr("y", height-5)
    .style("text-anchor", "middle")
    .text("% of Country's kCal Intake from Alcohol")
             
  svg.append("rect")
    .join("rect")
    .attr("x", 675)
    .attr("y", 220)
    .attr("width", 200)
    .attr("height", 300)
    .style("fill", "none")
    .style("stroke", "black")
    .style("stroke-width", 2) 
            
          svg.append("text")
                .attr("x", 770)
                .attr("y", 250)
                .style("text-anchor", "middle")
                .style("font-weight", "bold")
                .text("Continent")
            
            svg.append("text")
                .attr("x", 730)
                .attr("y", 300)
                .style("text-anchor", "middle")
                .text("Africa")
            
            svg.append("text")
                .attr("x", 730)
                .attr("y", 340)
                .style("text-anchor", "middle")
                .text("Asia")
            
            svg.append("text")
                .attr("x", 730)
                .attr("y", 380)
                .style("text-anchor", "middle")
                .text("Europe")
            
            svg.append("text")
                .attr("x", 730)
                .attr("y", 420)
                .style("text-anchor", "middle")
                .text("North America")
            
             svg.append("text")
                .attr("x", 730)
                .attr("y", 460)
                .style("text-anchor", "middle")
                .text("Oceania")
            
            svg.append("text")
                .attr("x", 730)
                .attr("y", 500)
                .style("text-anchor", "middle")
                .text("South America")
            
            
            svg.append("circle")
                .attr("cx", 790)
                .attr("cy", 295)
                .attr("r", 8)
                .style("fill", "#3366cc")
            
            
            svg.append("circle")
                .attr("cx", 790)
                .attr("cy", 335)
                .attr("r", 8)
                .style("fill", "#ff9900")
            
            svg.append("circle")
                .attr("cx", 790)
                .attr("cy", 375)
                .attr("r", 8)
                .style("fill", "#109618")
            
            svg.append("circle")
                .attr("cx", 790)
                .attr("cy", 415)
                .attr("r", 8)
                .style("fill", "#990099")
            
            svg.append("circle")
                .attr("cx", 790)
                .attr("cy", 455)
                .attr("r", 8)
                .style("fill", "#0099c6")
             
            svg.append("circle")
                .attr("cx", 790)
                .attr("cy", 495)
                .attr("r", 8)
                .style("fill", "#dc3912")

       })
        .catch(function(error){
        
        })
    
})
</script>
           

'''.format(dataForD3)

display(IPython.display.Javascript(js_code))


<IPython.core.display.Javascript object>

In [None]:
import IPython
js_code = \
'''
  let data = {}
  d3.select("div#final").selectAll("*").remove()
 
  const width = 700
  const height = 600
  const margin = 60 
 
  data = data.map(d=> ({{country:d.country, continent:d.continent, total_cases:d.total_cases, alcohol:d.alcohol, population:d.population, covid_percent:d.covid_percent }}))

  const y = d3.scaleLinear().range([height-margin, margin]).domain(d3.extent(data, (d,i) => d.covid_percent))
  const x = d3.scaleLinear().range([margin, width-margin]).domain(d3.extent(data, (d,i)=>d.alcohol))
  const colorscale = d3.scaleOrdinal().range(["#3366cc", "#dc3912", "#ff9900", "#109618", "#990099", "#0099c6"]).domain(d3.extent(data, (d,i) => d.continent))


  const svg = d3.select('div#output-area').append('svg')
    .attr("width", width)
    .attr("height", height)

  const xAxis = d3.axisBottom().scale(x)

  svg.append("g")
    .attr("class", "axis")
    .attr("transform", "translate(0," + (height-margin) + ")")
    .call(xAxis)

  const yAxis = d3.axisLeft().scale(y)
            
  svg.append("g")
    .attr("class", "axis")
    .attr("transform", "translate(" + margin + ",0)")
    .call(yAxis)

  svg.selectAll("circle")
    .data(data)
    .join("circle")
    .attr("cx", (d,i) => x(d.alcohol))
    .attr("cy", (d,i)=> y(d.covid_percent))
    .attr("r", 0)
    .transition()
    .duration(1000)
    .attr("r", (d,i) => 5)
    .append("title")
    .text(d=>d.location)

'''.format(dataForD3)
display(IPython.display.Javascript(js_code))


<IPython.core.display.Javascript object>

## Legend