![logo](https://drive.google.com/uc?id=1VrvlBTHH4D7xsrNp74wtLBamMZygG8Sy)

<a id="top"></a>
## Table of Contents
1. [Introduction](#introduction)
2. [Task Overview](#task_overview)
3. [Geographical Data Visualization Tool](#task_visualization)
4. [Worldwide COVID-19 Cases](#worldwide)
5. [Italy COVID-19 Cases](#italy)
6. [Italy demographics](#demographics)
7. [Italy meteorology](#meteorology)
8. [Italy mobility](#mobility)
9. [Defining Workflow](#workflow)
10. [Our Code](#task_notebooks)
11. [Next Goals (June deadline)](#task_next)
12. [Daily calls](#task_calls)
13. [Credits](#task_credits)


# 1. Introduction <a id="introduction"></a>





This is a notebook created by the collaborative effort of CoronaWhy.org, a multi-disciplinary global effort of volunteers.
We are presenting this noteebok to explore the geographical dimension in the findings of our teammates that addressed the questions from following tasks:
- **task-vt: What do we know about vaccines and therapeutics?** ([task](https://www.kaggle.com/dataset/08dd9ead3afd4f61ef246bfd6aee098765a19d9f6dbf514f0142965748be859b/tasks?taskId=561) | [submission](https://www.kaggle.com/benjpjones/drug-treatment-extraction-taskvt)) 
- **task-risk: What do we know about COVID-19 risk factors?** ([task](https://www.kaggle.com/dataset/08dd9ead3afd4f61ef246bfd6aee098765a19d9f6dbf514f0142965748be859b/tasks?taskId=558) | [submission](https://www.kaggle.com/arturkiulian/coronawhy-org-task-risk-factors))   


- Visit our [website](https://www.coronawhy.org) to learn more.
- Read our [story](https://medium.com/@arturkiulian/im-an-ai-researcher-and-here-s-how-i-fight-corona-1e0aa8f3e714).
- Visit our [main notebook](https://www.kaggle.com/arturkiulian/coronawhy-org-global-collaboration-join-slack) for historical context on how this community started.

[Back to top](#top)


# 2. Task Overview <a id="task_overview"></a>




Our work in `task-geo` has been split in two different yet related lines of progression:

1. The extraction of geolocations on the original CORD-19 dataset.

2. The gathering of geopolitical and demographic data related to the spread of COVID-19.

The goal of the first line of work is to provide our teammates with additional insights about their findings, like the location of clinical trials or reports and claims.
The goal of the second line of work is to provide researchers with resources to confirm, debate or extends claims and findings made in the papers contained in the CORD-19 dataset.


At the current time, the extraction of geolocations from the CORD-19 datast is at a preliminary stage. We have achieved some initial results, but additional work is required in order to extract the important insights that would be useful to the research community at large. The gathering of geopolitical and demographic data is at a more developed stage. We provide very granular datasets on various aspects related to the spread of the virus and the resulting mortality in the population. The focus aspects were selected in collaboration with `task-risk` and `task-vt` via in depth analysis of the CORD-19 dataset. These datasets have been brought in a standardized format in order to make them easy to use and to combine for researchers and the public at large.



**DISCLAIMER AND RISKS**


We have strived to provide as accurate and up to date information as possible. We rely on multiple external providers and our data is only as accurate as theirs is. Our goal is not to draw any conclusion from the data, but only to present it in a way that is as accessible as possible.

[Back to top](#top)

# 3.Geographical Data Visualization Tool <a Id="task_visualization"></a>

## Overview


A tool (built with Carto) for exploring geospatial relationships between COVID-19 prevalence (by cases, recovery and/or fatality) and geospatial properties (climate and demographic factors), at different geospatial resolution (country, region, etc.) and temporal resolution (dates). 

## General layout
![carto.png](attachment:carto.png)

Across both climate and demographic filters, there are some common controls and visualization. In the center of the screen is a map that can be zoomed in/out using the minus and plus buttons on the bottom left side of the screen.

On the **left side** of the map are **layers** that correspond to COVID-19 prevalence. Select one (and only 1) from each display group to populate the map with corresponding data (see sections below for more details). 

Each **“display group”** basically populates the map with different types of data. In this visualization, Group A corresponds to the COVID case statistic type (deaths, recovered, active). Group B corresponds to the specific metrics being examined (climatic or demographic, depending on the visualization). Group C is an option that can allow data to be grouped specifically by region in the country.

### Date selection

Along the **bottom** of the screen there is a **date picker**. Selecting a single date will populate the map with the number of COVID cases **<u>up until that date</u>**. For example, if the date April 12 is selected, then the COVID statistics will display the accumulated cases up until April 12, and not the number of new cases/recoveries/deaths on April 12 only. Therefore, even though the visualization has a built-in slider to accommodate date range, it does not work correctly yet. Date ranges will be incorporated into future functionality. The Group B data at that given date will also be displayed (for climate data, and not applicable to demographic data).

### Climate Metrics

There are a variety of layers included in display group B to represent climate such as average temperature, minimum temperature, relative humidity, etc. A color spectrum is then used to mark regions depending on the value of the numerical climate metric measure. Currently, we have comprehensive climate data available for visualization only in the country of Italy (none of the other countries can be analyzed with this tool yet). The climate data for each region can be accessed by hovering the mouse over one of the colored dots inside the map view (the climate summary for the selected layer will be displayed above the cursor).

Sources:
- [Corona Data Scraper](https://coronadatascraper.com/#home)
- [NASA Langley Research Center (LaRC) POWER project](https://power.larc.nasa.gov/)


WHAT IT DOES:

This chart displays Italy’s  municipal regions’ COVID-19 information and the corresponding climate condition for a given **date**.
This chart gives the following INFORMATION of the region:

- Number COVID Cases
- Region Name
- Number of Active Cases
- Number of Deaths
- Number of Tests Conducted

And ONE of the below CLIMATE conditions of that region is displayed:

- Average Temperature
- Minimum Temperature
- Maximum Temperature
- Relative Humidity
- Pressure
 
 


### Demographic Metrics

This visualization shows various population demographic data from regional age and gender distributions to population density.
The options on the left are best when only using one selection from each group:

**Group A** - Age, Gender

**Group B** - COVID-19 cases: Total, Active, Recovered and Deaths

**Group C** - Regional Population Density of average number of people per square kilometer.

<u>Group A</u>
Age and gender/sex distributions:

<img src="attachment:age_points.png" align="left" style="margin-right:10px"> 9 age distribution brackets: 0-9, 20-29, 30-39, 30-49, 50-59, 60-69, 70-79, 80+

The order goes from top left like you would read a book. If a region had a uniform distribution of ages they would all be white dots, so this example from Emilia-Romagna shows a lower proportion of under 29’s and an over representation of 40-59 year olds.

<img src="attachment:genders.png" align="left" style="margin-right:10px"/>Gender/sex distribution, 0.5 would be equal distribution. Is displayed with male on the left and female on the right, equal would make both white though the deviation is only small, on average most countries have slightly more women than men, except places like China whose long term policies have affected the distribution.


<u>Group B</u>
These allow different options to show regional counts for COVID-19’s: Cumulative tested cases, number of active cases, number of recovered patients and number of deaths.
This is displayed as a circle in the centre of a region, the size displays the relative count.

<u>Group C</u>
This option only has one choice that colours the regions to display the population density measured in people per square kilometer.
 
Sources:
- [Population by different regional levels](http://demo.istat.it/pop2019)
- [Italian communes to province mapping](https://www.istat.it/it/archivio/6789#Elencodeicodiciedelledenominazionidelleunitterritoriali-0)
- [Cartographic data](https://www.istat.it/it/archivio/222527)


### Errata

In the climate visualization, the humidity data is missing some data points in specific Italian regions. In the database, if there is a missing data point, the database records the humidity value as -999 instead of NaN. We will fix this soon!


[Back to top](#top)

In [None]:
from IPython.display import display, HTML, IFrame
display(HTML(data="""
<style> div#notebook-container { width: 95%; } div#menubar-container { width: 85%; } div#maintoolbar-container { width: 99%; } </style>
"""))

# 4. Worldwide COVID-19 Cases <a id="worldwide"></a>

Display worldwide COVID-19 cases, active, recovered, and deaths along with population for the selected country.

In [None]:
IFrame(src='https://juancalvo.carto.com/builder/b1d52c5c-61de-4a22-9b3e-86d2f2ca7348/embed', width='100%', height=800)

[Back to top](#top)

# 5.Italy COVID-19 Cases <a id="italy"></a>
Display Italy's COVID-19 cases, active, recovered, deaths and tested at region level.

In [None]:
IFrame(src='https://juancalvo.carto.com/builder/2aad3040-19cb-4615-8940-b635cecb9817/embed', width='100%', height=800)

[Back to top](#top)

# 6.Italy Demographics <a id="demographics"></a>

Display Italy's COVID-19 cases along demographics such as age, sex distribution and population density

In [None]:
IFrame(src='https://juancalvo.carto.com/builder/b1b0b61e-acdc-4cc1-b47f-93c4a97b664d/embed', width='100%', height=800)

[Back to top](#top)

# 7. Italy Meteorology <a id="meteorology"></a>

Display Italy's COVID-19 cases along detailed meteorological information such as min/max/avg temperature, humidity and atmospheric pressure.

**NOTE** : Don't select more than 1 from each group simultaneously.

In [None]:
IFrame(src='https://juancalvo.carto.com/builder/9ec4eeeb-1ea2-4ef6-8ee5-3e923ffa944e/embed', width='100%', height=800)

[Back to top](#top)


# 8. Italy Mobility <a id="mobility"></a>

Display Italy's COVID-19 cases along mobility variations as on different categories.

In [None]:
IFrame(src='https://juancalvo.carto.com/builder/fe3dad66-b710-4880-b7e3-2d612d2fce6d/embed', width='100%', height=800)

[Back to top](#top)


# 9. Defining Workflow <a id="workflow"></a>

The current workflow for demographic data extraction is as follows.

**1. Data Extraction**
Data is extracted from external sources. All sources are publicly available and cited in our work.

**2. Formatting**
The extracted data is brought to a common format, making it easy to use and to combine.

**3. Data Elaboration**
Cleaning steps, aggregations, and other operations are applied to the data if needed.

The code for this whole process is open source on our [GitHub](#task_notebooks). In the near future, an automated extraction pipeline will be set up and will generate up to date datasets in a public repository.

[Back to top](#top)

# 10. Our Code <a id="task_notebooks"></a>

All the work from task-geo can be found in our Github repository:
#### https://github.com/CoronaWhy/task-geo

[Back to top](#top)

# 11. Next goal (June deadline) <a id="task_next"></a>

* Automation and auditability for the complete data gathering process.
* Easy access for non-technical researchers for the data we provide.
* Automation and improvement of the paper geolocation process.
* Improvement of outputs

[Back to top](#top)


# 12. Team calls <a id="task_calls"></a>
We operate under radical transparency and all of our meetings/calls are recorded, feel free to review the historical progress and the way we reached this stage so far:

https://trello.com/c/XHahB7GU

[Back to top](#top)

# 13. Credits <a id="task_credits"></a>

Our Task-Geo team:

In alphabetical order:

* **Alex Walther** - Data Scientist
* **Andy Gabel** - Project Management Consultant
* **Ansun Sujoe** - Data Scientist
* **Brian Vacha** - Data Scientist
* **Carles Sala** - Data Scientist
* **Carlos Gomez** - Data Scientist
* **Cassie Gabel** - Inter-team communications, scientific consultant
* **Daniel Robert-Nicoud** - Project Lead
* **Hyberson Pereira** - Project Manager
* **Igor Kiulian** - Data Scientist
* **Imran Ahmed** - Lead sub-Team NLP
* **Ishan Sharma** - Data Scientist
* **Jane Razumovskaya** - Lead sub-Team Demographic Data
* **Juan Calvo** - Data Visualization
* **Kevin Li** - Data Visualization
* **Krishna Sheth** - FullStack Developer Cloud
* **Manuel Alvarez Campo** - Technical Lead
* **Marie Bjerede** - (Former) Project Manager
* **Mike Honey** - Data Visualization Consultant
* **Nicholas Webb** - Consultant (Virology)
* **Oussama Naji** - Data Scientist
* **Roshan Grewal** - Data Scientist
* **Sammer Puran** - Data Scientist
* **Shashi Raj** - Data Scientist
* **Wendy Mak** - Data Scientist

[Back to top](#top)