# Demographic & Geographical Impact on Covid-19

#### Data Source
##### The dataset analyzed in this project was publicly retrieved from Kaggle.
##### Link:https://www.kaggle.com/code/aestheteaman01/demographics-observation-for-pandemic-escalation/data 
#### Group Members
##### Abdullah Aljarallah, Arushi Pathik, Ashley Mercado, Jay Chaudhary, Linting(Linsey) Wang, Vibhas Goel

### Content

#### 1. Executive Summary
#### 2. Problem Statement
#### 3. Dataset Description & Analysis
#### 4. Conclusion
#### 5. References

### 1. Executive Summary

This project aims to dive into the demographic and geological factors that have a massive impact on the spread of COVID-19. Weather (Temperature), age, and medical resources are crucial in the overall data analysis. We aim to find out crucial parameters that impacted the covid spread. To achieve this goal, we found some relevant datasets from Kaggle, and the period of these datasets is from January to April 2020. We chose the following six countries on different continents: India, Mexico, Italy, Argentina, South Korea, and Germany.

First, we looked at the covid overview worldwide and combined the temperature with the case number. Then, we analyzed the relationship between the case number and several possible influence factors, such as age, travel policy, available medical resources, and wind speed. And precipitation. We also compared mental health and anxiety within different lockdown policies to determine which strategy is more beneficial for society. 


### 2. Problem Statement

The world is still tackling the COVID-19 pandemic. Over the past two years, the pandemic has impacted every industry, with healthcare at the forefront. It prompts us to examine the world’s healthcare system, evaluating the ultimate question of “are we pandemic ready?”.
The pandemic has left grave marks on the public’s mental health quotient and anxiety levels, leading to PTSD and even depression. This project aims to dive deeply into how the demographical and geological parameters affected the pandemic’s spread across the globe. Since COVID-19 affected the public’s health in every manner, the motivation behind this project boils down to the crux of the problem. With mental health, anxiety, and depression being talked about the most today, we believed that choosing the COVID-19 pandemic’s impact on overall health was a crucial topic to tackle.

In this project, we will use a publicly available dataset from Kaggle, which analyzes COVID cases across the world with geographical & demographic factors and how it impacts the public’s health.


### 3. Data Description & Analysis

#### 3.1 Dataset Description

The files contain details on the COVID-19 cases and fatalities in numerous nations between January and March 2020. The dataset contains the details listed below:

- The worldwide daily reported and cumulative cases number for all countries, including confirmed, fatalities, and recovery.
- The day's average, minimum and maximum temperatures in each country.
- Population, density, and median population figures for that nation.
- The country's gender ratio, % of the population is over 65.
- Hospital Beds and Available Patients per 1000 Hospital Beds
- Travel within the nation: domestic, international, and outbound.


#### 3.2 Dataset Analysis

**The top 5 countries with maximum Covid cases as of June 6th 2020**

In [5]:
%%bigquery
SELECT sum(confirmed) as TotalCases, Country_Region as Country

 FROM `ba775-fal22-a11.CountryCovid.Covid_19_dataset` 
 group by Country_Region
 order by sum(confirmed) desc
 limit 5

Query complete after 0.00s: 100%|██████████| 1/1 [00:00<00:00, 595.11query/s] 
Downloading: 100%|██████████| 5/5 [00:03<00:00,  1.32rows/s]


Unnamed: 0,TotalCases,Country
0,67630796.0,US
1,13438736.0,Spain
2,13287673.0,Italy
3,10790367.0,UK
4,10143091.0,Germany


After the initial analysis, we can deduce that the United States (US) has the highest number of cases globally. This conclusion is based on the period between January and June 2020.

The US is the only non-European country in the top 5. Europe later emerges as the epicenter of the virus spread. This spike is constant throughout the dataset across the European regions and USA.

**Cases number in US, India and New Zealand**

In [11]:
%%bigquery
SELECT cast(sum(confirmed) as int)/ as TotalCases, Country_Region as Country

 FROM `ba775-fal22-a11.CountryCovid.Covid_19_dataset` 
    
    where Country_Region in ('India','New Zealand','US')
 group by Country_Region
 order by sum(confirmed) desc

Query complete after 0.00s: 100%|██████████| 3/3 [00:00<00:00, 1599.25query/s]                        
Downloading: 100%|██████████| 3/3 [00:02<00:00,  1.46rows/s]


Unnamed: 0,TotalCases,Country
0,67630796,US
1,3744809,India
2,90431,New Zealand


We have chosen these 3 countries based on the stringency of the lockdown imposed in the respective countries. US didnt have hat many travel restrictions hence the high number of cases in the initial days of covid. India and New-Zealand had imposed stricter guidelines of lockdown hence have less number of cases

**Progression of Covid by Date**

In [12]:
%%bigquery
SELECT  ObservationDate
,sum(confirmed) as TotalCases,
sum(sum(confirmed)) over (order by ObservationDate) Cumulative_Total_Cases

 FROM `ba775-fal22-a11.CountryCovid.Covid_19_dataset` 
 group by ObservationDate
 order by ObservationDate

Query complete after 0.00s: 100%|██████████| 4/4 [00:00<00:00, 1478.95query/s]                        
Downloading: 100%|██████████| 132/132 [00:02<00:00, 61.89rows/s]


Unnamed: 0,ObservationDate,TotalCases,Cumulative_Total_Cases
0,2020-01-22,555.0,555.0
1,2020-01-23,653.0,1208.0
2,2020-01-24,941.0,2149.0
3,2020-01-25,1438.0,3587.0
4,2020-01-26,2118.0,5705.0
...,...,...,...
127,2020-05-28,5808946.0,200627379.0
128,2020-05-29,5924275.0,206551654.0
129,2020-05-30,6059017.0,212610671.0
130,2020-05-31,6166946.0,218777617.0


On further analysis of the spread of the virus concerning the date, we have a clear result of the sudden spike in the number of Covid cases throughout the world.

Hence, we will further explore the spread of cases across the world through visualization in the coming phase of this project.

In [4]:
%%bigquery
with CovidProgression as
(SELECT  ObservationDate
,sum(confirmed) as Daily_Total_Cases,
sum(sum(confirmed)) over (order by ObservationDate) Cumulative_Total_Cases
 FROM `ba775-fal22-a11.CountryCovid.Covid_19_dataset` 
 group by ObservationDate
 order by ObservationDate )

select * from CovidProgression where Daily_Total_Cases >200000


Query complete after 0.00s: 100%|██████████| 3/3 [00:00<00:00, 1253.65query/s]                        
Downloading: 100%|██████████| 76/76 [00:02<00:00, 35.16rows/s]


Unnamed: 0,ObservationDate,Daily_Total_Cases,Cumulative_Total_Cases
0,2020-03-18,214915.0,4041239.0
1,2020-03-19,242713.0,4283952.0
2,2020-03-20,272167.0,4556119.0
3,2020-03-21,304549.0,4860668.0
4,2020-03-22,337122.0,5197790.0
...,...,...,...
71,2020-05-28,5808946.0,200627379.0
72,2020-05-29,5924275.0,206551654.0
73,2020-05-30,6059017.0,212610671.0
74,2020-05-31,6166946.0,218777617.0


The Daily_Total_Cases column is the daily confirmed cases in all countries; this indicator can show the spread speed of Covid with time worldwide. The Cumulative_Total_Cases column is the total confirmed cases in all countries from 2022.01.22 to each day, and this number can show us the overall situation of Covid.
We can see that after 56 days(2020.01.22 - 2020.03.18) of reporting the first-ever covid case in the world, the total daily cases crossed the 200,000 mark. After arriving at this number, the covid progression was relatively fast, and the cases began increasing rapidly.

**Covid Recoveries by Date**

In [2]:
%%bigquery
SELECT  ObservationDate
,sum(Recovered) as TotalRecoveries,
sum(sum(Recovered)) over (order by ObservationDate) Cumulative_Total_Recoveries
 FROM `ba775-fal22-a11.CountryCovid.Covid_19_dataset` 
 group by ObservationDate
 order by ObservationDate 
Limit 10

Query complete after 0.00s: 100%|██████████| 3/3 [00:00<00:00, 1330.26query/s]                        
Downloading: 100%|██████████| 10/10 [00:02<00:00,  4.47rows/s]


Unnamed: 0,ObservationDate,TotalRecoveries,Cumulative_Total_Recoveries
0,2020-01-22,28.0,28.0
1,2020-01-23,30.0,58.0
2,2020-01-24,36.0,94.0
3,2020-01-25,39.0,133.0
4,2020-01-26,52.0,185.0
5,2020-01-27,61.0,246.0
6,2020-01-28,107.0,353.0
7,2020-01-29,126.0,479.0
8,2020-01-30,143.0,622.0
9,2020-01-31,222.0,844.0


The above table shows the total recoveries from COVID-19 as a cumulative sum progressing by date. The final observation was roughly 2.7 million COVID-19 recoveries from the analyzed dataset. 

**Covid Deaths by Date**

In [3]:
%%bigquery
SELECT  ObservationDate
,sum(Deaths) as TotalDeaths,
sum(sum(Deaths)) over (order by ObservationDate) Cumulative_Total_Deaths
 FROM `ba775-fal22-a11.CountryCovid.Covid_19_dataset` 
 group by ObservationDate
 order by ObservationDate 
Limit 10

Query complete after 0.00s: 100%|██████████| 3/3 [00:00<00:00, 1345.33query/s]                        
Downloading: 100%|██████████| 10/10 [00:02<00:00,  4.51rows/s]


Unnamed: 0,ObservationDate,TotalDeaths,Cumulative_Total_Deaths
0,2020-01-22,17.0,17.0
1,2020-01-23,18.0,35.0
2,2020-01-24,26.0,61.0
3,2020-01-25,42.0,103.0
4,2020-01-26,56.0,159.0
5,2020-01-27,82.0,241.0
6,2020-01-28,131.0,372.0
7,2020-01-29,133.0,505.0
8,2020-01-30,171.0,676.0
9,2020-01-31,213.0,889.0


The above table shows the total cumulative deaths during the period and dataset we analyzed. The final amount of deaths was 375,559. 

<iframe src="https://streamable.com/e/uo2ir4" frameborder="0" width="100%" height="100%" style="width:100%;height:100%;position:absolute;left:0px;top:0px;overflow:hidden;"></iframe>

In [7]:
from IPython.display import HTML
HTML('<div style="width:100%;height:0px;position:relative;padding-bottom:68.702%;"><iframe src="https://streamable.com/e/uo2ir4" frameborder="0" width="100%" height="100%" allowfullscreen style="width:100%;height:100%;position:absolute;left:0px;top:0px;overflow:hidden;"></iframe></div>')

The above video is a snippet of the moving map visual we cretaed to depict spread of cases with respect to time on daily basis. This is a visual depiction of how cases first came in China, spreaded to Europe and finally exploded in US during the initial stage of infection spread.

#### Analysing 6 countries for covid progression: India, Mexico, Argentina, South Korea, Italy, and Germany

**Age**

In [34]:
%%bigquery
SELECT  Country,sum(Daily_cases) AS sum_of_daily_cases, Age__65_ as PercentAbove65, AVG(Median_Age) AS median_age FROM `ba775-fal22-a11.CountryCovid.6countries`
group by Age__65_,Country
order by Age__65_,sum(Daily_cases) desc


Query complete after 0.00s: 100%|██████████| 3/3 [00:00<00:00, 958.84query/s]                         
Downloading: 100%|██████████| 6/6 [00:02<00:00,  2.50rows/s]


Unnamed: 0,Country,sum_of_daily_cases,PercentAbove65,median_age
0,India,1251,6.39,28.0
1,Mexico,993,7.26,29.0
2,Argentina,820,11.79,32.0
3,South Korea,9660,14.55,44.0
4,Italy,101739,21.69,47.0
5,Germany,66885,22.36,46.0


As a country's median age increase, the number of covid cases increases proportionally. There is almost a perfect correlation (99.18%) between the elderly population percentage, 65+, and the percentage of positive cases in the same age group. These findings highlight that the elderly population has the highest mortality rate.

<img src="https://i.ibb.co/yyHkzLd/AgevsCas.png" alt="AgevsCas" border="0" width=300 />

**Temperature**

In [4]:
%%bigquery
SELECT 
CASE
WHEN Temperature >=-40 AND Temperature <-30 THEN '-40 to -30' 
WHEN Temperature >=-30 AND Temperature <-20 THEN '-30 to -20'
WHEN Temperature >=-20 AND Temperature <-10 THEN '-20 to -10' 
WHEN Temperature >=-10 AND Temperature <0 THEN '-10 to 0'
WHEN Temperature >=0 AND Temperature <10 THEN '0 to 10'
WHEN Temperature >=10 AND Temperature <20 THEN '10 to 20'
WHEN Temperature >=20 AND Temperature <30 THEN '20 to 30'
WHEN Temperature >=30 AND Temperature <40 THEN '30 to 40'
--WHEN Temperature >=80 AND Temperature <90 THEN '80~89'
--WHEN Temperature >=90 AND Temperature <100 THEN '90~99'
END AS temp_group,
sum(Daily_cases) AS sum_of_daily_cases


FROM `ba775-fal22-a11.Weather_Data.Temperature` 
WHERE Temperature IS NOT NULL
GROUP BY temp_group 
ORDER BY sum(Daily_cases) desc

Query complete after 0.00s: 100%|██████████| 3/3 [00:00<00:00, 1047.18query/s]                        
Downloading: 100%|██████████| 8/8 [00:01<00:00,  4.11rows/s]


Unnamed: 0,temp_group,sum_of_daily_cases
0,0 to 10,406759
1,10 to 20,277264
2,20 to 30,46114
3,-10 to 0,44109
4,30 to 40,4610
5,-20 to -10,2424
6,-30 to -20,303
7,-40 to -30,0


After analyzing the spread via Temperature, we see that the vast majority, 88%, of daily cases occur in locations where the Temperature ranges from 0-20 degrees Celsius. When the Temperature drops below 0°, daily cases decrease significantly.

<img src="https://i.postimg.cc/vZxjZs3h/Tempvs-Cases.png" alt="Tempvs-Cases"  height=400 border="0" />


**Travel Policy**

In [4]:
%%bigquery
##this is showing how travel affected the 6 countries' total cases
SELECT country, In_travels_mill__ AS in_travel, Out_Travels__mill__ AS out_travel, Domestic_Travels__mill__ AS domestic_travel, sum(Daily_cases) AS total_cases, In_travels_mill__/AVG(population) * 1000000 AS in_travel_per_million
 FROM `ba775-fal22-a11.CountryCovid.6countries` 
 GROUP BY country, In_travels_mill__, Out_Travels__mill__, Domestic_Travels__mill__
 ORDER BY total_cases DESC;

Query complete after 0.00s: 100%|██████████| 3/3 [00:00<00:00, 1522.43query/s]                        
Downloading: 100%|██████████| 6/6 [00:02<00:00,  2.05rows/s]


Unnamed: 0,country,in_travel,out_travel,domestic_travel,total_cases,in_travel_per_million
0,Italy,93.0,61.195,,101739,1.538161
1,Germany,38881.0,,,66885,464.062672
2,South Korea,15.347,28.696,31.1153,9660,0.299342
3,India,17423.0,,,1251,12.625322
4,Mexico,41.313,86.28,Not Reported,993,0.320423
5,Argentina,6.942,18.41,89.92,820,0.153598


For this table, we analyzed the in-travel, out-travel, & domestic travel for each of the six countries. When normalized for population by million, it appeared Germany had the most in travel. Consequently, Germany had the second most COVID-19 cases among the six countries.
Therefore, we can conclude that since Germany had much travel, that aided in spreading the virus even further and made situations worse for the German public. Hence, the more the travel, the more the virus spreads.

**Medical Resources**

In [3]:
%%bigquery
##compared among Confirmed_Cases_1000, hospital_beds_1000 and Available_Beds_1000
SELECT  country,Confirmed_Cases_1000, hospital_beds_1000, Available_Beds_1000,ROUND( hospital_beds_1000/Confirmed_Cases_1000,2) AS bed_per_case
 FROM `ba775-fal22-a11.CountryCovid.6countries` 
 GROUP BY Date, country, State,Confirmed_Cases_1000, hospital_beds_1000, Available_Beds_1000
 ORDER BY Date DESC, Available_Beds_1000 DESC
 LIMIT 6;

Query complete after 0.00s: 100%|██████████| 3/3 [00:00<00:00, 1855.89query/s]                        
Downloading: 100%|██████████| 6/6 [00:01<00:00,  3.09rows/s]


Unnamed: 0,country,Confirmed_Cases_1000,hospital_beds_1000,Available_Beds_1000,bed_per_case
0,Germany,0.798303,8.3,2.075,10.4
1,Argentina,0.018,5.0,1.85,277.78
2,Italy,1.682698,3.4,0.85,2.02
3,Mexico,0.007702,1.5,0.375,194.76
4,India,0.000906,0.7,0.175,772.2
5,South Korea,0.188437,11.5,0.125,61.03


India has the highest bed_per_case rate, Argentina placed second, Mexico is third, and Italy has the lowest bed_per_case rate.<br> 
However, all of these countries have enough beds for every case.<br>
In terms of total available beds, Germany placed first with more capacity than the total of South Korea, India, Mexico, and Italy combined. The available beds are directly related to the healthcare conditions in each country and how quickly the healthcare professionals tackle the sudden spike in Covid cases across age groups and regions.

### 4. Conclusion



Following are the conclusion we have made after analyzing the dataset:

1. As the median age increases across a geographical region, the virus's spread also increases significantly. There is a direct correlation between median age & number of cases, which tells us that people in the older age group were more prone to getting infected.

2. Life expectancy also seemed to affect the spread of the virus. Males seem more prone to COVID-19 than females, which may be in line with the healthcare facilities provided by the country. Moreover, it is also possible that males were out of their houses more often than females and hence got infected more. 

3. With temperature, we saw that regions with extreme temperatures tend not to have such a high number of cases. Lower cases in extreme high and low temperatures can be understood since there is not much population in regions with extreme temperatures, so the spread decline was significant.

4. While different countries implement various strategies to combat the spread of the virus, the number of confirmed cases is lower in countries with more strict lockdowns. Combined with poor mental health due to social and economic restrictions, we suggest a more tolerant approach is a better strategy.

Through this project, we analyzed the role of demographical and geological factors in spreading the pandemic further. More importantly, we can conclude that COVID-19 has severely impacted the healthcare industry because it has increased physical and mental health conditions such as rising fatigue, anxiety, and depression levels in individuals.
We aim to emphasize the stable health conditions caused due to the pandemic, especially among the younger generation who spent most of their time locked indoors.


### 5. Dashboard


<img src="https://i.postimg.cc/HWBvdx4X/Screen-Shot-2022-10-18-at-7-46-34-PM.png" alt="Tempvs-Cases"  height=400 border="0" />

https://public.tableau.com/app/profile/jay.chaudhary/viz/ba775-a11-fall22/Dashboard1

The above is our dashborad which shows the various visuals and graphs we used to analyze our datatset

### 6. References

https://www.kaggle.com/datasets/aestheteaman01/covcsd-covid19-countries-statistical-dataset

https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data 

https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv

https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.ncdc:C00516

https://www.statista.com/statistics/271315/age-distribution-in-india/

https://tradingeconomics.com/mexico/population-ages-65-and-above-percent-of-total-wb-data.html

https://www.globaldata.com/data-insights/macroeconomic/argentina-population-distribution-in-by-age/

https://knoema.com/atlas/Republic-of-Korea/Population-aged-65-years-and-above

https://www.statista.com/statistics/785104/elderly-population-in-italy/

https://www.statista.com/statistics/454349/population-by-age-group-germany/