In [1]:
# this code to will import all the things i need for this notebook

import os
import re
import math
import string
import random
import statistics
from datetime import datetime

import numpy as np
import pandas as pd

# for the notebook rendering 
from IPython.display import display, HTML
from IPython.display import Markdown as MD

# Graphs and Charts
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
# use to export plotly graphs
import plotly.io as pio 

#misc
from scipy.stats import spearmanr, kendalltau
import pycountry


# pandas Settings/Options
pd.set_option("display.max_rows", None) 
pd.set_option("display.max_columns", None)
pd.set_option('display.width', 9000)
pd.set_option('max_colwidth', 400)
pd.set_option('display.float_format', '{:.5f}'.format)

# heatmap colors 
hm_Spectral_r = sns.color_palette('Spectral_r', as_cmap=True)
hm_coolwarm = sns.color_palette('coolwarm', as_cmap=True)
hm_viridis_r = sns.color_palette('viridis_r', as_cmap=True)
hm_coolwarm_r = sns.color_palette('coolwarm_r', as_cmap=True)

## directories 
DIR = os.getcwd()
print(f'{DIR=}')

DataDIR = os.path.join(DIR,'data')
OutDIR = os.path.join(DIR,'docs')

if not os.path.exists(DataDIR):
    print('***DATA FOLDER IS MISSING***')

if not os.path.exists(OutDIR):
    os.makedirs(OutDIR)


# my own little library of helping functions
import Helping_Functions as HF

DIR='c:\\Users\\JGarza\\GitHub\\Excess_Mortality_And_Vaccines_In_Europe'


# Excess Mortality And Vaccines In Europe

**Author:** Justin Garza

**Date:** See below  
  
**Description:**  
This notebook explores excess mortality across Europe, analyzing statistical trends and investigating potential causes through data visualization and interpretation.  

**Content Warning:**    
If you find discussions of death or its underlying factors distressing, please proceed with caution or consider whether this content is right for you.  

In [2]:

current_date = datetime.now().strftime('%Y-%m-%d')
version = datetime.now().strftime('%Y%m%d.%H%M')
display(MD(f"**Date:** {current_date}"))
display(MD(f"**version:** {version}"))

**Date:** 2025-05-17

**version:** 20250517.1753

## Prerequisites
1. Logical Fallacies
2. Scientific Method


### Logical Fallacies
Logical fallacies are errors in reasoning that weaken arguments. They can be categorized into **formal** (structural errors) and **informal** (content errors).


| **Type**                 | **Fallacy**                                | **Description** |
|--------------------------|--------------------------------------------|-----------------|
| Formal                   | Affirming the Consequent                   | Assuming that if *P → Q* and *Q is true*, then *P must be true*. |
|                          | Denying the Antecedent                     | Assuming that if *P → Q* and *P is false*, then *Q must be false*. |
|                          | Non-Sequitur                               | The conclusion does not logically follow from the premises. |
| Informal – Relevance     | **Ad Hominem**                             | Attacking the person instead of the argument. |
|                          | **Straw Man**                              | Misrepresenting an argument to make it easier to attack. |
|                          | **Red Herring**                            | Diverting attention with an irrelevant point. |
|                          | **Appeal to Authority**                    | Claiming something is true because an authority said so. |
|                          | **Appeal to Emotion**                      | Using emotions instead of logic to argue a point. |
| Informal – Causation & Presumption | **Post Hoc Ergo Propter Hoc**    | Assuming that correlation implies causation. |
|                          | Slippery Slope                             | Claiming one action will lead to extreme consequences. |
|                          | False Dilemma                              | Presenting only two options when more exist. |
|                          | Begging the Question                       | Using circular reasoning. |
|                          | False Equivalence                          | Treating two things as equal when they are not. |
|                          | Hasty Generalization                       | Drawing a conclusion from insufficient evidence. |
|                          | No True Scotsman                           | Excluding counterexamples by redefining a group. |

Logical fallacies can make arguments misleading or invalid. Identifying them helps improve critical thinking and debate skills.

#### About Appeal to Authority
**DOCTORS USED TO PRESCRIBE CIGARETTES**


<img src="./docs/camels-fresh-01-2015.webp"
     onerror="this.onerror=null; this.src='./camels-fresh-01-2015.webp';"
     height="200"
     alt="Camel ad" />


##### And More ... 
* Bloodletting  
* Lobotomies  
* Radium and Mercury Treatments  
* Thalidomide for Morning Sickness  
* Cocaine and Heroin as Medicine  
* X-Ray Shoe Fitting  
* Forceps and Twilight Sleep in Childbirth  
* Tapeworm Diet Pills  
* Electroshock Therapy (Overuse)  

**Therefore** Doctors need to provide something more than just saying they are an authority on a subject.


### Post Hoc Ergo Propter Hoc (Correlation vs. Causation)

* Flipping a switch and the light turning on shows **causation**, not just **correlation**, because:

  1. **Mechanistic Understanding** 
      * We know the switch turns on the light (by completes a circuit).
  2. **Temporal Order** 
      * The switch is flipped *before* the light turns on.
  3. **Location Relevance** 
      * The light is in the same place as the switch.
  4. **Alternative Explanations** 
      * Other causes (like a power surge) are less likely, and supporting evidence should be collected for them.


* When later discussing vaccines:

  1. **Mechanistic Understanding** 
      * We're taught **vaccines save lives and are safe**.
  2. **Temporal Order** 
      * Vaccines are given first, then deaths should decrease.
  3. **Location Relevance** 
      * Countries with more vaccines should show different outcomes.
  4. **Alternative Explanations** 
      * Other causes may exist, however supporting evidence should be collected on them and should be consistant with real world data.
      > **ideally** the scale of the cause should match the effect, *this is a rule of thumb and therefore not always true*.

### Scientific Method 
The **scientific method** is a systematic approach to investigating natural phenomena, acquiring knowledge, and testing hypotheses. It consists of the following key steps:

1. **Observation**  
   - Identify a problem or phenomenon that needs explanation.
   - Gather initial data through direct observation or research.

2. **Hypothesis**  
   - Propose a testable and falsifiable explanation (a hypothesis).
   - Example: "If plants receive more sunlight, then they will grow taller."

3. **Experimentation**  
   - Design and conduct controlled experiments to test the hypothesis.
   - Include independent and dependent variables, control groups, and repeatable procedures.

4. **Conclusion**  
   - Determine whether the data supports or refutes the hypothesis.
   - Modify or refine the hypothesis if necessary.

5. **Replication**  
   - Repeat experiments to verify results.
   - Publish findings for scrutiny by the scientific community.




#### Quick Rant!: Peer Review is Flawed

1. Imagine a mechanic fixes my car and writes a paper about it.
2. Other mechanics review and approve the paper.
3. But when I try to start the car—it still won’t run.
4. The mechanic protests, “But my paper was peer-reviewed!”
5. It doesn’t matter how many experts approved the theory—what matters is whether the result actually worked in practice.

Peer reviewed by experts can be helpful and may catch some flaws,
but ultimately, what matters most is whether the product actually does what it’s supposed to do.

#### The scientific method ensures
* objectivity
* reliability
* accuracy

It is an iterative process, meaning that conclusions can lead to new questions and further investigations.

## Observations

There were two sides when it comes to the vaccines 
* The covid-19 vaccines were bad, and cause side effects (including death)
* The covid-19 vaccines were good and saved lives, and is safe.


### News Articles & Headlines
The One side of this can easily be seen in the news headlines, using logical fallacies

#### Ad Hominem Attacks
- [CDC Warns of 'Pandemic of the Unvaccinated'](https://www.cnn.com/videos/health/2021/07/21/delta-variant-coronavirus-vaccines-cohen-newday-vpx.cnn)
- [Covid: French uproar as Macron vows to 'piss off' unvaccinated](https://www.bbc.com/news/world-europe-59873833?utm_source=chatgpt.com)
- [Don Lemon Unloads on Unvaxxed: We Have to ‘Do Things For The Greater Good Of Society, Not For Idiots’](https://www.mediaite.com/tv/don-lemon-unloads-on-unvaxxed-we-have-to-do-things-for-the-greater-good-of-society-not-for-idiots/?utm_source=chatgpt.com)
- [People Who Skip Vaccinations 'Incredibly Selfish' Experts Say](https://www.yahoo.com/lifestyle/people-who-skip-vaccinations-incredibly-selfish-108914416747.html?utm_source=chatgpt.com)
- "If you're willing to walk among us unvaccinated, you are an enemy." - Gene Simmons, co-lead singer and co-founder of KISS
- plague rats
- selfish
- anti-science
- ignorant
- irresponsible.

#### Appeals to Authority 
- [Pope Francis urges people to get vaccinated against Covid-19](https://www.vaticannews.va/en/pope/news/2021-08/pope-francis-appeal-covid-19-vaccines-act-of-love.html)
- [Former Presidents Obama, Bush and Clinton volunteer to get coronavirus vaccine publicly to prove it’s safe](https://www.cnn.com/2020/12/02/politics/obama-vaccine/index.html)
- [FDA Approves First COVID-19 Vaccine](https://www.fda.gov/news-events/press-announcements/fda-approves-first-covid-19-vaccine?utm_source=chatgpt.com)

#### Appeal to Emotions
- [Getting Vaccinated to Help Protect Yourself, Your Family and Your Community](https://www.aha.org/news/perspective/2023-09-29-getting-vaccinated-help-protect-yourself-your-family-and-your-community?utm_source=chatgpt.com)
- [COVID-19 Vaccines Protect the Family, Too](https://www.nih.gov/covid-19-vaccines-protect-family-too?utm_source=chatgpt.com)
- [Concern about loved ones might motivate people to mask up, get vaccine](https://news.umich.edu/concern-about-loved-ones-might-motivate-people-to-mask-up-get-vaccine/)

### Denial of Aid 
One side of this topic also had the power to denie aid and services.

* D.J. Ferguson  
    * Service Denied: Heart transplant at Brigham and Women’s Hospital (2022).  
    * Reason: Refused COVID-19 vaccine, a hospital requirement.
    * [Link](https://www.kpbs.org/news/national/2022/01/26/patient-who-refused-covid-vaccine-was-denied-a-heart-transplant)
* Leilani Lutali  
    * Service Denied: Kidney transplant at UCHealth (2021).  
    * Reason: Opposed vaccine due to religious beliefs; hospital mandated it.
    * [Link](https://www.couriermail.com.au/lifestyle/health/us-hospital-denies-unvaccinated-woman-lifesaving-kidney-transplant/news-story/8cc30ab5dccbd70951621cf1ceb04004)
* Adaline Deal  
    * Service Denied: Heart transplant list at Cincinnati Children’s Hospital (2025).  
    * Reason: Parents refused COVID-19 and flu vaccines on religious grounds.
    * [Link](https://www.foxnews.com/health/young-girl-heart-conditions-denied-being-added-transplant-list-over-vaccination-status-family-says)
* Jennifer Bridges  
    * Service Denied: Employment at Houston Methodist Hospital (2021).  
    * Reason: Refused vaccine mandate; fired.
    * [Link](https://www.khou.com/video/news/local/jennifer-bridges-says-she-expects-to-be-fired-for-not-getting-the-covid-vaccine/285-d1d45450-8dbf-4e23-b75c-47a71f17ff54)
* Northwell Health Employees (1,400 individuals)  
    * Service Denied: Employment (2021).  
    * Reason: Refused vaccine mandate at New York healthcare provider.
    * [Link](https://ny1.com/nyc/all-boroughs/coronavirus/2021/10/04/northwell-health-fires-1-400-unvaccinated-employees?utm_source=chatgpt.com)
* General Cases of Unemployment Benefit Denials  
    * Service Denied: Unemployment benefits (2021-2025).  
    * Reason: Fired or quit over vaccine mandates; often deemed “misconduct.”
    * [Link](https://abc11.com/covid-vaccine-unemployment-mandate-19/10940880/?utm_source=chatgpt.com)

## Hypothesis

Given the time elapsed since the COVID-19 pandemic, can we assess the long-term effectiveness of COVID-19 vaccines through available data ?

**Note:** 
This might not be a binary result (even though there are pretty binary sides), 

| Value     | Effect                      | Description/Correlation-Suggestion                                   |
|-----------|-----------------------------|----------------------------------------------------------------------|
| **1.0**   | Positive effect             | The vaccine could have a `possitive effect` in saving lives and getting society back to normal.          |
| **0.5**   | Slightly positive effect    | The Vaccine could have a `slightly positive effect`                  |
| **0.0**   | Null/No effect              | The Vaccine could have a `null/no effect`                            |
| **-0.5**  | Slightly negative effect    | The Vaccine could have a `slightly negative effect`                  |
| **-1.0**  | Negative effect             | The Vaccine could have a `negative effect`                           |


... and of course we might dissagree on the results, and/or further research might need to be done.  

### Spearman's Rank Correlation

Spearman's Rank Correlation Coefficient (ρ or rₛ) is a non-parametric statistical measure used to assess the strength and direction of a monotonic relationship between two variables.

* **Type:** Non-parametric (used for ordinal or non-linear data)
* **Range:** -1 to +1
  * +1: Perfect positive correlation
  * -1: Perfect negative correlation
  * 0:  No correlation
* **Use Case:** 
    * Ideal for assessing relationships where the data is not normally distributed or the relationship is not linear.

#### **Formula:**

If there are no tied ranks:

$$
\rho = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)}
$$

* $d_i$: difference between the ranks of each pair
* $n$: number of data pairs

In [3]:


# 0 - 99
A = [i for i in range(0, 32)]
B = A

spearman_corr, _ = spearmanr(A,B)
display(MD(f"{A=}"))
display(MD(f"{B=}"))
display(MD(f"* Spearman's Rank Correlation A x B: {spearman_corr:.5f}"))


A=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]

B=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]

* Spearman's Rank Correlation A x B: 1.00000

In [4]:
# flipping B
B = B[::-1]

spearman_corr, _ = spearmanr(A,B)
display(MD(f"{A=}"))
display(MD(f"{B=}"))
display(MD(f"* Spearman's Rank Correlation A x B: {spearman_corr:.5f}"))


A=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]

B=[31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

* Spearman's Rank Correlation A x B: -1.00000

In [5]:
# randomizing B
random.shuffle(B)
spearman_corr, _ = spearmanr(A,B)
display(MD(f"{A=}"))
display(MD(f"{B=}"))
display(MD(f"* Spearman's Rank Correlation A x B: {spearman_corr:.5f}"))




A=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]

B=[1, 18, 25, 6, 23, 11, 8, 7, 3, 22, 15, 21, 17, 20, 26, 14, 16, 29, 5, 9, 2, 31, 28, 19, 24, 4, 10, 0, 12, 13, 27, 30]

* Spearman's Rank Correlation A x B: 0.15616

In [6]:
# randomizing B ... a few times

random_spearmanr = []
for i in range(25):
    random.shuffle(B)
    spearman_corr, _ = spearmanr(A,B)
    random_spearmanr.append(spearman_corr)

minimum = min(random_spearmanr)
maximum = max(random_spearmanr)
average = sum(random_spearmanr) / len(random_spearmanr)
std_dev = statistics.stdev(random_spearmanr)

display(MD(f"25 random Spearman's Rank Correlation A x B:"))
display(MD(f"{minimum=:.5f}"))
display(MD(f"{maximum=:.5f}"))
display(MD(f"{average=:.5f}"))
display(MD(f"{std_dev=:.5f}"))

25 random Spearman's Rank Correlation A x B:

minimum=-0.34861

maximum=0.41679

average=-0.00717

std_dev=0.18100

## Pre-Experiment: Smoking and Lung Cancer by State
lets see if there is a correlation between Smoking and Lung Cancer.

### Sources:
* [CigaretteSmoking Data](https://www.cdc.gov/mmwr/preview/mmwrhtml/mm5943a2.htm)
    * **Source**: https://www.cdc.gov/mmwr/preview/mmwrhtml/mm5943a2.htm
* [DeathByLungCancer Data](	https://wonder.cdc.gov/controller/datarequest/D76)	
    * **Source**: https://wonder.cdc.gov/controller/datarequest/D76
    * see: "./docs/Underlying Cause of Death, 1999-2020 Request Form.pdf" to see how I queried this info


In [7]:
smoking_df = pd.read_excel(os.path.join(DataDIR,"SmokingAndLungCancer.xlsx"),sheet_name = "CigaretteSmoking")
smoking_df = smoking_df.sort_values(by='Total%', ascending=False)

deaths_df = pd.read_excel(os.path.join(DataDIR,"SmokingAndLungCancer.xlsx"),sheet_name = "LungCancerDeaths")
deaths_df = deaths_df.sort_values(by='Crude Rate Per 100,000', ascending=False)

temp = pd.merge(smoking_df,deaths_df, how='left',on='State')

temp = temp.dropna()

styled_df = temp.style.background_gradient(cmap=hm_coolwarm, axis=0, subset=['Total%']) \
                      .background_gradient(cmap=hm_coolwarm, axis=0, subset=['Crude Rate Per 100,000']) 

display(MD(f"### Smoking and Lung Cancer Deaths"))
display(MD(f"* States with the highest percentage of smokers and lung cancer deaths"))
display(MD(f"* Total% = percentage of smokers in the state"))
display(MD(f"* Crude Rate Per 100,000 = lung cancer deaths per 100,000 people in the state"))
display(styled_df)

### Smoking and Lung Cancer Deaths

* States with the highest percentage of smokers and lung cancer deaths

* Total% = percentage of smokers in the state

* Crude Rate Per 100,000 = lung cancer deaths per 100,000 people in the state

Unnamed: 0,State,Total%,(95% CI†),Men%,(95% CI),Women%,(95% CI).1,Deaths,Population,"Crude Rate Per 100,000"
0,West Virginia,25.6,(23.9--27.2),27.7,(25.0--30.4),23.6,(21.7--25.6),1563.0,1847775.0,84.6
1,Kentucky,25.6,(23.9--27.3),27.1,(24.1--30.0),24.2,(22.3--26.1),3294.0,4317074.0,76.3
2,Oklahoma,25.5,(24.1--26.9),27.1,(24.7--29.4),24.0,(22.4--25.5),2444.0,3717572.0,65.7
4,Mississippi,23.3,(22.0--24.6),27.2,(25.0--29.4),19.8,(18.4--21.3),1933.0,2958774.0,65.3
5,Indiana,23.1,(21.7--24.5),24.9,(22.6--27.1),21.5,(19.8--23.1),4070.0,6459325.0,63.0
6,Missouri,23.1,(21.2--25.0),24.3,(21.3--27.4),21.9,(19.6--24.3),3846.0,5961088.0,64.5
7,Alabama,22.5,(20.9--24.1),25.7,(22.9--28.5),19.7,(17.9--21.4),3292.0,4757938.0,69.2
8,Louisiana,22.1,(20.7--23.5),25.1,(22.7--27.4),19.3,(17.9--20.8),2704.0,4491648.0,60.2
9,Nevada,22.0,(19.5--24.5),22.7,(18.8--26.7),21.3,(18.3--24.3),1330.0,2684665.0,49.5
10,Tennessee,22.0,(20.1--23.9),24.6,(21.4--27.9),19.6,(17.7--21.6),4391.0,6306019.0,69.6


### Spearman's Rank Correlation for Smoking x Lung Cancer Death.


In [8]:
temp = temp.sort_values(by='Total%', ascending=False)
temp['rank'] = smoking_df['Total%'].rank()

temp2 = temp.copy()
temp2 = temp2.sort_values(by='Crude Rate Per 100,000', ascending=False)

display(MD(
'''### Output Interpretation

| Value     | Effect                      | Description/Correlation-Suggestion                         |
|-----------|-----------------------------|------------------------------------------------------------|
| **1.0**   | Positive effect             | Suggests more Smoking leads to more Lung Cancer Deaths     |
| **0.5**   | Slightly positive effect    | Slightly more Smoking leads to more Lung Cancer Deaths     |
| **0.0**   | Null/No effect              | Suggests there is no correlation with Smoking and Lung Cancer Deaths |
| **-0.5**  | Slightly negative effect    | Slightly more Smoking leads to fewer Lung Cancer Deaths    |
| **-1.0**  | Negative effect             | Suggests more Smoking leads to fewer Lung Cancer Deaths    |

'''
))

# Calculate Spearman's Rank Correlation
spearman_corr, _ = spearmanr(temp['rank'], temp2['rank'])
print(HF.spearmansrank_bar(spearman_corr))




### Output Interpretation

| Value     | Effect                      | Description/Correlation-Suggestion                         |
|-----------|-----------------------------|------------------------------------------------------------|
| **1.0**   | Positive effect             | Suggests more Smoking leads to more Lung Cancer Deaths     |
| **0.5**   | Slightly positive effect    | Slightly more Smoking leads to more Lung Cancer Deaths     |
| **0.0**   | Null/No effect              | Suggests there is no correlation with Smoking and Lung Cancer Deaths |
| **-0.5**  | Slightly negative effect    | Slightly more Smoking leads to fewer Lung Cancer Deaths    |
| **-1.0**  | Negative effect             | Suggests more Smoking leads to fewer Lung Cancer Deaths    |





Spearman's Rank: 0.613
                                                                                                0.613
________________________________________________________________________________________________*_______________________
-1                            -0.5                          0                             0.5                           1



### Smoking and Lung Cancer – Conclusion

A Spearman's Rank Correlation of **0.613** does not, by itself, prove causation. However, a moderate positive correlation—combined with well-documented evidence such as:

* Issues Breathing 
    * Smoker's cough  
    * Shortness of breath and reduced exercise tolerance  
    * Wheezing and chest tightness  
    * Frequent respiratory infections (e.g., pneumonia, bronchitis) 
    * Shortness of breath and reduced exercise tolerance  
    * Chronic bronchitis  
    * Wheezing and chest tightness  
    * Frequent respiratory infections (e.g., pneumonia, bronchitis)  
    * Increased phlegm and mucus production  
    * Chronic obstructive pulmonary disease (COPD)  
    * Decreased lung function on spirometry tests  
    * Exacerbation of asthma symptoms  
    * Higher rates of emphysema  
* Blackened lungs observed in autopsies  
* Decades of other epidemiological studies 

...strongly supports the conclusion that smoking is a significant risk factor for lung cancer.  
Correlation alone is not proof, but when it aligns with biological mechanisms and consistent real-world observations,  
the case for causation becomes compelling.  



## Other Cause-Effect Relationships with Spearman Ranking

<div class='chatgpt chatgpt-prompt'>
list of some cause and effect relationships (i.e. Hardwork and Success) and Estimate the Spearman's Rank Correlation
</div>
<div class='chatgpt chatgpt-quote'>
Here are some examples of cause-and-effect relationships along with estimated Spearman's Rank Correlation values:

| **Cause**               | **Effect**            | **Estimated Spearman's Rank Correlation** |
|--------------------------|-----------------------|-------------------------------------------|
| Hardwork                | Success              | **0.8** (Strong positive correlation)     |
| Exercise                | Physical Fitness     | **0.7** (Moderate to strong positive correlation) |
| Education               | Income               | **0.6** (Moderate positive correlation)   |
| Smoking                 | Lung Cancer          | **0.5** (Moderate positive correlation)   |
| Screen Time             | Sleep Quality        | **-0.4** (Moderate negative correlation)  |
| Social Media Usage      | Mental Health Issues | **-0.5** (Moderate negative correlation)  |
| Rainfall                | Crop Yield           | **0.6** (Moderate positive correlation)   |
| Pollution               | Respiratory Diseases | **0.7** (Moderate to strong positive correlation) |
| Study Time              | Exam Scores          | **0.8** (Strong positive correlation)     |
| Alcohol Consumption     | Reaction Time        | **-0.6** (Moderate negative correlation)  |

These are estimated values and may vary depending on the dataset and context.
</div>

## Data Importing and Methodology

below we will be importing and cleaning the data for 
* Vaccines Data 
* Population Data 
* Deaths Data (grouped by Age)
* Causes of Death Data 

### Vaccine Data 

#### Getting the Data
1. Go to [https://www.ecdc.europa.eu/](https://www.ecdc.europa.eu/en/publications-data/data-covid-19-vaccination-eu-eea)
2. Click the `Download in CSV`.


In [9]:
# import Vaccine Data 
vd = pd.read_csv(os.path.join(DataDIR,'data.csv'))
vd = vd[vd['TargetGroup'] == 'ALL']

# create a year column
vd['year'] = pd.to_numeric(vd['YearWeekISO'].str[0:4])

# renaming columns for shorter names
vd = vd.rename(columns={'ReportingCountry':'abbr'})
vd = vd.rename(columns={'Vaccine':'vacc'})
vd = vd.rename(columns={'FirstDose':'dose1'})
vd = vd.rename(columns={'SecondDose':'dose2'})
vd = vd.rename(columns={'DoseAdditional1':'dose3'})
vd = vd.rename(columns={'DoseAdditional2':'dose4'})
vd = vd.rename(columns={'DoseAdditional3':'dose5'})
vd = vd.rename(columns={'DoseAdditional4':'dose6'})
vd = vd.rename(columns={'DoseAdditional5':'dose7'})
vd = vd.rename(columns={'UnknownDose':'doseUNK'})

vd = vd.rename(columns={'Population':'population'})


doseCol = ['dose1','dose2','dose3','dose4','dose5','dose6','dose7','doseUNK']
# calculate the sum of all the Doses
vd = pd.pivot_table(
    data = vd,
    values = doseCol,
    index = ['abbr','year','population','vacc'],
    aggfunc="sum"
    )
vd = vd.reset_index()

# gets get the total doses given 
vd['total_dose'] = vd[doseCol].sum(axis=1)
doseCol.append('total_dose')

# lets make a new record for all vaccines
temp = pd.pivot_table(
    data = vd,
    values = doseCol,
    index = ['abbr','year','population'],
    aggfunc="sum"
    )
temp = temp.reset_index()
temp['vacc'] = 'All'

# and add it to all the vaccine data
vd = pd.concat([vd,temp])

# gets get the total dose1 (first dose) given 
vd['total_dose1'] = vd['dose1']
# gets get the total dose1 (first dose) given 
vd['total_dose2'] = vd['dose2']

# dose1 ... cumulative sum, dose1/pop. , dose1/pop 
vd['td1_sum'] = vd.groupby(['abbr','vacc'])['total_dose1'].cumsum()
vd['dose1_pop_ratio'] = vd['td1_sum']/vd['population']

# dose2 ... cumulative sum, dose1/pop. , dose1/pop 
vd['td2_sum'] = vd.groupby(['abbr','vacc'])['total_dose2'].cumsum()
vd['dose2_pop_ratio'] = vd['td2_sum']/vd['population']

# total_dose ... cumulative sum, total/pop. , total/pop
vd['td_sum'] = vd.groupby(['abbr','vacc'])['total_dose'].cumsum()
vd['dose_pop_ratio'] = vd['td_sum']/vd['population']


vd['name'] = vd.abbr.apply(HF.abbr_to_name)

# save to out folder
vd.to_csv(r'out\vacc_data.csv',index=False)

vd[(vd['abbr']=='FI') ].to_csv(r'out\vacc_data_FI.csv',index=False)

# Check some of the Data 
display(vd[(vd['abbr']=='FI') & (vd['vacc']=='All')])
display(vd[(vd['abbr']=='NO') & (vd['vacc']=='All')])
display(vd[(vd['abbr']=='IE') & (vd['vacc']=='All')])
display(vd[(vd['abbr']=='RO') & (vd['vacc']=='All')])

display(vd[(vd['abbr']=='SI') & (vd['vacc']=='All')])

Unnamed: 0,abbr,year,population,vacc,dose1,dose2,dose3,dose4,dose5,dose6,dose7,doseUNK,total_dose,total_dose1,total_dose2,td1_sum,dose1_pop_ratio,td2_sum,dose2_pop_ratio,td_sum,dose_pop_ratio,name
40,FI,2020,5525292,All,17151,0,0,0,0,0,0,0,17151,17151,0,17151,0.0031,0,0.0,17151,0.0031,Finland
41,FI,2021,5533793,All,12062376,11587069,3835791,2706,0,0,0,0,27487942,12062376,11587069,12079527,2.18287,11587069,2.09387,27505093,4.97039,Finland
42,FI,2022,5548241,All,153691,359625,5333235,3641874,1426157,0,0,0,10914582,153691,359625,12233218,2.20488,11946694,2.15324,38419675,6.92466,Finland
43,FI,2023,5548241,All,3285,4192,21628,143580,209551,0,0,0,382236,3285,4192,12236503,2.20547,11950886,2.154,38801911,6.99355,Finland


Unnamed: 0,abbr,year,population,vacc,dose1,dose2,dose3,dose4,dose5,dose6,dose7,doseUNK,total_dose,total_dose1,total_dose2,td1_sum,dose1_pop_ratio,td2_sum,dose2_pop_ratio,td_sum,dose_pop_ratio,name
91,NO,2020,5367580,All,2399,0,0,0,0,0,0,0,2399,2399,0,2399,0.00045,0,0.0,2399,0.00045,Norway
92,NO,2021,5391369,All,3964299,3818779,1564641,1960,10,0,0,0,9349689,3964299,3818779,3966698,0.73575,3818779,0.70831,9352088,1.73464,Norway
93,NO,2022,5425270,All,30934,86929,1448074,861635,12897,0,0,0,2440469,30934,86929,3997632,0.73685,3905708,0.71991,11792557,2.17364,Norway
94,NO,2023,5425270,All,493,578,7719,51031,151590,0,0,0,211411,493,578,3998125,0.73694,3906286,0.72002,12003968,2.2126,Norway


Unnamed: 0,abbr,year,population,vacc,dose1,dose2,dose3,dose4,dose5,dose6,dose7,doseUNK,total_dose,total_dose1,total_dose2,td1_sum,dose1_pop_ratio,td2_sum,dose2_pop_ratio,td_sum,dose_pop_ratio,name
56,IE,2020,4964440,All,4363,32,1,0,0,0,0,1,4397,4363,32,4363,0.00088,32,1e-05,4397,0.00089,Ireland
57,IE,2021,5006324,All,3681248,3404281,2168443,218,4,0,0,124398,9378592,3681248,3404281,3685611,0.73619,3404313,0.68,9382989,1.87423,Ireland
58,IE,2022,5060004,All,21121,30082,783067,1107140,352295,423,0,10584,2304712,21121,30082,3706732,0.73256,3434395,0.67873,11687701,2.30982,Ireland
59,IE,2023,5060004,All,1388,1043,12792,156316,130357,181962,0,439,484297,1388,1043,3708120,0.73283,3435438,0.67894,12171998,2.40553,Ireland


Unnamed: 0,abbr,year,population,vacc,dose1,dose2,dose3,dose4,dose5,dose6,dose7,doseUNK,total_dose,total_dose1,total_dose2,td1_sum,dose1_pop_ratio,td2_sum,dose2_pop_ratio,td_sum,dose_pop_ratio,name
103,RO,2021,19201662,All,7734378,5677069,1140530,0,0,0,0,0,14551977,7734378,5677069,7734378,0.4028,5677069,0.29566,14551977,0.75785,Romania
104,RO,2022,19042455,All,181348,184394,611197,25319,0,0,0,0,1002258,181348,184394,7915726,0.41569,5861463,0.30781,15554235,0.81682,Romania
105,RO,2023,19042455,All,2219,1678,2714,6420,0,0,0,0,13031,2219,1678,7917945,0.4158,5863141,0.3079,15567266,0.8175,Romania


Unnamed: 0,abbr,year,population,vacc,dose1,dose2,dose3,dose4,dose5,dose6,dose7,doseUNK,total_dose,total_dose1,total_dose2,td1_sum,dose1_pop_ratio,td2_sum,dose2_pop_ratio,td_sum,dose_pop_ratio,name
110,SI,2020,2095861,All,9320,1,0,0,0,0,0,0,9321,9320,1,9320,0.00445,1,0.0,9321,0.00445,Slovenia
111,SI,2021,2108977,All,1154571,982990,451187,163,0,0,0,0,2588911,1154571,982990,1163891,0.55187,982991,0.4661,2598232,1.23199,Slovenia
112,SI,2022,2107180,All,9761,27935,193272,73756,0,0,0,0,304724,9761,27935,1173652,0.55698,1010926,0.47975,2902956,1.37765,Slovenia
113,SI,2023,2107180,All,161,259,481,3267,0,0,0,0,4168,161,259,1173813,0.55705,1011185,0.47988,2907124,1.37963,Slovenia


In [10]:
display(vd[(vd['abbr']=='FI') & (vd['vacc']=='COM')])

Unnamed: 0,abbr,year,population,vacc,dose1,dose2,dose3,dose4,dose5,dose6,dose7,doseUNK,total_dose,total_dose1,total_dose2,td1_sum,dose1_pop_ratio,td2_sum,dose2_pop_ratio,td_sum,dose_pop_ratio,name
239,FI,2020,5525292,COM,17106,0,0,0,0,0,0,0,17106,17106,0,17106,0.0031,0,0.0,17106,0.0031,Finland
248,FI,2021,5533793,COM,9623064,9627417,3139564,1959,0,0,0,0,22392004,9623064,9627417,9640170,1.74205,9627417,1.73975,22409110,4.0495,Finland
257,FI,2022,5548241,COM,125738,293165,3490003,2364683,41403,0,0,0,6314992,125738,293165,9765908,1.76018,9920582,1.78806,28724102,5.17715,Finland
266,FI,2023,5548241,COM,257,330,111,225,237,0,0,0,1160,257,330,9766165,1.76023,9920912,1.78812,28725262,5.17736,Finland


### Population Data 

#### Getting the Data
1. go to [Europa.eu - Database](https://ec.europa.eu/eurostat/databrowser/explore/all/popul)
2. choose 
    * Population and social conditions 
    * Demography, population stock and balance 
    * Population (national level)
    * Population on 1 January by age and sex
3. Click the little table 󰓫
4. customize the data 
    * Customize your dataset -> Time -> From - to 
        * From: 2015
        * To: [Current or Max]
    * Customize your dataset -> Age class
        * All -> Uncheck all
        * Check "[TOTAL] Total"
5. Click `download` (as a spreadsheet) and place the file in the `.\data` folder


<img src="./docs/chrome_8rsdjTGnV1.png"
     onerror="this.onerror=null; this.src='./chrome_8rsdjTGnV1.png';"
     style="height: 200px"
     alt="" />

In [11]:
# getting the data 
pop = pd.read_excel(os.path.join(DataDIR,"demo_pjan__custom_16646654_spreadsheet.xlsx"),sheet_name = "Sheet 1")

# remove the headers
pop = pop.iloc[7::]

# drop the bad columns
for c in pop.columns:
    if pd.isnull(pop.at[8,c]):
        pop = pop.drop(columns=[c])

# rename time columns
for c in pop.columns:
    name = pop.at[8,c]
    pop = pop.rename(columns={c: name})

# rename the first two columns
pop = pop.rename(columns={'TIME':'name'})
pop['abbr'] = pop['name'].apply(HF.name_to_abbr)


# drop, replace, reset index,
pop = pop.drop([7,8,9])
pop = pop.replace(to_replace=':', value=None)
pop = pop.reset_index(drop=True)

# # peak at the data 
display(pop.head(5))
display(pop.tail(5))


  warn("Workbook contains no default style, apply openpyxl's default")


Unnamed: 0,name,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024,abbr
0,European Union - 27 countries (from 2020),442911027,443987823,444655529,445287011,446135629,447015600.0,445872542.0,445972024.0,447695350.0,449306184.0,Unknown
1,European Union - 28 countries (2013-2020),507764420,509366867,510499671,511560587,512782741,,,,,,Unknown
2,European Union - 27 countries (2007-2013),503583505,505235752,506420902,507546246,508814065,,,,,,Unknown
3,Euro area – 20 countries (from 2023),342243873,343429507,344126499,344794283,345697844,346625682.0,346699769.0,347104468.0,348557873.0,350174019.0,Unknown
4,Euro area - 19 countries (2015-2022),338062958,339298392,340047730,340779942,341729168,342692171.0,342806743.0,343242163.0,344706979.0,346312052.0,Unknown


Unnamed: 0,name,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024,abbr
65,bp,"break in time series, provisional",,,,,,,,,,Unknown
66,b,break in time series,,,,,,,,,,Unknown
67,e,estimated,,,,,,,,,,Unknown
68,ep,"estimated, provisional",,,,,,,,,,Unknown
69,p,provisional,,,,,,,,,,Unknown


In [12]:
# This code processes the population data and puts it into a long format

temp = pop.melt(id_vars=['name','abbr'], var_name='year', value_name='population')
temp['year'] = pd.to_numeric(temp['year'])
pop = temp

In [13]:
# lets remove some data we don't need

# this is a combination of 27 countries

pop = pop[~pop['name'].str.contains("European Union", na=False)]
pop = pop[~pop['name'].str.contains("Euro area", na=False)]
pop = pop[pop['name']!= 'not available']
pop = pop[pop['name']!= 'Special value']
pop = pop[pop['name']!= 'None']
pop = pop[pop['name']!= 'Observation flags:']
pop = pop[pop['name']!= 'p']
pop = pop[pop['name']!= 'Nan']
pop = pop[pop['name']!= 'd']
pop = pop[pop['name']!= 'b']
pop = pop[pop['name']!= 'bp']
pop = pop[pop['name']!= 'e']
pop = pop[pop['name']!= 'p']

pop = pop[~pop['name'].isna()]

# removed due to lack of reporting data 
pop = pop[pop['abbr']!= 'Unknown']


In [14]:
# remove countries with no population data

none_counts = pop[pop['population'].isna()].groupby('abbr').size()
abbrs_with_4_or_more_none = none_counts[none_counts >= 4].index.tolist()
pop = pop[~pop['abbr'].isin(abbrs_with_4_or_more_none)]

###################################################################################

# Fill missing population values by forward and backward filling within each 'abbr' group
# Interpolate missing population values within each 'abbr' group, then forward/backward fill any remaining gaps
pop['population'] = (
    pop.groupby('abbr')['population']
    .transform(lambda x: x.infer_objects(copy=False)
                        .interpolate(method='linear', limit_direction='both')
                        .ffill()
                        .bfill())
)

###################################################################################

pop = pop[['abbr','year','population']]

###################################################################################

# sort values
pop = pop.sort_values(by=['abbr','year'], ascending=[True,True])

###################################################################################

# save to out folder
pop.to_csv(r'out\pop_data.csv',index=False)


In [15]:
# # peak at the data 
display(pop.head(20))
# display(pop.tail(25))

Unnamed: 0,abbr,year,population
47,AL,2015,2885796.0
117,AL,2016,2875592.0
187,AL,2017,2876591.0
257,AL,2018,2870324.0
327,AL,2019,2862427.0
397,AL,2020,2845955.0
467,AL,2021,2829741.0
537,AL,2022,2793592.0
607,AL,2023,2761785.0
677,AL,2024,2761785.0


### Deaths Data (grouped by Age)

#### Getting the Data
1. go to [Europa.eu - Database](https://ec.europa.eu/eurostat/databrowser/explore/all/popul)
2. choose 
    * Population and social conditions 
    * Demography, population stock and balance 
    * Deaths by week – special data collection
    * Deaths by week, sex and 20-year age group
3. Click the little table 󰓫
4. customize the data 
    * Customize your dataset -> Time -> From - to 
        * From: 2015-W01
        * To: [Current or Max]
    * Move the `Age Class` under `Geopolitical entity (reporting)`
5. Click `download` (as a spreadsheet) and place the file in the `.\data` folder


<img src="./docs/chrome_rZTvtL3J2L.png"
     onerror="this.onerror=null; this.src='./chrome_rZTvtL3J2L.png';"
     style="height: 200px"
     alt="" />


In [16]:
# getting the data 
ddw = pd.read_excel(
    os.path.join(DataDIR, "demo_r_mwk_20__custom_16646490_spreadsheet.xlsx"),
    sheet_name="Sheet 1",
    engine="openpyxl"
)

# remove the headers
ddw = ddw.iloc[7::]

# drop the bad columns
for c in ddw.columns:
    if pd.isnull(ddw.at[7,c]):
        ddw = ddw.drop(columns=[c])

# rename time columns
for c in ddw.columns:
    name = ddw.at[7,c]
    ddw = ddw.rename(columns={c: name})

# make the duplicate column names unique
ddw = HF.df_column_uniquify(ddw)

# # rename the first two columns
ddw = ddw.rename(columns={'TIME': 'abbr'})
ddw = ddw.rename(columns={'TIME_1':'name'})
ddw = ddw.rename(columns={'TIME_2':'agegrp'})
ddw = ddw.rename(columns={'TIME_3':'agegrp_desc'})
           
# drop, replace, reset index,
ddw = ddw.drop([7,8,9])
ddw = ddw.replace(to_replace=':', value=None)
ddw = ddw.reset_index(drop=True)

# # peak at the data 
# display(dd.head(5))
# display(dd.tail(5))

  warn("Workbook contains no default style, apply openpyxl's default")


In [17]:
# This code processes the raw death data (dd) by restructuring it into a long-form dataframe. 
# Each row in the new dataframe represents a single country's deaths for a specific year and week, 
# along with additional metadata such as country abbreviations and derived values.

temp = ddw.melt(id_vars=['name','abbr','agegrp','agegrp_desc'],var_name='year-week',value_name='deaths')
temp['year'] = pd.to_numeric(temp['year-week'].str[0:4])
temp['week'] = pd.to_numeric(temp['year-week'].str[6:8])
temp['year.week'] = temp['year'] + temp['week']/100
temp['year.p'] = temp['year'] + (temp['week']/53.001)
ddw = temp

In [18]:
# lets remove some data we don't need

# this is a combination of 27 countries
ddw = ddw[ddw['abbr']!= 'EU27_2020'] 
ddw = ddw[ddw['abbr']!= 'not available']
ddw = ddw[ddw['abbr']!= 'Special value']
ddw = ddw[ddw['abbr']!= 'None']
ddw = ddw[ddw['abbr']!= 'Observation flags:']
ddw = ddw[ddw['abbr']!= 'p']

ddw = ddw[~ddw['name'].isna()]
ddw = ddw[~ddw['agegrp'].isna()]

# we don't need these columns 
ddw.drop(columns=['agegrp_desc'], inplace=True)
ddw.drop(columns=['year-week'], inplace=True)

# converting columns 
ddw['deaths'] = pd.to_numeric(ddw['deaths'])

In [19]:
# there are quite a few NAN (not a number) values
# here we get rid of them 

# Get counts of NA values for each 'abbr' group
na_counts = ddw[ddw.deaths.isna()].groupby(['abbr','agegrp']).size()

# Filter for 'abbr' groups with more than 12 NA values
filtered_abbrs = na_counts[na_counts > 12*5].index

for abbr, agegrp in filtered_abbrs:
    print(f'removing -- {abbr} {HF.abbr_to_name(abbr)} {agegrp} NACount={na_counts[(abbr, agegrp)]}')
    ddw = ddw[~((ddw.abbr == abbr) & (ddw.agegrp == agegrp))]


removing -- AD Andorra TOTAL NACount=277
removing -- AD Andorra Y20-39 NACount=277
removing -- AD Andorra Y40-59 NACount=277
removing -- AD Andorra Y60-79 NACount=277
removing -- AD Andorra Y_GE80 NACount=277
removing -- AD Andorra Y_LT20 NACount=277
removing -- AL Albania TOTAL NACount=187
removing -- AL Albania Y20-39 NACount=187
removing -- AL Albania Y40-59 NACount=187
removing -- AL Albania Y60-79 NACount=187
removing -- AL Albania Y_GE80 NACount=187
removing -- AL Albania Y_LT20 NACount=187
removing -- DE Germany Y20-39 NACount=68
removing -- DE Germany Y_LT20 NACount=68
removing -- GE Georgia TOTAL NACount=224
removing -- GE Georgia Y20-39 NACount=224
removing -- GE Georgia Y40-59 NACount=224
removing -- GE Georgia Y60-79 NACount=224
removing -- GE Georgia Y_GE80 NACount=224
removing -- GE Georgia Y_LT20 NACount=224
removing -- IE Ireland Y20-39 NACount=538
removing -- IE Ireland Y40-59 NACount=538
removing -- IE Ireland Y60-79 NACount=538
removing -- IE Ireland Y_GE80 NACount=5

In [20]:
# we are going to create a new age group, everyone less than 80
# LT80 is less than 80 years old

# exclude where the age is greater than 80
temp = ddw.copy()
temp = temp[temp['agegrp'] != 'TOTAL']
temp = temp[temp['agegrp'] != 'Y_GE80']

temp = pd.pivot_table(
    temp, 
    values='deaths', 
    index=['name', 'abbr','year','week','year.week','year.p'], 
    aggfunc='sum'
)
temp = temp.reset_index()

temp['agegrp'] = 'LT80'

ddw = pd.concat([ddw,temp])

In [21]:
# save to out folder
ddw.to_csv(r'out\death_data_weekly.csv',index=False)

In [22]:
# peak at the data
display(ddw.head(10))

Unnamed: 0,name,abbr,agegrp,deaths,year,week,year.week,year.p
5,Belgium,BE,TOTAL,2461.0,2015,1,2015.01,2015.01887
6,Belgium,BE,Y_LT20,19.0,2015,1,2015.01,2015.01887
7,Belgium,BE,Y20-39,34.0,2015,1,2015.01,2015.01887
8,Belgium,BE,Y40-59,191.0,2015,1,2015.01,2015.01887
9,Belgium,BE,Y60-79,756.0,2015,1,2015.01,2015.01887
10,Belgium,BE,Y_GE80,1461.0,2015,1,2015.01,2015.01887
11,Bulgaria,BG,TOTAL,2501.0,2015,1,2015.01,2015.01887
12,Bulgaria,BG,Y_LT20,16.0,2015,1,2015.01,2015.01887
13,Bulgaria,BG,Y20-39,39.0,2015,1,2015.01,2015.01887
14,Bulgaria,BG,Y40-59,294.0,2015,1,2015.01,2015.01887


In [23]:
dd = pd.pivot_table(
    ddw,
    values='deaths',
    index=['name', 'abbr','agegrp','year'], 
    aggfunc='sum'
)
dd = dd.reset_index()

# save to out folder
dd.to_csv(r'out\death_data_year.csv',index=False)

In [24]:
# here we are calculating deaths per 100,000 people

ddn = dd.copy()
# Merge population data into ddn based on country and year
ddn = ddn.merge(pop, on=['abbr', 'year'], how='left')


ddn['deaths_pp'] = 0.0
ddn['deaths_pp'] = (ddn['deaths'] / ddn['population']) 

ddn['deaths_p1ht'] = 0.0
ddn['deaths_p1ht'] = (ddn['deaths'] / ddn['population']) * 100000

#########################################

# lets calculate a normalized deaths, based on the deaths per 100,000 people

ddn['deaths_norm'] = np.nan

## baseline years are 2015,2016,2017 before the pandemic 
blyears = ddn[ddn.year.isin([2015,2016,2017])]
grouped = blyears.groupby(['abbr','agegrp'])

# Compute baseline mean
temp = grouped['deaths_p1ht'].agg(
    baseline='mean',
).reset_index()

# Merge baseline means with the original DataFrame
ddn = ddn.merge(temp, on=['abbr', 'agegrp'], how='left')

# Normalize deaths column
ddn['deaths_norm'] = ddn['deaths_p1ht'] / ddn['baseline']

# Drop the intermediate baseline column if not needed
ddn.drop(columns=['baseline'], inplace=True)

# save to out folder
ddn.to_csv(r'out\death_data_norm.csv',index=False)

In [25]:
### peak at the data
display(ddn[(ddn.agegrp == 'TOTAL') & (ddn.abbr == 'NL')].head(30))
display(ddn[(ddn.agegrp == 'TOTAL') & (ddn.abbr == 'IS')].head(30))
display(ddn[(ddn.agegrp == 'TOTAL') & (ddn.abbr == 'RO')].head(30))

Unnamed: 0,name,abbr,agegrp,year,deaths,population,deaths_pp,deaths_p1ht,deaths_norm
1628,Netherlands,NL,TOTAL,2015,149732.0,16900726.0,0.00886,885.95011,1.00848
1629,Netherlands,NL,TOTAL,2016,148209.0,16979120.0,0.00873,872.88976,0.99362
1630,Netherlands,NL,TOTAL,2017,149745.0,17081507.0,0.00877,876.64982,0.9979
1631,Netherlands,NL,TOTAL,2018,152907.0,17181084.0,0.0089,889.97295,1.01306
1632,Netherlands,NL,TOTAL,2019,151483.0,17282163.0,0.00877,876.52801,0.99776
1633,Netherlands,NL,TOTAL,2020,171277.0,17407585.0,0.00984,983.92166,1.12001
1634,Netherlands,NL,TOTAL,2021,170184.0,17475415.0,0.00974,973.84812,1.10854
1635,Netherlands,NL,TOTAL,2022,169159.0,17590672.0,0.00962,961.64035,1.09464
1636,Netherlands,NL,TOTAL,2023,168815.0,17811291.0,0.00948,947.79766,1.07889
1637,Netherlands,NL,TOTAL,2024,171108.0,17942942.0,0.00954,953.62288,1.08552


Unnamed: 0,name,abbr,agegrp,year,deaths,population,deaths_pp,deaths_p1ht,deaths_norm
1067,Iceland,IS,TOTAL,2015,2223.0,329100.0,0.00675,675.47858,1.0007
1068,Iceland,IS,TOTAL,2016,2295.0,332529.0,0.0069,690.16537,1.02246
1069,Iceland,IS,TOTAL,2017,2231.0,338349.0,0.00659,659.37833,0.97685
1070,Iceland,IS,TOTAL,2018,2250.0,348450.0,0.00646,645.71675,0.95661
1071,Iceland,IS,TOTAL,2019,2263.0,356991.0,0.00634,633.90954,0.93911
1072,Iceland,IS,TOTAL,2020,2345.0,364134.0,0.00644,643.99369,0.95405
1073,Iceland,IS,TOTAL,2021,2338.0,368792.0,0.00634,633.96169,0.93919
1074,Iceland,IS,TOTAL,2022,2690.0,376248.0,0.00715,714.95397,1.05918
1075,Iceland,IS,TOTAL,2023,2565.0,387758.0,0.00661,661.49506,0.97998
1076,Iceland,IS,TOTAL,2024,2594.0,383567.0,0.00676,676.28341,1.00189


Unnamed: 0,name,abbr,agegrp,year,deaths,population,deaths_pp,deaths_p1ht,deaths_norm
1936,Romania,RO,TOTAL,2015,267209.0,19870647.0,0.01345,1344.74232,1.01484
1937,Romania,RO,TOTAL,2016,257033.0,19760585.0,0.01301,1300.73578,0.98163
1938,Romania,RO,TOTAL,2017,261217.0,19643949.0,0.0133,1329.75808,1.00353
1939,Romania,RO,TOTAL,2018,263222.0,19533481.0,0.01348,1347.54271,1.01695
1940,Romania,RO,TOTAL,2019,260235.0,19414458.0,0.0134,1340.41857,1.01158
1941,Romania,RO,TOTAL,2020,301427.0,19328838.0,0.01559,1559.46778,1.17689
1942,Romania,RO,TOTAL,2021,334154.0,19201662.0,0.0174,1740.23478,1.31331
1943,Romania,RO,TOTAL,2022,268798.0,19042455.0,0.01412,1411.57219,1.06527
1944,Romania,RO,TOTAL,2023,240379.0,19054548.0,0.01262,1261.53084,0.95204
1945,Romania,RO,TOTAL,2024,219347.0,19067576.0,0.0115,1150.36647,0.86815


### Causes of Death Data

#### Getting the Data
1. go to [Europa.eu - Database](https://ec.europa.eu/eurostat/databrowser/explore/all/popul)
2. choose 
    * Population and social conditions 
    * Health
    * Causes of death
    * General mortality
    * Causes of death - deaths by country of residence and occurrence
3. Click the little table 󰓫
4. customize the data 
    * Customize your dataset -> Time -> From - to 
        * From: 2015
        * To: [Current or Max]
    * Customize your dataset -> `International Statistical Classification of Diseases and Related Health Problems (ICD-10 2010)`
        * All
            * UnCheck All
        * Level 1 
            * All Checked
            * > Level 1 is basic classification of the cause of death 
    * Move the ``International Statistical Classification...` under `Geopolitical entity (reporting)`
5. Click `download` (as a spreadsheet) and place the file in the `.\data` folder


<img src="./docs/chrome_Gmc2itqQye.png"
     onerror="this.onerror=null; this.src='./chrome_Gmc2itqQye.png';"
     style="height: 200px"
     alt="" />

In [26]:
# getting the data 

# Level 1
cod = pd.read_excel(os.path.join(DataDIR,"hlth_cd_aro__custom_16646568_spreadsheet.xlsx"),sheet_name = "Sheet 1")

# remove the headers
cod = cod.iloc[9::]

# drop the bad columns
for c in cod.columns:
    if pd.isnull(cod.at[9,c]):
        cod = cod.drop(columns=[c])

# rename time columns
for c in cod.columns:
    name = cod.at[9,c]
    cod = cod.rename(columns={c: name})

# make the duplicate column names unique
cod = HF.df_column_uniquify(cod)

# # rename the first two columns
cod = cod.rename(columns={'TIME': 'name'})
cod = cod.rename(columns={'TIME_1':'cod'})
           
# drop, replace, reset index,
cod = cod.drop([9,10])
cod = cod.replace(to_replace=':', value=None)
cod = cod.reset_index(drop=True)

cod['abbr'] = cod['name'].apply(HF.name_to_abbr)

  warn("Workbook contains no default style, apply openpyxl's default")


In [27]:
# This code processes the raw cod data (cod) by restructuring it into a long-form dataframe. 
# along with additional metadata such as country abbreviations and derived values.

temp = cod.melt(id_vars=['name','abbr','cod'],var_name='year',value_name='deaths')
temp['year'] = pd.to_numeric(temp['year'])
cod = temp



In [28]:
# lets remove some data we don't need

# this is a combination of 27 countries

cod = cod[~cod['name'].str.contains("European Union", na=False)]
cod = cod[cod['name']!= 'not available']
cod = cod[cod['name']!= 'Special value']
cod = cod[cod['name']!= 'None']
cod = cod[cod['name']!= 'Observation flags:']
cod = cod[cod['name']!= 'p']
cod = cod[cod['name']!= 'Nan']
cod = cod[cod['name']!= 'd']
cod = cod[cod['name']!= 'b']
cod = cod[cod['name']!= 'bp']
cod = cod[cod['name']!= 'e']
cod = cod[cod['name']!= 'p']

cod = cod[~cod['name'].isna()]

# removed due to lack of reporting data 
cod = cod[cod['abbr']!= 'Unknown']

In [29]:
# there are quite a few NAN (not a number) values
# here we get rid of them 

# Get counts of NA values for each 'abbr' group
na_counts = cod[cod.deaths.isna()].groupby(['abbr','cod']).size()

# Filter for 'abbr' groups with more than 12 NA values
filtered_abbrs = na_counts[na_counts > 3].index

for abbr, cod_ in filtered_abbrs:
    print(f'removing -- {abbr} {HF.abbr_to_name(abbr)} {cod_} NACount={na_counts[(abbr, cod_)]}')
    # Corrected filtering condition: remove rows where abbr and agegrp match separately
    cod = cod[~((cod.abbr == abbr) & (cod.cod == cod_))]  # Use bitwise AND & inside the negation
    # print(len(dd))

removing -- GE Georgia Certain conditions originating in the perinatal period (P00-P96) NACount=8
removing -- GE Georgia Congenital malformations, deformations and chromosomal abnormalities (Q00-Q99) NACount=8
removing -- GE Georgia Diseases of the circulatory system (I00-I99) NACount=8
removing -- GE Georgia Diseases of the digestive system (K00-K93) NACount=8
removing -- GE Georgia Diseases of the genitourinary system (N00-N99) NACount=8
removing -- GE Georgia Diseases of the musculoskeletal system and connective tissue (M00-M99) NACount=8
removing -- GE Georgia Diseases of the respiratory system (J00-J99) NACount=8
removing -- GE Georgia Diseases of the skin and subcutaneous tissue (L00-L99) NACount=8
removing -- GE Georgia Endocrine, nutritional and metabolic diseases (E00-E90) NACount=8
removing -- GE Georgia Malignant neoplasms (C00-C97) NACount=8
removing -- GE Georgia Mental and behavioural disorders (F00-F99) NACount=8
removing -- GE Georgia Pregnancy, childbirth and the puerp

In [30]:
# if a cause of death is null, we'll fill it with 0

# cod['deaths'].fillna(0, inplace=True) #deprecated
cod['deaths'] = pd.to_numeric(cod['deaths'], errors='coerce').fillna(0).astype(int)

In [31]:
# save to out folder
cod.to_csv(r'out\cod_data.csv',index=False)

In [32]:
# here we are calculating deaths per 100,000 people

codn = cod.copy()
# Merge population data into ddn based on country and year
codn = codn.merge(pop, on=['abbr', 'year'], how='left')

codn['deaths_pp'] = 0.0
codn['deaths_pp'] = (codn['deaths'] / codn['population'])

codn['deaths_p1ht'] = 0.0
codn['deaths_p1ht'] = (codn['deaths'] / codn['population']) * 100000

#########################################

# lets calculate a normalized deaths, based on the deaths per 100,000 people

codn['deaths_norm'] = np.nan

## baseline years are 2015,2016,2017 before the pandemic 
blyears = codn[codn.year.isin([2015,2016,2017])]
grouped = blyears.groupby(['abbr','cod'])

# Compute baseline mean
temp = grouped['deaths_p1ht'].agg(
    baseline='mean',
).reset_index()

# Merge baseline means with the original DataFrame
codn = codn.merge(temp, on=['abbr', 'cod'], how='left')

# Normalize deaths column
codn['deaths_norm'] = codn['deaths_p1ht'] / codn['baseline']

# Drop the intermediate baseline column if not needed
codn.drop(columns=['baseline'], inplace=True)

# save to out folder
codn.to_csv(r'out\cod_data_norm.csv',index=False)

In [33]:
# peak at the data

display(codn[(codn.abbr == 'FI') & (codn.cod == 'Mental and behavioural disorders (F00-F99)')].head(20))
display(codn[(codn.abbr == 'NO') & (codn.cod == 'Mental and behavioural disorders (F00-F99)')].head(20))
display(codn[(codn.abbr == 'IE') & (codn.cod == 'Mental and behavioural disorders (F00-F99)')].head(20))
display(codn[(codn.abbr == 'RO') & (codn.cod == 'Mental and behavioural disorders (F00-F99)')].head(20))

Unnamed: 0,name,abbr,cod,year,deaths,population,deaths_pp,deaths_p1ht,deaths_norm
327,Finland,FI,Mental and behavioural disorders (F00-F99),2015,2353,5471753.0,0.00043,43.00267,0.99386
743,Finland,FI,Mental and behavioural disorders (F00-F99),2016,2411,5487308.0,0.00044,43.93776,1.01547
1159,Finland,FI,Mental and behavioural disorders (F00-F99),2017,2359,5503297.0,0.00043,42.86521,0.99068
1575,Finland,FI,Mental and behavioural disorders (F00-F99),2018,2653,5513130.0,0.00048,48.12148,1.11216
1991,Finland,FI,Mental and behavioural disorders (F00-F99),2019,2625,5517919.0,0.00048,47.57228,1.09947
2407,Finland,FI,Mental and behavioural disorders (F00-F99),2020,2707,5525292.0,0.00049,48.99289,1.1323
2823,Finland,FI,Mental and behavioural disorders (F00-F99),2021,3077,5533793.0,0.00056,55.60381,1.28509
3239,Finland,FI,Mental and behavioural disorders (F00-F99),2022,3351,5548241.0,0.0006,60.39752,1.39588
3655,Finland,FI,Mental and behavioural disorders (F00-F99),2023,0,5563970.0,0.0,0.0,0.0


Unnamed: 0,name,abbr,cod,year,deaths,population,deaths_pp,deaths_p1ht,deaths_norm
379,Norway,NO,Mental and behavioural disorders (F00-F99),2015,2627,5165802.0,0.00051,50.85367,0.93664
795,Norway,NO,Mental and behavioural disorders (F00-F99),2016,2794,5213985.0,0.00054,53.58665,0.98698
1211,Norway,NO,Mental and behavioural disorders (F00-F99),2017,3073,5258317.0,0.00058,58.44075,1.07638
1627,Norway,NO,Mental and behavioural disorders (F00-F99),2018,3179,5295619.0,0.0006,60.03075,1.10567
2043,Norway,NO,Mental and behavioural disorders (F00-F99),2019,3108,5328212.0,0.00058,58.33101,1.07436
2459,Norway,NO,Mental and behavioural disorders (F00-F99),2020,3061,5367580.0,0.00057,57.02756,1.05035
2875,Norway,NO,Mental and behavioural disorders (F00-F99),2021,3143,5391369.0,0.00058,58.29688,1.07373
3291,Norway,NO,Mental and behavioural disorders (F00-F99),2022,3353,5425270.0,0.00062,61.80338,1.13832
3707,Norway,NO,Mental and behavioural disorders (F00-F99),2023,0,5488984.0,0.0,0.0,0.0


Unnamed: 0,name,abbr,cod,year,deaths,population,deaths_pp,deaths_p1ht,deaths_norm
80,Ireland,IE,Mental and behavioural disorders (F00-F99),2015,1557,4677627.0,0.00033,33.28611,0.94686
496,Ireland,IE,Mental and behavioural disorders (F00-F99),2016,1679,4726286.0,0.00036,35.52472,1.01054
912,Ireland,IE,Mental and behavioural disorders (F00-F99),2017,1759,4799157.0,0.00037,36.65227,1.04261
1328,Ireland,IE,Mental and behavioural disorders (F00-F99),2018,1946,4855733.0,0.0004,40.07634,1.14001
1744,Ireland,IE,Mental and behavioural disorders (F00-F99),2019,1953,4940311.0,0.0004,39.53192,1.12452
2160,Ireland,IE,Mental and behavioural disorders (F00-F99),2020,1791,5012600.0,0.00036,35.72996,1.01637
2576,Ireland,IE,Mental and behavioural disorders (F00-F99),2021,1899,5066893.0,0.00037,37.47859,1.06611
2992,Ireland,IE,Mental and behavioural disorders (F00-F99),2022,2078,5154277.0,0.0004,40.31603,1.14683
3408,Ireland,IE,Mental and behavioural disorders (F00-F99),2023,0,5271395.0,0.0,0.0,0.0


Unnamed: 0,name,abbr,cod,year,deaths,population,deaths_pp,deaths_p1ht,deaths_norm
288,Romania,RO,Mental and behavioural disorders (F00-F99),2015,320,19870647.0,2e-05,1.61042,0.92664
704,Romania,RO,Mental and behavioural disorders (F00-F99),2016,367,19760585.0,2e-05,1.85723,1.06866
1120,Romania,RO,Mental and behavioural disorders (F00-F99),2017,343,19643949.0,2e-05,1.74608,1.0047
1536,Romania,RO,Mental and behavioural disorders (F00-F99),2018,344,19533481.0,2e-05,1.76108,1.01333
1952,Romania,RO,Mental and behavioural disorders (F00-F99),2019,347,19414458.0,2e-05,1.78733,1.02843
2368,Romania,RO,Mental and behavioural disorders (F00-F99),2020,382,19328838.0,2e-05,1.97632,1.13718
2784,Romania,RO,Mental and behavioural disorders (F00-F99),2021,454,19201662.0,2e-05,2.36438,1.36047
3200,Romania,RO,Mental and behavioural disorders (F00-F99),2022,408,19042455.0,2e-05,2.14258,1.23285
3616,Romania,RO,Mental and behavioural disorders (F00-F99),2023,0,19054548.0,0.0,0.0,0.0


## Graphs and Charts

In [38]:


vaclist = 	 ['AZ', 'COM', 'COMBA.1', 'COMBA.4-5', 'JANSS', 'MOD', 'MODBA.1', 'NVXD', 'SGSK', 'UNK', 'VLA', 'MODBA.4-5', 'COMXBB', 'COMBIV', 'MODBIV', 'BECNBG', 'SPU', 'SIN', 'BHACOV', 'All']
vaclist = ['All']
cod_list = 	 ['Malignant neoplasms (C00-C97)', 'Endocrine, nutritional and metabolic diseases (E00-E90)', 'Mental and behavioural disorders (F00-F99)', 'Diseases of the circulatory system (I00-I99)', 'Diseases of the respiratory system (J00-J99)', 'Diseases of the digestive system (K00-K93)', 'Diseases of the skin and subcutaneous tissue (L00-L99)', 'Diseases of the musculoskeletal system and connective tissue (M00-M99)', 'Diseases of the genitourinary system (N00-N99)', 'Pregnancy, childbirth and the puerperium (O00-O99)', 'Certain conditions originating in the perinatal period (P00-P96)', 'Congenital malformations, deformations and chromosomal abnormalities (Q00-Q99)', 'Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified (R00-R99)']

# cod_list = ['Congenital malformations, deformations and chromosomal abnormalities (Q00-Q99)']

for vac_ in vaclist:
    for cod_ in cod_list:
        for y in range(2020,2023):


            temp0 = vd.copy()[(vd.vacc == vac_) & (vd.year == 2023)].sort_values(by='dose2_pop_ratio', ascending=False)
            
            temp1 = codn.copy()[(codn.cod == cod_) & (codn.year == y)].sort_values(by='deaths_pp', ascending=False)

            temp3 = pd.merge(temp0,temp1, how='left',on=['abbr'])

            temp3['rank'] = temp3['dose2_pop_ratio'].rank()

            temp4 = temp3.copy()
            temp4 = temp4.sort_values(by='deaths_pp', ascending=False)

            try:
                # Calculate Spearman's Rank Correlation
                print(f'*** {cod_} x {vac_} {y} ***')
                spearman_corr, _ = spearmanr(temp3['rank'], temp4['rank'])
                print(HF.spearmansrank_bar(spearman_corr))
            except Exception as e:
                # display(temp3)
                pass


            


*** Malignant neoplasms (C00-C97) x All 2020 ***


Spearman's Rank: -0.204
                                               -0.204
_______________________________________________*________________________________________________________________________
-1                            -0.5                          0                             0.5                           1

*** Malignant neoplasms (C00-C97) x All 2021 ***


Spearman's Rank: -0.178
                                                 -0.178
_________________________________________________*______________________________________________________________________
-1                            -0.5                          0                             0.5                           1

*** Malignant neoplasms (C00-C97) x All 2022 ***


Spearman's Rank: -0.115
                                                     -0.115
_____________________________________________________*_______________________________________________________________

In [35]:
styled_df = temp4[['abbr','dose2_pop_ratio','deaths_pp']].style.background_gradient(cmap=hm_coolwarm, axis=0, subset=['dose2_pop_ratio']) \
                      .background_gradient(cmap=hm_coolwarm, axis=0, subset=['deaths_pp']) 

styled_df = temp4[['abbr','dose_pop_ratio','deaths_pp']].style.background_gradient(cmap=hm_coolwarm, axis=0, subset=['dose_pop_ratio']) \
                      .background_gradient(cmap=hm_coolwarm, axis=0, subset=['deaths_pp']) 

display(styled_df)

Unnamed: 0,abbr,dose_pop_ratio,deaths_pp
1,FR,4.584656,0.001022
6,PL,2.86261,0.000993
16,EL,2.032484,0.000833
3,PT,5.294853,0.000718
7,DK,2.40274,0.000708
11,BE,2.411964,0.000601
18,NL,2.195485,0.00051
17,LI,1.815763,0.000486
10,DE,2.211555,0.000473
2,IT,4.765223,0.000434


In [36]:
temp0 = vd.copy()[(vd.vacc == 'All') & (vd.year == 2022)].sort_values(by='dose2_pop_ratio', ascending=False)
temp1 = ddn.copy()[(ddn.agegrp == 'TOTAL') & (ddn.year == 2022)].sort_values(by='deaths_pp', ascending=False)

temp3 = pd.merge(temp0,temp1, how='left',on=['abbr'])
temp3['rank'] = temp3['dose2_pop_ratio'].rank()

temp4 = temp3.copy()
temp4 = temp4.sort_values(by='deaths_pp', ascending=False)

# Calculate Spearman's Rank Correlation
spearman_corr, _ = spearmanr(temp3['rank'], temp4['rank'])
print(HF.spearmansrank_bar(spearman_corr))




Spearman's Rank: -0.250
                                            -0.250
____________________________________________*___________________________________________________________________________
-1                            -0.5                          0                             0.5                           1



In [37]:
styled_df = temp4[['abbr','dose_pop_ratio','deaths_pp']] \
                        .style.background_gradient(cmap=hm_coolwarm, axis=0, subset=['dose_pop_ratio']) \
                      .background_gradient(cmap=hm_coolwarm, axis=0, subset=['deaths_pp'])  
display(styled_df)

Unnamed: 0,abbr,dose_pop_ratio,deaths_pp
29,BG,0.66499,0.018242
25,LV,1.499612,0.016142
5,LT,3.101619,0.015103
24,HR,1.386677,0.0147
21,HU,1.615884,0.014147
28,RO,0.816819,0.014116
16,EL,2.023421,0.013413
23,EE,1.573763,0.012916
10,DE,2.204055,0.012786
6,PL,2.855098,0.012119


In [40]:
vaclist = 	 ['AZ', 'COM', 'COMBA.1', 'COMBA.4-5', 'JANSS', 'MOD', 'MODBA.1', 'NVXD', 'SGSK', 'UNK', 'VLA', 'MODBA.4-5', 'COMXBB', 'COMBIV', 'MODBIV', 'BECNBG', 'SPU', 'SIN', 'BHACOV', 'All']
vaclist = ['All']
cod_list = 	 ['Malignant neoplasms (C00-C97)', 'Endocrine, nutritional and metabolic diseases (E00-E90)', 'Mental and behavioural disorders (F00-F99)', 'Diseases of the circulatory system (I00-I99)', 'Diseases of the respiratory system (J00-J99)', 'Diseases of the digestive system (K00-K93)', 'Diseases of the skin and subcutaneous tissue (L00-L99)', 'Diseases of the musculoskeletal system and connective tissue (M00-M99)', 'Diseases of the genitourinary system (N00-N99)', 'Pregnancy, childbirth and the puerperium (O00-O99)', 'Certain conditions originating in the perinatal period (P00-P96)', 'Congenital malformations, deformations and chromosomal abnormalities (Q00-Q99)', 'Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified (R00-R99)']

# cod_list = ['Congenital malformations, deformations and chromosomal abnormalities (Q00-Q99)']

for vac_ in vaclist:
    for cod_ in cod_list:
        for y in range(2020,2024):


            temp0 = vd.copy()[(vd.vacc == vac_) & (vd.year == 2023)].sort_values(by='dose_pop_ratio', ascending=False)
            
            temp1 = codn.copy()[(codn.cod == cod_) & (codn.year == y)].sort_values(by='deaths_pp', ascending=False)

            temp3 = pd.merge(temp0,temp1, how='left',on=['abbr'])

            temp3['rank'] = temp3['dose_pop_ratio'].rank()

            temp4 = temp3.copy()
            temp4 = temp4.sort_values(by='deaths_pp', ascending=False)

            try:
                # Calculate Spearman's Rank Correlation
                print(f'*** {cod_} x {vac_} {y} ***')
                spearman_corr, _ = spearmanr(temp3['rank'], temp4['rank'])
                print(HF.spearmansrank_bar(spearman_corr))
            except Exception as e:
                # display(temp3)
                pass

*** Malignant neoplasms (C00-C97) x All 2020 ***


Spearman's Rank: -0.220
                                              -0.220
______________________________________________*_________________________________________________________________________
-1                            -0.5                          0                             0.5                           1

*** Malignant neoplasms (C00-C97) x All 2021 ***


Spearman's Rank: -0.201
                                               -0.201
_______________________________________________*________________________________________________________________________
-1                            -0.5                          0                             0.5                           1

*** Malignant neoplasms (C00-C97) x All 2022 ***


Spearman's Rank: -0.134
                                                   -0.134
___________________________________________________*____________________________________________________________________
-

## Discussion

## Conclusion

## Further Research

# Style and Misc Stuff
this object contains HTML,JS, and CSS

<script>

(() => {
  const style = document.createElement('style');
  style.textContent = `
    .jp-Cell {
      //transition: all 0.6s ease-in;
      //transition: all 0.6s cubic-bezier(0.34, 1.56, 0.64, 1); //bounce
      transition: all 0.6s cubic-bezier(0.55, 0, 1, 0.45);
      transition: all 0.6s cubic-bezier(0.37, 0, 0.63, 1); //sine

      opacity: 1;
      transform: scale(1.0);

      //overflow: hidden;  
      // height: auto;  
      // display:none;
    }

    .jp-Cell.reveal {
      opacity: 1;
      visibility: visible;
      filter: blur(0px);

      transform: scale(1);
      
      // height: auto;  
      // display:block;
    }

    .jp-Cell.hide {
      opacity: 0;
      visibility: hidden;
      filter: blur(10px);

      //transform: scale(0.0);
      // height: 0px;  
    }
  `;
  document.head.appendChild(style);

  const cells = Array.from(document.querySelectorAll('.jp-Cell'));
  let currentIndex = -1;

  // Initially hide all cells
  cells.forEach(cell => {
    cell.classList.remove('reveal');
    cell.classList.add('hide');
  });

  const scrollToCell = (cell) => {
    // Delay scroll slightly to ensure layout updates
    setTimeout(() => {
      const rect = cell.getBoundingClientRect();
      const offsetTop = window.scrollY + rect.top - 50;
      window.scrollTo({
        top: offsetTop,
        behavior: 'smooth'
      });
    }, 10); // just 1 frame is enough
  };

  window.addEventListener('keydown', (e) => {
    const key = e.key;

    if (key === '=' || key === 'ArrowDown') {
      if (currentIndex + 1 < cells.length) {
        currentIndex++;
        const cell = cells[currentIndex];
        cell.classList.remove('hide');
        cell.classList.add('reveal');
        scrollToCell(cell);
      }
    } else if (key === '-' || key === 'ArrowUp') {
      if (currentIndex >= 0) {
        const cell = cells[currentIndex];
        cell.classList.remove('reveal');
        cell.classList.add('hide');
        currentIndex--;
        if (currentIndex >= 0) {
          scrollToCell(cells[currentIndex]);
        }
      }
    }
  });
})();


</script>

<style>

body{
  background: #212121 !important;
}

.jp-Cell{
  /* border-width: 0px; */
  /* border-style: solid; */
  /* border-color: #AAAAAA; */
  /* border-color:rgb(0, 217, 255); */
  border-radius: 8px;
  padding: 12px;

  margin-bottom: 2rem !important;
  background: #303030 !important;

  box-shadow: 0 4px 12px rgba(0, 0, 0, 0.3);
}

.highlight{
  background: #171717 !important;
}

table{
  margin-left: 0px !important;
  margin-right: 0px !important;
}

.chatgpt{
  background-color: #171717;
  color: #AAAAAA;
  border-left: 4px solid #AAAAAA;
  padding: 1em 1.5em;
  margin: 2em 0;
  border-radius: 8px;
  position: relative;
}

.chatgpt-prompt {
  font-family: "Segoe UI", sans-serif;
  font-weight: 500;
  white-space: pre-wrap;
}

.chatgpt-prompt::before {
  content: "🧠 chatGPT Prompt";
  position: absolute;
  top: -1.0em;
  left: 0.8em;
  background-color: #171717;
  color: #AAAAAA;
  font-size: 0.75em;
  padding: 0 0.4em;
  border-radius: 4px;
}


.chatgpt-quote {
  font-style: italic;
  position: relative;
}

.chatgpt-quote::before {
  content: "💬 chatGPT Quote";
  position: absolute;
  top: -1.0em;
  left: 0.8em;
  background-color: #171717;
  color: #AAAAAA;
  font-size: 0.75em;
  padding: 0 0.4em;
  border-radius: 4px;
}



</style>