# Fastpages Notebook Blog Post
> A tutorial of fastpages for Jupyter notebooks.

- toc: true 
- badges: true
- comments: true
- categories: [jupyter]
- image: images/chart-preview.png

In [1]:
import pandas as pd
from fastdata.integrations import *
from fastdata.core import *
import plotly.express as px
from IPython.display import HTML

## Goal

The goal of this analysis is to help you understand what the deal is with the Astra Zeneca vaccine, specifically the risks reported by the media and the conclusions of the last European Medical Agency (EMA) report.

The report is based on EMA data published [here](https://www.ema.europa.eu/en/documents/prac-recommendation/signal-assessment-report-embolic-thrombotic-events-smq-covid-19-vaccine-chadox1-s-recombinant-covid_en.pdf). 

The analysis does not aim to challenge the underlying data, but rather make part of the report easier to understand without going through a 50 page EMA report.

**DISCLAIMER:** The author is not an expert in the field, and is applying some general statistical thinking to the problem. Therefore, it may contain errors, omissions or otherwise not accurate information.

## Methodology

### Introduction

The EMA report performs some observed to expected analysis (OE) in the report to try to understand the potential risks of the vaccine. We will focus on the "EMA analysis of EudraVigilance data" (sction 3.1.5 in the report).

**Expected to observed analysis:**

The logic of this analysis is to compare how many cases you have observed with one condition (observed) vs. how many usually happen (expected). With this, you can calculate an Observed to Expected Ration, which is defined as # of observed cases / # expected cases.

The statistical uncertainty will often be driven by the observed number of cases, which is often small (rare events). To deal with this statistical uncertainty around the total number of cases observed over the risk period of interest, a 95% Poisson exact confidence interval (95%CI) is often used (more on this later).

**EudraVigilance database:**

EudraVigilance is a database with information about suspected adverse reactions to medicines which have been authorised or being studied in clinical trials in the European Economic Area (EEA). 
[Source](https://www.ema.europa.eu/en/human-regulatory/research-development/pharmacovigilance/eudravigilance)

This section performs an OE analysis 3 types of conditions present in the EudraVigilance database:
- Disseminated intravascular coagulation
- Cerebral Venous Sinus Thrombosis
- Embolic and thrombotic events

### Data sources

A key input for the analysis is the incidence rate of the specific condition, to be able to determine the expected cases. It is useful also if the data is stratified by groups, to be able to analyize not just the general population as a whole but also individual subgroups.

The databases used for the main analysis for the three events investigated are:
- Coagulation disorder (this was used to compare with the SMQ Embolic and
thrombotic events): ARS from Italy
- Disseminated intravascular coagulation: FISABIO from Spain
- Cerebral venous sinus thrombosis: ARS from Italy 

## Analysis of potential side-effects

### Disseminated intravascular coagulation (DIC)

Disseminated intravascular coagulation (DIC) is a rare but serious condition that causes abnormal blood clotting throughout the body’s blood vessels. It is caused by another disease or condition, such as an infection or injury, that makes the body’s normal blood clotting process become overactive.

[Source: US NIH](https://www.nhlbi.nih.gov/health-topics/disseminated-intravascular-coagulation)

For those of us that are not medicine exprts, this diagram shows of a thrombus (blood clot) that has blocked a blood vessel valve.

![](https://upload.wikimedia.org/wikipedia/commons/c/c5/Blood_clot_diagram.png)

#### Load and clean the data

In [2]:
dic = gsheet_to_df(
    url="https://docs.google.com/spreadsheets/d/11yJ8GbArmcazWG8UdsSWD2gY2VIVD7l_zaPOdiF2ePY", 
    start_row=2, 
    sheet="DIC")

In [3]:
dic = dic.drop(
    columns=["EEA Expected 14d","EEA Observed 14d From EV","EEA OE 14d with 95% c.i."])

In [4]:
dic["oe_ci_interval_min"] = dic["EEA+UK  OE 14d with 95% c.i."].fdt.clean_text_column(
    mode="custom", 
    keep_unmatched=False, 
    regex="(\d+?[,.]\d+) - \d+?[,.]\d+")

In [5]:
dic["oe_ci_interval_max"] = dic["EEA+UK  OE 14d with 95% c.i."].fdt.clean_text_column(
    mode="custom", 
    keep_unmatched=False, 
    regex="\d+?[,.]\d+ - (\d+?[,.]\d+)")

In [6]:
dic["oe"] = dic["EEA+UK  OE 14d with 95% c.i."].fdt.clean_text_column(
    mode="before_character", 
    keep_unmatched=False, 
    character="(")

In [7]:
dic = dic.drop(
    columns=["EEA+UK  OE 14d with 95% c.i."])

In [8]:
dic = dic.astype(
    dtype={"IR per 100,000 Person years From FISABIO" : "float64", "EEA+UK Expected 14d" : "float64", "EEA+UK  Observed 14d From EV" : "float64", "oe_ci_interval_min" : "float64", "oe_ci_interval_max" : "float64", "oe" : "float64"})

In [9]:
dic

Unnamed: 0,Age group,"IR per 100,000 Person years From FISABIO",EEA+UK Expected 14d,EEA+UK Observed 14d From EV,oe_ci_interval_min,oe_ci_interval_max,oe
0,20-29,0.6,0.04,1.0,0.3,129.41,23.26
1,30-49,1.09,1.99,4.0,0.54,5.16,2.02
2,50-59,3.07,4.38,1.0,0.0,1.27,0.23
3,60-69,4.67,9.24,1.0,0.0,0.6,0.11
4,70-79,8.37,11.3,0.0,0.0,0.32,0.0
5,80+,11.66,5.37,0.0,0.0,0.68,0.0
