# **KPI 1**
A comparison between:
- the amount of people that got intensive care and the amount of people that passed away
- check if there is any correlation between the two

In [1]:
import pandas as pd
import plotly_express as px

df = pd.read_excel("./Data/Folkhalsomyndigheten_Covid19.xlsx", sheet_name="Totalt antal per åldersgrupp")
df

Unnamed: 0,Åldersgrupp,Totalt_antal_fall,Totalt_antal_intensivvårdade,Totalt_antal_avlidna
0,Ålder_0_9,138071,109,17
1,Ålder_10_19,355823,101,9
2,Ålder_20_29,418506,285,41
3,Ålder_30_39,493443,492,71
4,Ålder_40_49,474702,997,172
5,Ålder_50_59,378468,1932,523
6,Ålder_60_69,180079,2595,1422
7,Ålder_70_79,87096,2394,4654
8,Ålder_80_89,58170,612,8326
9,Ålder_90_plus,26677,21,5420


### Decided to remove the row with "Uppgift saknas" as it was hard to interpret the data

In [2]:
df.drop([10], inplace=True)

In [3]:
# calculating the percentage of people that got intensive care in comparison to the total amount of people that got infected
df["Procent_intensivvårdade"] = (df["Totalt_antal_intensivvårdade"]/df["Totalt_antal_fall"])*100

# calculating the percentage of people that passed away in comparison to the total amount of people that got infected
df["Procent_avlidna"] = (df["Totalt_antal_avlidna"]/df["Totalt_antal_fall"])*100
df

Unnamed: 0,Åldersgrupp,Totalt_antal_fall,Totalt_antal_intensivvårdade,Totalt_antal_avlidna,Procent_intensivvårdade,Procent_avlidna
0,Ålder_0_9,138071,109,17,0.078945,0.012313
1,Ålder_10_19,355823,101,9,0.028385,0.002529
2,Ålder_20_29,418506,285,41,0.068099,0.009797
3,Ålder_30_39,493443,492,71,0.099708,0.014389
4,Ålder_40_49,474702,997,172,0.210027,0.036233
5,Ålder_50_59,378468,1932,523,0.510479,0.138189
6,Ålder_60_69,180079,2595,1422,1.441034,0.789653
7,Ålder_70_79,87096,2394,4654,2.748691,5.343529
8,Ålder_80_89,58170,612,8326,1.052089,14.31322
9,Ålder_90_plus,26677,21,5420,0.078719,20.317127


In [4]:
fig = px.bar(
    df,
    x="Åldersgrupp",
    y="Procent_intensivvårdade",
    title="Procent av fall som behövde intensivvård",
    log_y=True, # log scale on the y-axis to make the difference between the groups more visible
)
fig.show()
fig.write_html("./Visualiseringar/Procent intensivvårdade.html")

In [5]:
fig = px.bar(
    df,
    x="Åldersgrupp",
    y="Procent_avlidna",
    title="Procent av fall som avled",
    log_y=True,
)
fig.show()
fig.write_html("./Visualiseringar/Procent avlidna.html")

In [6]:
fig = px.histogram(
    df,
    x="Åldersgrupp",
    y=["Procent_intensivvårdade", "Procent_avlidna"],
    title="Procent av fall som avled och behövde intensivvård",
    barmode="group",
    log_y=True, 
)
fig.show()
fig.write_html("./Visualiseringar/Procent avlidna och intensivvårdade.html")

### Conclusion:

- In the graph above we can see the ratio between the amount of people that got intensive care and the amount of people that passed away.
- What is noticable is that after the 60 to 69 age group, the percentage of people that died is higher than the percentage of people that needed intensive care. 
- This could be du to pre-existing condition for older people, which makes them more vulnerable to the virus.

                                                                                                                End of KPI 1

# **KPI 2**
In this analysis, we are going to check which are the regions that suffered the most:
- Who got more covid-19 cases
- The ratio of mortality 

In [7]:
df = pd.read_excel("./Data/Folkhalsomyndigheten_Covid19.xlsx", sheet_name="Totalt antal per region")
df

Unnamed: 0,Region,Totalt_antal_fall,Fall_per_100000_inv,Totalt_antal_intensivvårdade,Totalt_antal_avlidna
0,Blekinge,30829,19371.449951,85,184
1,Dalarna,75091,26098.780273,260,544
2,Gotland,11874,19776.671875,46,82
3,Gävleborg,74803,26020.503418,332,754
4,Halland,108822,32349.047119,229,518
5,Jämtland Härjedalen,34347,26197.373535,95,197
6,Jönköping,89662,24583.161133,414,756
7,Kalmar,62810,25537.878418,135,385
8,Kronoberg,51460,25460.141602,162,410
9,Norrbotten,50755,20327.326904,267,459


### Checking which cities has most cases
Then doing a comparison to see if the cities with most cases also has the most deaths

In [8]:
df.sort_values(by="Totalt_antal_fall", ascending=False, inplace=True)
df.reset_index(drop=True, inplace=True)
df

Unnamed: 0,Region,Totalt_antal_fall,Fall_per_100000_inv,Totalt_antal_intensivvårdade,Totalt_antal_avlidna
0,Stockholm,606611,25391.749023,2645,5626
1,Västra Götaland,444278,25636.465332,1670,3342
2,Skåne,346727,24998.67041,819,2587
3,Halland,108822,32349.047119,229,518
4,Örebro,98929,32379.776855,276,525
5,Östergötland,98793,21158.707031,440,890
6,Uppsala,93746,24194.380859,462,722
7,Jönköping,89662,24583.161133,414,756
8,Västmanland,78438,28326.63623,156,534
9,Dalarna,75091,26098.780273,260,544


In [9]:
df_top10_fall = df.head(10)
df_top10_fall

Unnamed: 0,Region,Totalt_antal_fall,Fall_per_100000_inv,Totalt_antal_intensivvårdade,Totalt_antal_avlidna
0,Stockholm,606611,25391.749023,2645,5626
1,Västra Götaland,444278,25636.465332,1670,3342
2,Skåne,346727,24998.67041,819,2587
3,Halland,108822,32349.047119,229,518
4,Örebro,98929,32379.776855,276,525
5,Östergötland,98793,21158.707031,440,890
6,Uppsala,93746,24194.380859,462,722
7,Jönköping,89662,24583.161133,414,756
8,Västmanland,78438,28326.63623,156,534
9,Dalarna,75091,26098.780273,260,544


In [10]:
fig = px.bar(
    df,
    x="Region",
    y=["Totalt_antal_fall", "Totalt_antal_avlidna"],
    title="Antal fall och avlidna per region", 
    labels={"value": "Antal"},
    barmode="group",
    log_y=True,
)
fig.show()
fig.write_html("./Visualiseringar/Antal fall och avlidna per region.html")

### As we observe in this graph, the number of deaths is not proportional to the number of cases.
To be more certain, we are going to calculate the ratio between the amount of people that passed away and the amount of people that got infected.

In [11]:
df["Andel avlidna"] = df["Totalt_antal_avlidna"]/df["Totalt_antal_fall"]
df.sort_values(by="Andel avlidna", ascending=False, inplace=True)
df.reset_index(drop=True, inplace=True)

In [12]:
df_top10_avlidna = df.head(10)
df_top10_avlidna

Unnamed: 0,Region,Totalt_antal_fall,Fall_per_100000_inv,Totalt_antal_intensivvårdade,Totalt_antal_avlidna,Andel avlidna
0,Västernorrland,56224,22976.456543,217,678,0.012059
1,Gävleborg,74803,26020.503418,332,754,0.01008
2,Sörmland,67918,22704.283447,446,662,0.009747
3,Stockholm,606611,25391.749023,2645,5626,0.009274
4,Norrbotten,50755,20327.326904,267,459,0.009043
5,Östergötland,98793,21158.707031,440,890,0.009009
6,Jönköping,89662,24583.161133,414,756,0.008432
7,Kronoberg,51460,25460.141602,162,410,0.007967
8,Uppsala,93746,24194.380859,462,722,0.007702
9,Västra Götaland,444278,25636.465332,1670,3342,0.007522


### Merging df_top10_fall and df_top10_avlidna to check for similarities

In [13]:
df_top10 = pd.merge(df_top10_fall, df_top10_avlidna, on="Region")
df_top10

Unnamed: 0,Region,Totalt_antal_fall_x,Fall_per_100000_inv_x,Totalt_antal_intensivvårdade_x,Totalt_antal_avlidna_x,Totalt_antal_fall_y,Fall_per_100000_inv_y,Totalt_antal_intensivvårdade_y,Totalt_antal_avlidna_y,Andel avlidna
0,Stockholm,606611,25391.749023,2645,5626,606611,25391.749023,2645,5626,0.009274
1,Västra Götaland,444278,25636.465332,1670,3342,444278,25636.465332,1670,3342,0.007522
2,Östergötland,98793,21158.707031,440,890,98793,21158.707031,440,890,0.009009
3,Uppsala,93746,24194.380859,462,722,93746,24194.380859,462,722,0.007702
4,Jönköping,89662,24583.161133,414,756,89662,24583.161133,414,756,0.008432


### Conclusion:
- As we can see, the regions that had most cases doesn't necessarily have the highest death rate
                                                                                                               
                                                                                                                End of KPI 2

# **KPI 3**
In this one we are going to figure out if the restrictions applied by the sweedish government had any effect on the number of intensive care cases

In [14]:
df = pd.read_excel("./Data/Folkhalsomyndigheten_Covid19.xlsx", sheet_name="Antal intensivvårdade per dag")
df.head()

Unnamed: 0,Datum_vårdstart,Antal_intensivvårdade
0,2020-03-06,1
1,2020-03-07,1
2,2020-03-08,1
3,2020-03-09,0
4,2020-03-10,2


In [15]:
# check for missing values
df.isnull().sum()

Datum_vårdstart          0
Antal_intensivvårdade    0
dtype: int64

In [31]:
fig = px.bar(
    df,
    x="Datum_vårdstart",
    y="Antal_intensivvårdade",
    title="Antal intensivvårdade per dag",
    labels={"value": "Antal"},
    color="Antal_intensivvårdade",
)
fig.show()
fig.write_html("./Visualiseringar/Antal intensivvårdade per dag.html")

### Conclusion:
- As we can clearly see, the number of cases has peaked on three occasions
- The restrictions were always heavily applied at the beginning of every summer
- After April 2020 and April 2021, the number of cases decreased significantly
- That high numbers that we notice at the end of December 2020, is probably due to the Christmas holidays
- So we can conclude to a certain extent that the restrictions have been effective

                                                                                                                End of KPI 3