# Covid and Influenza - sick and death cases comparison

**Part 2 - Analysis** 

The first part has been done in other file: _"Flu_Covid_Cleaning.ipynb"_

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
url = "https://raw.githubusercontent.com/mborycki/Covid_Influenza_Comparison/main/Covid_and_Influenza.csv"

In [None]:
df = pd.read_csv(url)

In [None]:
df.head()

In [None]:
df.info()

We can see we have 9359 records and 8 columns without NULL values.

----
**Column Description:**
* <b>Country</b>: Country name
* <b>Week</b>: Week number within a year
* <b>Confirmed</b>: Counts include confirmed and probable (Covid).
* <b>Deaths</b>: Counts include confirmed and probable (Covid).
* <b>Recovered</b>: Recovered cases are estimates based on local media reports, and state and local reporting when available, and therefore may be substantially lower than the true number (Covid).
* <b>Detected_FluCases</b>: Counts of detected influenza cases.
* <b>Quarter</b>: Quarter number within a year (1-4 quarters).

# Analysis / Visualisation

Now, we are ready to check the outputs we have.

---
Objectives: What I am interested in to see: 
1. How the influenza fluctuation looked in a weekly level (for Europe and Poland)
1. How the covid fluctuaction looks in a weekly level (for Europe and Poland)
1. Top 10 countries with Flu / Covid
1. When we have the biggest increase of inluenza cases (which period)
1. When we had the biggest increas of covid cases (Europe and Poland)
1. Finally, check how many cases of flu we had once the covid came

In [None]:
def WeeklyFluChart(table, where):
    """
    THE CHART SHOWS WEEKLY CASES DETECTION FOR FLU WITHIN YEARS 
    
    table: dataframe with influenza cases (DataFrame)
    where: country we are interested in. Required for a chart title (String/Object)
    """
    years = table.Year.unique()
    color_per_year = ['green', 'blue', 'yellow', 'orange', 'purple', 'red']

    fig, ax1 = plt.subplots()
    plt.rcParams["figure.figsize"] = (25,15)
    plt.xticks(fontsize=16, rotation=45)
    plt.grid(color='grey', linestyle = '--', linewidth = 0.5)

    for number, year in enumerate(years):
        color = color_per_year[number]

        x = table[table.Year==year].Week.unique()
        y = table[table.Year==year].groupby(['Week', 'Year'])['Detected_FluCases'].sum().reset_index().sort_values(['Year','Week'])['Detected_FluCases']
        plt.title(f"Influenza Cases per year in {where}", fontsize=28)
        plt.xlabel('Weeks', fontsize=24)
        plt.ylabel("Detected Flu Cases", color='black', fontsize=24)
        plt.plot(x, y, color=color)
        plt.tick_params(axis='y', labelcolor='black', labelsize=16) 
        plt.legend(years, fontsize=16)

### #1. How the influenza fluctuation looked in a weekly level (for Europe and Poland)

In [None]:
WeeklyFluChart(df, 'Europe')

As we can see above in 2021 flu cases is 0 or near to zero. Let see it in a table

In [None]:
df.groupby(['Year'])['Detected_FluCases'].sum().reset_index().sort_values(['Year'])

In [None]:
df[df.Year==2019].groupby(['Quarter', 'Year'])['Detected_FluCases'].sum().reset_index().sort_values(['Year','Quarter'])

In [None]:
df[df.Year==2018].groupby(['Quarter', 'Year'])['Detected_FluCases'].sum().reset_index().sort_values(['Year','Quarter'])

The highest influenza detection was in 2019 and 2018. In 2021 it was incredibly low number. Of course we still have 2021, but the biggest detection increase occur between 5th and 10th week of a year. In 2021 we cannot see any growth.

---
How does it looks on Poland?

In [None]:
WeeklyFluChart(df[df.Country=='Poland'],'Poland')

The chart looks similar like in case of Europe. However, the biggest growth we had in 2016 and we can see really small increase of flu detection after 50th week (in Europe we have bigger amount of cases in this time).
What is common both for Europe and Poland is no detection influenza in 2021.

### #2. How the covid fluctuaction looks in a weekly level (for Europe and Poland)
What we probably know is that Covid came in 2020. So we do not need all of the years we had in case of flu table. But Lets check it first

In [None]:
df[df.Confirmed>0].Year.unique()

In [None]:
df[df.Year>=2020].groupby(['Week', 'Year'])['Confirmed'].sum().reset_index().sort_values(['Year','Week'])

Since now I want to see some covid ouputs on charts and compare it to flu. So, I made decision to remove weeks from my dataframe and keep only years and quarters.

In [None]:
df_q = df[['Country','Year','Quarter','Confirmed','Deaths','Recovered','Detected_FluCases']].\
groupby(['Country','Year','Quarter']).agg({'Confirmed':'max','Deaths':'max','Recovered':'max','Detected_FluCases':'sum'})\
.sort_values(['Country','Year','Quarter']).reset_index().sort_values(['Country','Year','Quarter'])

In [None]:
def CovidChart(table,where):
    """
    THE CHART SHOWS WEEKLY CASES DETECTION FOR COVID IN QUARTERLY LEVEL 
    
    table: dataframe with covid cases (DataFrame)
    where: country we are interested in. Required for a chart title (String/Object)
    """
    tbl = table[table.Year>=2020].groupby(['Year','Quarter'])\
    .agg({'Confirmed':'sum', 'Deaths':'sum', 'Recovered':'sum','Detected_FluCases':'sum'})\
    .sort_values(['Year','Quarter']).reset_index()

    # Create a new column for Year and Quarters
    tbl['YearQuarter'] = tbl.Year.astype(str)+'-Q'+tbl.Quarter.astype(str)

    fig, ax1 = plt.subplots()
    plt.rcParams["figure.figsize"] = (25,15)
    plt.xticks(fontsize=14, rotation=45)
    plt.grid(color='grey', linestyle = '--', linewidth = 0.5)

    x = tbl.YearQuarter.sort_values().unique()
    y = tbl.groupby(['YearQuarter'])['Confirmed'].sum().reset_index().sort_values(['YearQuarter'])['Confirmed']
    plt.title(f"Covid19 Cases per year in {where}", fontsize=28)
    plt.xlabel('Quarters', fontsize=18)
    plt.ylabel("Detected Covid19 Cases", color='black', fontsize=18)
    plt.bar(x, y, color='grey')
    plt.tick_params(axis='y', labelcolor='black', labelsize=16) 

    for xx,yy in zip(x,y):
        label = f'{yy:,}'
        plt.annotate(label, (xx,yy), textcoords="offset points", xytext=(0,10), ha='center', fontsize=16) 

In [None]:
CovidChart(df_q,'Europe')

We can see above that we have a big covid cases increase in Q4 2020 and Q1 2021 was not much better. However, in Q2 2021 big improvement is visible.

In [None]:
CovidChart(df_q[df_q.Country=='Poland'],'Poland')

In Poland we have quite similar trend like in Europe.<br>
Now we can create one chart contain influenza and covid cases. But we need to keep in mind it will be done just for showing the trend, as in case of Covid Pandemic we are dealing with much biger number of cases

In [None]:
# I made a function to have bigger visibility on chart
def CovidCasesDevider(table_name):
    covid_cases_list_mln = ['Confirmed','Deaths','Recovered','Detected_FluCases']

# number of cases divided by 1000
    for col in covid_cases_list_mln: 
        for value in range(len(table_name)):
            table_name.loc[value,(col)] = table_name.loc[value,(col)] / 1000

# rounded to 2 decimal values            
    for col in covid_cases_list_mln: 
        table_name[col] = table_name[col].apply(lambda x: round(x,2))

# Do not need weeks/year in chart - quarters is enough      
    table_name['YearQuater'] = table_name.Year.astype(str)+'-Q'+table_name.Quarter.astype(str)
    table_name = table_name.drop(["Year","Quarter"],axis=1)
    
    return table_name

In [None]:
# I made a function to have bigger visibility on chart
def CovidCasesDevider_Country(table_name):
    covid_cases_list_mln = ['Confirmed','Deaths','Recovered','Detected_FluCases']

# number of cases decreased by 1000
    for col in covid_cases_list_mln: 
        for value in range(len(table_name)):
            table_name.loc[value,(col)] = table_name.loc[value,(col)] / 1000

# rounded to 2 decimal values            
    for col in covid_cases_list_mln: 
        table_name[col] = table_name[col].apply(lambda x: round(x,2))

# Do not need weeks/year in chart - quarters is enough      
    table_name['YearQuater'] = table_name.Year.astype(str)+'-Q'+table_name.Quarter.astype(str)
    table_name = table_name.drop(["Year","Quarter"],axis=1)
    
    return table_name

--
For making charts I do not need weeks as it is to

In [None]:
# for all countries - TOTAL
chart_1 = CovidCasesDevider(df_q.groupby(['Year','Quarter'])\
.agg({'Confirmed':'sum', 'Deaths':'sum', 'Recovered':'sum','Detected_FluCases':'sum'})\
.sort_values(['Year','Quarter']).reset_index())

In [None]:
chart_1.head(2)

In [None]:
# for chosen country
chosen_country = 'Poland'

chart_2 = CovidCasesDevider(df_q[final_df_quarter.Country==chosen_country].groupby(['Year','Quarter']).\
agg({'Confirmed':'sum', 'Deaths':'sum', 'Recovered':'sum','Detected_FluCases':'sum'}).\
sort_values(['Year','Quarter']).reset_index())

In [None]:
chart_2.head(2)

In [None]:
def VirusComparison(table,where):
    """
    THE CHART SHOWS QYARTERLY COMPARISON OF CASES DETECTION FOR COVID AND FLU 
    
    table: dataframe with detected cases (DataFrame)
    where: country we are interested in. Required for a chart title (String/Object)
    """
    # Chart Creation
    fig, ax1 = plt.subplots()
    plt.rcParams["figure.figsize"] = (25,15)
    plt.xticks(fontsize=16, rotation=45)
    plt.grid(color = 'green', linestyle = '--', linewidth = 0.5)
    
    # Data Aggregation
    chart = CovidCasesDevider(table.groupby(['Year','Quarter'])\
    .agg({'Confirmed':'sum', 'Deaths':'sum', 'Recovered':'sum','Detected_FluCases':'sum'})\
    .sort_values(['Year','Quarter']).reset_index())
    
    x = chart.YearQuater.unique()
    y = chart['Confirmed']
    z = chart['Detected_FluCases']

    ##############
    # Covid19:
    ax1.set_title(f"Covid and Flu Cases Comparison in {where} ('000)", fontsize=28)
    color = 'tab:red'
    ax1.set_xlabel('Periods', fontsize=24)
    ax1.set_ylabel("Confirmed Covid19 Cases ('000)", color=color, fontsize=24)
    ax1.plot(x, y, color=color)
    ax1.tick_params(axis='y', labelcolor=color, labelsize=16) 

    for xx,yy in zip(x,y):
        label = "{:.0f}".format(yy)
        plt.annotate(label, (xx,yy), textcoords="offset points", xytext=(0,10), ha='right', fontsize=20) 
    
    ##############
    # Influenza:
    ax2 = ax1.twinx()  
    color = 'tab:blue'
    ax2.set_ylabel("Detected Inluenza Cases ('000)", color=color, fontsize=24)  # we already handled the x-label with ax1
    ax2.plot(x, z, color=color)
    ax2.tick_params(axis='y', labelcolor=color, labelsize=16)


    for xx,zz in zip(x,z):
        label = "{:.1f}".format(zz)
        plt.annotate(label,(xx,zz),textcoords="offset points",xytext=(0,10),ha='right', fontsize=20)


    fig.tight_layout()  
    plt.show()

In [None]:
VirusComparison(df_q,'Europe')

Well, dependence between Covid and Flu is visible. Once pandemic came to Europe, influenza almost disappeared.
Please keep in mind that the volumes in the chart are divided by 1000.<br>
So, the highest Flu Cases was in Q1 2018: **184,900 cases**<br>
In the other side the highest Covid19 Cases was in Q2 2021: **52 285 000 cases**

In [None]:
chosen_country = 'Poland' 


VirusComparison(df_q[df_q.Country==chosen_country],chosen_country)

In Poland trend the chart looks similar 

# Conclusion:

Based on the data I have prepared, we can definitely say that Covid 19 pandemic influenced on the Influenza detection.
However, flu was never (at least since 2016) even close to the range of the Covid19 pandemic around the Europe.

The scale of the coronavirus is incredibly high and we cannot have any doubts we have been affected by pandemic.