# Objectives

During whole my live I haven't met a person that is or was sick with cholera. It is interesting for me to investigate this disease, as it might be somewhere around me, but I just don't know that. 

In this notebook I want to build:
- distribution plot for my country (Ukraine) to find out, how many cases are there. 
- map-plots using plotly to see, where are the most active regions. 

In [None]:
import pandas as pd
import plotly.graph_objects as go

## Checking data

In this chapter I'll upload data, review the dataset and fix possible errors, which is needed to find correct statistics and build correct visuals.

In [None]:
cholera = pd.read_csv('../input/cholera-dataset/data.csv')

In [None]:
cholera.head()

In [None]:
# Making column names shorter
cholera.rename(columns = {'Number of reported cases of cholera': 'Cases', 
                          'Number of reported deaths from cholera': 'Deaths', 
                          'Cholera case fatality rate': 'Fatality rate', 
                          'WHO Region': 'Region'}, inplace = True)

In [None]:
cholera.info()

In [None]:
# Checking if 'Fatality rate' can be calculated based on 'Cases' and 'Deaths'
cholera[(cholera['Fatality rate'].isnull()) & (~cholera['Cases'].isnull()) & (~cholera['Deaths'].isnull())]

There are missing values in 'Number of reported cases of cholera', 'Number of reported deaths from cholera' and 'Cholera case fatality rate' columns, but I'm not able to find or calculate these data.

Let's change some data types.

In [None]:
# Checking non-numerical values in Fatality rate column
cholera [(~cholera['Cases'].fillna('0').str.replace(' ','').str.isnumeric()) | (~cholera['Deaths'].fillna('0').str.replace('.','').str.isnumeric()) | (~cholera['Fatality rate'].fillna('0').str.replace('.','').str.isnumeric())]

- There is no information neither in the dataset nor in Internat, that somebody dead in Germany because of cholera in 2016, so I'm going to replace 'Unknown' values with '0'.
- For Iraq I'll leave only '3' as number of cases, because only 3 cases were reported to [WHO](https://www.who.int/gho/epidemic_diseases/cholera/cases/en/) 

In [None]:
# Fixing data and changing types to floats
cholera['Cases'] = cholera['Cases'].str.replace('3 5','3').str.replace(' ','').astype('float')
cholera['Deaths'] = cholera['Deaths'].str.replace('Unknown','0').str.replace('0 0','0').astype('float')
cholera['Fatality rate'] = cholera['Fatality rate'].str.replace('Unknown','0').str.replace('0.0 0.0','0').astype('float')

In [None]:
cholera.describe()

How is that possible, that Fatality Rate is more than 100%? Checking where is the problem:

In [None]:
cholera[cholera['Fatality rate'] > 100]

[This WHO report](https://www.who.int/csr/resources/publications/surveillance/en/cholera.pdf) says 
that there were no deaths because of cholera in Europe in 1998, so I'm going to correct data for Italy.

In [None]:
cholera.loc[1094, 'Deaths'] = 0
cholera.loc[1094, 'Fatality rate'] = 0

# Numbers

In this chapter I'd like to find numbers like:
- total number of cases and deaths; 
- top 10 countries with biggest numbers of choleras cases and deaths; 
- top 10 years with biggest outbreaks;
- statistics for last years.

In [None]:
# Total number of Cases
cholera['Cases'].sum()

In [None]:
# Countries with top 10 number of cases
cholera.groupby(['Country'])['Cases'].sum().sort_values(ascending = False).head(10)

In [None]:
# Total number of deaths
cholera['Deaths'].sum()

In [None]:
# Countries with top 10 number of deaths
cholera.groupby(['Country'])['Deaths'].sum().sort_values(ascending = False).head(10)

In [None]:
# 10 year with biggest outbreaks
cholera.groupby(['Year'])['Cases'].sum().sort_values(ascending = False).head(10)

In [None]:
# Statistics for last 5 years 
cholera[cholera['Year'] > 2010].groupby(['Year', 'Region'])['Cases'].sum().sort_index(ascending = [False, True])

# Visualisation

At this chapter I'm going to create map-plots that visualise the situation with cholera disease in the world. here you will find:
- map with total number of cholera cases per country.
- map with total number of cholera deaths per country.
- animation shows how cholera was spread.
- bar plot with TOP 15 fatality rates (taken average rate per country).
- statistics in Ukraine.

Let's take a look on total number of cholera cases (since 1949 year) in every country.

In [None]:
import plotly.graph_objects as go

fig = go.Figure(data=go.Choropleth(    
    locations = cholera.groupby('Country')['Cases'].sum().index,
    locationmode = "country names",
    z = cholera.groupby('Country')['Cases'].sum(),
    #text = cholera.groupby('Country')['Cases'].sum().index,
    colorscale = 'Reds_r',
    autocolorscale=False,
    reversescale=True,
    marker_line_color='darkgray',
    marker_line_width=0.5,
    colorbar_title = 'Number of cases',
))

fig.update_layout(
    title_text='Cholera Cases',
    geo=dict(
        showframe=False,
        showcoastlines=False,
        projection_type='natural earth'
    )
)

fig.show()

And total number of cholera deaths in every country.

In [None]:
import plotly.graph_objects as go

fig = go.Figure(data=go.Choropleth(    
    locations = cholera.groupby('Country')['Deaths'].sum().index,
    locationmode = "country names",
    z = cholera.groupby('Country')['Deaths'].sum(),
    colorscale = 'Hot',
    autocolorscale=False,
    reversescale=True,
    marker_line_color='darkgray',
    marker_line_width=0.5,
    colorbar_title = 'Number of cases',
))

fig.update_layout(
    title_text='Cholera Cases',
    geo=dict(
        showframe=False,
        showcoastlines=False,
        projection_type='natural earth'
    )
)

fig.show()

Below you can see an animation, that shows how cholera disease was spread since 1949 year.

In [None]:
import plotly.express as px
fig = px.scatter_geo(cholera.dropna(), 
                     locations = "Country", locationmode = "country names", 
                     color="Cases", color_continuous_scale = "Reds",
                     hover_name="Country", 
                     size = cholera.dropna()["Cases"], size_max = 50,
                     animation_frame = "Year",
                     category_orders = {"Year": range(1949,2017)},
                     projection="natural earth"                    
                    )
fig.update_geos(
    showframe=False,
    showcoastlines=False,
    showcountries=True, 
    countrycolor="White")
fig.show()

Let's find out which countries have the biggest Fatality rate.

In [None]:
fig = go.Figure(data=[
     go.Bar(name='Deaths', x = cholera.groupby(['Country'])['Fatality rate'].mean().sort_values(ascending = False).head(15).sort_values(), 
            y = cholera.groupby(['Country'])['Fatality rate'].mean().sort_values(ascending = False).head(15).sort_values().index,
            orientation='h', marker_color='indianred')
])

fig.update_layout(title_text='Cholera cases fatality rate', height = 500)
fig.show()

As last step I'm going to check situation in my country - Ukraine.

In [None]:
fig = go.Figure(data=[
    go.Bar(name='Deaths', x = cholera[cholera['Country'] == 'Ukraine']['Year'], y = cholera[cholera['Country'] == 'Ukraine']['Deaths'], 
    marker_color='lightslategray'), 
    go.Bar(name='Cases', x = cholera[cholera['Country'] == 'Ukraine']['Year'], y = cholera[cholera['Country'] == 'Ukraine']['Cases'], marker_color='indianred')
])
fig.update_layout(barmode='stack', title_text='Cholera in Ukraine', height = 400, width = 600)
fig.show()

Thank you for reviewing. Hope you found some interesting insights!

Please rate this notebook and leave comments with your opinion.)