# Tuberculosis - Data Visualization

TB is caused by bacteria (Mycobacterium tuberculosis) and it most often affects the lungs. TB is spread through the air when people with lung TB cough, sneeze or spit. A person needs to inhale only a few germs to become infected.

Every year, 10 million people fall ill with tuberculosis (TB). Despite being a preventable and curable disease, 1.5 million people die from TB each year – making it the world’s top infectious killer.

TB is the leading cause of death of people with HIV and also a major contributor to antimicrobial resistance.

# WHO TB burden estimates
**This includes WHO-generated estimates of TB mortality, incidence (including disaggregation by age and sex and incidence of TB/HIV), case fatality ratio, treatment coverage (previously called case detection rate), proportion of TB cases that have rifampicin-resistant TB (RR-TB, which includes cases with multidrug-resistant TB, MDR-TB), RR/MDR-TB among notified pulmonary TB cases and latent TB infection among children aged under 5.**





In [None]:
from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))

In [None]:

import numpy as np 
import pandas as pd 

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))



In [None]:
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import seaborn as sns
import os
%matplotlib inline

The data used here is provided by WHO 

In [None]:
df = pd.read_csv(r"../input/TB_burden_age_sex_2020-06-14.csv")
df.head()

In [None]:
df.info()

In [None]:
# This fuctions is used a few times to display bar values in barplots
def show_values_on_bars(axs, *args, **kwargs):
    """
    Function based on Sharon Soussan answer for:
    https://stackoverflow.com/questions/43214978/seaborn-barplot-displaying-values
    This function will help display information over some plots.
    """

    def _show_on_single_plot(ax, *args):        
        for p in ax.patches:
            _x = p.get_x() + p.get_width() / 2
            if 'height' in kwargs:
                _y = kwargs['height']
            else:
                _y = 10
            value = f'{p.get_height()}'
            ax.text(_x, _y, value, fontsize=14, ha="center") 

    if isinstance(axs, np.ndarray):
        for idx, ax in np.ndenumerate(axs, *args):
            _show_on_single_plot(ax, *args)
    else:
        _show_on_single_plot(axs, *args)

In [None]:
# Setting the style for my plots
sns.set_style(style='ticks')

# Visualization based on sex

In [None]:
df['sex'].value_counts()

In [None]:
sex_palette = sns.color_palette(["#4287f5", "#bd3c3c"])
pclass_palette = sns.color_palette(["#FFDF00", "#c0c0c0", "#cd7f32"])
survived_palette = sns.color_palette(["#4e5245", "#57916c"])

fig = plt.figure(figsize=(16,8), constrained_layout=True)
gs = gridspec.GridSpec(nrows=3, ncols=4, figure=fig)


# Male/Female Totals
ax1 = fig.add_subplot(gs[0, 0:2])
sns.countplot(x='sex', data=df, palette=sex_palette, edgecolor=sns.color_palette(["#000"]), alpha=0.5)
show_values_on_bars(ax1)
plt.ylabel('risk_factor')





# Visualization based on 'Age'

In [None]:

df['age_group'].value_counts()


In [None]:
df['age_group'].value_counts()

In [None]:
df['sexB'] = df['sex'].map({'male': 1,'female': 0})
df['EmbarkedNum'] = df['country'].map({'S': 0,'C': 1, 'Q': 2})

In [None]:
sns.heatmap(df.corr(), cmap='Pastel1')
plt.title('Correlation', fontsize=24)

In [None]:
df['country'].value_counts()