In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import warnings
import glob
import os

In [None]:
warnings.filterwarnings('ignore')

Read all the csv files

In [None]:
for data_file in glob.glob('../input/who-worldhealth-statistics-2020-complete/*'):
    df_name = os.path.basename(data_file).split('.')[0] + '_df'
    globals()[df_name] = pd.read_csv(data_file)

Filtering out India

In [None]:
all_vars = globals().copy()
for var in all_vars:
    if '_df' in var:
        if '30' not in var:
            print('#' * 100)
            print(var + ':')
            print(all_vars[var].columns)
            globals()[var] = all_vars[var][all_vars[var].Location == 'India']
        else:
            cancer_df = pd.read_csv("../input/who-worldhealth-statistics-2020-complete/30-70cancerChdEtc.csv")
            cancer_df = cancer_df[cancer_df.Location == 'India']

Let's go alphabetically(as per names of csv files).

## Death Rate

In [None]:
cancer_df = cancer_df.rename(columns={'Dim1': 'Sex', 'First Tooltip': 'Mortality %'});
px.line(data_frame=cancer_df, x='Period', y='Mortality %', color='Sex', title=cancer_df.Indicator.iloc[0], height=600, width=600)

Death rate due to these diseases has decreased for both the sexes. This may be due to betterment of the overall healthcare of the country.

## Life Expectancy

In [None]:
HALElifeExpectancyAtBirth_df = HALElifeExpectancyAtBirth_df.rename(columns={'Dim1': 'Sex', 'First Tooltip': 'Healthy life expectancy'});
px.line(data_frame=HALElifeExpectancyAtBirth_df, x='Period', y='Healthy life expectancy', color='Sex', title=HALElifeExpectancyAtBirth_df.Indicator.iloc[0], height=600, width=600)

Again, arguably due to improving healthcare, life expectancy increased by 13% from 2000 to 2019

## Teen Birth Rate

In [None]:
adolescentBirthRate_df = adolescentBirthRate_df.rename(columns={'First Tooltip': 'ABR'});
px.bar(data_frame=adolescentBirthRate_df, x='Period', y='ABR', title=adolescentBirthRate_df.Indicator.iloc[0], height=600, width=600)

<b>Adolescent Birth Rate(here: ABR)</b> is the number of teenage girls (aged 15-19) giving birth in a year per 1000 15-19 year old teenage girls.

Although the overall trend is downwards, there are two sudden drops: from 2000 to 2001(-38%) and from 2014 to 2015(-59%); I don't know why though.

## Air Pollution

In [None]:
airPollutionDeathRate_df = airPollutionDeathRate_df.rename(columns={'First Tooltip': 'Death Rate'})

In [None]:
airPollutionDeathRate_df['APDR'] = airPollutionDeathRate_df['Death Rate'].str.replace('\s\[.*\]', '').astype(float)

In [None]:
px.bar(data_frame=airPollutionDeathRate_df[airPollutionDeathRate_df.Dim2 != 'Total'], x='Dim1', y='APDR', color='Dim2', title=airPollutionDeathRate_df.Indicator.iloc[0], height=600, width=600)

There\'s something wrong here, each category(disease) is divided into two parts. When I investigated the dataframe, I found that there are two entries of APDR for the same column entries, which is weird. If you know what I'm missing here, please let me know.

Anyways, the two biggest conditions that occur due to air pollution are

1. Chronic obstructive pulmonary disease
2. Ischaemic heart disease

while lung cancers are a rarity

## Alcohol Consumption

In [None]:
alcoholSubstanceAbuse_df = alcoholSubstanceAbuse_df.rename(columns={'Dim1': 'Sex', 'First Tooltip': 'Consumption'})

In [None]:
px.area(data_frame=alcoholSubstanceAbuse_df, x='Period', y='Consumption', color='Sex',
        title=alcoholSubstanceAbuse_df.Indicator.iloc[0],# barmode='group',
          width=600, height=600)

Alcohol was(still is) seen as a taboo topic in most middle-class households but is slowly getting normalized. Alcohol Consumption has boomed due to the rise of western/pop culture in the country. The popularity of alcohol has risen so much that it <a href="https://www.ft.com/content/a63328a9-42a9-43ee-9d51-97a77ac45c2e"> was chosen to generate funds</a> during the nationwide lockdown due to COVID-19.

## Sanitization

In [None]:
atLeastBasicSanitizationServices_df = atLeastBasicSanitizationServices_df.rename(columns={'Dim1': 'Area', 'First Tooltip': '% Population'})

In [None]:
px.bar(data_frame=atLeastBasicSanitizationServices_df[atLeastBasicSanitizationServices_df.Area != 'Total'],
       x='Period', y='% Population', color='Area',
       height=600, width=600,
       title=atLeastBasicSanitizationServices_df.Indicator.iloc[0],
       barmode='group')

Okay. 50% of Urban population had basic sanitization in 2000 that became 72% in 2017, a 44% increase, which is impressive. But, only 3.7% Rural population had it in 2000 that became 53% making it a whopping 1666% increase!

## Drinking Water

In [None]:
basicDrinkingWaterServices_df = basicDrinkingWaterServices_df.rename(columns={'First Tooltip': '% Population'})

In [None]:
px.line(basicDrinkingWaterServices_df,
        x='Period', y='% Population',
        title=basicDrinkingWaterServices_df.Indicator.iloc[0],
        height=600,
        width=600)

We see a stark linear increase. If the situation continues to be the same, the entire nation will have at least basic drinking water services by the year 2027, as shown by simple linear regression:

In [None]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(basicDrinkingWaterServices_df['Period'].values.reshape(-1, 1), basicDrinkingWaterServices_df['% Population'].values)
x_hat = np.arange(2017, 2028).reshape(-1, 1)
y_hat = model.predict(x_hat)
plt.rcParams['figure.figsize'] = 8, 8
plt.plot(basicDrinkingWaterServices_df.Period.values, basicDrinkingWaterServices_df['% Population'].values, label='Data')
plt.plot(x_hat.reshape(-1), y_hat.reshape(-1), label='Forecast');
plt.scatter(x_hat.reshape(-1)[-1], y_hat.reshape(-1)[-1], marker='x', color='k')
plt.annotate(s='2027, 100% forecast', xy=(2024, 99.5))
plt.xlabel('Period')
plt.ylabel('% Population')
plt.legend();

## Basic Handwash

In [None]:
basicHandWashing_df = basicHandWashing_df.rename(columns={'Dim1': 'Area', 'First Tooltip': '% Population'})

In [None]:
px.bar(data_frame=basicHandWashing_df[basicHandWashing_df.Area != 'Total'],
       x='Period', y='% Population', color='Area',
       height=600, width=600,
       title=basicHandWashing_df.Indicator.iloc[0],
       barmode='group')

Hmm. No change at all. Maybe corrupt data. Moving on.

## Professional Childbirth

In [None]:
birthAttendedBySkilledPersonal_df = birthAttendedBySkilledPersonal_df.rename(columns={'First Tooltip': '% Population'})

In [None]:
px.bar(data_frame=birthAttendedBySkilledPersonal_df,
       x='Period', y='% Population',
       height=600, width=600,
       title=birthAttendedBySkilledPersonal_df.Indicator.iloc[0])

There's an increasing trend, but the entries for 2014 and 16 are exactly the same, which is unlikely. Good for India though, as healthcare workers are more able to contain any danger to the mother's or child's life

## Fuel and Tech availability

In [None]:
cleanFuelAndTech_df = cleanFuelAndTech_df.rename(columns={'First Tooltip': '% Population'})

In [None]:
px.line(cleanFuelAndTech_df,
        x='Period', y='% Population',
        title=cleanFuelAndTech_df.Indicator.iloc[0],
        height=600,
        width=600)

## Crude Suicide Rate

In [None]:
crudeSuicideRates_df = crudeSuicideRates_df.rename(columns={'First Tooltip': 'CSR', 'Dim1': 'Sex'})

In [None]:
px.line(crudeSuicideRates_df,
        x='Period', y='CSR', color='Sex',
        title=crudeSuicideRates_df.Indicator.iloc[0],
        height=600,
        width=600)

The data has either been vastly under-reported or is false. This cannot be true.

## Dentists availability

In [None]:
dentists_df = dentists_df.rename(columns={'First Tooltip': '# Dentists per 10000 people'})

In [None]:
px.line(dentists_df,
        x='Period', y='# Dentists per 10000 people',
        title=dentists_df.Indicator.iloc[0],
        height=600,
        width=600)

We see a quadratic/cubic/whatever-degree-it-is increase in the number of dentists through the years, starting from 1991. Good news for Indians' oral health I guess.

## Relationship violence

In [None]:
eliminateViolenceAgainstWomen_df = eliminateViolenceAgainstWomen_df.rename(columns={'Dim2': 'Age Group', 'First Tooltip': 'Proportion'})

In [None]:
px.scatter(eliminateViolenceAgainstWomen_df, y='Age Group', x='Proportion', title=eliminateViolenceAgainstWomen_df.Indicator.iloc[0])

Most violent physical/sexual assault by partners happen to women aged 20-24 years followed by the 30-34 age bracket.

## Malaria Situation

In [None]:
incedenceOfMalaria_df = incedenceOfMalaria_df.rename(columns={'First Tooltip': 'Incedence per 1000'})

In [None]:
px.bar(incedenceOfMalaria_df, x='Period', y = 'Incedence per 1000', title=incedenceOfMalaria_df.Indicator.iloc[0])

Malaria is going down!

## Tuberculosis Situation

In [None]:
incedenceOfTuberculosis_df = incedenceOfTuberculosis_df.rename(columns={'First Tooltip': 'Incedence per 1000'})

In [None]:
incedenceOfTuberculosis_df['Incedence per 1000'] = incedenceOfTuberculosis_df['Incedence per 1000'].str.replace('\s\[.*\]', '').astype(float)

In [None]:
px.bar(incedenceOfTuberculosis_df, x='Period', y = 'Incedence per 1000', title=incedenceOfTuberculosis_df.Indicator.iloc[0])

Tuberculosis is also going down!

## Infant Mortality

In [None]:
infantMortalityRate_df = infantMortalityRate_df.rename(columns={'First Tooltip': 'Infant Mortality Rate', 'Dim1': 'Sex'})

In [None]:
infantMortalityRate_df['Infant Mortality Rate'] = infantMortalityRate_df['Infant Mortality Rate'].str.replace('\s\[.*\]', '').astype(float)

In [None]:
px.bar(infantMortalityRate_df, x='Period', y='Infant Mortality Rate', color='Sex', title=infantMortalityRate_df.Indicator.iloc[0], barmode='group')

Infant Mortality Rate was very high earlier and that's why Indians didn't stop reproducing. It got wayyyy better with time but no one told us. We are 1,384,660,352(Source: https://en.wikipedia.org/wiki/Demographics_of_India) now.

## Life Expectancy at Birth

In [None]:
lifeExpectancyAtBirth_df = lifeExpectancyAtBirth_df.rename(columns={'First Tooltip': 'Life Expectancy', 'Dim1': 'Sex'})

In [None]:
px.line(lifeExpectancyAtBirth_df, x='Period', y='Life Expectancy', color='Sex',
           title=lifeExpectancyAtBirth_df.Indicator.iloc[0],
           height=600, width=600)

Life Expectancy for females has always been higher than males but the difference is increasing.

## Doctors

In [None]:
medicalDoctors_df = medicalDoctors_df.rename(columns={'First Tooltip': '# Doctors per 10000 people'})

In [None]:
px.area(medicalDoctors_df, x='Period', y='# Doctors per 10000 people', title=medicalDoctors_df.Indicator.iloc[0])

The "becoming a doctor" culture apparently was very prominent in the 90s as being a doctor was a very respectable/highly paid profession. Ayurvedic doctors have always been respected by all the walks of society from ancient times. Also, you'd never lose your job if you were a doctor as we saw earlier, that the country had so many healthcare problems.

I'm pretty sure that the number of Indian doctors is still high but many go out of the country to settle.

In [None]:
nursingAndMidwife_df = nursingAndMidwife_df.rename(columns={'First Tooltip': 'Midwifery per 10000 people'})

In [None]:
px.line(nursingAndMidwife_df, x='Period', y='Midwifery per 10000 people', title=nursingAndMidwife_df.Indicator.iloc[0])

This is a veryyyy weird trend. Why the sudden drop in 2006? I did some digging and found that in 2006, the government found that a lot of private institutions are giving out nurse and midwife titles to people without any skills for it. The government in response, made sure that the jobs for nursing and midwifery are only given to people with necessary diplomas, certificate or degree.

In [None]:
safelySanitization_df = safelySanitization_df.rename(columns={'First Tooltip': '% Population'})

In [None]:
px.line(safelySanitization_df, x='Period', y='% Population', title=safelySanitization_df.Indicator.iloc[0])

As we saw earlier, people are getting better healthcare and are resorting to more and more sophisticated sanitization services.

In [None]:
under5MortalityRate_df = under5MortalityRate_df.rename(columns={'First Tooltip': 'UFMR per 1000 childbirth', 'Dim1': 'Sex'})

In [None]:
under5MortalityRate_df['UFMR per 1000 childbirth'] = under5MortalityRate_df['UFMR per 1000 childbirth'].str.replace('\s\[.*\]', '').astype(float)

In [None]:
px.line(under5MortalityRate_df, x='Period', y='UFMR per 1000 childbirth', color='Sex', title=under5MortalityRate_df.Indicator.iloc[0])

Again, blah blah, healthcare, blah blah. You get the point, right?

## FUTURE WORK

- Exploring other data files' data - <b>Done</b>
- Maybe some modeling?
- Comparing India with some other country/countries with similar present economic/social/political conditions