<a href="https://colab.research.google.com/github/namita0210/longevity-prediction/blob/main/longevity_standalone_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:

# IMPORTANT: RUN THIS CELL IN ORDER TO IMPORT YOUR KAGGLE DATA SOURCES
# TO THE CORRECT LOCATION (/kaggle/input) IN YOUR NOTEBOOK,
# THEN FEEL FREE TO DELETE THIS CELL.
# NOTE: THIS NOTEBOOK ENVIRONMENT DIFFERS FROM KAGGLE'S PYTHON
# ENVIRONMENT SO THERE MAY BE MISSING LIBRARIES USED BY YOUR
# NOTEBOOK.

import os
import sys
from tempfile import NamedTemporaryFile
from urllib.request import urlopen
from urllib.parse import unquote, urlparse
from urllib.error import HTTPError
from zipfile import ZipFile
import tarfile
import shutil

CHUNK_SIZE = 40960
DATA_SOURCE_MAPPING = 'child-and-infant-mortality:https%3A%2F%2Fstorage.googleapis.com%2Fkaggle-data-sets%2F2424497%2F4099677%2Fbundle%2Farchive.zip%3FX-Goog-Algorithm%3DGOOG4-RSA-SHA256%26X-Goog-Credential%3Dgcp-kaggle-com%2540kaggle-161607.iam.gserviceaccount.com%252F20240208%252Fauto%252Fstorage%252Fgoog4_request%26X-Goog-Date%3D20240208T073303Z%26X-Goog-Expires%3D259200%26X-Goog-SignedHeaders%3Dhost%26X-Goog-Signature%3D103203b498c089b9760a5c066c340235c1d8ee4a13dffbd313d3288b3e08a702f4092bb81ed17225e666e9346571fa9eaf3ec71a2957ec8704c3643f4f15ab5ba28f27c5471de9f2e4f0450fd8a179bb9beba2ba1dfaac86c771d644f9425e276fbe7294a76ff6462938b110acb0ad2dec6b117b0057d7f1a2d02bf94e8da32cc2a7982e9bbac7f94df1507c1d67653a4c1615efd9d2238564aafff4a5ba5b1a89f04c664d6bc04ad3714080d262de9ad174de3609c69b1ff9fdb5a138ca1b5570f8be3a950e315166f0f329e4eabfaac35edcd52742d27dc9748c64c92acedfa368f1bb80f89a717cfdbfd42ba9c8ecdca860c8b3f6035159270d9c6edc2bb2'

KAGGLE_INPUT_PATH='/kaggle/input'
KAGGLE_WORKING_PATH='/kaggle/working'
KAGGLE_SYMLINK='kaggle'

!umount /kaggle/input/ 2> /dev/null
shutil.rmtree('/kaggle/input', ignore_errors=True)
os.makedirs(KAGGLE_INPUT_PATH, 0o777, exist_ok=True)
os.makedirs(KAGGLE_WORKING_PATH, 0o777, exist_ok=True)

try:
  os.symlink(KAGGLE_INPUT_PATH, os.path.join("..", 'input'), target_is_directory=True)
except FileExistsError:
  pass
try:
  os.symlink(KAGGLE_WORKING_PATH, os.path.join("..", 'working'), target_is_directory=True)
except FileExistsError:
  pass

for data_source_mapping in DATA_SOURCE_MAPPING.split(','):
    directory, download_url_encoded = data_source_mapping.split(':')
    download_url = unquote(download_url_encoded)
    filename = urlparse(download_url).path
    destination_path = os.path.join(KAGGLE_INPUT_PATH, directory)
    try:
        with urlopen(download_url) as fileres, NamedTemporaryFile() as tfile:
            total_length = fileres.headers['content-length']
            print(f'Downloading {directory}, {total_length} bytes compressed')
            dl = 0
            data = fileres.read(CHUNK_SIZE)
            while len(data) > 0:
                dl += len(data)
                tfile.write(data)
                done = int(50 * dl / int(total_length))
                sys.stdout.write(f"\r[{'=' * done}{' ' * (50-done)}] {dl} bytes downloaded")
                sys.stdout.flush()
                data = fileres.read(CHUNK_SIZE)
            if filename.endswith('.zip'):
              with ZipFile(tfile) as zfile:
                zfile.extractall(destination_path)
            else:
              with tarfile.open(tfile.name) as tarfile:
                tarfile.extractall(destination_path)
            print(f'\nDownloaded and uncompressed: {directory}')
    except HTTPError as e:
        print(f'Failed to load (likely expired) {download_url} to path {destination_path}')
        continue
    except OSError as e:
        print(f'Failed to load {download_url} to path {destination_path}')
        continue

print('Data source import complete.')


In [None]:
#importing libraries
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")

# Deaths in children due to Lower Respiratory Diseases

Lower Respiratory Infections include Bronchitis, Bronchiolitis and Pneumonia

Symptoms of pneumonia: Signs of pneumonia are a combination of respiratory symptoms, including 'cough and fast or difficult breathing due to a chest-related problem'.

Lower Respiratory Infections stands to be the **biggest** reason for infant and child mortality around the world.

- Infant mortality rate can be defined as the number of children less than 1 year old who die in any given year per 1000 live births.
- Child mortality rate is the probability of dying between exact age one and age five, expressed as the number of deaths of children from exact age one to less than age five during a given period per 1,000 children surviving to age 12 months at the beginning of the period.

### Summary:
Region wise, **World Bank Lower Middle Income** countries like those with a GNI per capita between $1,086 and $4,255; (South Asia countries etc.) have the highest number of deaths due to lower respiratory infections

**Nigeria** has highest number of deaths due to Lower Respiratory Diseases followed by India in 2019. The numbers are so close that it is hard to make a difference.

**India forms 19.12% of the total deaths in 2019** and 30.5% of the deaths in world bank lower middle income countries, which is alarming and the country should definitely consider this as a priority.

**Sub-saharan african** countries have highest child mortality rate with 209.08 deaths per 1000 live births in 2019

Highest child mortality rate has been observed in **Chad with 412.58 chidren per 1000 live births** followed by Burkino Faso, a country in West Africa

Deaths due to Respiratory infections is highest in India but also the difference between India and Pakistan is significantly high (5746 deaths). This may be due to difference in population of children aged between 5-14 years in the two countries.

**Main factor for Lower respiratory diseases in 2019 is Child Wasting**. Wasting is often referred to as acute malnutrition. It is a sign that a child has experienced short periods of undernutrition, resulting in significant wastage of muscle and fat tissue. This means their weight is very low for their height.

**Cuba has the highest percentage of children under healthcare** followed by guyana.
India with highest number of deaths has 76% of children under healthcare. So we can say India is definitely looking into the issue but with increasing population, the country should try to increase the number in the coming years.

## Reasons for death in Children around the globe

In [None]:
data1=pd.read_csv("/kaggle/input/child-and-infant-mortality/causes-of-death-in-children.csv")
data1.columns=["Country","Code","Year","Malaria","HIV/AIDS","Meningitis","Nutritional Deficiencies","Other neonatal disorders","Whooping Cough","Lower respiratory infections","Congenital birth","Measles","Neonatal sepsis and other neonatal infections","eonatal encephalopathy due to birth asphyxia and trauma","Drowning","Tuberculosis","Neonatal preterm birth","Diarrheal diseases","Neoplasms","Syphilis"]
data1.head()

In [None]:
data1_reasons=data1[(data1["Country"]=="World")]

In [None]:
columns=["Malaria","HIV/AIDS","Meningitis","Nutritional Deficiencies","Other neonatal disorders","Whooping Cough","Lower respiratory infections","Congenital birth","Measles","Neonatal sepsis and other neonatal infections","eonatal encephalopathy due to birth asphyxia and trauma","Drowning","Tuberculosis","Neonatal preterm birth","Diarrheal diseases","Neoplasms","Syphilis"]
data1_reasons.head()

In [None]:
fig = px.line(data1_reasons, x=data1_reasons["Year"], y=columns, title='Infant deaths due to diseases')
fig.show()

In [None]:
rows_drop=["World","Western Pacific Region","African Region (WHO)","OECD Countries","East Asia and Pacific (WB)","Eastern Mediterranean region (WHO)","Europe & Central Asia (WB)", "European Region (WHO)","G20","Latin America & Caribbean","World Bank Lower Middle Income","World Bank Upper Middle Income","World Bank High Income","World Bank Low Income","Western Pacific Region (WHO)","East Asia & Pacific (WB)","Middle East & North Africa (WB)","North America (WB)","Region of the Americas (WHO)","South Asia (WB)","South-East Asia Region (WHO)","Sub-Saharan Africa (WB)","Eastern Mediterranean Region (WHO)","Latin America & Caribbean (WB)"]
for i in rows_drop:
    data1.drop(data1[(data1.Country==i)].index,inplace=True)
data1.head()

In [None]:
pd.reset_option('display.max_rows',200)
#data1.Country.unique()

In [None]:
data1_2019=data1[(data1['Year'] == 2019)]
data1_2019.head()

In [None]:
data1_2019=data1_2019.drop(["Year","Country"],axis=1).sum(axis=0)
#data_2019=pd.DataFrame(data1_2019,index_col=["Object","sum"])

In [None]:
Sum=[]
for column_name, column_sum in data1_2019.items():
    Sum.append(column_sum)
#Sum

In [None]:
fig = px.bar(data1_2019, x=data1_2019.keys(), y=Sum, title="Reasons for deaths in children")
fig.update_layout(
    xaxis_title="Diseases",
    yaxis_title="Deaths"
)

fig.show()

**Observations:**
- Highest deaths in children in the year 2019 was due to Lower Respiratory diseases followed by Neonatal Preterm birth
- Lower respiratory diseases make up 15% of the total deaths in 2019
- Lower respiratory disease include: Bronchitis, Bronchiolitis and Pneumonia.
- Preterm is defined as babies born alive before 37 weeks of pregnancy are completed.

## Region-wise deaths in children due to Lower Respiratory Diseases

In [None]:
data2 = pd.read_csv("/kaggle/input/child-and-infant-mortality/causes-of-death-in-children.csv")
data2.columns=["Country","Code","Year","Malaria","HIV/AIDS","Meningitis","Nutritional Deficiencies","Other neonatal disorders","Whooping Cough","Lower respiratory infections","Congenital birth","Measles","Neonatal sepsis and other neonatal infections","eonatal encephalopathy due to birth asphyxia and trauma","Drowning","Tuberculosis","Neonatal preterm birth","Diarrheal diseases","Neoplasms","Syphilis"]
data2=data2.loc[data2['Country'].isin(rows_drop)]

In [None]:
data2.head()

In [None]:
fig = px.line(data2, x="Year", y="Lower respiratory infections", color='Country',title="Region wise deaths in children due to Lower respiratory diseases: Trend analysis")
fig.show()

In [None]:
420.181/671.928

In [None]:
data2_region=data2[(data2['Year'] == 2019) & (data2["Country"]!="World")]

In [None]:
fig = px.bar(data2_region, x="Country", y=data2_region["Lower respiratory infections"], title="Region wise deaths due to Lower Respiratory Diseases")
fig.show()

**Observations:**
- Region wise, World Bank Lower Middle Income countries like  those with a GNI per capita between \\$1,086 and \\$4,255; (South Asia countries etc.) have the highest number of deaths due to lower respiratory infections
-World bank Lower Middle Income Countries although observe a steady decrease in deaths from 1990 to 2019 are still facing the highest number of deaths in chidren in 2019 with 420.181k deaths which is about 62% of the total around the world

## Country-wise deaths in children due to Lower Respiratory Diseases

In [None]:
fig = px.line(data1, x="Year", y="Lower respiratory infections", color='Country',title="Country wise deaths in children due to Lower Respiratory Diseases")
fig.show()

In [None]:
data1_country=data1[(data1['Year'] == 2019)]
data1_country.sort_values(by="Lower respiratory infections")

In [None]:
fig = px.bar(data1_country, x="Country", y=data1_country["Lower respiratory infections"], title="Country wise deaths due to Lower Respiratory Diseases")
fig.show()

In [None]:
129.44/671.928
129.44/347.237

**Observations:**
- Nigeria has highest number of deaths due to Lower Respiratory Diseases followed by India in 2019. The numbers are so close that it is hard to make a difference.
- Observing the trend from 1990 to 2019, we can see that China had a steady drop in deaths 1990 to 2003. Nigeria had no improvement from 1990 to 2019 which is surprising since there has been development in the healthcare in all parts of the world over the years.
- India forms 19.12% of the total deaths in 2019 and 30.5% of the deaths in world bank lower middle income countries, which is alarming and the country should definitely consider this as a priority.
- Nigeria forms 19.26% of the total deaths in 2019 and 37.27% of the deaths in African region.

## Child mortality due to pneumonia

In [None]:
data3=pd.read_csv("/kaggle/input/child-and-infant-mortality/pneumonia-death-rates-in-children-under-5.csv")
data3.head()

In [None]:
data3_2=pd.read_csv("/kaggle/input/child-and-infant-mortality/pneumonia-death-rates-in-children-under-5.csv")
rows_drop=["Western Pacific Region","African Region (WHO)","OECD Countries","East Asia and Pacific (WB)","Eastern Mediterranean region (WHO)","Europe & Central Asia (WB)", "European Region (WHO)","G20","Latin America & Caribbean","World Bank Lower Middle Income","World Bank Upper Middle Income","World Bank High Income","World Bank Low Income","Western Pacific Region (WHO)","East Asia & Pacific (WB)","Middle East & North Africa (WB)","North America (WB)","Region of the Americas (WHO)","South Asia (WB)","South-East Asia Region (WHO)","Sub-Saharan Africa (WB)","Eastern Mediterranean Region (WHO)","Latin America & Caribbean (WB)"]
data3_region=data3_2.loc[data3_2['Entity'].isin(rows_drop)]

In [None]:
data3_region=data3_region[(data3_region['Year'] == 2019)]
data3_region

In [None]:
fig = px.bar(data3_region, x="Entity", y="Deaths - Lower respiratory infections - Sex: Both - Age: Under 5 (Rate)", color='Entity', title='Child mortality rate due to Lower Respiratory Diseases- Region wise')
fig.show()

**Observations:**
- Sub-saharan african countries have highest child mortality rate with 209.08 deaths per 1000 live births in 2019

In [None]:
rows_drop=["World","Western Pacific Region","African Region (WHO)","OECD Countries","East Asia and Pacific (WB)","Eastern Mediterranean region (WHO)","Europe & Central Asia (WB)", "European Region (WHO)","G20","Latin America & Caribbean","World Bank Lower Middle Income","World Bank Upper Middle Income","World Bank High Income","World Bank Low Income","Western Pacific Region (WHO)","East Asia & Pacific (WB)","Middle East & North Africa (WB)","North America (WB)","Region of the Americas (WHO)","South Asia (WB)","South-East Asia Region (WHO)","Sub-Saharan Africa (WB)","Eastern Mediterranean Region (WHO)","Latin America & Caribbean (WB)"]
for i in rows_drop:
    data3.drop(data3[(data3.Entity==i)].index,inplace=True)

In [None]:
data3_country=data3[(data3['Year'] == 2019)]

In [None]:
fig = px.bar(data3_country, x="Entity", y="Deaths - Lower respiratory infections - Sex: Both - Age: Under 5 (Rate)", color='Entity', title='Child mortality rate due to Lower Respiratory Diseases- Country wise Age: Under 5')
fig.show()

**Observations:**
- Highest child mortality rate has been observed in Chad with 412.58 chidren per 1000 live births followed by Burkino Faso, a country in West Africa

In [None]:
data4=pd.read_csv("/kaggle/input/child-and-infant-mortality/pneumonia-and-lower-respiratory-diseases-deaths.csv")
data4.columns=["Country","Code", "Year","Under 5","Age: 50-69 years","Age: 15-49 years","Age: 5-14 years","Age: 70+ years"]
data4.drop(["Age: 50-69 years","Age: 15-49 years","Age: 70+ years"],axis=1,inplace=True)
rows_drop=["World","Western Pacific Region","African Region (WHO)","OECD Countries","East Asia and Pacific (WB)","Eastern Mediterranean region (WHO)","Europe & Central Asia (WB)", "European Region (WHO)","G20","Latin America & Caribbean","World Bank Lower Middle Income","World Bank Upper Middle Income","World Bank High Income","World Bank Low Income","Western Pacific Region (WHO)","East Asia & Pacific (WB)","Middle East & North Africa (WB)","North America (WB)","Region of the Americas (WHO)","South Asia (WB)","South-East Asia Region (WHO)","Sub-Saharan Africa (WB)","Eastern Mediterranean Region (WHO)","Latin America & Caribbean (WB)"]
for i in rows_drop:
    data4.drop(data4[(data4.Country==i)].index,inplace=True)
data4.head()

In [None]:
data4.Year.unique()

In [None]:
data4_2=pd.read_csv("/kaggle/input/child-and-infant-mortality/pneumonia-and-lower-respiratory-diseases-deaths.csv")
data4_2.columns=["Country","Code", "Year","Under 5","Age: 50-69 years","Age: 15-49 years","Age: 5-14 years","Age: 70+ years"]
data4_2.drop(["Age: 50-69 years","Age: 15-49 years","Age: 70+ years"],axis=1,inplace=True)
rows_drop=["Western Pacific Region","African Region (WHO)","OECD Countries","East Asia and Pacific (WB)","Eastern Mediterranean region (WHO)","Europe & Central Asia (WB)", "European Region (WHO)","G20","Latin America & Caribbean","World Bank Lower Middle Income","World Bank Upper Middle Income","World Bank High Income","World Bank Low Income","Western Pacific Region (WHO)","East Asia & Pacific (WB)","Middle East & North Africa (WB)","North America (WB)","Region of the Americas (WHO)","South Asia (WB)","South-East Asia Region (WHO)","Sub-Saharan Africa (WB)","Eastern Mediterranean Region (WHO)","Latin America & Caribbean (WB)"]
data4_2=data4_2.loc[data4_2['Country'].isin(rows_drop)]

In [None]:
data4_2=data4_2[(data4_2['Year'] == 2019)]

In [None]:
fig = px.bar(data4_2, x="Country", y="Age: 5-14 years", color='Country', title='Deaths in age 5-14 due to Lower Respiratory Diseases- Region Wise')
fig.show()

-  World Bank Lower Middle Income countries like  those with a GNI per capita between \\$1,086 and \\$4,255; (South Asian countries etc.) had the highest number of deaths due to lower respiratory infections in the age group of 5-14 years in year 2019 with 24.981k deaths
- The difference in deaths between World Bank Lower Middle Income countries and other regions is significant with a difference of 7k with second highest region Sub-saharan Africa

In [None]:
data4_country=data4[(data4['Year'] == 2019)]

In [None]:
fig = px.bar(data4_country, x="Country", y="Age: 5-14 years", title='Deaths in age 5-14 due to Lower Respiratory Diseases- Country wise')
fig.show()

In [None]:
8.946/24.98

**observations**:
- Deaths due to Respiratory infections is highest in India but also the difference between India and Pakistan is significantly high (5746 deaths). This may be due to difference in population of children aged between 5-14 years in the two countries.
- India forms 35.81% of deaths in world bank lower middle income countries in 2019.

## Lower Respiratory Diseases Risk Factors

In [None]:
data5=pd.read_csv("/kaggle/input/child-and-infant-mortality/pneumonia-risk-factors.csv")
data5.columns=["Country","Code","Year","Risk: Child stunting", "Risk: Child wasting","Risk: Low birth weight","Risk: No access to handwashing facility","Risk: Secondhand smoke","Risk: Child underweight", "Risk: Household air pollution from solid fuels","Risk: Non-exclusive breastfeeding","Risk: Short gestation"]
rows_drop=["World","Western Pacific Region","African Region (WHO)","OECD Countries","East Asia and Pacific (WB)","Eastern Mediterranean region (WHO)","Europe & Central Asia (WB)", "European Region (WHO)","G20","Latin America & Caribbean","World Bank Lower Middle Income","World Bank Upper Middle Income","World Bank High Income","World Bank Low Income","Western Pacific Region (WHO)","East Asia & Pacific (WB)","Middle East & North Africa (WB)","North America (WB)","Region of the Americas (WHO)","South Asia (WB)","South-East Asia Region (WHO)","Sub-Saharan Africa (WB)","Eastern Mediterranean Region (WHO)","Latin America & Caribbean (WB)"]
for i in rows_drop:
    data5.drop(data5[(data5.Country==i)].index,inplace=True)
data5.head()

In [None]:
#data4.Year.unique()

In [None]:
data5_2019=data5[(data5['Year'] == 2019)]

In [None]:
columns=["Risk: Child stunting", "Risk: Child wasting","Risk: Low birth weight","Risk: No access to handwashing facility","Risk: Secondhand smoke","Risk: Child underweight", "Risk: Household air pollution from solid fuels","Risk: Non-exclusive breastfeeding","Risk: Short gestation"]
l={}
for i in columns:
    l.update({i:data5_2019[i].sum()})

In [None]:
l

In [None]:
# Get the Keys and store them in a list
labels = list(l.keys())

# Get the Values and store them in a list
values = list(l.values())

In [None]:
import matplotlib.pyplot as plt

plt.pie(values, labels=labels)
plt.title("Risk Factors of Lower Respiratory Diseases")
plt.show()

**Observations:**
- Main factor for Lower respiratory diseases in 2019 is Child Wasting. Wasting is often referred to as acute malnutrition. It is a sign that a child has experienced short periods of undernutrition, resulting in significant wastage of muscle and fat tissue. This means their weight is very low for their height.
- Child low birth weight follows child wasting as the second main factor in 2019.

## Percentage of children taken care in healthcare in 2014

In [None]:
data6=pd.read_csv("/kaggle/input/child-and-infant-mortality/pneumonia-careseeking.csv")
data6.head()

In [None]:
data6_2014=data6[(data6['Year'] == 2014)]

In [None]:
#fig = px.line(data6, x="Year", y="Percentage of children under 5 with symptoms of pneumonia taken for care to a health provider", color='Entity', title='Percentage of children under 5 with symptoms of pneumonia taken for care to a health provider')
#fig.show()

In [None]:
data6_2014.sort_values(by="Percentage of children under 5 with symptoms of pneumonia taken for care to a health provider")

In [None]:
fig = px.bar(data6_2014, x='Code', y='Percentage of children under 5 with symptoms of pneumonia taken for care to a health provider')
fig.show()

**Observations:**
- Cuba has the highest percentage of children under healthcare followed by guyana.
- India with highest number of deaths has 76% of children under healthcare. So we can say India is definitely looking into the issue but with increasing population, the country should try to increase the number in the coming years.