# Quality of Life - Exploratory Analysis


Exploratory Data Analysis

Kaggle Dataset Link: https://www.kaggle.com/valchovalev/qualityoflife2021dataset

## Dependencies or Libraries

- Importing useful libraries like numpy, pandas, matplotlib and seaborn
- `% matplotlib` is a magic function that renders the figure in a notebook (instead of displaying a dump of the figure object).

In [None]:
#####################CODE TO IMPORT CSV INTO KAGGLE NOTEBOOK######################
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
####################CODE TO IMPORT CSV INTO KAGGLE NOTEBOOK########################

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

In [None]:
# importing the dataset into pandas dataframe(df) to understand its contents

df = pd.read_csv('/kaggle/input/qualityoflife2021dataset/QualityOfLife.csv')
df

In [None]:
df.shape

In [None]:
# Checking for null values in the columns

df.isna().sum()

In [None]:
# Checking for data types

df.dtypes

In [None]:
# Cardinality / distinct count for all columns in pandas dataframe

df.apply(pd.Series.nunique)

In [None]:
df['Purchasing Power Index'].max()

## Converting to Percentages

* To carry out further analysis, it is best if we reduce all the indices to percentages.
* The highest number among the respective columns are treated as 100% and the rest are adjusted accordingly.

In [None]:
# Sort Values by Quality of Life Index
df2 = df.sort_values(by = 'Quality of Life Index', ascending = True, inplace = False)

df2['Quality of Life Index'] = (df2['Quality of Life Index'] * 100) / (df2['Quality of Life Index'].max())
df2['Purchasing Power Index'] = (df2['Purchasing Power Index'] * 100) / (df2['Purchasing Power Index'].max())
df2['Safety Index'] = (df2['Safety Index'] * 100) / (df2['Safety Index'].max())
df2['Health Care Index'] = (df2['Health Care Index'] * 100) / (df2['Health Care Index'].max())
df2['Cost of Living Index'] = (df2['Cost of Living Index'] * 100) / (df2['Cost of Living Index'].max())
df2['Property Price to Income Ratio'] = (df2['Property Price to Income Ratio'] * 100) / (df2['Property Price to Income Ratio'].max())
df2['Traffic Commute Time Index'] = (df2['Traffic Commute Time Index'] * 100) / (df2['Traffic Commute Time Index'].max())
df2['Pollution Index'] = (df2['Pollution Index'] * 100) / (df2['Pollution Index'].max())
df2['Climate Index'] = (df2['Climate Index'] * 100) / (df2['Climate Index'].max())
df2

In [None]:
plt.figure(figsize = (10, 20))
sns.barplot(x = 'Quality of Life Index', y = 'Country', data = df2, orient = 'h')
plt.xlabel("Quality of Life Index (%)")
plt.ylabel("Country");

It seems that the Switzerland has the Highest Quality of Life and Nigeria being the lowest.

In [None]:
filtered_results = df2[(df2['Country']) == 'Switzerland']
filtered_results1 = filtered_results.T
filtered_results2 = filtered_results1.reset_index(drop=False, inplace = False)
filtered_results3 = filtered_results2.iloc[2:]
filtered_results3.columns = ['Indices', 'Percentage']
plt.figure(figsize = (10, 5))
sns.barplot(x = 'Percentage', y = 'Indices', data = filtered_results3, orient = 'h')
plt.xlabel("Percentage")
plt.ylabel("Switzerland");

## Function to plot bar graph of a specified country

In [None]:
def plotstats(country_name):
    filtered_results = df2[(df2['Country']) == country_name]
    filtered_results1 = filtered_results.T
    filtered_results2 = filtered_results1.reset_index(drop=False, inplace = False)
    filtered_results3 = filtered_results2.iloc[2:]
    filtered_results3.columns = ['Indices', 'Percentage']
    plt.figure(figsize = (10, 5))
    sns.barplot(x = 'Percentage', y = 'Indices', data = filtered_results3, orient = 'h')
    plt.xlabel('Indices in Percentage')
    plt.ylabel(f'{country_name} Indices');
    
plotstats('Switzerland')

In [None]:
plotstats('Nigeria')

In [None]:
plotstats('India')

**Note:** It is also interesting to observe that the under-developed countries have higher pollution indices, lesser health care index, lower safety index in general.

## Co-Relation between the variables

In [None]:
plt.figure(figsize = (10, 10))
sns.heatmap(df2.corr(method='pearson'), annot=True, cmap='coolwarm');

## Future scope 

I believe the future scope here would be to build a machine learning model, that accurately predicts the Quality of life index of countries where the quality of life is not measured, but other factors have been measured.