# Total population supplied by water supply industry

http://unstats.un.org/

The Environment Statistics Database contains selected water and waste statistics by country. Statistics on water and waste are based on official statistics supplied by national statistical offices and/or ministries of environment (or equivalent institutions) in countries in response to the biennial UNSD/UNEP Questionnaire on Environment Statistics. They were complemented by data on EU and OECD member and partner countries from Eurostat and OECD. With the following two exceptions, every country’s data are sourced from UNSD.
All data are sourced from Eurostat for the following 32 countries: Austria, Belgium, Bulgaria, Croatia, Cyprus, Czechia, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands, Norway, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey, and the United Kingdom of Great Britain and Northern Ireland.
All data are sourced from OECD for the following nine countries: Australia, Canada, Chile, Israel, Japan, Mexico, New Zealand, Republic of Korea, and the United States of America.
Choice of preferred data source between Eurostat and OECD is made in an attempt to have as much data availability as possible for the user and to capture as much data as possible for all UN member states.

Data are often sparse for some variables. The statistics selected here are those of relatively good quality and geographic coverage. The online database currently covers the years 1990 to 2015. For information on definitions, data quality and other important metadata, please check UNSD Environmental Indicator tables.

# Import Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings

# Options and Settings

In [2]:
%matplotlib inline
plt.style.use('ggplot')
plt.rcParams['figure.autolayout'] = True
plt.rcParams['font.size'] = 12
path = Path.cwd()                                          # get current working directory
warnings.simplefilter('ignore')

# Import Data

In [3]:
df = pd.read_csv('Total population supplied by water supply industry.csv')
df

Unnamed: 0,Country or Area,Year,Value,Value Footnotes,Unit
0,Albania,2015,82.0,,%
1,Albania,2014,81.0,,%
2,Algeria,2015,98.0,,%
3,Algeria,2014,98.0,,%
4,Algeria,2013,98.0,,%
...,...,...,...,...,...
896,16,This is data for all 10 districts and it data ...,,,
897,17,Population of Trinidad & tobago taken at 1.3 mil.,,,
898,18,The level of provision of apartments (houses) ...,,,
899,19,Data refer to percentage of households supplie...,,,


# Head and Tail

In [4]:
df = df[:880]                                   # select index position 0-880
df

Unnamed: 0,Country or Area,Year,Value,Value Footnotes,Unit
0,Albania,2015,82.0,,%
1,Albania,2014,81.0,,%
2,Algeria,2015,98.0,,%
3,Algeria,2014,98.0,,%
4,Algeria,2013,98.0,,%
...,...,...,...,...,...
875,Yemen,2003,15.9,20,%
876,Yemen,2002,14.8,20,%
877,Yemen,2001,14.4,20,%
878,Yemen,2000,14.3,20,%


In [5]:
df1 = df[['Country or Area', 'Year', 'Value', 'Unit']]

In [6]:
df1['Value'] = df1['Value'].apply(lambda x: np.round(x, 2))
df1['Value']

0      82.0
1      81.0
2      98.0
3      98.0
4      98.0
       ... 
875    15.9
876    14.8
877    14.4
878    14.3
879    14.0
Name: Value, Length: 880, dtype: float64

In [7]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 880 entries, 0 to 879
Data columns (total 4 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Country or Area  880 non-null    object 
 1   Year             880 non-null    object 
 2   Value            880 non-null    float64
 3   Unit             880 non-null    object 
dtypes: float64(1), object(3)
memory usage: 27.6+ KB


In [8]:
df1.columns

Index(['Country or Area', 'Year', 'Value', 'Unit'], dtype='object')

In [9]:
df1.rename(columns={'Country or Area': 'country_or_area' , 'Value': 'pct_sup'}, inplace=True) 

df1.columns = [col.lower() for col in df1.columns]
df1.head() 

Unnamed: 0,country_or_area,year,pct_sup,unit
0,Albania,2015,82.0,%
1,Albania,2014,81.0,%
2,Algeria,2015,98.0,%
3,Algeria,2014,98.0,%
4,Algeria,2013,98.0,%


In [10]:
df1.describe(include='object')

Unnamed: 0,country_or_area,year,unit
count,880,880,880
unique,93,24,1
top,Singapore,2012,%
freq,22,60,880


In [11]:
df1.describe(include='number')

Unnamed: 0,pct_sup
count,880.0
mean,83.426955
std,19.90249
min,14.0
25%,74.7975
50%,90.0
75%,99.4
max,100.0


# Descriptive Statistics

In [12]:
min_pop = df1['pct_sup'].min()
max_pop = df1['pct_sup'].max()

# total electricity for each country or area and years
print('Minimum and maximum population supplied by water supply industry are {:,} and {:,} respectively'.format(min_pop, max_pop))

Minimum and maximum population supplied by water supply industry are 14.0 and 100.0 respectively


# Dataframe Grouping

In [13]:
cnt_area_grp = df1.groupby(['country_or_area'])
cnt_area_grp.head()

Unnamed: 0,country_or_area,year,pct_sup,unit
0,Albania,2015,82.0,%
1,Albania,2014,81.0,%
2,Algeria,2015,98.0,%
3,Algeria,2014,98.0,%
4,Algeria,2013,98.0,%
...,...,...,...,...
865,Yemen,2013,18.6,%
866,Yemen,2012,18.5,%
867,Yemen,2011,18.6,%
868,Yemen,2010,18.6,%


https://www.unwater.org/publication_categories/world-water-development-report/
https://www.wri.org/insights/17-countries-home-one-quarter-worlds-population-face-extremely-high-water-stress

# National Water Stress Rankings 

Extremely High Baseline Water Stress

cols = ['Lebanon', 'Jordan', 'Kuwait', 'Saudi Arabia', 'Bahrain', 'Botswana']

In [14]:
cnt_area_grp.get_group('Saudi Arabia')

Unnamed: 0,country_or_area,year,pct_sup,unit
661,Saudi Arabia,2013,98.0,%
662,Saudi Arabia,2012,98.0,%
663,Saudi Arabia,2011,97.0,%
664,Saudi Arabia,2010,96.0,%
665,Saudi Arabia,2009,96.0,%
666,Saudi Arabia,2008,96.0,%
667,Saudi Arabia,2007,96.0,%
668,Saudi Arabia,2006,96.0,%
669,Saudi Arabia,2005,96.0,%
670,Saudi Arabia,2004,96.0,%
