# Chennai's quest to quench its thirst

### Chennai & its water sources

Chennai also known as Madras, is the capital of the Indian state of Tamil Nadu. Located on the Coromandel Coast off the Bay of Bengal, it is the biggest cultural, economic and educational centre of south India. Population of Chennai is close to 9 million and is the 36th largest urban area by population in the world

Chennai is entirely dependent on ground water resources to meet its water needs. Ground water resources in Chennai are replenished by rain water and the city's average rainfall is 1,276 mm1.

Following are the major sources of water supply for Chennai city.

Four major reservoirs in Red Hills, Cholavaram, Poondi and Chembarambakkam
Following are the major sources of water supply for Chennai city.

1. Four major reservoirs in Red Hills, Cholavaram, Poondi and Chembarambakkam
2. Cauvery water from Veeranam lake
3. Desalination plants at Nemelli and Minjur
4. Aquifers in Neyveli, Minjur and Panchetty
5. Tamaraipakkam, Poondi and Minjur Agriculture wells
6. CMWSSB Boreweels
7. Retteri lake

The above one is also roughly the descending order in which the contribution is made to overall fresh water requirements of the city. In addition to this, people make use of borewells and private tankers for their water needs.

Chennai is facing an acute water shortage due to shortage of rainfall for the past three years (and we had one of the worst floods in history the year before that!). As a result, the water in these resources are depleting along with the groundwater level. This [video](https://www.youtube.com/watch?v=iaG7kRcSxwA&feature=youtu.be) will give an idea about the current state.

### Content
This dataset has details about the water availability in the four following main reservoirs over the last 15 years.
All the measurements are in mcft (million cubic feet)

 1. Poondi
 2. Cholavaram
 3. Redhills
 4. Chembarambakkam


In this notebook, let us explore the data of different water resources available.

**Source**: To find dataset, please click [here](https://www.kaggle.com/sudalairajkumar/chennai-water-management)

### To visualize plots in this notebook please click [here]()

## Import libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
color = sns.color_palette()
%matplotlib inline

import plotly.offline as py
import plotly_express as px
from plotly.subplots import make_subplots
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go

## Read the data

Firstly, we have data about the water availability in four major reservoirs that supply water to Chennai. This data spans from 2004 to 2019. All the measurements are in mcft (million cubic feet). Let us look at the top few lines.

In [2]:
df = pd.read_csv('../data/chennai_reservoir_levels.csv', parse_dates=['Date'], dayfirst=True)
df.head()

Unnamed: 0,Date,POONDI,CHOLAVARAM,REDHILLS,CHEMBARAMBAKKAM
0,2004-01-01,3.9,0.0,268.0,0.0
1,2004-01-02,3.9,0.0,268.0,0.0
2,2004-01-03,3.9,0.0,267.0,0.0
3,2004-01-04,3.9,0.0,267.0,0.0
4,2004-01-05,3.8,0.0,267.0,0.0


In [3]:
df.dtypes

Date               datetime64[ns]
POONDI                    float64
CHOLAVARAM                float64
REDHILLS                  float64
CHEMBARAMBAKKAM           float64
dtype: object

In [4]:
df.isnull().sum()

Date               0
POONDI             0
CHOLAVARAM         0
REDHILLS           0
CHEMBARAMBAKKAM    0
dtype: int64

## Find out and compare the water levels of the 4 major resoviours over a period of time



In [5]:
fig = make_subplots(rows=2, cols=2, 
                    subplot_titles=['Poondi Reserviour (in mcft)'])

fig.add_trace(go.Scatter(x=df.Date, y=df.POONDI, name='Poondi'), row=1, col=1)
fig.add_trace(go.Scatter(x=df.Date, y=df.REDHILLS, name='Redhills'), row=1, col=2)
fig.add_trace(go.Scatter(x=df.Date, y=df.CHEMBARAMBAKKAM, name='Chembarambakkam'), row=2, col=1)
fig.add_trace(go.Scatter(x=df.Date, y=df.CHOLAVARAM, name='Cholavaram'), row=2, col=2)

fig.update_layout(title_text=f"Water availability of Chennai's four major water resorviour ({df.Date.dt.year.min()} - {df.Date.dt.year.max()})")
fig.show()

Plotly express can be easy to use here but for that to use subplots we need to convert dataframe to tidy form where each rows represet one observation.

In [6]:
df_tidy = df.melt(id_vars='Date', var_name='Reservoir', value_name='WaterLevel')
df_tidy.head()

Unnamed: 0,Date,Reservoir,WaterLevel
0,2004-01-01,POONDI,3.9
1,2004-01-02,POONDI,3.9
2,2004-01-03,POONDI,3.9
3,2004-01-04,POONDI,3.9
4,2004-01-05,POONDI,3.8


In [7]:
fig = px.line(df_tidy,
       x='Date',
       y='WaterLevel',
       facet_col='Reservoir',
       facet_col_wrap=1,
       color='Reservoir',
       height=1100, width=800,
       title=f"Water availability of Chennai's four major water reservoir ({df.Date.dt.year.min()} - {df.Date.dt.year.max()})"
       )

fig.update_yaxes(matches=None)
fig.show()

**Inference:**

* We could clearly see that evey year there is a decremental phase and a replenishment phase (mainly during october to december)
* There was a very bad water scarcity phase seen during 2004.
* We can also see a bad phase during 2014-15 but there was to water availability in two reservoirs (Redhills and Chembarambakkam) and so it was a savior.
* Now coming to recent times, the data shows that there is no water availability in any of the four major reservoirs.

## Combine the major water reservoirs to get a better picture and note down your observations

In [8]:
df['Total'] = df.drop(columns='Date').sum(axis=1)
df.head()

Unnamed: 0,Date,POONDI,CHOLAVARAM,REDHILLS,CHEMBARAMBAKKAM,Total
0,2004-01-01,3.9,0.0,268.0,0.0,271.9
1,2004-01-02,3.9,0.0,268.0,0.0,271.9
2,2004-01-03,3.9,0.0,267.0,0.0,270.9
3,2004-01-04,3.9,0.0,267.0,0.0,270.9
4,2004-01-05,3.8,0.0,267.0,0.0,270.8


In [9]:
px.line(df,
       x='Date',
       y='Total',
       title='Total water availability from all four reservoirs (in mcft)'
       )

**Inference:**

* We could clearly see that trend of the water availability increased a bit from 2004 to 2012, but after that trend goes down.
* We can clearly see that 2013-2015 have the lease water availability after 2004.

## Rainfall Levels in Reservoir Regions

Now there are two clear facts:

There is no water in any of the major reservoirs. Water in reservoirs depend on rain for their replenishment.

### Next we can look at the rainfall data in these reservoir regions to analyze the rainfall months. Let us take the total monthly rainfall in these reservoir regions and plot the same.

* Read the data

* Combine the rainfall data for major reservoirs

* Plot the rainfall data

* Note down your observation

**Note** - Hover over the graph to see the better results

In [10]:
rain_df = pd.read_csv('../data/chennai_reservoir_rainfall.csv', parse_dates=['Date'], dayfirst=True)
rain_df.head()

Unnamed: 0,Date,POONDI,CHOLAVARAM,REDHILLS,CHEMBARAMBAKKAM
0,2004-01-01,0.0,0.0,0.0,0.0
1,2004-01-02,0.0,0.0,0.0,0.0
2,2004-01-03,0.0,0.0,0.0,0.0
3,2004-01-04,0.0,0.0,0.0,0.0
4,2004-01-05,0.0,0.0,0.0,0.0


In [11]:
rain_df.dtypes

Date               datetime64[ns]
POONDI                    float64
CHOLAVARAM                float64
REDHILLS                  float64
CHEMBARAMBAKKAM           float64
dtype: object

In [12]:
fig = px.line(rain_df.melt(id_vars='Date', var_name='Reservoir', value_name='Rainfall'),
       x='Date',
       y='Rainfall',
       facet_col='Reservoir',
       facet_col_wrap=2,
       color='Reservoir',
       title='Daily rainfall in Chennai - In all four reservoirs'
       )

fig.update_yaxes(matches=None)
fig.show()

In [13]:
rain_df['YearMonth'] = pd.to_datetime(rain_df.Date.dt.year.astype(str) + rain_df.Date.dt.month.astype(str), format='%Y%m')
rain_df.head()

Unnamed: 0,Date,POONDI,CHOLAVARAM,REDHILLS,CHEMBARAMBAKKAM,YearMonth
0,2004-01-01,0.0,0.0,0.0,0.0,2004-01-01
1,2004-01-02,0.0,0.0,0.0,0.0,2004-01-01
2,2004-01-03,0.0,0.0,0.0,0.0,2004-01-01
3,2004-01-04,0.0,0.0,0.0,0.0,2004-01-01
4,2004-01-05,0.0,0.0,0.0,0.0,2004-01-01


In [14]:
rain_df['Total'] = rain_df.drop(columns=['Date', 'YearMonth']).sum(axis=1)
rain_df.head()

Unnamed: 0,Date,POONDI,CHOLAVARAM,REDHILLS,CHEMBARAMBAKKAM,YearMonth,Total
0,2004-01-01,0.0,0.0,0.0,0.0,2004-01-01,0.0
1,2004-01-02,0.0,0.0,0.0,0.0,2004-01-01,0.0
2,2004-01-03,0.0,0.0,0.0,0.0,2004-01-01,0.0
3,2004-01-04,0.0,0.0,0.0,0.0,2004-01-01,0.0
4,2004-01-05,0.0,0.0,0.0,0.0,2004-01-01,0.0


In [15]:
rain_df_monthly = rain_df.groupby('YearMonth')['Total'].sum().reset_index()
rain_df_monthly.head()

Unnamed: 0,YearMonth,Total
0,2004-01-01,111.0
1,2004-02-01,0.0
2,2004-03-01,0.0
3,2004-04-01,26.0
4,2004-05-01,906.0


In [16]:
def get_season(dt: pd.Timestamp) -> str:
    """
    Helper function to create a seasons based on months.
    
    Args:
        dt (pd.Timestamp): A timestamp
    
    Returns:
        season (str): A name of the season
    """
    if 1<=dt.month<=2:
        season = 'Winter'
    elif 3<=dt.month<=5:
        season = 'Summer'
    elif 6<=dt.month<=9:
        season = 'Monsoon'
    else:
        season = 'Post-Monsoon'
    
    return season

In [18]:
rain_df_monthly['Season'] = rain_df_monthly['YearMonth'].apply(lambda x: get_season(x))
rain_df_monthly.head()

Unnamed: 0,YearMonth,Total,Season
0,2004-01-01,111.0,Winter
1,2004-02-01,0.0,Winter
2,2004-03-01,0.0,Summer
3,2004-04-01,26.0,Summer
4,2004-05-01,906.0,Summer


In [19]:
px.bar(rain_df_monthly,
      x='YearMonth',
      y='Total',
      color = 'Season',
      title='Monthly rainfall in all four reservoir regions (per season) - in mm'
      )

**Inferences:**

* Looks like the city gets some rains in the month of June, July, August and September due to south west monsoon.
* Major rainfall happens during October and November of every year which is due to North-east monsoon.
* During the initial years rain from north-east monsoon is much higher than south-west monsoon. But seems like last few years, they both are similar (reduction in rains from north-east monsoon).
* In 2016, it seems the city got highest rainfall in the monsoon season which could be a flood.
* We have got some good rains in August and September 2019, but the water reservoir levels are yet to go up.

## Plot the yearly rainfall data and note your observations

In [20]:
rain_df['Year'] = pd.to_datetime(rain_df.Date.dt.year.astype(str), format='%Y')
rain_df.head()

Unnamed: 0,Date,POONDI,CHOLAVARAM,REDHILLS,CHEMBARAMBAKKAM,YearMonth,Total,Year
0,2004-01-01,0.0,0.0,0.0,0.0,2004-01-01,0.0,2004-01-01
1,2004-01-02,0.0,0.0,0.0,0.0,2004-01-01,0.0,2004-01-01
2,2004-01-03,0.0,0.0,0.0,0.0,2004-01-01,0.0,2004-01-01
3,2004-01-04,0.0,0.0,0.0,0.0,2004-01-01,0.0,2004-01-01
4,2004-01-05,0.0,0.0,0.0,0.0,2004-01-01,0.0,2004-01-01


In [21]:
px.bar(rain_df.groupby('Year')['Total'].sum().reset_index(),
       x='Year',
       y='Total',
       title='Total yearly rainfall in all four reservoir regions - in mm'
      )

**Inferences:**

* The amount of rainfall in 2018 is the lowest of all the years from 2004.
* The highest ranfall years are 2005 and 2015.
* We are getting some good rains so far in 2019. Hopefully this continues.

## Water shortage estimation

Since all the data is available in the public domain, we want to do some analysis and see whether we can estimate this water shortage ahead of time so as to plan for it?

First let us just take a simple step to compare the sum of water levels at the beginning of summer (Let us take February 1st of every year). This is because there will not be any replenishment till the next monsoon and the amount of water stored in the four reservoirs itself will be clear indicator of how long can the water be managed during summer and whether there should be some backup plans.

In [22]:
px.bar(df.query('Date.dt.month == 2 and Date.dt.day == 1'),
       x='Date',
       y='Total',
       title='Availability of total reservoir water (4 major ones) at the beginning of summer'
      )

**Inferences:**

* This clearly indicates that there is not enough water in the reservoirs at the beginning of summer 2019 to cope up with the needs of the city. Infact this is the second worst level after 2004 (Also it is important to note that city has grown a lot bigger from 2004 to 2019).

* The city has just had 1000 mcft of water at the beginning of the summer which is much worser than the 2017 levels of 1500 mcft. So just by looking at the very low water level, the water scarcity could have been forecasted without even computing the consumption level per day.

## Conclusion :

* The water scarcity of 2004 has brought Veeranam lake as the new means of water supply for the city.

* Hopefully, this current scarcity (July 2019) will bring more additional sources of water for the ailing city. The city has grown a lot in the last 15 years and so need additional water resources to manage the needs.

* The city needs to devise better scarcity control methods by estimating the needs ahead of time.

## Extra Activity

#### Can you think of a similar urban large scale problem having a real time affect that you would like to analyze and solve with the help data? Note them, break down the possible ways and steps to solve the same

* Delhi air pollution 
    - High in winter or monsoon?
    - Impact of agricultural activities? 
    - Compare with other cities like Beijing, Singapore, etc.?
    
* US accidents dataset
    - https://www.kaggle.com/sobhanmoosavi/us-accidents

* COVID-19 dataset
    - https://www.kaggle.com/sudalairajkumar/covid19-in-india?select=StatewiseTestingDetails.csv
