# Chennai's quest to quench its thirst

### Chennai & its water sources

Chennai also known as Madras, is the capital of the Indian state of Tamil Nadu. Located on the Coromandel Coast off the Bay of Bengal, it is the biggest cultural, economic and educational centre of south India. Population of Chennai is close to 9 million and is the 36th largest urban area by population in the world

Chennai is entirely dependent on ground water resources to meet its water needs. Ground water resources in Chennai are replenished by rain water and the city's average rainfall is 1,276 mm1.

Following are the major sources of water supply for Chennai city.

Four major reservoirs in Red Hills, Cholavaram, Poondi and Chembarambakkam
Following are the major sources of water supply for Chennai city.

1. Four major reservoirs in Red Hills, Cholavaram, Poondi and Chembarambakkam
2. Cauvery water from Veeranam lake
3. Desalination plants at Nemelli and Minjur
4. Aquifers in Neyveli, Minjur and Panchetty
5. Tamaraipakkam, Poondi and Minjur Agriculture wells
6. CMWSSB Boreweels
7. Retteri lake

The above one is also roughly the descending order in which the contribution is made to overall fresh water requirements of the city. In addition to this, people make use of borewells and private tankers for their water needs.

Chennai is facing an acute water shortage due to shortage of rainfall for the past three years (and we had one of the worst floods in history the year before that!). As a result, the water in these resources are depleting along with the groundwater level. This [video](https://www.youtube.com/watch?v=iaG7kRcSxwA&feature=youtu.be) will give an idea about the current state.

### Content
This dataset has details about the water availability in the four main reservoirs over the last 15 years.
All the measurements are in mcft (million cubic feet)

Poondi
Cholavaram
Redhills
Chembarambakkam


In this notebook, let us explore the data of different water resources available.

In [None]:
import glob
csv_list = glob.glob('/kaggle/input/**/*')
csv_list

In [None]:
!ls /kaggle/input/**/*

## Import libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px

## Read the data

Firstly, we have data about the water availability in four major reservoirs that supply water to Chennai. This data spans from 2004 to 2019. All the measurements are in mcft (million cubic feet). Let us look at the top few lines.

In [None]:
df = pd.read_csv(csv_list[0], 
                 parse_dates=['Date'], 
                 dayfirst=True
                )
#df['Date'] = pd.to_datetime(df.Date, format='%d-%m-%Y')
df.head()

In [None]:
df.dtypes

In [None]:
df.isna().sum()

## Find out and compare the water levels of the 4 major resoviours over a period of time



In [None]:
px.line(df.melt(id_vars='Date',var_name='Resoviour', value_name='Water_Level'),
       x='Date',
       y='Water_Level',
       color='Resoviour',
       facet_col='Resoviour',
       facet_col_wrap=2,
       title=f'water availability in Chennai four major resoviours {df.Date.dt.year.min()}-{df.Date.dt.year.max()}')

## Insights
- seasonal pattern of increase and decrease of water availability
- low water availability in 2004, 2015 time periods
- interestingly high availablity during Sep-Dec 2020, possibly due to lockdown and low industrial use?


## Combine the major water reservoirs to get a better picture and note down your observations

In [None]:
df.POONDI + df.CHEMBARAMBAKKAM + df.CHOLAVARAM + df.REDHILLS

In [None]:
df['total'] = df.drop(columns='Date').sum(axis=1)
df.head()

In [None]:
px.line(df,
       x='Date',
       y='total',
       title=f'water availability in Chennai {df.Date.dt.year.min()}-{df.Date.dt.year.max()}')

## Rainfall Levels in Reservoir Regions

Now there are two clear facts:

- There is no water in any of the major reservoirs
- Reservoirs depend on rain for their replenishment.

### Next we can look at the rainfall data in these reservoir regions to analyze the rainfall months. Let us take the total monthly rainfall in these reservoir regions and plot the same.

* Read the data

* Combine the rainfall data for major reservoirs

* Plot the rainfall data

* Note down your observation

Note - Hover over the graph to see the better results

In [None]:
rain_df = pd.read_csv(csv_list[1],parse_dates=['Date'], dayfirst=True)
rain_df.head()

In [None]:
rain_df.dtypes

In [None]:
rain_df.isna().sum()

## daily rainfall

In [None]:
px.line(rain_df.melt(id_vars='Date',var_name='Resoviour', value_name='Rainfall'),
       x='Date',
       y='Rainfall',
       color='Resoviour',
       facet_col='Resoviour',
       facet_col_wrap=2,
       title=f'Daily rainfall in Chennai four major resoviours {df.Date.dt.year.min()}-{df.Date.dt.year.max()}')

## monthly rainfall

In [None]:
## get daily total 
rain_df['total'] = rain_df.drop(columns='Date').sum(axis=1)

rain_df.head()

In [None]:
## group by month and sum it

rain_df['YearMonth'] = pd.to_datetime(rain_df.Date.dt.year.astype(str) + rain_df.Date.dt.month.astype(str), format='%Y%m')
rain_df.head()

In [None]:
rain_df.YearMonth.value_counts()

In [None]:
monthly = rain_df.groupby([rain_df.Date.dt.year,rain_df.Date.dt.month]).total.sum()
monthly.index = monthly.index.set_names('Month',level=1).set_names('Year',level=0)
monthly.reset_index()

In [None]:
rain_df.Date.dt.isocalendar().week

In [None]:
weekly = rain_df.groupby([rain_df.Date.dt.year,rain_df.Date.dt.isocalendar().week]).total.sum()
weekly.index = weekly.index.set_names('Week',level=1).set_names('Year',level=0)
weekly.reset_index()

In [None]:
## todo use multiple cols in X-axis
px.bar(weekly.reset_index(),
       x=['Week'],
       y='total',
       facet_col='Year',
       facet_col_wrap=4,
       title=f'Weekly rainfall in Chennai {df.Date.dt.year.min()}-{df.Date.dt.year.max()}',
      height=1500)

In [None]:
monthly_df = rain_df.groupby('YearMonth').total.sum().reset_index()
monthly_df.head()

In [None]:
px.bar(monthly_df,
       x='YearMonth',
       y='total',
       title=f'Monthly rainfall in Chennai {df.Date.dt.year.min()}-{df.Date.dt.year.max()}')

In [None]:
month_to_season = {1: 'winter', 2: 'winter', 3: 'summer', 4: 'summer', 5: 'summer', 6: 'monsoon', 7: 'monsoon', 8: 'monsoon', 9: 'monsoon', 10: 'post-monsoon', 11: 'post-monsoon', 12: 'winter'}

In [None]:
monthly_df['season'] = monthly_df.YearMonth.dt.month.map(month_to_season)
monthly_df.head()

In [None]:
px.bar(monthly_df,
       x='YearMonth',
       y='total',
       color='season',
       title=f'Monthly rainfall in Chennai by seasons {df.Date.dt.year.min()}-{df.Date.dt.year.max()}')

## insight
- chennai receives most of its rainfall during post-monsoon season

## Plot the yearly rainfall data and note your observations

In [None]:
rain_df

In [None]:
px.bar(monthly_df.groupby(monthly_df.YearMonth.dt.year).total.sum().reset_index(),
       x='YearMonth',
       y='total',
       title='yearly rainfall in chennai')

In [None]:
monthly_df.groupby([monthly_df.YearMonth.dt.year, 'season']).total.sum().reset_index()

In [None]:
px.bar(monthly_df.groupby([monthly_df.YearMonth.dt.year,'season']).total.sum().reset_index(),
       x='YearMonth',
       y='total',
       color='season',
       title='Yearly rainfall by season in Chennai')

## insight
- chennai receives most of its rainfall during post-monsoon season

## Water shortage estimation

Since all the data is available in the public domain, we want to do some analysis and see whether we can estimate this water shortage ahead of time so as to plan for it?

First let us just take a simple step to compare the sum of water levels at the beginning of summer (Let us take March 1st of every year). This is because there will not be any replenishment till the next monsoon and the amount of water stored in the four reservoirs itself will be clear indicator of how long can the water be managed during summer and whether there should be some backup plans.

In [None]:
avg_water_avail = df.loc[ (df.Date.dt.month==3) & (df.Date.dt.day==1) ].total.mean()
avg_water_avail

In [None]:
fig = px.bar(df.loc[ (df.Date.dt.month==3) & (df.Date.dt.day==1) ],
       x='Date',
       y='total',
       title='Availablility of water in total at the begining of summer',
)
fig.add_hline(y=avg_water_avail,
              annotation_text=f"Avg. water availability {avg_water_avail:0.2f}", 
              #annotation_position="bottom right",
              annotation_font_color="red",
              annotation_font_size=20,
              line_color="green",
              line_dash='dot')

## Activity

### Can you think of a similar urban large scale problem having a real time affect that you would like to analyze and solve with the help data? Note them, break down the possible ways and steps to solve the same