In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.

In [None]:
res_level = pd.read_csv("../input/chennai_reservoir_levels.csv")
res_rfall = pd.read_csv("../input/chennai_reservoir_rainfall.csv")

In [None]:
res_level.head()

In [None]:
res_rfall.head()

In [None]:
res_level.describe()

#### Assumption:

Water levels close to zero or absolutely zero means that the water level has gone down so much so that it does not get pass through even the gates of the water reservoir.

This gives us some background on the SI units used in measuring water levels and rain fall.

### Water levels : mcft to tmc unit conversion

1000 mcft = 1 tmc.

#### What are mcft and tmc?
mcft = Million Cubic Feet <br>
tmc = Thousand Million Cubic Feet

<b>tmc</b> is higher measurement compared to <b>mcft</b> in terms of SI units. Since, we have values as lower as 0.9 in reservoir level data set, its not wise to convert these values into tmc as it will lead to very very low values.

### mcft to litres unit conversion:

1 mcft = 28316846.59 litres

<b>litres</b> is lower compared to <b>mcft</b> in SI units measurement. We can convert mcft to litres during our data analysis if required.

### Rainfall : mm (millimeters)

It is the measure of precipitation in millimetres in height collected on each square meter during a certain period, equivalent to litres per square metre.

In [None]:
res_rfall.describe()

#### Assumption:

Rainfall close to zero or absolutely zero does not mean it did not rain at all or may be it poured but not recorded.

In [None]:
# Rename the columns in both the data sets.

res_level.columns = res_level.columns.str.lower() + "_level"
res_rfall.columns = res_rfall.columns.str.lower() + "_rfall"

In [None]:
# Join water level and rainfall data sets to form a single dataset.

res_master_df = res_level.merge(res_rfall, left_on = "date_level", right_on = "date_rfall")
res_master_df = res_master_df.drop(columns = ["date_rfall"], errors = "ignore")
res_master_df = res_master_df.rename(index = str, columns = {"date_level":"recorded_date"})
res_master_df.head()

In [None]:
# Convert object to date data type.

res_master_df["recorded_date"] = pd.to_datetime(res_master_df["recorded_date"], format = "%d-%m-%Y")

In [None]:
# Extract year and month from recorded_date field.

res_master_df["year"] = res_master_df["recorded_date"].dt.year
res_master_df["month"] = res_master_df["recorded_date"].dt.month

In [None]:
res_master_df.head()

Based on "month" value parsed from recorded_date field, divide year period into 4 basic seasons (as per Wikipedia) to analyse the water levels and rainfall across seasons per year:<br>
<t>a. <b>Winter</b> : Spans from December to February.<br>
    b. <b>Summer</b> : Spans from March to May.<br>
    c. <b>Monsoon</b> : Spans from Jun to September.<br>
    d. <b>Autumn</b> : Spans from October to November.<br>

In [None]:
res_master_df["season"] = np.select([res_master_df["month"].isin([12, 1, 2]), 
                                     res_master_df["month"].isin([3, 4, 5]),
                                     res_master_df["month"].isin([6, 7, 8, 9]),
                                     res_master_df["month"].isin([10, 11])], 
                                    ["winter", "summer", "monsoon", "autumn"])

#### Assumption:

For this problem, though it makes no sense to compute average water levels per day for every year season of an year for all the years, it helps in comparing water levels in different seasons across an year.

In [None]:
# Group the data by year and season on each of the water level columns and merge all these data frames 
# into a single data frame for further analysis.

df_1 = pd.DataFrame(res_master_df.groupby(["year", "season"])[["poondi_level", "poondi_rfall"]].agg(np.mean)).round(2)
df_2 = pd.DataFrame(res_master_df.groupby(["year", "season"])[["cholavaram_level", "cholavaram_rfall"]].agg(np.mean)).round(2)
df_3 = pd.DataFrame(res_master_df.groupby(["year", "season"])[["redhills_level", "redhills_rfall"]].agg(np.mean)).round(2)
df_4 = pd.DataFrame(res_master_df.groupby(["year", "season"])[["chembarambakkam_level", "chembarambakkam_rfall"]].agg(np.mean)).round(2)
df_1.reset_index(inplace = True)
df_2.reset_index(inplace = True)
df_3.reset_index(inplace = True)
df_4.reset_index(inplace = True)

df_1 = df_1.merge(df_2, left_on = ["year", "season"], right_on = ["year", "season"])
df_1 = df_1.merge(df_3, left_on = ["year", "season"], right_on = ["year", "season"])
df_1 = df_1.merge(df_4, left_on = ["year", "season"], right_on = ["year", "season"])

year_season_group_df = df_1

del (df_1, df_2, df_3, df_4)

### Analysis of Water levels for each reservoir

Let's analyse the water levels for the 4 reservoirs across seasons for each of the years. Here, we are going to compute and compare proportion of average water levels in a season in each year.

#### Function name : plot_water_level

A user defined function to plot stacked bar chart for a reservoir across given years data proportionating the average water levels in a year across different seasons in that year.

In [None]:
def plot_water_level (ip_df, ip_reservoir):
    
    fig = plt.figure(figsize = (12, 9))
    
    df = ip_df[['year', 'season', ip_reservoir + '_level']]
    df_1 = df.groupby('year')[ip_reservoir + '_level'].sum()
    df_1 = df_1.reset_index()
    df = df.merge(df_1, right_on = 'year', left_on = 'year')
    df = df.rename(index = str, columns = {ip_reservoir + '_level_x':'water_level', ip_reservoir + '_level_y':'total_water_level'})
    df['perc'] = round(df['water_level'] * 100 / df['total_water_level'], 2)
    df['water_level'] = df['perc'].copy()
    df = df.drop(columns = ['total_water_level', 'perc'], axis = 1)

    def_water_level_df = pd.DataFrame({'year':df["year"].unique().astype(str).tolist(), 
                                       'water_level':list(map(int, "0" * df["year"].nunique()))})

    water_level_df = pd.DataFrame({'year':df["year"].unique().astype(str).tolist(),
                                   'bar_bottom':list(map(int, "0" * df["year"].nunique()))
                                  })

    water_level_rec = pd.DataFrame({'season':[], 'year':[], 'water_level':[]})

    tmp_plot = plt.bar([], [], [])
    plot_legend = {"color":[], "plot":[]}

    for season in (['winter', 'summer', 'monsoon', 'autumn']):
        water_level_rec = df.loc[df["season"] == season, ['year', 'water_level']].copy()
        water_level_rec['year'] = water_level_rec['year'].astype(str)

        water_level_rec = def_water_level_df.merge(water_level_rec, 
                                                   how = 'left', 
                                                   right_on = 'year', 
                                                   left_on = 'year')

        water_level_rec = water_level_rec.drop(columns = 'water_level_x', axis = 1)
        water_level_rec = water_level_rec.rename(index = str, columns = {'water_level_y':'water_level'})
        water_level_rec['water_level'] = water_level_rec['water_level'].fillna(0)

        x = water_level_rec['year'].tolist()
        y = water_level_rec['water_level'].tolist()    
        bar_bottom = water_level_df['bar_bottom']

        tmp_plot = plt.bar(x, y, bottom = bar_bottom)

        for i in (water_level_df.itertuples(index = True, name = 'Pandas')):

            i_iter_year = getattr(i, 'year')

            for j in (water_level_rec.itertuples(index = True, name = 'Pandas')):
                j_iter_year = getattr(j, 'year')
                j_water_level = getattr(j, 'water_level')

                if (i_iter_year == j_iter_year):
                    water_level_df.loc[water_level_df['year'] == j_iter_year, 'bar_bottom'] += j_water_level
                    break

        for i in (np.arange(len(x))):

            if (y[i] == 0):
                continue

            y_coord = round(water_level_df.loc[i, 'bar_bottom'] - y[i] / 2)

            x_coord = x[i]

            plt.text(x_coord, y_coord, str(y[i]) + '%', ha = 'center', fontsize = 8, color = 'white')

        plot_legend["color"].append(season.title())
        plot_legend["plot"].append(tmp_plot[0])

    plt.xlabel("Year")
    plt.ylabel('Water level (Average per year per season - Proportionately)')
    fig.suptitle("Reservoir : " + ip_reservoir.title() + "\nAverage water level Year-wise Season-wise")
    plt.legend(plot_legend["plot"], plot_legend["color"])
    plt.show()

#### Analysis of Water level for "Poondi"

In [None]:
plot_water_level (year_season_group_df, 'poondi')

Observations on water levels in <b>Poondi</b> water reservoir during 2004-2019 period:
- The water levels are on the higher side during winter season across almost all the years except for 2012, 2014, 2018 and 2019 years.
- Surprisingly, higher water levels are observed in the reservoir in summer season in years 2012, 2018 and 2019 compared to winter season. In the remaining years, the water levels have clearly come down in summer season as expected to happen.
- Not every monsoon is followed by an increase in water level in autumn season as compared to water levels in the summer season.<br>
<i><b>For e.g.</b> Year 2008, 2009, 2012, 2013, 2014, 2016 and 2018 average water level per day in autumn season is lower compared to summer season when it should be the other way around.</i>
- Observation made above might have been due to the fact that the water evaporates a lot during summer season compared to other seasons around any year than water precipitated.
- Water level is pretty high in year 2019's winter season.
- We do not have autumn season's data for year 2019.

In [None]:
year_season_group_df.loc[year_season_group_df['year']==2019, ]

#### Analysis of Water level for "Cholavaram"

In [None]:
plot_water_level (year_season_group_df, 'cholavaram')

Observations on water levels in <b>Cholavaram</b> water reservoir during 2004-2019 period:
- Except for the years 2004, 2005 and 2017, the water levels are pretty high during winter season across a year in every other year.
- In the summer season we see a clear dip in water levels across all the years.
- Not every monsoon is followed by an increase in water level in autumn season as compared to water levels in the summer season.<br>
<i><b>For e.g.</b> 2006, 2008, 2009, 2011, 2016 and 2018 average water level per day in autumn season is lower compared to summer season when it should be the other way around.</i>
- Above might have been observed due to the fact that water evaporates a lot during summer season compared to other seasons around the year.
- Observation made above might have been due to the fact that the water evaporates a lot during summer season compared to other seasons around any year than water precipitated.
- Water level is pretty high in year 2019's winter season.
- We do not have autumn season's data for year 2019.

#### Analysis of Water level for "Redhills"

In [None]:
plot_water_level (year_season_group_df, 'redhills')

Observations on water levels in <b>Redhills</b> water reservoir during 2004-2019 period:
- Barring 2009, 2014, 2016 & 2018 years, we see high water levels across all years in winter season.
- We can clearly see a drop in water levels in almost all years except 2009, 2014, 2016 & 2018.
- Not every monsoon is followed by an increase in water level in autumn season as compared to water levels in the summer season.<br>
<i><b>For e.g.</b> 2006, 2008, 2009, 2011, 2016 and 2018 average water level per day in autumn season is lower compared to summer season when it should be the other way around.</i>
- Above might have been observed due to the fact that water evaporates a lot during summer season compared to other seasons around the year.
- Observation made above might have been due to the fact that the water evaporates a lot during summer season compared to other seasons around any year than water precipitated.
- Water level is pretty high in year 2019's winter season.
- We do not have autumn season's data for year 2019.

#### Analysis of Water level for "Chembarambakkam"

In [None]:
plot_water_level (year_season_group_df, 'chembarambakkam')

Observations on water levels in <b>Chembarambakkam</b> water reservoir during 2004-2019 period:
- We see high water levels across almost all years in winter season except for some years.
- Years 2009, 2013, 2014, 2016 & 2018 are exceptional because we see a rise in water levels in these years when compared to winter season in respective years. Otherwise, we see "as usual" a steep decrease in water level in summer season.
- Not every monsoon is followed by an increase in water level in autumn season as compared to water levels in the summer season.
- Above might have been observed due to the fact that water evaporates a lot during summer season compared to other seasons around the year.
- Observation made above might have been due to the fact that the water evaporates a lot during summer season compared to other seasons around any year than water precipitated.
- Water level is pretty high in year 2019's winter season; even compared to other 3 reservoirs.
- We do not have autumn season's data for year 2019.

### Analysis of Rainfall around for each reservoir

Let's analyse the rainfall around the 4 water reservoirs across seasons for each of the years. Here, we are going to compute and compare proportion of average rainfall across seasons in each year.

#### Function name : plot_rain_fall

A user defined function to plot stacked bar chart for a reservoir across given years data proportionating the average rainfall (around reservoir) in a year across different seasons in that year.

In [None]:
def plot_rain_fall (ip_df, ip_reservoir):
    
    fig = plt.figure(figsize = (12, 9))
    
    df = ip_df[['year', 'season', ip_reservoir + '_rfall']]
    df_1 = df.groupby('year')[ip_reservoir + '_rfall'].sum()
    df_1 = df_1.reset_index()
    df = df.merge(df_1, right_on = 'year', left_on = 'year')
    df = df.rename(index = str, columns = {ip_reservoir + '_rfall_x':'rain_fall', ip_reservoir + '_rfall_y':'total_rain_fall'})
    df['perc'] = round(df['rain_fall'] * 100 / df['total_rain_fall'], 2)
    df['rain_fall'] = df['perc'].copy()
    df = df.drop(columns = ['total_rain_fall', 'perc'], axis = 1)

    def_rain_fall_df = pd.DataFrame({'year':df["year"].unique().astype(str).tolist(), 
                                     'rain_fall':list(map(int, "0" * df["year"].nunique()))})

    rain_fall_df = pd.DataFrame({'year':df["year"].unique().astype(str).tolist(),
                                 'bar_bottom':list(map(int, "0" * df["year"].nunique()))
                                 })

    rain_fall_rec = pd.DataFrame({'season':[], 'year':[], 'rain_fall':[]})

    tmp_plot = plt.bar([], [], [])
    plot_legend = {"color":[], "plot":[]}

    for season in (['winter', 'summer', 'monsoon', 'autumn']):
        rain_fall_rec = df.loc[df["season"] == season, ['year', 'rain_fall']].copy()
        rain_fall_rec['year'] = rain_fall_rec['year'].astype(str)

        rain_fall_rec = def_rain_fall_df.merge(rain_fall_rec, 
                                               how = 'left', 
                                               right_on = 'year', 
                                               left_on = 'year')

        rain_fall_rec = rain_fall_rec.drop(columns = 'rain_fall_x', axis = 1)
        rain_fall_rec = rain_fall_rec.rename(index = str, columns = {'rain_fall_y':'rain_fall'})
        rain_fall_rec['rain_fall'] = rain_fall_rec['rain_fall'].fillna(0)

        x = rain_fall_rec['year'].tolist()
        y = rain_fall_rec['rain_fall'].tolist()    
        bar_bottom = rain_fall_df['bar_bottom']

        tmp_plot = plt.bar(x, y, bottom = bar_bottom)

        for i in (rain_fall_df.itertuples(index = True, name = 'Pandas')):

            i_iter_year = getattr(i, 'year')

            for j in (rain_fall_rec.itertuples(index = True, name = 'Pandas')):
                j_iter_year = getattr(j, 'year')
                j_rain_fall = getattr(j, 'rain_fall')

                if (i_iter_year == j_iter_year):
                    rain_fall_df.loc[rain_fall_df['year'] == j_iter_year, 'bar_bottom'] += j_rain_fall
                    break

        for i in (np.arange(len(x))):

            if (y[i] == 0):
                continue

            y_coord = round(rain_fall_df.loc[i, 'bar_bottom'] - y[i] / 2)

            x_coord = x[i]

            plt.text(x_coord, y_coord, str(y[i]) + '%', ha = 'center', fontsize = 8, color = 'white')

        plot_legend["color"].append(season.title())
        plot_legend["plot"].append(tmp_plot[0])

    plt.ylabel('Rain fall (Average per year per season - Proportionately)')
    fig.suptitle("Reservoir : " + ip_reservoir.title() + "\nAverage rain fall Year-wise Season-wise")
    plt.legend(plot_legend["plot"], plot_legend["color"])
    plt.show()

#### Analysis of rainfall for "Poondi"

In [None]:
plot_rain_fall (year_season_group_df, 'poondi')

Observations on average rainfall around <b>Poondi</b> water reservoir in the period 2004-2019 across different seasons of each of these years:<br>
- Years 2007, 2010 and 2016 have seen high rainfall in monsoon season in comparison to other years in the same season.
- In almost all years (except for few), an increase in amount of rainfall can be observed in autumn season.
- In winter season, we can see few years with good rainfall.<br>
<i><b>For e.g.</b> 2007, 2009-2012, 2015 and 2016</i>
- Summer season we see less rainfalls in the years except 2004, 2006, 2008 & 2016 years in which we see a considerable amount of rainfall.
- As we do not have rainfall data for any other season for year 2019, we see a thumping rise in rainfall for summer season which may not be true if we include the missing data in this analysis. 

#### Analysis of rainfall for "Cholavaram"

In [None]:
plot_rain_fall (year_season_group_df, 'cholavaram')

Observations on average rainfall around <b>Cholavaram</b> water reservoir in the period 2004-2019 across different seasons of each of these years:<br>
- All years from 2004-2018, a good amount of rainfall has been seen in monsoon season.
- In almost all years (except for few), we see an increase in the amount of rainfall in autumn season.
- In winter season, we see good rainfall for almost all years except few.<br>
<i><b>For e.g.</b> 2004, 2006, 2008, 2013 & 2017.</i>
- Summer season we see less amount of rain in almost all years except 2004, 2016 & 2019 in which we see a considerable amount of rainfall.
- As we do not have rainfall data for any other season for year 2019, we see a thumping rise in rainfall for summer season which may not be true if we include the missing data in this analysis. 

#### Analysis of rainfall for "Redhills"

In [None]:
plot_rain_fall (year_season_group_df, 'redhills')

Observations on average rainfall around <b>Redhills</b> water reservoir in the period 2004-2019 across different seasons of each of these years:<br>
- All years we have seen high rainfall close to 20% or above in monsoon season.
- In almost all years (except for few), we see an increase in the amount of rainfall in autumn season.
- In winter season, we can see good rainfall for almost all years except few.<br>
<i><b>For e.g.</b> 2004, 2006, 2008, 2013, 2017 and 2018</i>
- Summer season we see less amount of rain in almost all years except 2004, 2010, 2016 & 2019 in which we see a considerable amount of rainfall.
- As we do not have rainfall data for any other season for year 2019, we see a thumping rise in rainfall for summer season which may not be true if we include the missing data in this analysis. 

#### Analysis of rainfall for "Chembarambakkam"

In [None]:
plot_rain_fall (year_season_group_df, 'chembarambakkam')

Observations on average rainfall around <b>Chembarambakkam</b> water reservoir in the period 2004-2019 across different seasons of each of these years:<br>
- Years 2007, 2010 and 2016 have seen high rainfall in monsoon season in comparison to other years in the same season.
- In almost all years (except for few), we see an increase in the amount of rainfall in autumn season.
- In winter season, we can see good rainfall for almost all years except few.<br>
<i><b>For e.g.</b> 2004, 2006, 2008, 2013, 2014, 2017 and 2018</i>
- Summer season we see less amount of rain in almost all years except 2014 & 2016 in which we see a considerable amount of rainfall.
- We do not have data for year 2019 for this water reservoirs.

In [None]:
# Water reservoir which is filled to the maximum capacity (as per data).

year_season_group_df[['poondi_level', 'cholavaram_level', 'redhills_level', 'chembarambakkam_level']].max()

#### Few basic points to remember when analysing data of this kind of problem:

- Rain is not the only source of increase in water level but it is main source indeed. Water level gets increased through water bubbling from the bottom of a water reservoir from underground waterbeds.

- Increase in water level depends on:<br>
a. Amount of rain in and round the water reservoir and in the neighboring state from where the water gets flown.
b. Increase in ground level water which sometimes gets moved from earth's waterbed to the water reservoir.

- Rate of evaporation of water depends on:<br>
a. Surface area of the water reservoir which gets exposed to the sunlight.
b. Depth of the reservoir. More deeper the reservoir is more amount of water gets collected.

#### Overall observations on the given data set

1. Though we see a very good amount of rain in monsoon we do not see a rise in water level eiter in monsoon or autumn seasons. Also, please note that a good amount of rain is not an indicator of the amount collected or flown to the water reservoir. Because, a heavy down pour in and around a water reservoir may or may not get collected in the reservoir. Some percentage of rain goes to drainage, some may get collected in nearby ponds or may get soaked into the ground.

2. Water levels are quiet high in winter and autumn season when the temperature is not as high as we see during summer.

3. As per the data, Poondi water reservoir seems to be having highest capacity among other water reservoirs given in the data.