**About the analysis**:

This notebook explores the features of Aquifers provided in the dataset. There are four datasets provided under the the Aquifer waterbody category and we are going to explore each waterbody under Auser, Petrigano, Doganella and Luco in this notebook. Another thing to note here is that the EDA has been performed solely based on the raw data provided. I agree that there are a lot of missing values in the dataset, however the intention of this notebook is to look at the data as it is without any imputations. 

Since we have a lot of wonderful notebooks performing the EDA on daily time series data, I thought it would be insightful to add an analysis on monthly aggregation. 

*Findings.*


1. [Rainfall](#Rainfall): June, July and August are the drier months 
2. [Rainfall_Vs_Depth_to_groundwater](#Rainfall_Vs_Depth_to_groundwater): We get to see the effects of rainfall on depth to groundwater and hydrology after a lag period.  
3. [Temperature_Vs_Depth_to_groundwater](#Temperature_Vs_Depth_to_groundwater): Increase in mean temperature affects the depth to groundwater.
4. [Hydrometry_Vs_Rain](#Hydrometry_Vs_Rain):The mean hydrometry i.e. level of groundwater is lowest during the drier months.


**What is a Aquifer**?

*According to wikipedia:*

*An aquifer is an underground layer of water-bearing permeable rock, rock fractures or unconsolidated materials (gravel, sand, or silt). Groundwater can be extracted using a water well. The study of water flow in aquifers and the characterization of aquifers is called hydrogeology.*

Some more details about Aquifers in this link : https://www.usgs.gov/special-topic/water-science-school/science/aquifers-and-groundwater?qt-science_center_objects=0#qt-science_center_objects

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
from functools import reduce

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

**AQUIFER DOGANELLA:**

Let's begin with exploring the Doganella Aquifer. This dataset has around 16 years of data from 2004 to 2020 and has 9 different Pozzo(water wells). If you have seen the wikipedia definition of Aquifer then you would know pozzo or water wells are used to extract water from Aquifers. The depth and volume of these 9 Pozzos are provided along with the daily rainfall and temperatures of two regions Monteporzio and Velletri. 

In [None]:
# Read the Aquifer_Doganella dataset
aquifer_doganella = pd.read_csv('/kaggle/input/acea-water-prediction/Aquifer_Doganella.csv')

# Check the info
aquifer_doganella.info()
aquifer_doganella.head()

# If you have a date column then convert it to that format
aquifer_doganella['Date'] = pd.to_datetime(aquifer_doganella['Date'])

In [None]:
# Small helper functions to aggregate the data and visualize the data for each region

def grp_by_month_sum(df, date_col, feature_col):
    df_sum = df.groupby(df[date_col].dt.strftime('%b'))[feature_col].sum().reset_index()
    return df_sum

def grp_by_month_mean(df, date_col, feature_col):
    df_avg = df.groupby(df[date_col].dt.strftime('%b'))[feature_col].mean().reset_index()
    return df_avg


def grp_by_yr_sum(df, date_col, feature_col):
    df_sum = df.groupby(df[date_col].dt.strftime('%Y'))[feature_col].sum().reset_index()
    return df_sum

def grp_by_yr_mean(df, date_col, feature_col):
    df_avg = df.groupby(df[date_col].dt.strftime('%Y'))[feature_col].mean().reset_index()
    return df_avg


# visualize the time series
def check_ts(df, date_col):
    df.set_index(date_col).plot(figsize=(10,5))
    plt.legend(bbox_to_anchor=(1.04,1), loc="upper left")
    plt.xticks(rotation = 90)
    plt.grid()
    plt.show()
    

**Depth to groundwater:**

Depth to groundwater is our target for Aquifers waterbody. Where we have to predict the groundwater level(in meters from groundfloor) for all 9 pozzo. 

The next plot is the monthly mean depth_to_groundwater which is expressed in **meters from the groundfloor** . Here we can see that the mean depth is around 30 meters down from the ground floor for pozzo1 but for all other pozzos the average depth remains bewteen 90-110 meters. 

In [None]:
month = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
cols =['Depth_to_Groundwater_Pozzo_1', 'Depth_to_Groundwater_Pozzo_2','Depth_to_Groundwater_Pozzo_3','Depth_to_Groundwater_Pozzo_4',
       'Depth_to_Groundwater_Pozzo_5','Depth_to_Groundwater_Pozzo_6','Depth_to_Groundwater_Pozzo_7','Depth_to_Groundwater_Pozzo_8',
       'Depth_to_Groundwater_Pozzo_9'] 

mean_monthly_depth = aquifer_doganella.groupby(aquifer_doganella['Date'].dt.strftime('%b'))[cols].mean().reset_index()
mean_monthly_depth['Date'] = pd.Categorical(mean_monthly_depth['Date'], categories = month, ordered = True)
mean_monthly_depth.sort_values(by= 'Date', inplace = True)
check_ts(mean_monthly_depth, 'Date')

In order to understand the depth of each well, let's visualize the distribution of data in a violin plot. In the below plots we can see how the depth varies in each category. The distribution of depth in each category is different from the other category and the data distribution is multimodal with more than one peak. 

In [None]:
import plotly.express as px
fig = px.violin(aquifer_doganella, y=cols)
fig.show()

**Analyzing the monthly rainfall(in mm) in the region of Monteporzio and Velletri.** <a id='Rainfall'></a>

From the two line plots below we can see that  July and August are the months with lowest rainfall and November has the highest rainfal in these two regions. 

In [None]:
# Monteporzio and Velletri monthly total rainfall

monthly_monteporzio_rainfall = grp_by_month_sum(aquifer_doganella, 'Date', 'Rainfall_Monteporzio')
monthly_velletri_rainfall = grp_by_month_sum(aquifer_doganella, 'Date', 'Rainfall_Velletri')

combine_ts_1 = pd.merge(monthly_monteporzio_rainfall, monthly_velletri_rainfall, on = 'Date')

combine_ts_1['Date'] = pd.Categorical(combine_ts_1['Date'], categories = month, ordered = True)
combine_ts_1.sort_values(by= 'Date', inplace = True)
#print(combine_ts_1)
check_ts(combine_ts_1, 'Date')

In [None]:
# Monteporzio and Velletri monthly mean rainfall

monthly_monteporzio_rainfall = grp_by_month_mean(aquifer_doganella, 'Date', 'Rainfall_Monteporzio')
monthly_velletri_rainfall = grp_by_month_mean(aquifer_doganella, 'Date', 'Rainfall_Velletri')

combine_ts_2 = pd.merge(monthly_monteporzio_rainfall, monthly_velletri_rainfall, on = 'Date')

combine_ts_2['Date'] = pd.Categorical(combine_ts_2['Date'], categories = month, ordered = True)
combine_ts_2.sort_values(by= 'Date', inplace = True)
#print(combine_ts_2)
check_ts(combine_ts_2, 'Date')


When it comes to yearly rainfall we can see that from 2015- 2018 these two regions received very less rainfall. 

In [None]:
# Monteporzio and Velletri yearly total rainfall

yearly_monteporzio_rainfall = grp_by_yr_sum(aquifer_doganella, 'Date', 'Rainfall_Monteporzio')
yearly_velletri_rainfall = grp_by_yr_sum(aquifer_doganella, 'Date', 'Rainfall_Velletri')

combine_ts_3 = pd.merge(yearly_monteporzio_rainfall, yearly_velletri_rainfall, on = 'Date')
check_ts(combine_ts_3, 'Date')

In [None]:
# Monteporzio and Velletri yearly mean rainfall

yearly_monteporzio_rainfall = grp_by_yr_mean(aquifer_doganella, 'Date', 'Rainfall_Monteporzio')
yearly_velletri_rainfall = grp_by_yr_mean(aquifer_doganella, 'Date', 'Rainfall_Velletri')

combine_ts_4 = pd.merge(yearly_monteporzio_rainfall, yearly_velletri_rainfall, on = 'Date')
check_ts(combine_ts_4, 'Date')

From the temperature plots we can see that June, July and August have highest average temperatures in **degree C** as compared to other months and if you can recall from the rainfall plots above these three months have lowest rainfall as well. 

In [None]:
# Monteporzio and Velletri monthly mean temperature

monthly_monteporzio_temperature = grp_by_month_mean(aquifer_doganella, 'Date', 'Temperature_Monteporzio')
monthly_velletri_temperature = grp_by_month_mean(aquifer_doganella, 'Date', 'Temperature_Velletri')

combine_ts_5 = pd.merge(monthly_monteporzio_temperature, monthly_velletri_temperature, on = 'Date')

combine_ts_5['Date'] = pd.Categorical(combine_ts_5['Date'], categories = month, ordered = True)
combine_ts_5.sort_values(by= 'Date', inplace = True)
#print(combine_ts_5)
check_ts(combine_ts_5, 'Date')

Yearly average temperature wise, Velletri has higher mean temperatures as compared to Monteporzio. We can also see that mean temperature in Monteporzio were lowest in the year 2018 whereas in Velletri it was the opposite case. 

In [None]:
# Monteporzio and Velletri yearly mean temperature

yearly_monteporzio_temperature = grp_by_yr_mean(aquifer_doganella, 'Date', 'Temperature_Monteporzio')
yearly_velletri_temperature = grp_by_yr_mean(aquifer_doganella, 'Date', 'Temperature_Velletri')

combine_ts_6 = pd.merge(yearly_monteporzio_temperature, yearly_velletri_temperature, on = 'Date')
check_ts(combine_ts_6, 'Date')

**Volume:**

Monthly mean volume of water in **cubic meter** taken from drinking water treatment plan:

Here the volume of water taken from pozzo 5 and 6 are clubbed together hence we can see that the volume is higher as compared to the volume of water taken from other Pozzos which ranges mostly between 1000-4000 cubic meters on an average every month. One thing to note here is for pozzo1 the mean water volume taken every month is lowest as compared to other pozzos , even if the depth to ground water is around 30 meters from ground level for pozzo 1. 

In [None]:
# Monthly mean volume of water in cubic meter taken from drinking water treatment plan

cols =['Volume_Pozzo_1', 'Volume_Pozzo_2','Volume_Pozzo_3','Volume_Pozzo_4',
       'Volume_Pozzo_5+6', 'Volume_Pozzo_7', 'Volume_Pozzo_8', 'Volume_Pozzo_9']

mean_monthly_vol = aquifer_doganella.groupby(aquifer_doganella['Date'].dt.strftime('%b'))[cols].mean().reset_index()
mean_monthly_vol['Date'] = pd.Categorical(mean_monthly_vol['Date'], categories = month, ordered = True)
mean_monthly_vol.sort_values(by= 'Date', inplace = True)
check_ts(mean_monthly_vol, 'Date')

The distribution of volume of water taken from the drinking water treatment plant is different for different well just like that the depth. 

In [None]:
import plotly.express as px
fig = px.violin(aquifer_doganella, y=cols)
fig.show()

In [None]:
aquifer_doganella.columns

**How does depth to ground water and rainfall vary?** <a id='Rainfall_Vs_Depth_to_groundwater'></a>

Here we find that the rainfall is high during nov, dec, jan, feb and the depth to groundwater level is high in the preceeding 3 months March, April and May. May be this is due to the lag effect.

In [None]:
# merge data

df_to_merge_2 = [combine_ts_2, combine_ts_5, mean_monthly_depth, mean_monthly_vol]

consolidated_monthly_mean_df = reduce(lambda left, right: pd.merge(left, right, on=['Date'], how = 'outer'), df_to_merge_2)
#print(consolidated_monthly_mean_df)



df_to_merge_2 = [combine_ts_2, combine_ts_5, mean_monthly_depth, mean_monthly_vol] 

consolidated_monthly_mean_df = reduce(lambda left, right: pd.merge(left, right, on=['Date'], how = 'outer'), df_to_merge_2)
consolidated_monthly_mean_df.columns
consolidated_monthly_mean_df['mean_rain'] = consolidated_monthly_mean_df[['Rainfall_Monteporzio', 'Rainfall_Velletri']].sum(axis =1)


consolidated_monthly_mean_df['mean_depth'] = consolidated_monthly_mean_df[['Depth_to_Groundwater_Pozzo_1', 'Depth_to_Groundwater_Pozzo_2',
       'Depth_to_Groundwater_Pozzo_3', 'Depth_to_Groundwater_Pozzo_4',
       'Depth_to_Groundwater_Pozzo_5', 'Depth_to_Groundwater_Pozzo_6',
       'Depth_to_Groundwater_Pozzo_7', 'Depth_to_Groundwater_Pozzo_8',
       'Depth_to_Groundwater_Pozzo_9']].mean(axis =1)

consolidated_monthly_mean_df['mean_volume'] = consolidated_monthly_mean_df[['Volume_Pozzo_1', 'Volume_Pozzo_2',
       'Volume_Pozzo_3', 'Volume_Pozzo_4', 'Volume_Pozzo_5+6',
       'Volume_Pozzo_7', 'Volume_Pozzo_8', 'Volume_Pozzo_9']].mean(axis =1)


consolidated_monthly_mean_df['mean_temp'] = consolidated_monthly_mean_df[['Temperature_Monteporzio', 'Temperature_Velletri']].mean(axis =1)


plt.figure(figsize = (12, 8))
ax = plt.axes()
sc = sns.scatterplot(x = consolidated_monthly_mean_df['mean_rain'], 
                y = consolidated_monthly_mean_df['mean_depth'], 
                hue = consolidated_monthly_mean_df['Date'])
sc.legend(loc='center left', bbox_to_anchor=(1.25, 0.5), ncol=1)
ax.set_title('Doganella- How depth varies with rain')

for l1 in range(0,consolidated_monthly_mean_df.shape[0]):
     sc.text(consolidated_monthly_mean_df['mean_rain'][l1]+0.2, 
             consolidated_monthly_mean_df['mean_depth'][l1], 
             consolidated_monthly_mean_df['Date'][l1], 
             horizontalalignment='left', 
             size='medium', color='black', 
             weight='semibold')

# seaborn annonate link reference: https://python-graph-gallery.com/46-add-text-annotation-on-scatterplot/

**How does depth to ground water and temperature vary?**

Here we find that the rainfall is high during nov, dec, jan, feb . But the depth depth is lowest for those months.

In [None]:
plt.figure(figsize = (12, 8))
ax = plt.axes()
sc = sns.scatterplot(x = consolidated_monthly_mean_df['mean_temp'], 
                y = consolidated_monthly_mean_df['mean_depth'], 
                hue = consolidated_monthly_mean_df['Date'])
sc.legend(loc='center left', bbox_to_anchor=(1.25, 0.5), ncol=1)
ax.set_title('Doganella- How depth varies with temperature')

for l1 in range(0,consolidated_monthly_mean_df.shape[0]):
     sc.text(consolidated_monthly_mean_df['mean_temp'][l1]+0.2, 
             consolidated_monthly_mean_df['mean_depth'][l1], 
             consolidated_monthly_mean_df['Date'][l1], 
             horizontalalignment='left', 
             size='medium', color='black', 
             weight='semibold')

**How does depth to ground water and volume vary?**

Here we find that the rainfall is high during nov, dec, jan, feb . But the depth depth is lowest for those months.


Let's compare the volume to the depth to ground water features. We know that for pozzo1 the depth to ground water is around 30-40 meters from the ground level. Let's check how depth varies with mean volume in cubic meters each month. 

From the rainfall plot above we know that the mean rainfall from Nov-Mar is high as compared to other months. Now from the plot below we get to see that the volume of water taken from drinking water treatment plant is higher during the first 6 months of the year when the depth to ground water is less. While the water volume recordings reduces during the later part of the year when the depth is more. All these values are consolidated mean values for every month.

In [None]:
plt.figure(figsize = (12, 8))
ax = plt.axes()
sc = sns.scatterplot(x = consolidated_monthly_mean_df['mean_volume'], 
                y = consolidated_monthly_mean_df['mean_depth'], 
                hue = consolidated_monthly_mean_df['Date'])
sc.legend(loc='center left', bbox_to_anchor=(1.25, 0.5), ncol=1)
ax.set_title('Doganella- How depth varies with volume')

for l1 in range(0,consolidated_monthly_mean_df.shape[0]):
     sc.text(consolidated_monthly_mean_df['mean_volume'][l1]+0.2, 
             consolidated_monthly_mean_df['mean_depth'][l1], 
             consolidated_monthly_mean_df['Date'][l1], 
             horizontalalignment='left', 
             size='medium', color='black', 
             weight='semibold')

**AQUIFER AUSER**:

The next dataset in Aquifer category is the Auser waterbody. We will follow almost the same steps as above to get some insigths from the Auser dataset.  


In [None]:
# Read the Aquifer_Auser dataset
aquifer_auser = pd.read_csv('/kaggle/input/acea-water-prediction/Aquifer_Auser.csv')

# Check the info
aquifer_auser.info()
aquifer_auser.head()

# If you have a date column then convert it to that format
aquifer_auser['Date'] = pd.to_datetime(aquifer_auser['Date'])

**Total Monthly Rainfall:**


The data provided is for 10 places. The total monthly rainfall follows the same trend in each of these places with small variations in between. Overall the trend remains the same. 

In [None]:
cols = ['Rainfall_Gallicano', 'Rainfall_Pontetetto','Rainfall_Monte_Serra','Rainfall_Orentano', 
        'Rainfall_Borgo_a_Mozzano','Rainfall_Piaggione','Rainfall_Calavorno','Rainfall_Croce_Arcana',
        'Rainfall_Tereglio_Coreglia_Antelminelli','Rainfall_Fabbriche_di_Vallico']

monthly_rainfall_auser = aquifer_auser.groupby(aquifer_auser['Date'].dt.strftime('%b'))[cols].sum().reset_index()
monthly_rainfall_auser['Date'] = pd.Categorical(monthly_rainfall_auser['Date'], categories = month, ordered = True)
monthly_rainfall_auser.sort_values(by= 'Date', inplace = True)
check_ts(monthly_rainfall_auser, 'Date')

**Mean Monthly Rainfall:**

The mean monthly rainfall follows the same trend in these places.

In [None]:
monthly_rainfall_auser = aquifer_auser.groupby(aquifer_auser['Date'].dt.strftime('%b'))[cols].mean().reset_index()
monthly_rainfall_auser['Date'] = pd.Categorical(monthly_rainfall_auser['Date'], categories = month, ordered = True)
monthly_rainfall_auser.sort_values(by= 'Date', inplace = True)
check_ts(monthly_rainfall_auser, 'Date')

**Depth to Groundwater:**

This category of waterbody has two subsystems viz. North and South. The levels of the NORTH sector are represented by the values of the SAL, PAG, CoS and DIEC wells, while the levels of the SOUTH sector by the LT2 well. 
The depth to groundwater is much lesser for auser than doganella

In [None]:
cols = ['Depth_to_Groundwater_SAL', 'Depth_to_Groundwater_PAG',
       'Depth_to_Groundwater_CoS', 'Depth_to_Groundwater_DIEC']

monthly_mean_depth = aquifer_auser.groupby(aquifer_auser['Date'].dt.strftime('%b'))[cols].mean().reset_index()
monthly_mean_depth['Date'] = pd.Categorical(monthly_mean_depth['Date'], categories = month, ordered = True)
monthly_mean_depth.sort_values(by= 'Date', inplace = True)
check_ts(monthly_mean_depth, 'Date')

**Temperature:**


June July August are summer months. 

In [None]:
cols = ['Temperature_Orentano', 'Temperature_Monte_Serra',
       'Temperature_Ponte_a_Moriano', 'Temperature_Lucca_Orto_Botanico']

monthly_mean_temp = aquifer_auser.groupby(aquifer_auser['Date'].dt.strftime('%b'))[cols].mean().reset_index()
monthly_mean_temp['Date'] = pd.Categorical(monthly_mean_temp['Date'], categories = month, ordered = True)
monthly_mean_temp.sort_values(by= 'Date', inplace = True)
check_ts(monthly_mean_temp, 'Date')

**Volume:**


The mean volumne (in cubic meters) of drinking water taken from water treatment plant remains steady throughout the year for all but one treatment plant which is Volume_POL

In [None]:
cols = ['Volume_POL', 'Volume_CC1', 'Volume_CC2', 'Volume_CSA', 'Volume_CSAL']

monthly_mean_vol = aquifer_auser.groupby(aquifer_auser['Date'].dt.strftime('%b'))[cols].mean().reset_index()
monthly_mean_vol['Date'] = pd.Categorical(monthly_mean_vol['Date'], categories = month, ordered = True)
monthly_mean_vol.sort_values(by= 'Date', inplace = True)
check_ts(monthly_mean_vol, 'Date')

**Hydrometry**:


This is the ground water level measured in meters. 

In [None]:
cols = ['Hydrometry_Monte_S_Quirico', 'Hydrometry_Piaggione']

monthly_mean_hyd = aquifer_auser.groupby(aquifer_auser['Date'].dt.strftime('%b'))[cols].mean().reset_index()
monthly_mean_hyd['Date'] = pd.Categorical(monthly_mean_hyd['Date'], categories = month, ordered = True)
monthly_mean_hyd.sort_values(by= 'Date', inplace = True)
check_ts(monthly_mean_hyd, 'Date')

**How depth varies with rain?**

In [None]:
df_to_merge_3 = [monthly_rainfall_auser, monthly_mean_depth, monthly_mean_temp, monthly_mean_vol, monthly_mean_hyd] 

consolidated_monthly_mean_df = reduce(lambda left, right: pd.merge(left, right, on=['Date'], how = 'outer'), df_to_merge_3)
consolidated_monthly_mean_df.columns
consolidated_monthly_mean_df['mean_rain'] = consolidated_monthly_mean_df[['Rainfall_Gallicano', 'Rainfall_Pontetetto',
       'Rainfall_Monte_Serra', 'Rainfall_Orentano', 'Rainfall_Borgo_a_Mozzano','Rainfall_Piaggione', 'Rainfall_Calavorno', 
       'Rainfall_Croce_Arcana','Rainfall_Tereglio_Coreglia_Antelminelli','Rainfall_Fabbriche_di_Vallico']].sum(axis =1)


consolidated_monthly_mean_df['mean_depth'] = consolidated_monthly_mean_df[['Depth_to_Groundwater_SAL',
                                                                           'Depth_to_Groundwater_PAG', 
                                                                           'Depth_to_Groundwater_CoS',
                                                                           'Depth_to_Groundwater_DIEC']].mean(axis =1)

consolidated_monthly_mean_df['mean_volume'] = consolidated_monthly_mean_df[['Volume_CC1','Volume_CC2', 'Volume_CSA', 'Volume_CSAL']].mean(axis =1)


consolidated_monthly_mean_df['mean_temp'] = consolidated_monthly_mean_df[['Temperature_Orentano',
       'Temperature_Monte_Serra', 'Temperature_Ponte_a_Moriano',
       'Temperature_Lucca_Orto_Botanico']].mean(axis =1)


consolidated_monthly_mean_df['mean_hydrometry'] = consolidated_monthly_mean_df[['Hydrometry_Monte_S_Quirico',
       'Hydrometry_Piaggione']].mean(axis =1)


plt.figure(figsize = (12, 8))
ax = plt.axes()
sc = sns.scatterplot(x = consolidated_monthly_mean_df['mean_rain'], 
                y = consolidated_monthly_mean_df['mean_depth'], 
                hue = consolidated_monthly_mean_df['Date'])
sc.legend(loc='center left', bbox_to_anchor=(1.25, 0.5), ncol=1)
ax.set_title('Auser: How depth varies with rain')

for l1 in range(0,consolidated_monthly_mean_df.shape[0]):
     sc.text(consolidated_monthly_mean_df['mean_rain'][l1]+0.2, 
             consolidated_monthly_mean_df['mean_depth'][l1], 
             consolidated_monthly_mean_df['Date'][l1], 
             horizontalalignment='left', 
             size='medium', color='black', 
             weight='semibold')



**How does mean depth vary with mean temperature?** <a id='Temperature_Vs_Depth_to_groundwater'></a>

In [None]:
plt.figure(figsize = (12, 8))
ax = plt.axes()
sc = sns.scatterplot(x = consolidated_monthly_mean_df['mean_temp'], 
                y = consolidated_monthly_mean_df['mean_depth'], 
                hue = consolidated_monthly_mean_df['Date'])
sc.legend(loc='center left', bbox_to_anchor=(1.25, 0.5), ncol=1)
ax.set_title('Auser: How depth varies with temperature')

for l1 in range(0,consolidated_monthly_mean_df.shape[0]):
     sc.text(consolidated_monthly_mean_df['mean_temp'][l1]+0.2, 
             consolidated_monthly_mean_df['mean_depth'][l1], 
             consolidated_monthly_mean_df['Date'][l1], 
             horizontalalignment='left', 
             size='medium', color='black', 
             weight='semibold')

**How hydrometry varies with rain?** <a id='Hydrometry_Vs_Rain'></a>

In [None]:
plt.figure(figsize = (12, 8))
ax = plt.axes()
sc = sns.scatterplot(x = consolidated_monthly_mean_df['mean_rain'], 
                y = consolidated_monthly_mean_df['mean_hydrometry'], 
                hue = consolidated_monthly_mean_df['Date'])
sc.legend(loc='center left', bbox_to_anchor=(1.25, 0.5), ncol=1)
ax.set_title('Auser: How hydrometry varies with rain')

for l1 in range(0,consolidated_monthly_mean_df.shape[0]):
     sc.text(consolidated_monthly_mean_df['mean_rain'][l1]+0.2, 
             consolidated_monthly_mean_df['mean_hydrometry'][l1], 
             consolidated_monthly_mean_df['Date'][l1], 
             horizontalalignment='left', 
             size='medium', color='black', 
             weight='semibold')

**AQUIFER LUCO:**


This aquifer not fed by rivers or lakes but by meteoric infiltration at the extremes of the impermeable sedimentary layers.

In [None]:
# Read the Luco data
aquifer_luco = pd.read_csv('/kaggle/input/acea-water-prediction/Aquifer_Luco.csv')

# Check the info
aquifer_luco.info()
aquifer_luco.head()

# If you have a date column then convert it to that format
aquifer_luco['Date'] = pd.to_datetime(aquifer_luco['Date'])

**Rainfall:**


When we check the mean rainfall data for Luco, we find that the number of data points are missing for most of the regions except for three regions where we have considerable amount of data and those regions are Simignano, Montalcinello and Sovicille. 

Another thing to note here is that the monthly mean rainfall is much less than the Doganella and Auser data. 

In [None]:
cols = ['Rainfall_Simignano', 'Rainfall_Montalcinello','Rainfall_Sovicille']

monthly_rainfall_luco = aquifer_luco.groupby(aquifer_luco['Date'].dt.strftime('%b'))[cols].mean().reset_index()
monthly_rainfall_luco['Date'] = pd.Categorical(monthly_rainfall_luco['Date'], categories = month, ordered = True)
monthly_rainfall_luco.sort_values(by= 'Date', inplace = True)
check_ts(monthly_rainfall_luco, 'Date')

**Temperature:**


The temperatures of Mensano and Pentolina are between 0-5 degree C throughout the year. (Isin't it too low??)

In [None]:
cols = ['Temperature_Siena_Poggio_al_Vento',
       'Temperature_Mensano', 'Temperature_Pentolina',
       'Temperature_Monteroni_Arbia_Biena']

monthly_temperature_luco = aquifer_luco.groupby(aquifer_luco['Date'].dt.strftime('%b'))[cols].mean().reset_index()
monthly_temperature_luco['Date'] = pd.Categorical(monthly_temperature_luco['Date'], categories = month, ordered = True)
monthly_temperature_luco.sort_values(by= 'Date', inplace = True)
check_ts(monthly_temperature_luco, 'Date')

**Depth to Ground Water:**

Here we have less data points which is around 1000-3000 data points out of around 7000 data points. 

In [None]:
cols = ['Depth_to_Groundwater_Podere_Casetta',
       'Depth_to_Groundwater_Pozzo_1', 'Depth_to_Groundwater_Pozzo_3',
       'Depth_to_Groundwater_Pozzo_4']

monthly_depth_luco = aquifer_luco.groupby(aquifer_luco['Date'].dt.strftime('%b'))[cols].mean().reset_index()
monthly_depth_luco['Date'] = pd.Categorical(monthly_depth_luco['Date'], categories = month, ordered = True)
monthly_depth_luco.sort_values(by= 'Date', inplace = True)
check_ts(monthly_depth_luco, 'Date')

**Volume:**


We have less data points to analyze here as well. 


In [None]:
cols = ['Volume_Pozzo_1', 'Volume_Pozzo_3','Volume_Pozzo_4']

monthly_volume_luco = aquifer_luco.groupby(aquifer_luco['Date'].dt.strftime('%b'))[cols].mean().reset_index()
monthly_volume_luco['Date'] = pd.Categorical(monthly_volume_luco['Date'], categories = month, ordered = True)
monthly_volume_luco.sort_values(by= 'Date', inplace = True)
check_ts(monthly_volume_luco, 'Date')

**AQUIFER PETRIGNANO:**


This aquifier is fed by three underground aquifers separated by low permeability septa. 



In [None]:
# Read the aquifer_petrignano data
aquifer_petrignano = pd.read_csv('/kaggle/input/acea-water-prediction/Aquifer_Petrignano.csv')

# Check the info
aquifer_petrignano.info()
aquifer_petrignano.head()

# If you have a date column then convert it to that format
aquifer_petrignano['Date'] = pd.to_datetime(aquifer_petrignano['Date'])

**Rainfall:**



In [None]:
monthly_rainfall_petrignano = aquifer_petrignano.groupby(aquifer_petrignano['Date'].dt.strftime('%b'))['Rainfall_Bastia_Umbra'].mean().reset_index()
monthly_rainfall_petrignano['Date'] = pd.Categorical(monthly_rainfall_petrignano['Date'], categories = month, ordered = True)
monthly_rainfall_petrignano.sort_values(by= 'Date', inplace = True)
check_ts(monthly_rainfall_petrignano, 'Date')

**Temperature:**



In [None]:
cols = ['Temperature_Bastia_Umbra', 'Temperature_Petrignano']
monthly_temp_petrignano = aquifer_petrignano.groupby(aquifer_petrignano['Date'].dt.strftime('%b'))[cols].mean().reset_index()
monthly_temp_petrignano['Date'] = pd.Categorical(monthly_temp_petrignano['Date'], categories = month, ordered = True)
monthly_temp_petrignano.sort_values(by= 'Date', inplace = True)
check_ts(monthly_temp_petrignano, 'Date')

**Depth to Groundwater:**



In [None]:
cols = ['Depth_to_Groundwater_P24', 'Depth_to_Groundwater_P25']
monthly_depth_petrignano = aquifer_petrignano.groupby(aquifer_petrignano['Date'].dt.strftime('%b'))[cols].mean().reset_index()
monthly_depth_petrignano['Date'] = pd.Categorical(monthly_depth_petrignano['Date'], categories = month, ordered = True)
monthly_depth_petrignano.sort_values(by= 'Date', inplace = True)
check_ts(monthly_depth_petrignano, 'Date')

**Volume:**



In [None]:
monthly_vol_petrignano = aquifer_petrignano.groupby(aquifer_petrignano['Date'].dt.strftime('%b'))['Volume_C10_Petrignano'].mean().reset_index()
monthly_vol_petrignano['Date'] = pd.Categorical(monthly_vol_petrignano['Date'], categories = month, ordered = True)
monthly_vol_petrignano.sort_values(by= 'Date', inplace = True)
check_ts(monthly_vol_petrignano, 'Date')

**Hydrometry:**

In [None]:
monthly_hyd_petrignano = aquifer_petrignano.groupby(aquifer_petrignano['Date'].dt.strftime('%b'))['Hydrometry_Fiume_Chiascio_Petrignano'].mean().reset_index()
monthly_hyd_petrignano['Date'] = pd.Categorical(monthly_hyd_petrignano['Date'], categories = month, ordered = True)
monthly_hyd_petrignano.sort_values(by= 'Date', inplace = True)
check_ts(monthly_hyd_petrignano, 'Date')