# Challenge

Can you build a story to predict the amount of water in each unique waterbody? The challenge is to determine how features influence the water availability of each presented waterbody. To be more straightforward, gaining a better understanding of volumes, they will be able to ensure water availability for each time interval of the year.

The time interval is defined as day/month depending on the available measures for each waterbody. Models should capture volumes for each waterbody(for instance, for a model working on a monthly interval a forecast over the month is expected).

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, label_binarize, normalize, MinMaxScaler
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.metrics import accuracy_score, precision_score, recall_score
from sklearn.metrics import confusion_matrix, f1_score

In [None]:
aquafier_auser = pd.read_csv('.//Downloads//Aquifer_Auser.csv')

In [None]:
aquafier_doganella = pd.read_csv('.//Downloads//Aquifer_Doganella.csv')

In [None]:
aquafier_luco = pd.read_csv('.//Downloads//Aquifer_Luco.csv')

In [None]:
aquafier_petrignano = pd.read_csv('.//Downloads//Aquifer_Petrignano.csv')

In [None]:
lake_bilancino = pd.read_csv('.//Downloads//Lake_Bilancino.csv')

In [None]:
river_arno = pd.read_csv('.//Downloads//River_Arno.csv')

In [None]:
water_spring_amiata = pd.read_csv('.//Downloads//Water_Spring_Amiata.csv')

In [None]:
water_spring_lupa = pd.read_csv('.//Downloads//Water_Spring_Lupa.csv')

In [None]:
water_spring_madonnna_di_canneto = pd.read_csv('.//Downloads//Water_Spring_Madonna_di_Canneto.csv')

# Aquafiers

The first water bodies I am working on are aquafiers

In [None]:
aquafier_auser.head()

In [None]:
aquafier_doganella.head()

In [None]:
aquafier_luco.head()

In [None]:
aquafier_petrignano.head()

In [None]:
print('aquafier_ause :', aquafier_auser.shape)
print('aquafier_doganella :', aquafier_doganella.shape)
print('aquafier_luco :', aquafier_luco.shape)
print('aquafier_petrignano :', aquafier_petrignano.shape)

## Aquafier_Auser

In [None]:
aquafier_auser.isnull().sum()

In [None]:
aquafier_auser.info()

In [None]:
aquafier_auser.describe()

There are total 5 classes of parameters in aquafier_auser dataset:
    1. Rainfall
    2. Depth to ground water
    3. Temperature
    4. Volume
    5. Hydrometery
    
In first step we will focus on rainfall of the dataset. This data is about the rain falling in different locations.

In [None]:
aquafier_auser.columns

# Rainfall

In [None]:
aquafier_auser['Date'] = pd.to_datetime(aquafier_auser['Date'])
aquafier_auser['Day'] = aquafier_auser['Date'].dt.day
aquafier_auser['Month'] = aquafier_auser['Date'].dt.month
aquafier_auser['Year'] = aquafier_auser['Date'].dt.year

In [None]:
aquafier_auser_rain = [col for col in aquafier_auser.columns if col.startswith('Rain')]
aquafier_auser_rain.append('Day')
aquafier_auser_rain.append('Month')
aquafier_auser_rain.append('Year')
aquafier_auser_rain.append('Date')
a = aquafier_auser[aquafier_auser_rain]

In [None]:
a.head()

In [None]:
a.isnull().sum()/len(a)

In [None]:
a_feat = [col for col in a.columns if col.startswith('Rain')]
for col in a_feat:
    plt.figure(figsize=(30,5), dpi= 300)
    sns.lineplot(a['Date'], a[col])
    plt.title([col])
    plt.show()

# Conclusion : 

Normally the data is gathered from 1998 to 2020 but the data for rainfall is only present from 2006.

### Now filling the null values aquafier_auser columns whose label start with 'Rain'

#### Due to the pressence of null values present in most of the the columns most and due to that most of the columns medians are zero, so i have to use mean instead of median to fill the null values.

In [None]:
aquafier_auser.Rainfall_Borgo_a_Mozzano = aquafier_auser['Rainfall_Borgo_a_Mozzano'].fillna(aquafier_auser.Rainfall_Borgo_a_Mozzano.mean())

In [None]:
aquafier_auser.Rainfall_Calavorno = aquafier_auser['Rainfall_Calavorno'].fillna(aquafier_auser.Rainfall_Calavorno.mean())
aquafier_auser.Rainfall_Croce_Arcana = aquafier_auser['Rainfall_Calavorno'].fillna(aquafier_auser.Rainfall_Calavorno.mean())
aquafier_auser.Rainfall_Fabbriche_di_Vallico = aquafier_auser['Rainfall_Fabbriche_di_Vallico'].fillna(aquafier_auser.Rainfall_Fabbriche_di_Vallico.mean())
aquafier_auser.Rainfall_Gallicano = aquafier_auser['Rainfall_Gallicano'].fillna(aquafier_auser.Rainfall_Gallicano.mean())
aquafier_auser.Rainfall_Monte_Serra = aquafier_auser['Rainfall_Monte_Serra'].fillna(aquafier_auser.Rainfall_Monte_Serra.mean())
aquafier_auser.Rainfall_Orentano = aquafier_auser['Rainfall_Orentano'].fillna(aquafier_auser.Rainfall_Orentano.mean())
aquafier_auser.Rainfall_Piaggione = aquafier_auser['Rainfall_Piaggione'].fillna(aquafier_auser.Rainfall_Piaggione.mean())
aquafier_auser.Rainfall_Pontetetto = aquafier_auser['Rainfall_Pontetetto'].fillna(aquafier_auser.Rainfall_Pontetetto.mean())
aquafier_auser.Rainfall_Tereglio_Coreglia_Antelminelli = aquafier_auser['Rainfall_Tereglio_Coreglia_Antelminelli'].fillna(aquafier_auser.Rainfall_Tereglio_Coreglia_Antelminelli.mean())

In [None]:
aquafier_auser.isnull().sum()

In [None]:
aquafier_auser.head()

# Depth_To_Groundwater

In [None]:
aquafier_auser.Depth_to_Groundwater_CoS = aquafier_auser['Depth_to_Groundwater_CoS'].fillna(aquafier_auser.Depth_to_Groundwater_CoS.mean())
aquafier_auser.Depth_to_Groundwater_DIEC = aquafier_auser['Depth_to_Groundwater_DIEC'].fillna(aquafier_auser.Depth_to_Groundwater_DIEC.mean())
aquafier_auser.Depth_to_Groundwater_LT2 = aquafier_auser['Depth_to_Groundwater_LT2'].fillna(aquafier_auser.Depth_to_Groundwater_LT2.mean())
aquafier_auser.Depth_to_Groundwater_PAG = aquafier_auser['Depth_to_Groundwater_PAG'].fillna(aquafier_auser.Depth_to_Groundwater_PAG.mean())
aquafier_auser.Depth_to_Groundwater_SAL = aquafier_auser['Depth_to_Groundwater_SAL'].fillna(aquafier_auser.Depth_to_Groundwater_SAL.mean())

In [None]:
aquafier_auser.head()

In [None]:
aquafier_auser.loc[ : ,'Depth_to_Groundwater_LT2':'Depth_to_Groundwater_DIEC'].head()

In [None]:
aquafier_auser.isnull().sum()

# Volume

In [None]:
aquafier_auser.Volume_POL = aquafier_auser['Volume_POL'].fillna(aquafier_auser.Volume_POL.mean())
aquafier_auser.Volume_CC1 = aquafier_auser['Volume_CC1'].fillna(aquafier_auser.Volume_CC1.mean())
aquafier_auser.Volume_CC2 = aquafier_auser['Volume_CC2'].fillna(aquafier_auser.Volume_CC2.mean())
aquafier_auser.Volume_CSA = aquafier_auser['Volume_CSA'].fillna(aquafier_auser.Volume_CSA.mean())
aquafier_auser.Volume_CSAL = aquafier_auser['Volume_CSAL'].fillna(aquafier_auser.Volume_CSAL.mean())

In [None]:
aquafier_auser.head()

In [None]:
aquafier_auser.isnull().sum()

# Hydrometry

In [None]:
aquafier_auser.Hydrometry_Monte_S_Quirico = aquafier_auser['Hydrometry_Monte_S_Quirico'].fillna(aquafier_auser.Hydrometry_Monte_S_Quirico.mean())
aquafier_auser.Hydrometry_Piaggione = aquafier_auser['Hydrometry_Piaggione'].fillna(aquafier_auser.Hydrometry_Piaggione.mean())

In [None]:
aquafier_auser.head()

In [None]:
aquafier_auser.isnull().sum()

### Applying the univariate & bivariate analysis

In [None]:
plt.hist(aquafier_auser.Rainfall_Borgo_a_Mozzano, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Rainfall_Calavorno, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Rainfall_Croce_Arcana, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Rainfall_Fabbriche_di_Vallico, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Rainfall_Gallicano, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Rainfall_Monte_Serra, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Rainfall_Orentano, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Rainfall_Piaggione, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Rainfall_Pontetetto, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Rainfall_Tereglio_Coreglia_Antelminelli, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Depth_to_Groundwater_CoS, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Depth_to_Groundwater_DIEC, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Depth_to_Groundwater_LT2, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Depth_to_Groundwater_PAG, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Depth_to_Groundwater_SAL, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Temperature_Orentano, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Temperature_Lucca_Orto_Botanico, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Temperature_Monte_Serra, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Temperature_Ponte_a_Moriano, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Volume_CC1, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Volume_CC2, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Volume_CSA, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Volume_CSAL, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Volume_POL, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Hydrometry_Monte_S_Quirico, rwidth= 0.9)
plt.show()

In [None]:
plt.hist(aquafier_auser.Hydrometry_Piaggione, rwidth= 0.9)
plt.show()

In [None]:
aquafier_auser = aquafier_auser.drop(['Day', 'Month', 'Year'], axis= 1)

In [None]:
aquafier_auser.head()

In [None]:
from sklearn.tree import DecisionTreeRegressor

In [None]:
dtree = DecisionTreeRegressor(min_samples_split= 200)

# The features to be predicted in Aquafier_auser

1. Depth_to_Groundwater_SAL

2. Depth_to_Groundwater_COS

3. Depth_to_Groundwater_LT2

"Information about the Auser aquifer. This water body consists of two subsystems, that we call NORH and SOUTH, where the former partly influences the behaviour of the latter.
The levels of the NORTH sector are represented by the values of the SAL, PAG, CoS and DIEC wells, while the levels of the SOUTH sector by the LT2 well."

In [None]:
plt.figure(figsize= (15,15))
sns.heatmap(aquafier_auser.corr(method= 'pearson'), annot= True)
plt.show()