<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:black; border:0; color:cyan' role="tab" aria-controls="home"><center>Acea Smart Water Analytics</center></h2>


![acea](https://images.pexels.com/photos/1231622/pexels-photo-1231622.jpeg?cs=srgb&dl=pexels-sourav-mishra-1231622.jpg&fm=jpg)

<a id="1"></a>
<b>In this competition we will focus only on the water sector to help Acea Group preserve precious waterbodies. As it is easy to imagine, a water supply company struggles with the need to forecast the water level in a waterbody (water spring, lake, river, or aquifer) to handle daily consumption. During fall and winter waterbodies are refilled, but during spring and summer they start to drain. To help preserve the health of these waterbodies it is important to predict the most efficient water availability, in terms of level and water flow for each day of the year.

Build a story to predict the amount of water in each unique waterbody? The challenge is to determine how features influence the water availability of each presented waterbody. To be more straightforward, gaining a better understanding of volumes, they will be able to ensure water availability for each time interval of the year.

The time interval is defined as day/month depending on the available measures for each waterbody. Models should capture volumes for each waterbody(for instance, for a model working on a monthly interval a forecast over the month is expected). The desired outcome is a notebook that can generate four mathematical models, one for each category of waterbody (acquifers, water springs, river, lake) that might be applicable to each single waterbody.<b>

![acea](https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F6195295%2Fcca952eecc1e49c54317daf97ca2cca7%2FAcea-Input.png?generation=1606932492951317&alt=media)

<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:black; border:0; color:cyan' role="tab" aria-controls="home"><center>Table of Contents</center></h2>

    
    
- [Problem Statement](#1)
- [Import Libaries](#2)
- [Reading Data](#3)     
- [Explanatory Data Analysis (EDA)](#4)


<a id="2"></a>
# Import Libraries

In [None]:
#Ignore warnings
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly
import cufflinks as cf
from plotly import tools
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
plotly.offline.init_notebook_mode(connected = True)
cf.set_config_file(theme='henanigans')
from pprint import pprint as pprint
import plotly.offline as pyo
from plotly.subplots import make_subplots

<a id="3"></a>
# Reading Data

In [None]:
Aquifer_Auser = pd. read_csv("../input/acea-water-prediction/Aquifer_Auser.csv", index_col = 'Date')
Aquifer_Doganella = pd. read_csv("../input/acea-water-prediction/Aquifer_Doganella.csv", index_col = 'Date')
Aquifer_Luco = pd. read_csv("../input/acea-water-prediction/Aquifer_Luco.csv", index_col = 'Date')
Aquifer_Petrignano = pd. read_csv("../input/acea-water-prediction/Aquifer_Petrignano.csv", index_col = 'Date')
Lake_Bilancino = pd. read_csv("../input/acea-water-prediction/Lake_Bilancino.csv", index_col = 'Date')
River_Arno = pd. read_csv("../input/acea-water-prediction/River_Arno.csv", index_col = 'Date')
Water_Spring_Amiata = pd. read_csv("../input/acea-water-prediction/Water_Spring_Amiata.csv", index_col = 'Date')
Water_Spring_Lupa = pd. read_csv("../input/acea-water-prediction/Water_Spring_Lupa.csv", index_col = 'Date')
Water_Spring_Madonna_di_Canneto = pd. read_csv("../input/acea-water-prediction/Water_Spring_Madonna_di_Canneto.csv", index_col = 'Date')

# Missing Value function

In [None]:
#Function to check percentage of null values present in dataset 
def calnullpercentage(df):
    missing_num= df[df.columns].isna().sum().sort_values(ascending=False)
    missing_perc= (df[df.columns].isna().sum()/len(df)*100).sort_values(ascending=False)
    missing= pd.concat([missing_num,missing_perc],keys=['Total','Percentage'],axis=1)
    missing= missing[missing['Percentage']>0]
    return missing

In [None]:
# Aquifer_Auser null values percentage
calnullpercentage(Aquifer_Auser).style.format({"Total": "{:20,.0f}", 
                          "Percentage": "{:20,.0f}%"})\
                 .background_gradient(cmap='Blues')

In [None]:
# Aquifer_Doganella null values percentage
calnullpercentage(Aquifer_Doganella).style.format({"Total": "{:20,.0f}", 
                          "Percentage": "{:20,.0f}%"})\
                 .background_gradient(cmap='Blues')

In [None]:
# Aquifer_Luco null values percentage
calnullpercentage(Aquifer_Luco).style.format({"Total": "{:20,.0f}", 
                          "Percentage": "{:20,.0f}%"})\
                 .background_gradient(cmap='Blues')

In [None]:
# Aquifer_Petrignano null values percentage
calnullpercentage(Aquifer_Petrignano).style.format({"Total": "{:20,.0f}", 
                          "Percentage": "{:20,.0f}%"})\
                 .background_gradient(cmap='Blues')

In [None]:
# Lake_Bilancino null values percentage
calnullpercentage(Lake_Bilancino).style.format({"Total": "{:20,.0f}", 
                          "Percentage": "{:20,.0f}%"})\
                 .background_gradient(cmap='Blues')

In [None]:
# River_Arno null values percentage
calnullpercentage(River_Arno).style.format({"Total": "{:20,.0f}", 
                          "Percentage": "{:20,.0f}%"})\
                 .background_gradient(cmap='Blues')

In [None]:
# Water_Spring_Amiata null values percentage
calnullpercentage(Water_Spring_Amiata).style.format({"Total": "{:20,.0f}", 
                          "Percentage": "{:20,.0f}%"})\
                 .background_gradient(cmap='Blues')

In [None]:
# Water_Spring_Lupa null values percentage
calnullpercentage(Water_Spring_Lupa).style.format({"Total": "{:20,.0f}", 
                          "Percentage": "{:20,.0f}%"})\
                 .background_gradient(cmap='Blues')

In [None]:
# Water_Spring_Madonna_di_Canneto null values percentage
calnullpercentage(Water_Spring_Madonna_di_Canneto).style.format({"Total": "{:20,.0f}", 
                          "Percentage": "{:20,.0f}%"})\
                 .background_gradient(cmap='Blues')

<a id="4"></a>
# Explanatory Data Analysis (EDA)

In [None]:
# Scatter_plot
def scatter_plots(df,col,color):
    if len(df.columns)==26:
        title='Aquifer_Auser'
    elif df.columns[0]=='Rainfall_Monteporzio':
        title='Aquifer_Doganella'
    elif df.columns[0]=='Rainfall_Simignano':
        title='Aquifer_Luco'
    elif len(df.columns)==7:
        title='Aquifer_Petrignano'
    elif len(df.columns)==8:
        title='Lake_Bilancino'
    elif len(df.columns)==16:
        title='River_Arno'
    elif len(df.columns)==15:
        title='Water_Spring_Amiata'    
    elif len(df.columns)==2:
        title='Water_Spring_Lupa'
    else:
        title='Water_Spring_Madonna_di_Canneto'      
    fig=go.Figure(go.Scatter(
             x=df.index,
             y=df[col],
             mode='lines+markers',
             name='Lines+markers',
             line=dict(
                 color=color,width=1.5)),layout=go.Layout(title={'text': title +'  '+ col ,'y':0.9, 'x':0.5, 'xanchor': 'center',
            'yanchor': 'top'},hovermode='x',plot_bgcolor='black', xaxis ={'showgrid': False}, yaxis ={'showgrid': True},\
                                                        titlefont={'family':'Balto','size':18}))
    iplot(fig)

In [None]:
color=['red','#00FFFF','#f1c40f','#5F9EA0','#6495ED','#006400','#00FFFF','#FFD700','#32CD32','#87CEFA','#FFFFF0',
      '#FF00FF','#FF7F50','#008080','#FFFF00','#9ACD32','#FFC0CB','#DA70D6','#FFFFE0','#FFB6C1','#E6E6FA','#DAA520',
      '#696969','#F5FFFA','#FFE4B5','#FFF5EE']
# Visualising Aquifer_Auser
for i,col in enumerate(Aquifer_Auser.columns):
    scatter_plots(Aquifer_Auser,col,color[i])

In [None]:
# Visualising Aquifer_Doganella
for i,col in enumerate(Aquifer_Doganella.columns):
    scatter_plots(Aquifer_Doganella,col,color[i])

In [None]:
# Visualising Aquifer_Luco
for i,col in enumerate(Aquifer_Luco.columns):
    scatter_plots(Aquifer_Luco,col,color[i])

In [None]:
# Visualising Aquifer_Petrignano
for i,col in enumerate(Aquifer_Petrignano.columns):
    scatter_plots(Aquifer_Petrignano,col,color[i])

In [None]:
# Visualising Lake_Bilancino
for i,col in enumerate(Lake_Bilancino.columns):
    scatter_plots(Lake_Bilancino,col,color[i])

In [None]:
# Visualising River_Arno
for i,col in enumerate(River_Arno.columns):
    scatter_plots(River_Arno,col,color[i])

In [None]:
# Visualising Water_Spring_Amiata
for i,col in enumerate(Water_Spring_Amiata.columns):
    scatter_plots(Water_Spring_Amiata,col,color[i])

In [None]:
# Visualising Water_Spring_Lupa
for i,col in enumerate(Water_Spring_Lupa.columns):
    scatter_plots(Water_Spring_Lupa,col,color[i])

In [None]:
# Visualising Water_Spring_Madonna_di_Canneto
for i,col in enumerate(Water_Spring_Madonna_di_Canneto.columns):
    scatter_plots(Water_Spring_Madonna_di_Canneto,col,color[i])

In [None]:
# Aquifer_Auser Correlation
# Correlation Metrics
plt.subplots(figsize = (22,18))
mask = np.zeros_like(Aquifer_Auser.corr(), dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
#Plotting heatmap
sns.heatmap(Aquifer_Auser.corr(), cmap=sns.diverging_palette(20, 220, n=200), mask = mask, annot=True, center = 0)


In [None]:
# Aquifer_Doganella Correlation
# Correlation Metrics
plt.subplots(figsize = (18,16))
mask = np.zeros_like(Aquifer_Doganella.corr(), dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
#Plotting heatmap
sns.heatmap(Aquifer_Doganella.corr(), cmap=sns.diverging_palette(20, 220, n=200), mask = mask, annot=True, center = 0)


In [None]:
# Aquifer_Luco Correlation
# Correlation Metrics
plt.subplots(figsize = (20,18))
mask = np.zeros_like(Aquifer_Luco.corr(), dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
#Plotting heatmap
sns.heatmap(Aquifer_Luco.corr(), cmap=sns.diverging_palette(20, 220, n=200), mask = mask, annot=True, center = 0)


In [None]:
# Aquifer_Petrignano Correlation
# Correlation Metrics
plt.subplots(figsize = (10,8))
mask = np.zeros_like(Aquifer_Petrignano.corr(), dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
#Plotting heatmap
sns.heatmap(Aquifer_Petrignano.corr(), cmap=sns.diverging_palette(20, 220, n=200), mask = mask, annot=True, center = 0)


In [None]:
# Lake_Bilancino Correlation
# Correlation Metrics
plt.subplots(figsize = (10,8))
mask = np.zeros_like(Lake_Bilancino.corr(), dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
#Plotting heatmap
sns.heatmap(Lake_Bilancino.corr(), cmap=sns.diverging_palette(20, 220, n=200), mask = mask, annot=True, center = 0)


In [None]:
# River_Arno Correlation
# Correlation Metrics
plt.subplots(figsize = (16,14))
mask = np.zeros_like(River_Arno.corr(), dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
#Plotting heatmap
sns.heatmap(River_Arno.corr(), cmap=sns.diverging_palette(20, 220, n=200), mask = mask, annot=True, center = 0)


In [None]:
# Water_Spring_Amiata Correlation
# Correlation Metrics
plt.subplots(figsize = (14,12))
mask = np.zeros_like(Water_Spring_Amiata.corr(), dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
#Plotting heatmap
sns.heatmap(Water_Spring_Amiata.corr(), cmap=sns.diverging_palette(20, 220, n=200), mask = mask, annot=True, center = 0)


In [None]:
# Water_Spring_Madonna_di_Canneto Correlation
# Correlation Metrics
plt.subplots(figsize = (6,4))
mask = np.zeros_like(Water_Spring_Madonna_di_Canneto.corr(), dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
#Plotting heatmap
sns.heatmap(Water_Spring_Madonna_di_Canneto.corr(), cmap=sns.diverging_palette(20, 220, n=200), mask = mask, annot=True, center = 0)


<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:black; border:0; color:cyan' role="tab" aria-controls="home"><center><b>If you found this notebook helpful , some upvotes would be very much appreciated - That will keep me motivated :)</b></center></h2>


![wip](https://cwiki.apache.org/confluence/download/attachments/69406797/inProgress.gif?version=1&modificationDate=1493416081000&api=v2)
