<br>

<br>

# 🌊 **ACEA SMART WATER ANALYTICS** 🌊


**ANALYZING THE RELATIONSHIP OF PRECIPITATION TO BILANCINO LAKE'S WATER LEVEL**

**TIMES SERIES**

<br>

# **INDEX**

- **STEP 1: PROBLEM DEFINITION**
- **STEP 2: DATA COLLECTION**
- **STEP 3: DATA EXPLORATION & CLEANING**
- **STEP 4: FEATURE ENGINEERING & SELECTION**
- **STEP 5: MODEL SELECTION & IMPLEMENTATION**
- **STEP 6: MODEL EVALUATION & INTERPRETATION**
- **STEP 7: VISUALIZATION & INSIGHTS**
- **STEP 8: CONCLUSION & RECOMMENDATIONS**

<br>

<br>

---

<br>

<div style="text-align: justify;">

</div>

# **STEP 1: PROBLEM DEFINITION**

<br>



**Understanding the Problem:**

<div style="text-align: justify;">

Water resource management is crucial for environmental **sustainability, agriculture**, and **urban planning**. The **Lake Bilancino** dataset provides historical data on lake levels and various meteorological factors. The primary goal is to analyze the relationship between precipitation levels and lake water levels over time.


**Research Question:**

How does precipitation impact the water level of **Lake Bilancino**?

Hypothesis:

<div style="text-align: justify;">

An increase in precipitation will lead to a **rise** in the lake's water level, while periods of low precipitation may correspond to a **drop** in water levels.
</div>

**Key Variables:**

- **Dependent Variable:** `Lake_Level` (Water level of the lake).
- **Independent Variable:** `Precipitation` (Rainfall amount over time).
- **Other Potential Influencing Factors:** Temperature, humidity, drainage volumes, and seasonal patterns.

**Scope of the Analysis:**
<div style="text-align: justify;">

This project will focus on identifying correlations and trends between precipitation and lake levels using time-series data. The insights gained could be valuable for:
</div>

- Water conservation strategies.
- Predicting potential droughts or overflows.
- Enhancing decision-making in water resource management.

<br>

---

<br>

<br>

# **STEP 2: DATA COLLECTION**

- 2.1. Library Importing
- 2.2. Data Collection

<br>

**2.1. LIBRARY IMPORTING**

In [5]:
import os
import pandas as pd
import zipfile


<br>

**2.2. DATA COLLECTION**

In [8]:
zip_path = "data/acea-water-prediction.zip"

extract_to = os.getcwd()

with zipfile.ZipFile(zip_path, "r") as zip_ref:
    if "Lake_Bilancino.csv" in zip_ref.namelist():
        zip_ref.extract("Lake_Bilancino.csv", extract_to)
        print(f"'Lake_Bilancino.csv' extraído ")
    else:
        print("El archivo 'Lake_Bilancino.csv' no se encontró en el ZIP.")


'Lake_Bilancino.csv' extraído 


In [10]:
pd.options.display.max_columns=None
df = pd.read_csv("Lake_Bilancino.csv")
df.head()

Unnamed: 0,Date,Rainfall_S_Piero,Rainfall_Mangona,Rainfall_S_Agata,Rainfall_Cavallina,Rainfall_Le_Croci,Temperature_Le_Croci,Lake_Level,Flow_Rate
0,03/06/2002,,,,,,,249.43,0.31
1,04/06/2002,,,,,,,249.43,0.31
2,05/06/2002,,,,,,,249.43,0.31
3,06/06/2002,,,,,,,249.43,0.31
4,07/06/2002,,,,,,,249.44,0.31


<br>

---

<br>

<br>

# **STEP 2: DATA EXPLORATION AND CLEANING**


- 2.1. Exploration: Understanding the Features
- 2.2. Identifying null values in each feature
- 2.3. Eliminating Duplicates
- 2.4. Eliminating Irrelevant Information

<br>

<br>

**2.1. EXPLORATION: UNDERSTANDING THE FEATURES**

The **Lake Bilancino dataset**, contains hydrological and meteorological data relevant for time-series analysis.

**Key Features of the Dataset:**
- **`Date`**: Timestamp of recorded observations.
- **`Rainfall_S_Piero`**: Precipitation measured at the San Piero station.
- **`Rainfall_Mangona`**: Precipitation recorded at the Mangona station.
- **`Rainfall_S_Agata`**: Precipitation at the Sant'Agata station.
- **`Rainfall_Cavallina`**: Precipitation at the Cavallina station.
- **`Rainfall_Le_Croci`**: Precipitation at the Le Croci station.
- **`Temperature_Le_Croci`**: Temperature recorded at the Le Croci station.
- **`Lake_Level`**: Bilancino Lake level, measured in meters.
- **`Flow_Rate`**: Lake outflow rate, measured in cubic meters per second.

**Feature Description:** 

- **Precipitation (Rainfall):**
These variables represent the amount of **rainfall recorded at different stations** around **Lake Bilancino**. Rainfall **directly impacts** the lake level and outflow rate, as rainwater contributes to the **total volume** of the lake.

- **Temperature (Temperature_Le_Croci):**
Temperature can affect **water evaporation** from the lake, influencing its level. Additionally, extreme temperatures may **impact water demand** and the **lake's ecosystem.**

- **Lake Level (Lake_Level):**
Indicates the **height of the water** in **Lake Bilancino**. This is a **key indicator** for **water resource management**, as **extremely high or low levels** can have **environmental and operational consequences**.

- **Flow Rate (Flow_Rate):**
Measures the **amount of water leaving the lake per unit of time**. This is crucial for understanding **lake dynamics** and planning **water distribution** to surrounding areas.




**Exploratory Analysis of the Features:**

When analyzing these features, it is important to **assess correlations** between them. For example:

- Investigating **how precipitation variations impact lake levels and flow rates.**
- Identifying **seasonal trends** and long-term patterns in the data.

Additionally, handling missing values and detecting anomalies is crucial to ensure data accuracy for further analysis. Data visualization techniques, such as time-series line plots and scatter plots, can help better understand the dynamics of Lake Bilancino.

This exploratory feature analysis will provide a solid foundation for modeling and predicting the lake’s behavior, which is essential for efficient water resource management in the region.

In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6603 entries, 0 to 6602
Data columns (total 9 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Date                  6603 non-null   object 
 1   Rainfall_S_Piero      6026 non-null   float64
 2   Rainfall_Mangona      6026 non-null   float64
 3   Rainfall_S_Agata      6026 non-null   float64
 4   Rainfall_Cavallina    6026 non-null   float64
 5   Rainfall_Le_Croci     6026 non-null   float64
 6   Temperature_Le_Croci  6025 non-null   float64
 7   Lake_Level            6603 non-null   float64
 8   Flow_Rate             6582 non-null   float64
dtypes: float64(8), object(1)
memory usage: 464.4+ KB
