<a href="https://colab.research.google.com/github/leoalfonso/M11-and-M49/blob/main/05_Pandas_Load.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div>
<table style="width: 100%">
	<tr>
		<td>
		<table style="width: 100%">
			<tr>
                <td ><center><font size="5"><b>Modules 11 and 49</b></font><center></td>
			</tr>
			<tr>
                <td><center><font size="14">Notebook 5</font><center></td>
			</tr>
			<tr>
                <td><center><font size="6"><b>Pandas</b></font><center></td>
			</tr>
		</table>
		</td>
		<td><center><img src='https://ihe-delft-ihe-website-production.s3.eu-central-1.amazonaws.com/s3fs-public/styles/792w/public/2022-11/IHE-DELFT-INSTITUTE_UNESCO_RGB.png?itok=-GnfBc2x'></img></td>
	</tr>
</table>
</div>

# üêç Period 5: Introduction to Pandas

Welcome to Pandas! This library is the cornerstone of data analysis in Python, and it is built directly on top of NumPy.

In engineering, our data (from sensors, models, or field measurements) is rarely a simple list. It often has multiple columns, dates, and missing values. Pandas provides a powerful object called the **DataFrame** to manage this.

* **`Series`**: A 1D labeled array (like a single column in a spreadsheet).
* **`DataFrame`**: A 2D labeled data structure with columns of potentially different types (like a full spreadsheet or a database table).

Our main goal is to load a time series of precipitation data and inspect its structure.

## 1. Import Pandas

As with NumPy, we first import the library. The standard convention is to import it with the alias `pd`.

In [None]:
# Task 1.1: Import the pandas library
import pandas as pd

## 2. Preparing and Uploading Your Data File

In a real-world engineering project, your data will come from external files. The most common format is CSV (Comma-Separated Values).

We will use a simple precipitation dataset for this exercise.

**Task 2.1:**
1.  Open a plain text editor (like Notepad, TextEdit, or VS Code) on your computer.
2.  Copy the text below, paste it into the editor, and save it as `precipitation_data.csv`.
3.  Run the code cell below, which will prompt you to upload this file to the Colab environment.

---
**File Content for `precipitation_data.csv`:**

Date,Precipitation_mm,Station_ID
2023-01-01,0.0,S1 2023-01-02,5.2,S1 2023-01-03,12.1,S1 2023-01-04,0.5,S1 2023-01-05,0.0,S1 2023-01-06,0.0,S1 2023-01-07,8.4,S1 2023-01-08,1.2,S1 2023-01-09,0.0,S1 2023-01-10,3.3,S1 2023-01-11,6.0,S1 2023-01-12,1.5,S1

---
**Task 2.2:** Run the code cell below to upload your file.

In [None]:
# Task 2.2: Run this cell to upload your 'precipitation_data.csv' file
from google.colab import files

print("Please upload the 'precipitation_data.csv' file you just created.")
uploaded = files.upload()

# Verify the upload
for fn in uploaded.keys():
  print(f"User uploaded file '{fn}' with length {len(uploaded[fn])} bytes")

Please upload the 'precipitation_data.csv' file you just created.


## 3. Reading the CSV into a DataFrame

Now that the file is in our Colab environment, we can use the Pandas function `pd.read_csv()` to load it.

**Task 3.1:**
* Use `pd.read_csv()` with the filename `'precipitation_data.csv'` to load the data.
* Store the result in a variable named `df` (the standard name for a DataFrame).
* Print the `df` variable to see the output.

In [None]:
# Task 3.1: Read the CSV file into a DataFrame named 'df'

# [WRITE YOUR CODE BELOW THIS LINE]


# df = pd.read_csv(...)

# Print the DataFrame to see the result
# print(df)

## 4. Inspecting the DataFrame

We have data, but we don't know its structure. We need to perform an initial inspection.

**Task 4.1: `.head()`**
* Use `df.head()` to view the first 5 rows. This is the best way to quickly check if the data loaded correctly.

**Task 4.2: `.info()`**
* Use `df.info()`. This method is *essential* and provides:
    * The number of rows (entries) and columns.
    * The data type (e.g., `object`, `int64`, `float64`) of each column.
    * The number of non-null (not missing) values.

**Task 4.3: `.describe()`**
* Use `df.describe()`. This provides a rapid descriptive statistics summary for all *numeric* columns (it will ignore 'Date' and 'Station_ID').

In [None]:
# Task 4.1: View the first 5 rows

# [WRITE YOUR CODE BELOW THIS LINE]


# df.head()

In [None]:
# Task 4.2: Get the technical summary of the DataFrame

# [WRITE YOUR CODE BELOW THIS LINE]



# df.info()

In [None]:
# Task 4.3: Get the statistical summary of numeric columns

# [WRITE YOUR CODE BELOW THIS LINE]



# df.describe()

## 5. Handling Time Series Data (Crucial Step)

From the `df.info()` output, you probably noticed that the `Date` column is listed as an `object` (a string). We cannot perform time-based operations on a string. We *must* convert it to a proper `datetime` object.

**Task 5.1: Convert to Datetime**
* Use the `pd.to_datetime()` function on the `df['Date']` column.
* Overwrite the old `df['Date']` column with this new, converted version.

**Task 5.2: Set the Datetime Index**
* For time series analysis, we almost always set the `Date` column as the **Index** of the DataFrame. This "promotes" it from a data column to the primary row label.
* Use the `df.set_index()` method, passing it the column name `'Date'`.
* Re-run `df.head()` to see the new structure.

In [None]:
# Task 5.1: Convert the 'Date' column from string to datetime
# We will use the 'df' from the previous step.

print(f"Data type of 'Date' BEFORE conversion: {df['Date'].dtype}")

# [WRITE YOUR CODE BELOW THIS LINE]




# df['Date'] = pd.to_datetime(...)

# print(f"Data type of 'Date' AFTER conversion: {df['Date'].dtype}")
# df.info() # Run .info() again to confirm the change

In [None]:
# Task 5.2: Set the 'Date' column as the DataFrame's Index
# This makes time-based slicing and plotting much easier.

# [WRITE YOUR CODE BELOW THIS LINE]




# df = df.set_index(...) # Note: We re-assign 'df' to save the change

# Run .head() again to see the new structure (Date is now the index)
# df.head()

**End of Notebook 05**