## Introduction Python packages
We will use the following Python packages in this course:
  - [pathlib](#pathlib)
  - [tqdm](#tqdm)
  - [shapely](#shapely)
  - [pandas](#pandas)
  - [geopandas](#geopandas)
  - [networkx](#networkx)
  

<a id='pathlib'></a>
### Introduction to the `pathlib` package

The `pathlib` module in Python is a user-friendly library for working with file system paths. It is part of the standard library, making it available in every Python installation.

Here's a table summarizing some of the most commonly used functionalities of the `pathlib` module:

| Functionality               | Description                                                                 |
|-----------------------------|-----------------------------------------------------------------------------|
| `Path("/path/to/directory")`| Create a `Path` object representing a directory or file path.               |
| `.exists()`                 | Check if the path exists in the file system.                                |
| `.mkdir()`                  | Create a new directory at the specified path.                               |
| `.iterdir()`                | Iterate through the contents (files and directories) of a directory.        |
| `.parent`                   | Get the parent directory of the current path.                               |
| `/`                         | Join paths together, creating a new `Path` object.                          |
| `.is_dir()`                 | Check if the path represents a directory.                                   |
| `.is_file()`                | Check if the path represents a file.                                        |
| `.resolve()`                | Get the absolute path by resolving any symbolic links.                      |

Now, let's explore these functionalities with code examples below.

As a data scientist specializing in water management, you deal with a vast amount of data files, simulations, and reports related to water resources. To streamline your data management tasks, you decide to utilize Python's `pathlib` module.

In your water management project directory, you have the following:

- `sensor_data`: This directory contains raw data collected from various water sensors.
- `simulations`: This directory stores the results of hydrological simulations.
- `reports`: This directory holds reports and analysis documents.
- `final_report.pdf`: The final comprehensive water management report.

In [1]:
# Import the pathlib module
from pathlib import Path

In [2]:
# Create a Path object for your water management project directory
project_directory = Path("/path/to/water_management_project")

In [3]:
# Check if the project directory exists
if project_directory.exists():
    print(f"The water management project directory {project_directory} exists.")
else:
    print(f"The water management project directory {project_directory} does not exist.")

The water management project directory \path\to\water_management_project does not exist.


In [None]:
# Create a Path object for sensor data
sensor_data_directory = project_directory / "sensor_data"

In [None]:
# List all sensor data files
print("\nList of Sensor Data Files:")
for data_file in sensor_data_directory.iterdir():
    print(data_file)

In [None]:
# Create a new data file
new_data_file = sensor_data_directory / "sensor_data_2023.csv"
new_data_file.touch()

In [None]:
# Create a Path object for simulation results
simulations_directory = project_directory / "simulations"

In [None]:
# List all simulation results
print("\nList of Simulation Results:")
for result_file in simulations_directory.iterdir():
    print(result_file)

In [None]:
# Create a new simulation result file
new_result_file = simulations_directory / "hydro_simulation_v2.csv"
new_result_file.touch()

In [None]:
# Get the path to the final comprehensive report
final_report = project_directory / "final_report.pdf"

In [None]:
# Check if the final report exists
if final_report.exists():
    print(f"\nThe final comprehensive report {final_report} exists.")
else:
    print(f"\nThe final comprehensive report {final_report} does not exist.")

In [None]:
# Resolve the final report path to an absolute path
resolved_final_report = final_report.resolve()
print(f"\nThe resolved path to the final report is {resolved_final_report}")

<a id='tqdm'></a>
### Introduction to the `tqdm` package
The `tqdm` package in Python is a handy library for adding progress bars to your loops and iterable-based tasks. It provides a visual way to monitor the progress of operations, making it easier to estimate completion times and keep track of long-running tasks.

Here are some of the key features and functionalities of the `tqdm` package:

- **Progress Bars**: Easily add progress bars to loops and iterators to track the progress of your code.
- **Customization**: Customize the appearance and format of the progress bars, including text descriptions and styling.
- **Iterable Support**: Works seamlessly with various Python iterable objects, such as lists, ranges, and more.
- **Speed Estimation**: `tqdm` can estimate the time remaining for a task based on the current progress.

Now, let's dive into some code examples to see how `tqdm` can improve your Python coding experience.


In [None]:
# Import the tqdm package for progress bars
from tqdm import tqdm
import time
import random

# Scenario: Analyzing Water Level Data

# You're a data scientist working on water management, and you have received a large dataset
# containing water level measurements from various monitoring stations. Your task is to load
# and analyze this data efficiently.

# Simulate loading the water level data (for the sake of the example)
data = [random.uniform(0.0, 10.0) for _ in range(10000)]

print("Step 1: Loading Water Level Data")
# Use tqdm to visualize the progress of data loading
for value in tqdm(data, total=len(data), desc="Data Loading", unit="data points"):
    time.sleep(0.0001)  # Simulate data analysis time
        
print("Data loading complete.")

In [None]:
# Step 2: Customize the Appearance of the Progress Bar

print("\nStep 2: Analyzing Water Level Data")
# Customize the appearance of the progress bar
for value in tqdm(data, total=len(data), desc="Analyzing", bar_format="{desc}: {percentage:3.0f}%|{bar}| {n_fmt}/{total_fmt} [{elapsed}] ETA: {remaining}", unit="data points"):
    time.sleep(0.0001)  # Simulate data analysis time

print("Data analysis complete.")

<a id='shapely'></a>
### Introducing the `shapely` Package

The `shapely` package in Python is a powerful library for geometric operations and manipulation of planar geometric objects. It is commonly used for tasks related to spatial analysis, such as working with points, lines, polygons, and performing various geometric operations.

Here are some of the key features and functionalities of the `shapely` package:

- **Geometric Objects**: `shapely` provides classes for creating and manipulating various geometric objects, including `Point`, `LineString`, `Polygon`, `MultiPoint`, `MultiLineString`, and `MultiPolygon`.

- **Geometric Operations**: You can perform a wide range of geometric operations, such as intersections, unions, differences, buffering, and more, on these objects.

- **Predicates**: `shapely` offers predicates to check relationships between geometric objects, such as containment, touch, overlap, and more.

- **Geometric Analysis**: You can calculate properties of geometric objects, such as area, length, perimeter, and centroids.

- **Geometry Validation**: The package provides tools to validate and fix invalid geometries, ensuring they conform to the specifications.

- **Integration**: It is often used in combination with other spatial libraries and tools, such as `geopandas` and GIS software, for advanced spatial analysis.

Here's a table summarizing some of the most commonly used functionalities of the `shapely` module:

| Functionality                           | Description                                                         |
|----------------------------------------|---------------------------------------------------------------------|
| `Point(x, y)`                          | Create a point with coordinates `(x, y)`.                          |
| `LineString(coords)`                    | Create a line string from a list of coordinates.                   |
| `Polygon(shell, holes=None)`            | Create a polygon from a shell (exterior) and optional holes.       |
| `MultiPoint(points)`                   | Create a collection of points.                                      |
| `MultiLineString(lines)`               | Create a collection of line strings.                                |
| `MultiPolygon(polygons)`               | Create a collection of polygons.                                    |
| `object.geom_type`                     | Get the type of a geometric object (e.g., 'Point', 'Polygon').     |
| `object.area`                          | Calculate the area of a polygon.                                   |
| `object.length`                        | Calculate the length of a line string or perimeter of a polygon.   |
| `object.bounds`                        | Get the bounding box coordinates of an object `(minx, miny, maxx, maxy)`. |
| `object.centroid`                      | Get the centroid of a polygon.                                      |
| `object.buffer(distance)`              | Create a buffer (a polygon at a fixed distance from the object).   |
| `object.intersection(other)`           | Calculate the intersection of two geometric objects.               |
| `object.union(other)`                  | Calculate the union of two geometric objects.                      |
| `object.difference(other)`             | Calculate the difference between two geometric objects.             |
| `object.touches(other)`                | Check if an object touches another object.                          |
| `object.contains(other)`               | Check if an object contains another object.                         |
| `object.within(other)`                 | Check if an object is within another object.                        |
| `object.is_valid`                      | Check if an object is a valid geometry.                             |
| `object.is_empty`                      | Check if an object is empty.                                        |
| `object.equals(other)`                 | Check if two objects are equal.                                     |
| `object.distance(other)`               | Calculate the distance between two objects.                         |
| `object.representative_point()`        | Get a representative point for a geometry.                          |
| `object.simplify(tolerance=0.1)`       | Simplify a geometry by removing unnecessary points.                 |
| `object.to_wkt()`                      | Convert a geometry to Well-Known Text (WKT) format.                 |
| `object.to_geojson()`                  | Convert a geometry to GeoJSON format.                               |


Now, let's explore these functionalities with code examples to understand how `shapely` can be a valuable tool for working with geometric data in Python.

In [6]:
# Import the Shapely package
from shapely.geometry import Point, LineString, Polygon

# Scenario: Analyzing Water Monitoring Stations

# You're a data scientist in water management, and you need to analyze the locations
# of water monitoring stations and perform some spatial operations.

# Simulate the locations of water monitoring stations (for the sake of the example)
station_A = Point(5, 10)    # Station A at coordinates (5, 10)
station_B = Point(8, 8)     # Station B at coordinates (8, 8)
station_C = Point(12, 6)    # Station C at coordinates (12, 6)

# Create a LineString representing a river
river = LineString([(2, 3), (4, 7), (6, 12), (10, 9), (15, 5)])

# Create a Polygon representing a water reservoir
reservoir = Polygon([(4, 2), (4, 6), (8, 6), (8, 2)])

ModuleNotFoundError: No module named 'shapely'

In [5]:
# Functionality 1: Check if Stations are within the Reservoir

print("Functionality 1: Check if Stations are within the Reservoir")
if station_A.within(reservoir):
    print("Station A is within the reservoir.")
else:
    print("Station A is outside the reservoir.")

if station_B.within(reservoir):
    print("Station B is within the reservoir.")
else:
    print("Station B is outside the reservoir.")

if station_C.within(reservoir):
    print("Station C is within the reservoir.")
else:
    print("Station C is outside the reservoir.")

Functionality 1: Check if Stations are within the Reservoir


NameError: name 'station_A' is not defined

In [None]:
# Functionality 2: Find the Intersection between the River and Reservoir

print("\nFunctionality 2: Find the Intersection between the River and Reservoir")
river_reservoir_intersection = river.intersection(reservoir)

In [None]:
# Functionality 3: Calculate the Length of the River and Perimeter of the Reservoir

print("\nFunctionality 3: Calculate the Length of the River and Perimeter of the Reservoir")
river_length = river.length
reservoir_perimeter = reservoir.length

print(f"Length of the river: {river_length:.2f} units")
print(f"Perimeter of the reservoir: {reservoir_perimeter:.2f} units")

In [None]:
# Functionality 4: Create a Buffer Zone around Station B

print("\nFunctionality 4: Create a Buffer Zone around Station B")
buffer_zone = station_B.buffer(2)  # Create a buffer zone with a radius of 2 units around Station B

In [None]:
# Functionality 5: Check if the Buffer Zone Intersects with the River

print("\nFunctionality 5: Check if the Buffer Zone Intersects with the River")
if buffer_zone.intersects(river):
    print("The buffer zone around Station B intersects with the river.")
else:
    print("The buffer zone around Station B does not intersect with the river.")

In [None]:
# Functionality 6: Calculate the Area of the Reservoir

print("\nFunctionality 6: Calculate the Area of the Reservoir")
reservoir_area = reservoir.area
print(f"Area of the reservoir: {reservoir_area:.2f} square units")