# Network and Geospatial Analyses
## Introduction Python packages
`Author: Stijn Overmeen (stijn.overmeen@nelen-schuurmans.nl)`

This notebook introduces the Python packages used in the 'Network and Geospatial Analyses' pizza course. The purpose is to help you get familiar with these packages. You don't need to memorize everything.

We introduce the following packages:
  - [pathlib](#pathlib)
  - [tqdm](#tqdm)
  - [shapely](#shapely)
  - [geopandas](#geopandas)
  - [networkx](#networkx)
   
#### Disclaimer: This course material is intended solely for internal use within Nelen & Schuurmans and is provided exclusively for educational purposes. All rights, including copyright, pertaining to this material are owned or licensed by Nelen & Schuurmans.  

<a id='pathlib'></a>
### Introduction to the `pathlib` package

The `pathlib` module in Python is a user-friendly library for working with file system paths. It is part of the standard library, making it available in every Python installation.

Here's a table summarizing some of the most commonly used functionalities of the `pathlib` module:

| Functionality               | Description                                                                 |
|-----------------------------|-----------------------------------------------------------------------------|
| `Path("/path/to/directory")`| Create a `Path` object representing a directory or file path.               |
| `.exists()`                 | Check if the path exists in the file system.                                |
| `.mkdir()`                  | Create a new directory at the specified path.                               |
| `.iterdir()`                | Iterate through the contents (files and directories) of a directory.        |
| `.parent`                   | Get the parent directory of the current path.                               |
| `/`                         | Join paths together, creating a new `Path` object.                          |
| `.is_dir()`                 | Check if the path represents a directory.                                   |
| `.is_file()`                | Check if the path represents a file.                                        |
| `.resolve()`                | Get the absolute path by resolving any symbolic links.                      |
| `.cwd()`                    | Get the path to the current working directory.                              |

Now, let's explore these functionalities with code examples below.

As a data scientist specializing in water management, you deal with a vast amount of data files, and reports related to water resources. To streamline your data management tasks, you decide to utilize Python's `pathlib` module.

In your water management project directory, you have the following:

- `sensor_data`: This directory contains raw data collected from various water sensors.
- `final_report.pdf`: The final comprehensive water management report.

In [None]:
# Import the pathlib module
from pathlib import Path

In [None]:
# Create a Path object for your water management project directory
project_directory = Path.cwd() / "material" / "intro"

In [None]:
# Check if the project directory exists
if project_directory.exists():
    print(f"The water management project directory {project_directory} exists.")
else:
    print(f"The water management project directory {project_directory} does not exist.")

In [None]:
# Create a Path object for sensor data
sensor_data_directory = project_directory / "sensor_data"

In [None]:
# List all sensor data files
print("\nList of Sensor Data Files:")
for data_file in sensor_data_directory.iterdir():
    print(data_file)

In [None]:
# Create a new data file
new_data_file = sensor_data_directory / "sensor_data_2023.csv"
new_data_file.touch()

In [None]:
# List all sensor data files again
print("\nList of Sensor Data Files:")
for data_file in sensor_data_directory.iterdir():
    print(data_file)

In [None]:
# Get the path to the final comprehensive report
final_report = project_directory / "final_report.pdf"

In [None]:
# Check if the final report exists
if final_report.exists():
    print(f"\nThe final comprehensive report {final_report} exists.")
else:
    print(f"\nThe final comprehensive report {final_report} does not exist.")

<a id='tqdm'></a>
### Introduction to the `tqdm` package
The `tqdm` package in Python is a handy library for adding progress bars to your loops and iterable-based tasks. It provides a visual way to monitor the progress of operations, making it easier to estimate completion times and keep track of long-running tasks.

Here are some of the key features and functionalities of the `tqdm` package:

- **Progress Bars**: Easily add progress bars to loops and iterators to track the progress of your code.
- **Customization**: Customize the appearance and format of the progress bars, including text descriptions and styling.
- **Iterable Support**: Works seamlessly with various Python iterable objects, such as lists, ranges, and more.
- **Speed Estimation**: `tqdm` can estimate the time remaining for a task based on the current progress.

Now, let's dive into some code examples to see how `tqdm` can improve your Python coding experience.


In [None]:
# Import the tqdm package for progress bars
from tqdm import tqdm
import time
import random

You're a data scientist working on water management, and you have received a large dataset containing water level measurements from various monitoring stations. Your task is to load and analyze this data efficiently.

In [None]:
# Simulate loading the water level data (for the sake of the example)
data = [random.uniform(0.0, 10.0) for _ in range(5000)]

print("Step 1: Loading Water Level Data")
# Use tqdm to visualize the progress of data loading
for value in tqdm(data, total=len(data), desc="Data Loading", unit="data points"):
    time.sleep(0.0001)  # Simulate data analysis time
        
print("Data loading complete.")

In [None]:
# Step 2: Customize the Appearance of the Progress Bar

print("\nStep 2: Analyzing Water Level Data")
# Customize the appearance of the progress bar
for value in tqdm(data, total=len(data), desc="Analyzing", bar_format="{desc}: {percentage:3.0f}%|{bar}| {n_fmt}/{total_fmt} [{elapsed}] ETA: {remaining}", unit="data points"):
    time.sleep(0.0001)  # Simulate data analysis time

print("Data analysis complete.")

<a id='shapely'></a>
### Introducing the `shapely` package

The `shapely` package in Python is a powerful library for geometric operations and manipulation of planar geometric objects. It is commonly used for tasks related to spatial analysis, such as working with points, lines, polygons, and performing various geometric operations.

Here are some of the key features and functionalities of the `shapely` package:

- **Geometric Objects**: `shapely` provides classes for creating and manipulating various geometric objects, including `Point`, `LineString`, `Polygon`, `MultiPoint`, `MultiLineString`, and `MultiPolygon`.

- **Geometric Operations**: You can perform a wide range of geometric operations, such as intersections, unions, differences, buffering, and more, on these objects.

- **Predicates**: `shapely` offers predicates to check relationships between geometric objects, such as containment, touch, overlap, and more.

- **Geometric Analysis**: You can calculate properties of geometric objects, such as area, length, perimeter, and centroids.

- **Geometry Validation**: The package provides tools to validate and fix invalid geometries, ensuring they conform to the specifications.

- **Integration**: It is often used in combination with other spatial libraries and tools, such as `geopandas` and GIS software, for advanced spatial analysis.

Here's a table summarizing some of the most commonly used functionalities of the `shapely` module:

| Functionality                           | Description                                                         |
|----------------------------------------|---------------------------------------------------------------------|
| `Point(x, y)`                          | Create a point with coordinates `(x, y)`.                          |
| `LineString(coords)`                    | Create a line string from a list of coordinates.                   |
| `Polygon(shell, holes=None)`            | Create a polygon from a shell (exterior) and optional holes.       |
| `MultiPoint(points)`                   | Create a collection of points.                                      |
| `MultiLineString(lines)`               | Create a collection of line strings.                                |
| `MultiPolygon(polygons)`               | Create a collection of polygons.                                    |
| `object.geom_type`                     | Get the type of a geometric object (e.g., 'Point', 'Polygon').     |
| `object.area`                          | Calculate the area of a polygon.                                   |
| `object.length`                        | Calculate the length of a line string or perimeter of a polygon.   |
| `object.bounds`                        | Get the bounding box coordinates of an object `(minx, miny, maxx, maxy)`. |
| `object.centroid`                      | Get the centroid of a polygon.                                      |
| `object.buffer(distance)`              | Create a buffer (a polygon at a fixed distance from the object).   |
| `object.intersection(other)`           | Calculate the intersection of two geometric objects.               |
| `object.union(other)`                  | Calculate the union of two geometric objects.                      |
| `object.touches(other)`                | Check if an object touches another object.                          |
| `object.contains(other)`               | Check if an object contains another object.                         |
| `object.within(other)`                 | Check if an object is within another object.                        |
| `object.is_valid`                      | Check if an object is a valid geometry.                             |
| `object.distance(other)`               | Calculate the distance between two objects.                         |
| `object.simplify(tolerance=0.1)`       | Simplify a geometry by removing unnecessary points.                 |


Now, let's explore these functionalities with code examples to understand how `shapely` can be a valuable tool for working with geometric data in Python.

In [None]:
# Import the Shapely package
from shapely import Point, LineString, Polygon

You're a data scientist in water management, and you need to analyze the locations of water monitoring stations and perform some spatial operations.

In [None]:
# Simulate the locations of water monitoring stations (for the sake of the example)
station_A = Point(5, 12)
station_B = Point(5, 5)
station_C = Point(12, 7)

# Create a LineString representing a river
river = LineString([(2, 3), (4, 7), (5, 12), (10, 10), (15, 3)])

# Create a Polygon representing a water reservoir
reservoir = Polygon([(4, 4), (4, 8), (8, 8), (8, 4)])

In [None]:
# Plot the situation using matplotlib 
# (assumed knowledge: if you are unfamiliar with matplotlib, please do your own additional research)
import matplotlib.pyplot as plt

# Create a figure and axis
fig, ax = plt.subplots(figsize=(10, 6))

# Plot the river
river_x, river_y = river.xy
ax.plot(river_x, river_y, label='River', color='blue', linestyle='-', linewidth=2)

# Plot the water reservoir
reservoir_x, reservoir_y = reservoir.exterior.xy
ax.fill(reservoir_x, reservoir_y, label='Reservoir', color='lightblue', alpha=0.6)

# Plot the water monitoring stations
station_markers = ['o', 's', '^']
stations = [station_A, station_B, station_C]
station_labels = ['Station A', 'Station B', 'Station C']
for i, station in enumerate(stations):
    x, y = station.xy
    ax.plot(x, y, marker=station_markers[i], markersize=10, label=station_labels[i])

# Set axis limits
ax.set_xlim(0, 16)
ax.set_ylim(0, 14)

# Add labels and legend
ax.set_xlabel('X Coordinate')
ax.set_ylabel('Y Coordinate')
ax.set_title('Water Monitoring Stations, River, and Reservoir')
ax.legend()

# Display the plot
plt.grid()
plt.show()

In [None]:
# Functionality 1: Check if Stations are within the Reservoir
if station_A.within(reservoir):
    print("Station A is within the reservoir.")
else:
    print("Station A is outside the reservoir.")

if station_B.within(reservoir):
    print("Station B is within the reservoir.")
else:
    print("Station B is outside the reservoir.")

if station_C.within(reservoir):
    print("Station C is within the reservoir.")
else:
    print("Station C is outside the reservoir.")

In [None]:
# Functionality 2: Find the Intersection between the River and Reservoir
river_reservoir_intersection = river.intersection(reservoir)

if river_reservoir_intersection.is_empty:
    print("No Intersection Found")
else:
    print(f"Intersection found at {list(river_reservoir_intersection.coords)}")

In [None]:
# Functionality 3: Calculate the Length of the River and Perimeter of the Reservoir
river_length = river.length
reservoir_perimeter = reservoir.length

print(f"Length of the river: {river_length:.2f} units")
print(f"Perimeter of the reservoir: {reservoir_perimeter:.2f} units")

In [None]:
# Functionality 4: Create a Buffer Zone around Station C
buffer_zone = station_C.buffer(1)  # Create a buffer zone with a radius of 1 units around Station C

In [None]:
# Functionality 5: Check if the Buffer Zone Intersects with the River
if buffer_zone.intersects(river):
    print("The buffer zone around Station C intersects with the river.")
else:
    print("The buffer zone around Station C does not intersect with the river.")

In [None]:
# Functionality 6: Calculate the Area of the Reservoir
reservoir_area = reservoir.area
print(f"Area of the reservoir: {reservoir_area:.2f} square units")

<a id='geopandas'></a>
### Introducing the `geopandas` package
`geopandas` is an open-source Python library that simplifies working with geospatial data. It provides easy-to-use data structures and data analysis tools for handling geographic/spatial information. `geopandas` extends the Pandas library to enable spatial operations and data manipulation on geometric shapes. With `geopandas`, you can read, write, visualize, and analyze geospatial data. It is built on top of `shapely`.

Here are some key features and functionalities of the `geopandas` package:

- **GeoDataFrame**: A tabular data structure that includes a geometry column for storing geometric shapes (points, lines, polygons, etc.). It allows you to handle both attribute data and geometries in a single data structure.

- **Read and Write**: `geopandas` can read various geospatial file formats like Shapefiles, GeoJSON, and more. It also allows you to write GeoDataFrames to these formats.

- **Spatial Operations**: Perform various spatial operations like intersections, unions, differences, and buffering on GeoDataFrames.

- **Visualization**: Visualize geospatial data directly from a GeoDataFrame using Matplotlib or other plotting libraries.

- **Integration**: Easily integrate with other geospatial libraries, databases, and tools for advanced spatial analysis and visualization.

Here's a table summarizing some of the most commonly used functionalities of the `geopandas` module:

| Functionality                           | Description                                                         |
|-----------------------------------------|---------------------------------------------------------------------|
| `gpd.GeoDataFrame(df, geometry=col)`    | Create a GeoDataFrame from a regular DataFrame and a geometry column.|
| `gpd.read_file(filename)`               | Read geospatial data from a file (supports various formats).        |
| `gdf.to_file(filename, driver='format')`| Write GeoDataFrame to a file (supports various formats).             |
| `gdf.plot(column='column_name')`        | Plot GeoDataFrame, optionally specifying a column for coloring.      |
| `gdf.geometry`                          | Access the geometry column of a GeoDataFrame.                        |
| `gdf.set_geometry(col_name)`            | Set the geometry column of a GeoDataFrame.                           |
| `gdf.cx[xmin:xmax, ymin:ymax]`           | Filter GeoDataFrame by bounding box coordinates.                     |
| `gdf.simplify(tolerance=0.001)`         | Simplify geometries to reduce the number of vertices.                |
| `gdf.buffer(distance)`                  | Create buffer zones around geometries.                                |
| `gdf.intersection(other_gdf)`           | Calculate intersection with another GeoDataFrame.                    |
| `gdf.to_crs(epsg=code)`                 | Reproject geometries to a different coordinate reference system.    |
| `gdf.total_bounds`                      | Get the bounding box of the entire GeoDataFrame.                     |


Now, let's explore these functionalities with code examples to understand how `geopandas` can be a valuable tool for working with geospatial data in Python.

In [None]:
# Load packages
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import Polygon

In [None]:
# Create GeoDataFrames for water sources and cities
water_sources = gpd.GeoDataFrame({
    'name': ['Lake A', 'Pond B', 'River C'],
    'type': ['Lake', 'Lake', 'River'],
    'geometry': [Polygon([(8, 0), (8.2, 0.2), (11, 2), (12, 0)]),
                 Polygon([(7, 4), (7, 6), (5, 6), (5, 4)]),
                 Polygon([(0.3, 0.3), (0, 0.3), (5, 5), (5.3, 5), (0.3, 0.3)])]
})

cities = gpd.GeoDataFrame({
    'name': ['City 1', 'City 2'],
    'geometry': [Polygon([(1, 1), (1, 1.5), (1.5, 1.5), (1.5, 1)]),
                 Polygon([(6, 0), (6, 2.5), (7.5, 2.5), (7.5, 0)])]
})

In [None]:
# Dissolve Operation: Combine water sources of the same type
dissolved_water_sources = water_sources.dissolve(by='type', aggfunc=lambda x: ', '.join(x))
dissolved_water_sources

In [None]:
# Plot water sources and cities
fig, ax = plt.subplots(figsize=(10, 8))
dissolved_water_sources.plot(ax=ax, color='lightblue')
cities.plot(ax=ax, color='green')
plt.show()

In [None]:
# Area Calculation: Calculate the total area of water bodies
total_water_area = dissolved_water_sources.area.sum()
print("Total water area:", total_water_area, "square units")

In [None]:
# Geometric Manipulations: Translate cities to a new location
cities_translated = cities.copy()
cities_translated['geometry'] = cities.geometry.translate(xoff=1.5, yoff=0)

In [None]:
# Plot water sources and translated cities
fig, ax = plt.subplots(figsize=(10, 8))
dissolved_water_sources.plot(ax=ax, color='lightblue', alpha=0.8)
cities_translated.plot(ax=ax, color='orange', label='Translated Cities')
plt.show()

In [None]:
# Buffer Analysis: Create a protected zone of 0.5 units around water sources
protected_zones = water_sources.copy()
protected_zones['geometry'] = water_sources.buffer(0.5)

In [None]:
# Plot protected zones and translated cities
fig, ax = plt.subplots(figsize=(10, 8))
protected_zones.plot(ax=ax, color='red', alpha=0.2, label='Protected Zones')
cities_translated.plot(ax=ax, color='orange', label='Translated Cities')
plt.show()

In [None]:
# Saving GeoDataFrame
protected_zones.to_file('protected_zones.geojson', driver='GeoJSON')

<a id='networkx'></a>
### Introducing the `networkx` package
`networkx` is a Python library for the creation, manipulation, and study of complex networks or graphs. It provides a set of tools for analyzing the structure and dynamics of networks. With `networkx`, you can represent and analyze relationships between entities, model real-world networks, and explore their properties.

Here are some of the key features and functionalities of the `networkx` package:

- **Graph Representation**: Create and manipulate graphs, which consist of nodes (vertices) and edges (connections between nodes).

- **Various Graph Types**: Support for multiple types of graphs, including directed, undirected, weighted, and more...

- **Node and Edge Attributes**: Attach attributes to nodes and edges, allowing you to store additional information about network components.

- **Graph Algorithms**: Implement various algorithms for network analysis, such as shortest path calculation, centrality measures, clustering, and community detection.

- **Visualization**: Visualize graphs using Matplotlib or other visualization libraries, creating clear and informative representations of complex networks.


![graph_simple.png](attachment:graph_simple.png)

Here's a table summarizing some of the most commonly used functionalities of the `networkx` module:

| Functionality                                | Description                                                       |
|----------------------------------------------|-------------------------------------------------------------------|
| `nx.Graph()`                                 | Create an undirected graph.                                       |
| `nx.DiGraph()`                               | Create a directed graph.                                         |
| `G.add_node(node, attr_dict)`                | Add a node with optional attributes.                             |
| `G.add_edge(node1, node2, weight=None)`      | Add an edge between nodes with an optional weight.               |
| `G.nodes()`                                  | Get a list of nodes in the graph.                                 |
| `G.edges()`                                  | Get a list of edges in the graph.                                 |
| `nx.connected_components(G)`                 | Find connected components in the graph.                           |
| `nx.draw(G, pos=None, **options)`            | Visualize the graph using Matplotlib.                             |
| `nx.shortest_path(G, source, target)`         | Find the shortest path between two nodes.                         |
| `nx.dijkstra_path(G, source, target)`         | Find the shortest path using Dijkstra's algorithm.                |
| `nx.average_shortest_path_length(G)`         | Calculate the average shortest path length in the graph.          |


Now, let's explore these functionalities with code examples to understand how `networkx` can be a valuable tool for working with networks and graphs in Python.

In [None]:
import networkx as nx

# Create a graph object
water_network = nx.Graph()

# Add nodes (representing reservoirs and water consumers)
water_network.add_node("Reservoir A", pos=(0, 0))
water_network.add_node("Reservoir B", pos=(2000, 1000))
water_network.add_node("Consumer 1", pos=(1000, 1000))
water_network.add_node("Consumer 2", pos=(0, 2000))
water_network.add_node("Consumer 3", pos=(1500, 500))
water_network.add_node("Consumer 4", pos=(3000, 2000))

In [None]:
# For adding the edges (connecting lines) between nodes, we create a function to compute distances
import math
def calculate_distance(node1, node2):
    pos1 = water_network.nodes[node1]['pos']
    pos2 = water_network.nodes[node2]['pos']
    return int(math.sqrt((pos1[0] - pos2[0]) ** 2 + (pos1[1] - pos2[1]) ** 2))

In [None]:
# Add the connections
water_network.add_edge("Reservoir A", "Consumer 1", weight=calculate_distance("Reservoir A", "Consumer 1"))
water_network.add_edge("Reservoir A", "Consumer 2", weight=calculate_distance("Reservoir A", "Consumer 2"))
water_network.add_edge("Reservoir A", "Consumer 3", weight=calculate_distance("Reservoir A", "Consumer 3"))
water_network.add_edge("Reservoir B", "Consumer 1", weight=calculate_distance("Reservoir B", "Consumer 1"))
water_network.add_edge("Reservoir B", "Consumer 2", weight=calculate_distance("Reservoir B", "Consumer 2"))
water_network.add_edge("Reservoir B", "Consumer 3", weight=calculate_distance("Reservoir B", "Consumer 3"))
water_network.add_edge("Reservoir B", "Consumer 4", weight=calculate_distance("Reservoir B", "Consumer 4"))

In [None]:
# Draw the network
pos = nx.get_node_attributes(water_network, 'pos')
labels = nx.get_edge_attributes(water_network, 'weight')
nx.draw(water_network, pos, with_labels=True, node_size=700, node_color='skyblue', font_size=10, font_weight='bold')
nx.draw_networkx_edge_labels(water_network, pos, edge_labels=labels)

plt.title("Water Management Network")
plt.show()

In [None]:
# Calculate path lenghts by hand
path1 = 2000 + 2236 + 1414
path2 = 1414 + 1000 + 1414
path3 = 1581 + 707 + 1414
print("Shortest path length should be:", min(path1, path2, path3))

In [None]:
# Calculate shortest path from Reservoir A to Consumer 4 based on pipe length
shortest_path = nx.shortest_path(water_network, source="Reservoir A", target="Consumer 4", weight="weight")
print("Shortest path from Reservoir A to Consumer 4 based on pipe length:", shortest_path)

In [None]:
shortest_path_length = nx.shortest_path_length(water_network, source="Reservoir A", target="Consumer 4", weight="weight")
print("Shortest path from Reservoir A to Consumer 4 length:", shortest_path_length)