# Module 3, Lesson 1: Introduction to Vector Data
## Understanding Where Things Are and What They Affect üó∫Ô∏è

### Welcome to Spatial Analysis!
This lesson introduces **vector (spatial) data analysis** - the foundation for understanding **where** things are in your H&H models. By the end, you'll confidently work with watersheds, infrastructure, and their relationships.

### What You'll Accomplish Today:
‚úÖ Understand vector data as "smart drawings" with attributes  
‚úÖ Load and visualize watershed boundaries and building footprints  
‚úÖ Master coordinate systems (CRS) for accurate analysis  
‚úÖ Perform spatial operations (clip, intersect, buffer)  
‚úÖ Answer engineering questions: "What's at risk?" "What's in this watershed?"  
‚úÖ Create professional maps with labels and legends  

### Module Structure:
1. **Mental Models** - How to think about spatial data
2. **Vector Data Basics** - Points, lines, polygons
3. **Loading Spatial Data** - GeoJSON vs Shapefiles
4. **Coordinate Systems** - Getting everything aligned
5. **Visualization** - Making meaningful maps
6. **Spatial Operations** - Engineering questions answered
7. **Real Analysis** - Buildings at risk in watersheds
8. **What You Can Now Do** - Your new spatial toolkit

---

## üí° Use Your AI Assistant as You Learn

As you work through this notebook, **actively use your AI assistant** to ask questions about any part of the code that feels unclear. Learning Python and geospatial analysis is much easier when you pause, ask ‚Äúwhy,‚Äù and explore how each line works.

You don‚Äôt need to understand everything on the first pass. Use the AI to:
- Break down complex code cells line by line  
- Explain unfamiliar functions or libraries  
- Clarify *why* a specific approach was used  
- Explore alternative ways to write or optimize the code  

Think of your AI assistant as a patient study partner that‚Äôs always available.

### Sample Prompts You Can Try
- *‚ÄúCan you explain what this code cell is doing, line by line?‚Äù*
- *‚ÄúWhy do we reproject the data to EPSG:4326 before using Folium?‚Äù*
- *‚ÄúWhat does this function return, and how is it used later in the notebook?‚Äù*
- *‚ÄúCan you rewrite this code in a simpler way and explain the differences?‚Äù*
- *‚ÄúWhat would happen if I removed or changed this line of code?‚Äù*

The more questions you ask, the faster and deeper your understanding will grow. Don‚Äôt hesitate to experiment, explore, and stay curious.

## Part 1: Mental Models - Spatial Data for H&H Engineers

### Vector Data Answers Three Critical Questions:

1. **WHERE are things?** ‚Üí Location on Earth
2. **WHAT belongs to what?** ‚Üí Spatial relationships  
3. **WHAT intersects/overlaps?** ‚Üí Risk and impact zones

### Think of Vector Data as "Smart Drawings"

**Traditional CAD/Drawing:**
- Lines and shapes
- Visual only
- No real-world location

**Vector Data (GIS):**
- Lines and shapes WITH attributes
- Each shape knows its real-world location
- Each shape has a data table attached
- Shapes can interact (clip, buffer, intersect)

### The Three Vector Types

| Type | H&H Examples | What It Represents |
|------|-------------|--------------------|
| **Points** | Gauge stations, outfalls, wells | Specific locations |
| **Lines** | Streams, pipes, channels | Linear features |
| **Polygons** | Watersheds, buildings, flood zones | Areas/regions |

### Today's Engineering Scenario

**Question:** "If this watershed floods, which buildings are at risk?"

To answer this, we need:
- Watershed boundaries (polygons)
- Building footprints (polygons)
- A way to find what's inside what (spatial operations)

This is **screening-level analysis** - the type of fast, defensible analysis engineers do early in projects.

### Key Mindset:
**Spatial operations are just filters with geography.** Instead of filtering by date or value, you're filtering by location.

## Part 2: Setting Up Our Spatial Workspace

### Installing and Importing Libraries

For spatial work, we need specialized libraries:
- **GeoPandas**: Pandas for spatial data
- **Shapely**: Geometric operations
- **Folium**: Interactive maps

In [None]:
# Install spatial libraries (only needed once per session)
!pip install -q geopandas folium

print("‚úÖ Spatial libraries installed!")

In [None]:
# Import our libraries
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import folium
from shapely.geometry import Point, Polygon
import warnings
warnings.filterwarnings('ignore')

# Set default plot size
plt.rcParams['figure.figsize'] = (12, 8)

print("Libraries imported successfully!")
print(f"GeoPandas version: {gpd.__version__}")

### Understanding Our Data Files

We'll work with two Wyoming datasets:

1. **Building Footprints** (GeoJSON format)
   - File: `Wyoming_building_footprint_Selected.geojson`
   - Contains: Building polygons
   - Use case: Infrastructure at risk

2. **Watershed Boundaries** (Shapefile in ZIP)
   - File: `NHD__Watershed_Boundaries_HUC_12_Selected.zip`
   - Contains: HUC-12 watershed polygons
   - Use case: Drainage area analysis

## Part 3: GeoJSON vs Shapefiles - Two Formats, Same Purpose

### Why Different Formats?

**GeoJSON:**
- Single text file
- Human-readable
- Web-friendly
- Good for smaller datasets

**Shapefile:**
- Multiple files (.shp, .dbf, .shx, .prj)
- Industry standard
- More efficient for large data
- Must be zipped for sharing

### Why Shapefiles Need ZIP Files

A shapefile isn't one file - it's a family:
- `.shp` - The geometry (shapes)
- `.dbf` - The attributes (data table)
- `.shx` - Index linking geometry to attributes
- `.prj` - Coordinate system information

**All must travel together**, hence the ZIP!

### Loading Data into Colab

In [None]:
# Upload files to Colab
from google.colab import files

print("üì§ Please upload these files:")
print("1. Wyoming_building_footprint_Selected.geojson")
print("2. NHD__Watershed_Boundaries_HUC_12_Selected.zip")
print("\nClick 'Choose Files' below and select both files...")

uploaded = files.upload()

print(f"\n‚úÖ Uploaded {len(uploaded)} files:")
for filename in uploaded.keys():
    print(f"   - {filename}")

## Part 4: Reading Vector Data - Your First Spatial DataFrames

GeoPandas makes reading spatial data as easy as pandas!

In [None]:
# Read GeoJSON file (single file format)
print("Reading building footprints (GeoJSON)...")
buildings = gpd.read_file('Wyoming_building_footprint_Selected.geojson')

print(f"‚úÖ Loaded {len(buildings)} building footprints")
print(f"Columns: {list(buildings.columns)}")
buildings.head()

In [None]:
# Read Shapefile from ZIP (multiple files in archive)
print("Reading watershed boundaries (Shapefile in ZIP)...")

# GeoPandas can read directly from ZIP!
watersheds = gpd.read_file('zip://NHD__Watershed_Boundaries_HUC_12_Selected.zip')

print(f"‚úÖ Loaded {len(watersheds)} HUC-12 watersheds")
print(f"Columns: {list(watersheds.columns)}")
watersheds.head()

### Understanding the Data Structure

A GeoDataFrame = DataFrame + Geometry Column

Each row represents one feature (building or watershed) with:
- **Attributes**: Regular columns (ID, name, area, etc.)
- **Geometry**: Special column with the shape

In [None]:
# Explore the geometry types
print("Building geometry types:")
print(buildings.geometry.type.value_counts())

print("\nWatershed geometry types:")
print(watersheds.geometry.type.value_counts())

print("\nüí° Both are polygons - perfect for area calculations!")

## Part 5: Coordinate Reference Systems (CRS) - Getting on the Same Page

### Why CRS Matters

Imagine two engineers:
- One using meters from Greenwich
- One using feet from a local monument

They need a common reference system to work together. That's CRS!

### Common CRS Types

| CRS | EPSG Code | Units | Use Case |
|-----|-----------|-------|----------|
| WGS84 | 4326 | Degrees | GPS, web maps |
| Web Mercator | 3857 | Meters | Google/Bing maps |
| NAD83 UTM | Various | Meters | US engineering |
| State Plane | Various | Feet/Meters | Local projects |

### Check and Align CRS

In [None]:
# Check current CRS
print("Current Coordinate Reference Systems:")
print(f"Buildings CRS: {buildings.crs}")
print(f"Watersheds CRS: {watersheds.crs}")

# Check if they match
if buildings.crs == watersheds.crs:
    print("\n‚úÖ Great! Both datasets use the same CRS")
else:
    print("\n‚ö†Ô∏è Different CRS detected - we need to align them!")

In [None]:
# Convert both to a common CRS for Wyoming
# EPSG:32613 - WGS 84 / UTM zone 13N (good for Wyoming)
# This gives us measurements in meters

target_crs = 'EPSG:32613'

print(f"Converting to {target_crs} (UTM Zone 13N - meters)...")

buildings_utm = buildings.to_crs(target_crs)
watersheds_utm = watersheds.to_crs(target_crs)

print("‚úÖ Both datasets now in the same coordinate system!")
print(f"   Units: meters")
print(f"   Good for: area calculations, distance measurements")

## Part 6: Visualizing Vector Data - Making Maps That Communicate

### Basic Visualization

Let's start with a simple map showing our data:

#### Plotting Watersheds and Building Footprints

The following code cell creates a simple map to visualize watershed boundaries and building locations together.

First, a Matplotlib figure and axis are created with a larger figure size to make the map easier to read. The watershed polygons are plotted first using a light blue fill and dark blue outlines so they form the background context of the map. Plotting larger features first helps keep them from covering smaller details.

Next, building footprints are plotted on top of the watersheds in red. A small marker size and partial transparency are used so dense areas of buildings remain visible without overwhelming the map.

The map is then labeled with a title and axis labels showing UTM coordinates (easting and northing). A light grid is added to help with spatial reference.

Finally, a small text box is placed in the upper-left corner of the map that reports the total number of buildings and watersheds shown. The layout is adjusted to avoid overlapping elements, and the figure is displayed.


In [None]:
# Create a basic map
fig, ax = plt.subplots(figsize=(14, 10))

# Plot watersheds first (larger features)
watersheds_utm.plot(ax=ax,
                    color='lightblue',
                    edgecolor='darkblue',
                    linewidth=1.5,
                    alpha=0.5)

# Plot buildings on top
buildings_utm.plot(ax=ax,
                  color='red',
                  markersize=0.5,
                  alpha=0.6)

plt.title('Wyoming Watersheds and Building Footprints', fontsize=16, fontweight='bold')
plt.xlabel('Easting (m)', fontsize=12)
plt.ylabel('Northing (m)', fontsize=12)
plt.grid(True, alpha=0.3)

# Add a text note
plt.text(0.02, 0.98, f'Total Buildings: {len(buildings_utm):,}\nTotal Watersheds: {len(watersheds_utm)}',
         transform=ax.transAxes, fontsize=10, verticalalignment='top',
         bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))

plt.tight_layout()
plt.show()

### Adding Labels to Watersheds

For engineering communication, we often need to label features:

In [None]:
# Explore watershed attributes to find label field
print("Watershed attributes available for labeling:")
for col in watersheds_utm.columns:
    if col != 'geometry':
        print(f"  - {col}: {watersheds_utm[col].iloc[0]}")

In [None]:
# Create a map with watershed labels
fig, ax = plt.subplots(figsize=(16, 12))

# Plot watersheds
watersheds_utm.plot(ax=ax,
                    color='lightblue',
                    edgecolor='darkblue',
                    linewidth=2,
                    alpha=0.5)

# Add labels at watershed centroids
for idx, row in watersheds_utm.iterrows():
    # Get the centroid (center point) of each watershed
    centroid = row.geometry.centroid

    # Add HUC12 code as label
    if 'huc12' in row.index:
        label = row['huc12']
    elif 'HUC12' in row.index:
        label = row['HUC12']
    else:
        label = f"WS-{idx+1}"

    ax.annotate(label,
                xy=(centroid.x, centroid.y),
                ha='center',
                fontsize=8,
                fontweight='bold',
                color='darkblue',
                bbox=dict(boxstyle='round,pad=0.3',
                         facecolor='white',
                         alpha=0.8))

plt.title('HUC-12 Watersheds with Labels', fontsize=16, fontweight='bold')
plt.xlabel('Easting (m)', fontsize=12)
plt.ylabel('Northing (m)', fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## Part 7: Spatial Operations - Answering Engineering Questions

### The Engineering Question

**"If HUC-12 watershed 101900090108 floods, which buildings are at risk?"**

To answer this, we need to:
1. Find the specific watershed
2. Clip buildings to that watershed
3. Count and analyze affected buildings

### Step 1: Find Our Target Watershed

In [None]:
# Define our target HUC-12
target_huc = '101900090108'

# Find the watershed
# Try different column names that might contain HUC12
huc_column = None
for col in watersheds_utm.columns:
    if 'huc' in col.lower() or 'HUC' in col:
        huc_column = col
        break

if huc_column:
    print(f"Found HUC column: {huc_column}")
    target_watershed = watersheds_utm[watersheds_utm[huc_column] == target_huc]

    if len(target_watershed) > 0:
        print(f"‚úÖ Found watershed {target_huc}")
    else:
        print(f"‚ö†Ô∏è Watershed {target_huc} not found. Using first watershed as example.")
        target_watershed = watersheds_utm.iloc[[0]]
else:
    print("‚ö†Ô∏è No HUC column found. Using first watershed as example.")
    target_watershed = watersheds_utm.iloc[[0]]

# Display watershed info
print(f"\nTarget watershed info:")
for col in target_watershed.columns:
    if col != 'geometry':
        print(f"  {col}: {target_watershed[col].values[0]}")

### Step 2: Clip Buildings to Watershed

This is the spatial equivalent of filtering - keep only buildings inside the watershed:

In [None]:
# Perform spatial clip - buildings within watershed
print(f"Total buildings before clip: {len(buildings_utm):,}")

# Clip operation
buildings_in_watershed = gpd.clip(buildings_utm, target_watershed)

print(f"Buildings within watershed: {len(buildings_in_watershed):,}")
print(f"\nüéØ {len(buildings_in_watershed):,} buildings potentially at risk!")

### Step 3: Visualize the Results

In [None]:
# Create a detailed map of the analysis
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 8))

# Left map: Overview
watersheds_utm.plot(ax=ax1, color='lightgray', edgecolor='gray', alpha=0.5)
target_watershed.plot(ax=ax1, color='lightblue', edgecolor='darkblue', linewidth=3)
buildings_utm.plot(ax=ax1, color='gray', markersize=0.1, alpha=0.3)
buildings_in_watershed.plot(ax=ax1, color='red', markersize=1)

ax1.set_title('Overview: Target Watershed in Context', fontsize=14, fontweight='bold')
ax1.set_xlabel('Easting (m)')
ax1.set_ylabel('Northing (m)')
ax1.grid(True, alpha=0.3)

# Right map: Zoomed to target watershed
target_watershed.plot(ax=ax2, color='lightblue', edgecolor='darkblue', linewidth=2, alpha=0.5)
buildings_in_watershed.plot(ax=ax2, color='red', markersize=2)

# Zoom to watershed bounds
minx, miny, maxx, maxy = target_watershed.total_bounds
ax2.set_xlim(minx - 1000, maxx + 1000)
ax2.set_ylim(miny - 1000, maxy + 1000)

ax2.set_title(f'Buildings at Risk: {len(buildings_in_watershed):,} structures',
              fontsize=14, fontweight='bold')
ax2.set_xlabel('Easting (m)')
ax2.set_ylabel('Northing (m)')
ax2.grid(True, alpha=0.3)

plt.suptitle('Flood Risk Analysis: Buildings in Watershed', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

## Part 8: Geometry Calculations - Quantifying Risk üìä

### Calculate Areas and Statistics

In [None]:
# Calculate watershed area
watershed_area_m2 = target_watershed.geometry.area.values[0]
watershed_area_km2 = watershed_area_m2 / 1_000_000
watershed_area_mi2 = watershed_area_km2 * 0.386102

print("WATERSHED STATISTICS")
print("="*40)
print(f"Area: {watershed_area_km2:.2f} km¬≤ ({watershed_area_mi2:.2f} mi¬≤)")
print(f"Buildings at risk: {len(buildings_in_watershed):,}")
print(f"Building density: {len(buildings_in_watershed)/watershed_area_km2:.1f} buildings/km¬≤")

# Calculate building footprint statistics
if len(buildings_in_watershed) > 0:
    building_areas = buildings_in_watershed.geometry.area

    print("\nBUILDING FOOTPRINT STATISTICS")
    print("="*40)
    print(f"Total footprint area: {building_areas.sum()/10000:.2f} hectares")
    print(f"Average building size: {building_areas.mean():.1f} m¬≤")
    print(f"Largest building: {building_areas.max():.1f} m¬≤")
    print(f"Smallest building: {building_areas.min():.1f} m¬≤")

## Part 9: Interactive Mapping - Web-Based Visualization

For presentations and reports, interactive maps are powerful:

The following code cell creates an interactive web map showing a target watershed and the buildings located within it.

Because web maps require latitude and longitude coordinates, both the watershed and building datasets are first reprojected to WGS84 (EPSG:4326). The centroid of the watershed is then calculated and used to center the map.

A Folium map is created using OpenStreetMap tiles and an initial zoom level appropriate for neighborhood-scale viewing. The watershed boundary is added as a GeoJSON layer with a light blue fill and darker outline to clearly define its extent. A tooltip is included so users can identify the watershed when hovering.

Building footprints are added on top of the watershed in red. To keep the map responsive, buildings are only plotted if the total count is below a set threshold. Each building is styled with partial transparency so overlapping footprints remain visible.

Finally, a title is added to the map using custom HTML, a summary message is printed to confirm how many buildings were included, and the interactive map object is displayed.


In [None]:
# Create an interactive map with Folium
# First, convert to lat/lon for web mapping
target_watershed_wgs = target_watershed.to_crs('EPSG:4326')
buildings_in_watershed_wgs = buildings_in_watershed.to_crs('EPSG:4326')

# Get center point for map
center_lat = target_watershed_wgs.geometry.centroid.y.values[0]
center_lon = target_watershed_wgs.geometry.centroid.x.values[0]

# Create base map
m = folium.Map(location=[center_lat, center_lon],
               zoom_start=13,
               tiles='OpenStreetMap')

# Add watershed
folium.GeoJson(
    target_watershed_wgs.geometry.values[0],
    style_function=lambda x: {
        'fillColor': 'lightblue',
        'color': 'darkblue',
        'weight': 3,
        'fillOpacity': 0.4
    },
    tooltip="Target Watershed"
).add_to(m)

# Add buildings (if not too many)
if len(buildings_in_watershed_wgs) < 1000:  # Limit for performance
    for idx, building in buildings_in_watershed_wgs.iterrows():
        folium.GeoJson(
            building.geometry,
            style_function=lambda x: {
                'fillColor': 'red',
                'color': 'darkred',
                'weight': 1,
                'fillOpacity': 0.7
            }
        ).add_to(m)

# Add title
title_html = '''
             <h3 align="center" style="font-size:16px"><b>Buildings at Risk in Target Watershed</b></h3>
             '''
m.get_root().html.add_child(folium.Element(title_html))

print(f"Interactive map created with {len(buildings_in_watershed)} buildings")
m

## Practice Exercises üéØ

Now it's your turn! Complete these exercises to reinforce your learning.

### Exercise 1: Watershed Area Ranking
Calculate the area of all watersheds and identify the three largest.

In [None]:
# EXERCISE 1: Find the three largest watersheds
# Your code here:

# Step 1: Calculate area for all watersheds

# Step 2: Sort by area

# Step 3: Display top 3


### Exercise 2: Multi-Watershed Analysis
Select 3 watersheds and count buildings in each.

In [None]:
# EXERCISE 2: Analyze buildings in multiple watersheds
# Your code here:

# Step 1: Select 3 watersheds

# Step 2: For each watershed, clip buildings and count

# Step 3: Create a summary table


### Exercise 3: Buffer Analysis
Create a 500m buffer around buildings and see which watersheds they intersect.

In [None]:
# EXERCISE 3: Buffer analysis
# Your code here:

# Step 1: Select a few buildings

# Step 2: Create 500m buffers

# Step 3: Find intersecting watersheds


### Exercise 4: Professional Map
Create a publication-ready map with legend, scale bar, and north arrow.

In [None]:
# EXERCISE 4: Create a professional map
# Your code here:

# Include:
# - Title
# - Legend
# - Scale information
# - Color scheme
# - Labels


## üéâ What You Can Now Do!

Congratulations! You've completed Module 3 and gained powerful spatial analysis skills.

### ‚úÖ You Can Now:

**Data Management:**
- Load GeoJSON and Shapefiles
- Understand vector data structures
- Handle different coordinate systems

**Spatial Analysis:**
- Clip features by boundaries
- Calculate areas and distances
- Perform spatial joins
- Create buffers

**Visualization:**
- Create static maps with matplotlib
- Add labels and annotations
- Build interactive web maps
- Design professional cartography

**Engineering Applications:**
- Identify infrastructure at risk
- Calculate watershed statistics
- Perform screening-level assessments
- Communicate results visually

### üöÄ You're Ready For:
- Module 4: Raster data (DEMs, precipitation grids)
- Combining vector and raster analysis
- Watershed delineation
- Flood mapping

### üí° Key Takeaways:

1. **Vector data = Smart drawings with data tables**
2. **CRS alignment is critical for accurate analysis**
3. **Spatial operations are geographic filters**
4. **Start simple, build complexity**
5. **Visualization communicates results**

### üìö Your Spatial Toolkit:

```python
# Essential spatial operations
gpd.read_file()          # Load data
gdf.to_crs()            # Align coordinates
gpd.clip()              # Clip by boundary
gdf.buffer()            # Create buffers
gdf.overlay()           # Intersect/union
gdf.plot()              # Visualize
gdf.geometry.area       # Calculate area
```

### üåâ Bridge to Module 4:

We now know **where** watersheds and buildings are. Next, we'll bring in raster data like DEMs and precipitation grids to understand **how** water moves across them.

The combination of vector (Module 3) + raster (Module 4) = Complete spatial analysis toolkit!

**Keep exploring, keep mapping!** üó∫Ô∏èüêç