# 12 Clipping

In this lesson we will learn how to to clip different geometries.

## About the data

We will use three datasets in this lesson. 

The first dataset is a [TIGER shapefile of the US states from the United States Census Bureau](https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.2022.html#list-tab-790442341). Follow these steps to download shapefile with the United States' states:

You can check the [metadata for all the TIGER shapefiles here](https://www.census.gov/programs-surveys/geography/technical-documentation/complete-technical-documentation/tiger-geo-line.html). 

The second dataset we'll use is [Natural Earth's simple medium scale populated places dataset](https://www.naturalearthdata.com/downloads/50m-cultural-vectors/). We can obtain this dataset by downloading the shapefile (choose the one that says "simple (less columns)").

The third dataset we'll use is [Natural Earth's road dataset](https://www.naturalearthdata.com/downloads/10m-cultural-vectors/roads/). 
We can obtain this dataset by downloading the shapefile 

We will combine these datasets to create the following map of infrastructure in Alaska:

## Import data

Let's start by loading our libraries and then importing the datasets we will use.


In [2]:
import os
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd

from shapely.geometry import box  # To create polygon bounding box


pd.set_option("display.max.columns", None)

# -------------------------------------------------------
# Import and simplify states polygons
states = gpd.read_file(os.path.join('data', 
                                    'tl_2022_us_state', 
                                    'tl_2022_us_state.shp'))

# Import Natural Earth populated places points
places = gpd.read_file(os.path.join('data',
                                    'ne_50m_populated_places_simple',
                                    'ne_50m_populated_places_simple.shp')
                                    )

# Import ferry routes lines
roads = gpd.read_file(os.path.join('data',
                                   'ne_10m_roads',
                                   'ne_10m_roads.shp')
                                   )

ERROR 1: PROJ: proj_create_from_database: Open of /opt/anaconda3/envs/eds220-env/share/proj failed


DriverError: Failed to open dataset (flags=68): data/tl_2022_us_state/tl_2022_us_state.shp

## Check-in
Use a for loop to iterate over the three geo-dataframes we imported and change their column names to lower caps.

In [None]:
pwr_source_plants = power_plants[power_plants['primsource'] == 'wind']

pwr_source_plant_state = pwr_source_plants[pwr_source_plants['state'] == 'Texas']

pwr_source_plants_

In [3]:
def plot_top3_states(pwr_source, power_plants, states):
    
    fig, ax = plt.subplots(figuresize = (12,6), nrows = 1, ncols = 3)

    for axis, state in zip(ax, top_states):
        axis.set_title(state)
        axis.axis('off')

        # Extract the state booundary and plot 
        state_boundary = state[states['name'] == state]
        state_boundary.plot(ax = axis,
                           color = 'none')

        # Subset power plant info by power source 
        pwr_source_plants = power_plants[power_plants['primsource'] == pwr_source]
        pwr_source_plant_state = pwr_source_plants[pwr_source_plants['state'] == state]

        # plot power plant info 
        pwr_source_plants_state.plot(ax = axis, 
                                    markersize = 5, 
                                    alpha = 0.5)

        plt.subtitle(f"Top 3 US States by numer of {power_source}-powered electric plants")
        plt.show()
        
        
plot_top3_state()