## **Map Utilities**
This notebook will provide a tutorial for using the utility classes and functions in this package to aid with mapping

### **Set-Up**

In [1]:
# Packages used by this tutorial
import geopandas # manipulating geographic data
import numpy # creating arrays
import pygris # easily acquiring shapefiles from the US Census
import matplotlib.pyplot # visualization

In [2]:
# Downloading the state-level dataset from pygris
states = pygris.states(cb=True, year=2022, cache=False).to_crs(3857)

### **USA**

The `USA` class within `utils` is intended to help users (a) quickly isolate subsets of states they want to include in their maps, and (b) enrich their data with additional characteristics (such as state abbreviations, regional/divisional groupings, and the like)

In [3]:
# Importing the main package
from matplotlib_map_utils.utils import USA

In [4]:
# Creating a usa object
usa = USA() # this will load the data from ./utils/usa.json

The `USA` class contains a list of all [states](https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States#States) and [territories](https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States#Territories) for the USA in a list of dictionary objects.

The included states and territories are based upon [this Wikipedia page](https://en.wikipedia.org/wiki/Federal_Information_Processing_Standard_state_code) listing all the available FIPS codes.

Each state and territory has the following attributes available for it:

* `fips`: A two-character `string` representing [the FIPS code](https://en.wikipedia.org/wiki/Federal_Information_Processing_Standard_state_code#FIPS_state_codes). *Note that both FIPS 5-1 and FIPS 5-2 codes are included, but 5-1 Territory codes are marked as "invalid" (ex. FIPS code 66 is preferred over code 14 for Guam).*

* `name`: A `string` representing the name of the state or territory, with proper captialization and punctuation. Generally follows the name provided in the FIPS code table (above), with some minor modifications (`Washington, D.C.` is used instead of `District of Columbia`).

* `abbr`: A two-character `string` representing the [proper abbreviation](https://en.wikipedia.org/wiki/List_of_U.S._state_and_territory_abbreviations) (or "alpha code") for the state or territory. *Note that all states have abbreviations, but not all territories do*.

* `valid`: A `boolean` variable representing if the given entry is *valid* according to **FIPS 5-2**. "Invalid" entries (for territories such as Guam, American Samoa, and the like) are retained for backwards compatibility with older datasets, but should be superseded by "valid" entires for these territories (usually, with a higher FIPS value).

* `state`: A `boolean` variable representing if the given entry is a *state*, per [this list](https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States#States). Note that `Washington, D.C.` is *not* a state.

* `contiguous`: A `boolean` variable representing if the given entry is part of the *contiguous United States*, also referred to as the "lower 48" or *CONUS*, per [this list](https://en.wikipedia.org/wiki/Contiguous_United_States). Note that `Washington, D.C.` *is* included in this list.

* `territory`: A `boolean` variable representing if the given entry is a *territory*, per [this list](https://en.wikipedia.org/wiki/Territories_of_the_United_States). Note that `Washington, D.C.` is *not* included in this list.

* `region`: For states and Washington, D.C., this will be their [Census designated *region*](https://en.wikipedia.org/wiki/List_of_regions_of_the_United_States#Census_Bureau%E2%80%93designated_regions_and_divisions) (`Northeast`, `Midwest`, `South`, or `West`). For territories, this will be either `Inhabited Territory`, `Uninhabited Territory`, or `Sovereign State` (for Palau, Micronesia, and Marshall Islands).

* `division`: For states and Washington, D.C., this will be their [Census designated *division*](https://en.wikipedia.org/wiki/List_of_regions_of_the_United_States#Census_Bureau%E2%80%93designated_regions_and_divisions) (such as `New England`, `West North Central`, `Mountain`, or `Pacific`). For territories, this will be either `Commonwealth`, `Compact of Free Association`, `Incorporated and Unorganized`, `Unincorporated and Unorganized`, `Unincorporated and Organized`, per [this list](https://en.wikipedia.org/wiki/Territories_of_the_United_States).

* `omb`: For states and Washington, D.C., this will be their [OMB administrative region](https://en.wikipedia.org/wiki/List_of_regions_of_the_United_States#Agency_administrative_regions) (such as `Region I` or `Region IX`). For territories, this will have the same value has `region`.

* `bea`: For states and Washington, D.C., this will be their [Bureau of Economic Analysis region](https://en.wikipedia.org/wiki/List_of_regions_of_the_United_States#Bureau_of_Economic_Analysis_regions) (such as `Great Lakes` or `Far West`). For territories, this will have the same value has `region`.

* `alias`: This field is only filled in if an entry has a common second name, such as `District of Columbia` instead of `Washington, D.C.`, and `Virgin Islands of the U.S.` instead of `U.S. Virgin Islands`. For most, it is left blank.

In [5]:
# Looking at a single example
usa.jurisdictions[0]

{'fips': '01',
 'name': 'Alabama',
 'abbr': 'AL',
 'valid': True,
 'state': True,
 'contiguous': True,
 'territory': False,
 'region': 'South',
 'division': 'East South Central',
 'omb': 'Region IV',
 'bea': 'Southeast',
 'alias': None}

#### **Filtering**

All entries are available through entry points such as `usa.jurisdictions` (all *valid* entries), `usa.states` (all states), and `usa.territories` (all territories), for users to iterate over the list-of-dicts as desired. However, a convenience `filter()` function is also provided for the `USA` class, which allows users to easily apply layered filters.

The arguments for `filter()` mirror the properties available for each state/territory, except for *alias* (see below), and they can accept either single values or lists-of-values.

The final argument of `filter()` is `to_return`, which tells the function what value you want to return:

* `fips` (default), `name`, or `abbr` will return *just that field* for each returned entry.

* `object` or `dict` will return the full list-of-dicts that passes the filter

Some notes:

* Filters are applied "top-to-bottom" in the order they are shown-above, and act as "and" filters

* If only a single entry is going to be returned, it will be removed from the list and returned as a single value

* *Note that the* name *filter compares against both the* name *and* alias *fields*

In [6]:
# Filtering based on a list of FIPS codes
usa.filter(fips=["01","02","10","11"], to_return="name")

['Alabama', 'Alaska', 'Delaware', 'Washington, D.C.']

In [7]:
# Filtering for Pacific contiguous states
usa.filter(division="Pacific", contiguous=True, to_return="abbr")

['CA', 'OR', 'WA']

In [8]:
# If only a single value is going to be returned, it will not be returned as a list
usa.filter(abbr="CA", to_return="name")

'California'

In [9]:
# If no entries are returned, a warning will show
usa.filter(territory=True, state=True)



`filter()` is intended to be easy-to-use whether filtering based on a single dimension, or multiple. However, each property also has its own filter available as a standalone function, following the form `filter_EXAMPLE()`: `filter_valid()`, `filter_fips()`, `filter_region()`, and so on.

Each of these standalone functions accepts the same three arguments:

* The `value` you want to filter by

* (Optional) The list-of-dicts you want to filter (if left blank, will filter all valid states/territories)

* `to_return`, which accepts the same arguments that `filter()` does

Using this, you can build your own processing pipeline to filter the jurisdictions as you prefer.

In [10]:
# Getting all valid states
valid = usa.filter_valid(True, to_return="object")
# Filtering that for all contiguous states
contiguous = usa.filter_contiguous(True, valid, "object")
# Filtering that for all Southern states
south = usa.filter_region("South", contiguous, "name")
south

['Alabama',
 'Arkansas',
 'Delaware',
 'Washington, D.C.',
 'Florida',
 'Georgia',
 'Kentucky',
 'Louisiana',
 'Maryland',
 'Mississippi',
 'North Carolina',
 'Oklahoma',
 'South Carolina',
 'Tennessee',
 'Texas',
 'Virginia',
 'West Virginia']

#### **Pandas**

The original impetus for this utility class was to help filter/enrich DataFrames and GeoDataFrames with additional data for each state/territory, which can be quite useful for plotting. See below for an example as to how this works.

In [11]:
# Let's say you had an incomplete GeoDataFrame, that just contained the FIPS Code (STATEFP)
gdf = states[["STATEFP","geometry"]].copy()
gdf.head()

Unnamed: 0,STATEFP,geometry
0,35,"POLYGON ((-12139410.193 3695244.928, -12139373..."
1,46,"POLYGON ((-11583670.271 5621144.949, -11582880..."
2,6,"MULTIPOLYGON (((-13202983.219 3958997.376, -13..."
3,21,"MULTIPOLYGON (((-9952591.879 4373541.504, -995..."
4,1,"MULTIPOLYGON (((-9802056.754 3568885.376, -980..."


In [12]:
# Now we want to add the name of each state
# When using .apply() on a single column, it can be quite straightforward
gdf["NAME"] = gdf["STATEFP"].apply(lambda x: usa.filter_fips(x, to_return="name"))
gdf.head()

Unnamed: 0,STATEFP,geometry,NAME
0,35,"POLYGON ((-12139410.193 3695244.928, -12139373...",New Mexico
1,46,"POLYGON ((-11583670.271 5621144.949, -11582880...",South Dakota
2,6,"MULTIPOLYGON (((-13202983.219 3958997.376, -13...",California
3,21,"MULTIPOLYGON (((-9952591.879 4373541.504, -995...",Kentucky
4,1,"MULTIPOLYGON (((-9802056.754 3568885.376, -980...",Alabama


In [13]:
# Now we want to add their BEA region
# When using .apply() on an entire DF, need to state the axis of transformation (0 for rows, 1 for columns)
gdf["BEA_REGION"] = gdf.apply(lambda x: usa.filter_fips(x["STATEFP"], to_return="object")["bea"], axis=1)
gdf.head()

Unnamed: 0,STATEFP,geometry,NAME,BEA_REGION
0,35,"POLYGON ((-12139410.193 3695244.928, -12139373...",New Mexico,Southwest
1,46,"POLYGON ((-11583670.271 5621144.949, -11582880...",South Dakota,Plains
2,6,"MULTIPOLYGON (((-13202983.219 3958997.376, -13...",California,Far West
3,21,"MULTIPOLYGON (((-9952591.879 4373541.504, -995...",Kentucky,Southeast
4,1,"MULTIPOLYGON (((-9802056.754 3568885.376, -980...",Alabama,Southeast
