# Land Use Categorization for TMD23
This is intended to be a demo implementing the new definitions of land use categorizations for the Trip Generation step of the TDM23 (travel demand model in development). This notebook starts with categorization of Central Business Districts (CBDs).


Started March 30th, 2021 - Margaret Atkinson

#### Requirements:
This is running as a ipynb in Jupyter Notebooks in an Anaconda environment that contains the following:
- Python 3.7
- numpy
- pandas
- openmatrix
- matplotlib
- descartes
- ipympl
- geopandas
- nb_conda
- nb_conda_kernels
- folium
- branca
- jenkspy (INSTALL FIRST)

For more information on how this environment was set up, see: https://github.com/bkrepp-ctps/mde-prototype-python

#### Just a note about environments:

This script seems to require pyproj > 2.2. The environment base_py_37_branca seems to have this.

Hypothesis: In order for pyproj to deal with CRS (coordinate reference systems) in the geodataframes - we need to be using the most recent version of pyproj. See: https://geopandas.org/docs/user_guide/projections.html#upgrading-to-geopandas-0-7-with-pyproj-2-2-and-proj-6

This changes how we specify CRSs. Also note that going from pivot table back to geodataframe (even though the pivot table has geometry) we need to specify the CRS - which should be the same CRS that the data came out of.

Using a version of pyproj that uses the {'init': 'ESPG4326'} format is giving me inconsistent results - sometimes maintaining the CRS and sometimes not. This causes the buffers to be larger than desired and causes problems for spatial join.


#### Links about Decay Functions
- https://aceso.readthedocs.io/_/downloads/en/latest/pdf/
- https://geographicdata.science/book/notebooks/04_spatial_weights.html
- https://github.com/pysal/pysal


## Definitions:
### CBD
CBD: A TAZ is marked as a CBD if it is within 0.25 miles of two heavy rail (subway) stops from two different lines.

Methodology Plan: Buffer each heavy rail stop and dissolve by line to get one buffer per heavy rail line. Count the number of heavy rail line buffers intersect with each TAZ. If number is >1, the TAZ is a CBD.

### Urban

### Suburban

### Rural

In [1]:
#IMPORT LIBRARIES
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd

In [27]:
#FOR CBD - some of this can be reused
#Import Data
#import TAZ shapefile
#(https://geopandas.org/docs/user_guide/io.html)
taz = "G:\Data_Resources\DataStore\Ben_Marg_Proto\candidate_CTPS_TAZ_STATE_2019.shp"
    #load into a geodataframe
taz = gpd.read_file(taz)
    #filter to just be TAZ ID and geometry
taz = taz[['id','geometry']]
    #rename ID to TAZ_ID
taz = taz.rename(columns={"id":"TAZ_ID"})

#import all stops (must have line column)
hr_stops = "G:\Regional_Modeling\Projects\Landuse Type_Project\To Margrate\Model_Stops_CRS.shp"
hr_stops = gpd.read_file(hr_stops)

#transform TAZ and stop layers' CRS to CTPS Standard - "EPSG:26986" (Massachusetts State Plane)
taz = taz.to_crs("EPSG:26986")
hr_stops = hr_stops.to_crs("EPSG:26986")

#filter all stops to just heavy rail stops
hr_stops = hr_stops[hr_stops['MODE'].isin([5,6,7,8])] 
#mode 4 is Green, 5 is Red, Mattapan is 6 (Ashmont to Mattapan), Blue is 8, Orange is 7
    #DO I NEED TO TREAT MATTAPAN AS 5???
    #if so: (two lines doing identical things)
    #https://stackoverflow.com/questions/34499584/use-of-loc-to-update-a-dataframe-python-pandas
hr_stops['MODE'] = hr_stops['MODE'].replace(6,5)
#hr_stops.loc[hr_stops['MODE'] == 6, 'MODE'] = 5


#filter
hr_stops = hr_stops[['ID', 'MODE', 'STOP_NAME', 'geometry']]
hr_stops.crs

<Projected CRS: EPSG:26986>
Name: NAD83 / Massachusetts Mainland
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: United States (USA) - Massachusetts onshore - counties of Barnstable; Berkshire; Bristol; Essex; Franklin; Hampden; Hampshire; Middlesex; Norfolk; Plymouth; Suffolk; Worcester.
- bounds: (-73.5, 41.46, -69.86, 42.89)
Coordinate Operation:
- name: SPCS83 Massachusetts Mainland zone (meters)
- method: Lambert Conic Conformal (2SP)
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

## CBD

In [28]:
#BUFFER CELL

#convert the 0.25 mile buffer distance into meters (CRS is in meters)
buf25 = 0.5*1609.34

#make a copy of hr_stops to buffer
hr_buf = hr_stops

#Buffer the heavy rail stops
#(https://geopandas.org/docs/user_guide/geometric_manipulations.html)
#(https://gis.stackexchange.com/questions/253224/geopandas-buffer-using-geodataframe-while-maintaining-the-dataframe)
    #keep attribute data by replacing the geometry column
hr_buf['geometry']= hr_buf.buffer(buf25)

#group by line and dissolve
#(https://geopandas.org/docs/user_guide/aggregation_with_dissolve.html)
    #.reset_index() is because the by=field will turn the field into the index and have it not be a col
hr_buf_dis = hr_buf.dissolve(by='MODE').reset_index()

#only get what we need
hr_buf_dis = hr_buf_dis[['MODE', 'geometry']]

In [29]:
#make sure everything is kosher
#hr_buf_dis.crs="EPSG:26986"
#hr_buf_dis = gpd.GeoDataFrame(hr_buf_dis)

#hr_buf_dis.plot()
#hr_buf_dis.crs="EPSG:26986"
hr_buf_dis.crs 

<Projected CRS: EPSG:26986>
Name: NAD83 / Massachusetts Mainland
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: United States (USA) - Massachusetts onshore - counties of Barnstable; Berkshire; Bristol; Essex; Franklin; Hampden; Hampshire; Middlesex; Norfolk; Plymouth; Suffolk; Worcester.
- bounds: (-73.5, 41.46, -69.86, 42.89)
Coordinate Operation:
- name: SPCS83 Massachusetts Mainland zone (meters)
- method: Lambert Conic Conformal (2SP)
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

In [30]:
#INTERSECT AND COUNT CELL

#do a spatial join between TAZ and buffers
#(https://geopandas.org/docs/user_guide/mergingdata.html)
hr_sj = gpd.sjoin(taz,hr_buf_dis,how ="left", op="intersects")

hr_sj.crs


#make pivot table counting for each TAZ ID how many Line_IDs
    #note each line should only have one polygon so a taz should not be able to have multiple intersections with the same LineID
    ##keep geometry, count mode
#(https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html)
hr_pv = pd.pivot_table(hr_sj, index = "TAZ_ID", aggfunc={'MODE':'count', 'geometry': 'first'}).reset_index()
hr_pv = gpd.GeoDataFrame(hr_pv)
hr_pv.crs="EPSG:26986"

hr_pv.crs

<Projected CRS: EPSG:26986>
Name: NAD83 / Massachusetts Mainland
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: United States (USA) - Massachusetts onshore - counties of Barnstable; Berkshire; Bristol; Essex; Franklin; Hampden; Hampshire; Middlesex; Norfolk; Plymouth; Suffolk; Worcester.
- bounds: (-73.5, 41.46, -69.86, 42.89)
Coordinate Operation:
- name: SPCS83 Massachusetts Mainland zone (meters)
- method: Lambert Conic Conformal (2SP)
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

In [31]:
#rename Line_ID count field
hr_pv = hr_pv.rename(columns={"MODE":"MODE_COUNT"})

#make CBD flag field (conditional - 1 where CBD, 0 where not)
#(https://cmdlinetips.com/2019/05/how-to-create-a-column-using-condition-on-another-column-in-pandas/)
hr_pv['CBD_Flag'] = np.where(hr_pv.MODE_COUNT >1, 1, 0)

#filter to just have CBD TAZs
#(https://www.geeksforgeeks.org/ways-to-filter-pandas-dataframe-by-column-values/)
CBD_TAZ = hr_pv[hr_pv['CBD_Flag'] == 1]

CBD_TAZ

Unnamed: 0,TAZ_ID,MODE_COUNT,geometry,CBD_Flag
0,1,3,"POLYGON ((235848.827 901768.399, 235835.856 90...",1
1,2,3,"POLYGON ((235983.175 901640.457, 235978.313 90...",1
2,3,3,"POLYGON ((235967.030 901591.944, 235967.790 90...",1
3,4,2,"POLYGON ((236117.919 901965.053, 236042.500 90...",1
4,5,2,"POLYGON ((236782.571 902122.456, 236733.885 90...",1
...,...,...,...,...
145,146,2,"POLYGON ((236789.530 899785.961, 236830.814 89...",1
146,147,2,"POLYGON ((236727.250 899247.574, 236821.090 89...",1
624,625,2,"POLYGON ((235365.474 902027.539, 235274.972 90...",1
625,626,2,"POLYGON ((235150.211 902165.195, 235100.600 90...",1


In [32]:
#show a map of the result symbolized on CBD flag field.
%matplotlib
#hr_pv = gpd.GeoDataFrame(hr_pv)
hr_pv.plot(column = 'MODE_COUNT',figsize=(10.0,8.0), legend=True)

Using matplotlib backend: Qt5Agg


<AxesSubplot:>

## Exports for Testing

In [24]:
#output as shapefile
# Output path
outfp = r"C:\Users\matkinson\Downloads\TAZ_CBD5.shp"

CBD_TAZ = gpd.GeoDataFrame(CBD_TAZ)
# Save to disk
CBD_TAZ.to_file(outfp)

In [25]:
outfp = r"C:\Users\matkinson\Downloads\test5.shp"

hr_sj = gpd.GeoDataFrame(hr_sj)
# Save to disk
hr_sj.to_file(outfp)

  """


In [26]:
outfp = r"C:\Users\matkinson\Downloads\buf5.shp"

#hr_buf_dis = gpd.GeoDataFrame(hr_buf_dis)
# Save to disk
hr_buf_dis.to_file(outfp)