Skip to content

Latest commit

 

History

History
34 lines (25 loc) · 6.72 KB

MakeSimBzoneEstDataset.md

File metadata and controls

34 lines (25 loc) · 6.72 KB

MakeSimBzoneEstDataset Module

December 16, 2018

This script combines data from the US Census, the EPA Smart Location Database (SLD), and the National Transit Database to prepare a block group dataset for use in estimating the models for synthesizing Bzones for VisionEval models where the Bzone level of geography is synthesized rather than explicitly defined. The script has 4 functions which carry out the operations of loading the 3 datasets and combining them into the final model estimation dataset. These functions are called as necessary. A function is only called if the dataset it creates is not present in the package. As a result, since the default package contains all of the datasets, no processing is done. If the user wishes to change the processing and/or what data is included in the datasets, they would need to remove the datasets in the 'data' directory and rebuild the package.

Housing and Household Income Dataset

US Census data on the proportions of multifamily and single family households and the median income of households in each block group is downloaded using the US Census data API and the tidycensus package. Note that you need to get a Census API key and 'register' it with your R setup if you have not already done so. You get a Census API key from [https://api.census.gov/data/key_signup.html]. Then execute the following code in the R console: census_api_key(INSERT CENSUS API KEY HERE, install = TRUE). The getCensusHousingInc function iterates through each state (and the District of Columbia), downloads numbers of buildings by type and the household median income for each block group. Buildings are combined into 2 types (single-family, multifamily). The single-family type is defined as detached one-family homes and mobile homes. The multifamily includes all other types (which are attached dwellings) except for the boat, RV, van, etc. category which is ignored. The script also downloads the median income for each block group. The year 2015 American Community Survey data are downloaded. This is the earliest year for which these data are available at the block group level through the Census API. Although the data are 5 years after the date of the SLD data, inaccuracies caused by this date mismatch should be small for the purpose of estimating the SimBzone models for the following reasons:

  • Land use and population shifts are relatively slow;
  • The building data are aggregated into 2 types (single-family, multifamily);
  • The building proportions are used in estimating the models; and,
  • The income data are normalized by median income of the region.

Smart Location Database

The US Environmental Protection Agency's (EPA) Smart Location Database is the principal source of data used for developing the method for the synthesizing Bzones. The SLD includes a large number of land use and transportation measures at the Census block group level for the year 2010. The large majority of these are measures of the so called 5D measures that have been found to have significant relationships to personal travel: density, diversity, design, distance to transit, and access to destinations. Several 5D measures are used in estimating the multimodal travel models that will be incorporated into VisionEval in the future. These measures are calculated by modules in the VELandUse package and they need to be calculated by modules in the VESimLandUse package as well. The SLD data used for estimating SimBzone models includes some additional data items that were added to the dataset for estimating the multimodal travel models. These include the amount of population within 5 miles of each block group and the amount of employment within 2 miles of each block group. The population and employment totals were computed based on straight-line distances between block group centroids. These values are used to compute a destination accessibility measure that is the harmonic mean of the two values. This destination accessibility measure is used instead of the measures in the SLD which do not adequately distinguish accessibility in smaller urbanized areas. The harmonic mean of population within 5 miles and employment within 2 miles has been found to be useful in distinguishing area types in urbanized areas of different sizes. The processSLD carries out these steps.

National Transit Database (NTD)

The processTransitData function reads in 2010 transit service, agency, and urbanized area files downloaded from the National Transit Database website. These downloaded files are included in the inst/extdata directory of this package. The function sums up the annual vehicle miles for fixed-route transit and vehicle revenue miles for fixed route transit by urbanized area. The urbanized area names are checked against the names in the Smart Location Database and are made consistent so that the transit data can be joined with the SLD dataset.

Making the SimBzone Estimation Dataset

The SimBzone model estimation dataset combines the Census, SLD, and NTD datasets along with the latitudes and longitudes of the block group centroids (used for map documentation of models). The variables are limited to those needed to model Bzone attributes used in the multimodal travel model and other VisionEval modules. The dataset is cleaned to remove block groups that have no activity (i.e. households or jobs) or no land area. Comments in the script provide specifics about cleaning. In addition, some of the 5D measures are recalculated. The density measure (referred to as D1D in the SLD), a measure of households and jobs per acre, is calculated using the land area (excludes water bodies) recorded in the SLD rather than the unprotected land area for the following reasons:

  • The activity density of block groups needs to "add up" to the overall density of the urban area they are a part of;
  • The unprotected land area of the large majority of block groups is equal to the land area; and,
  • A significant number of block groups had no recorded unprotected land area.

The diversity measure (referred to as D2A_JPHH), the ratio of jobs to housing, is recalculated to assure that it is consistent with the numbers of jobs and households recorded in the SLD. In addition, the entropy measure is recalculated to be consistent with the entropy measure used in the multimodal travel model. This measure is calculated in the same way as the SLD entropy measures but with 3 employment sectors (retail, service, other), rather than 5.

User Inputs

This module has no user input requirements.

Datasets Used by the Module

This module uses no datasets that are in the datastore.

Datasets Produced by the Module

This module produces no datasets to store in the datastore.