Skip to content
Switch branches/tags
Go to file
Cannot retrieve contributors at this time
title: "Introduction to nycgeo"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Vignette Title}
```{r setup, include = FALSE}
collapse = TRUE,
comment = "#>",
out.width = "100%",
cache = TRUE,
cache.path = "../cache/nycgeo/"
## Basic Usage
```{r load-packages, include = FALSE, echo = FALSE}
The most basic usage of `nycgeo` is to get boundaries in the `sf` format. Use `nyc_boundaries()` to get your desired geography. To make best use of the package, you should also load the `sf` package when using `nycgeo`. For these examples, I'll also load `tidyverse` as this will allow us to take advantage of pretty `tibble` printing and will come in handy when we want to manipulate and map the spatial data later.
```{r basic-usage}
nyc_boundaries(geography = "tract")
## Filter by geography
If you don't need census tracts for the entire city, you can use the `filter_by` and `region` arguments of `nyc_boundaries()` to specify the area you are interested in. For example, the following code returns only census tracts in Brooklyn and Queens.
```{r filter-by}
bk_qn_tracts <- nyc_boundaries(
geography = "tract",
filter_by = "borough",
region = c("brooklyn", "queens")
ggplot(bk_qn_tracts) +
geom_sf() +
Note, you can select multiple regions by passing a character vector to the `region` argument, but you can only choose a single geography to `filter_by`. Additionally, you can only filter by a geography that is *larger than or equal to* the boundaries you request. For example, it is not possible to filter PUMAs by NTAs because NTAs are smaller than PUMAs.
## Adding American Community Survey Data
`nycgeo` includes selected estimates from the American Community Survey as datasets. You can access these datasets directly or have them appended to the spatial data. To print a `tibble` of ACS data, simply call the data you want.
```{r print-acs-data}
To add census estimates to an `sf` object, use `add_acs_data = TRUE` to an `nyc_boundaries()`call. For example, here we get all NTAs in Manhattan with ACS data appended. One convenience of having the ACS data joined to the `sf` object is that you can very simply make a choropleth map. Here we do it with `ggplot2`, but you could use [`tmap`](, [`leaflet`]( or any other spatial package that works with `sf` objects.
```{r add-acs-data}
mn_ntas <- nyc_boundaries(
geography = "nta",
filter_by = "borough",
region = "manhattan",
add_acs_data = TRUE
ggplot(mn_ntas) +
geom_sf(aes(fill = pop_ba_above_pct_est)) +
name = "Bachelor's or above",
labels = scales::percent_format(),
option = "magma"
) +
theme_void() +
theme(panel.grid = element_line(color = "transparent")) +
labs(title = "Which neighborhoods in Manhattan are most educated?")
## Joining with other data
One use case of `nycgeo()` is if you have non-spatial data that relates to census tracts, NTAs, or other geographies and need to join that data with spatial boundaries to plot or otherwise analyze. This non-spatial data may be coded in a variety of ways and might not have names or IDs that match your spatial data. The `sf` data provided in `nycgeo` seeks to have a variety of geographic metadata that will match whatever labels your non-spatial data has.
In this example, we have non-spatial data from the [NYC Neighborhood Health Atlas]( at the NTA-level from which we would like to make a choropleth map. To do this, we import the .csv file and then join it to the spatial NTA object matching on NTA IDs. Then, we can map it as in the above example.
```{r join-health, message = FALSE}
nta_health <- read_csv("") %>%
select(NTA_Code, BlackCarbon)
nyc_boundaries(geography = "nta") %>%
left_join(nta_health, by = c("nta_id" = "NTA_Code")) %>%
ggplot() +
geom_sf(aes(fill = BlackCarbon)) +
scale_fill_viridis_c(name = "Black carbon (absorbance units)", option = "inferno") +
theme_void() +
theme(panel.grid = element_line(color = "transparent")) +
labs(title = "Which neighborhoods have high levels of black carbon pollution?")
## Finding which districts a set of points lies within
[Point-in-polygon operations]( are common tasks for spatial analysis. Given a set of points we want to find out which polygon contains each point. A real-world application of this would be counting the number of schools in each community district.
We start with a (non-spatial) data frame of all schools in New York, but with columns for latitude and longitude. Then we use those latitudes and longitudes to convert the data frame to an sf object. From there, we can use the `nyc_point_poly()` function to find which community district (CD) each point (school) is in and then count by CD to get the total number of schools in each CD.
```{r pip-schools, message = FALSE}
nyc_schools <- read_csv("")
schools_sf <- nyc_schools %>%
coords = c("longitude", "latitude"),
crs = 4326,
stringsAsFactors = FALSE
nyc_point_poly(schools_sf, "cd") %>%
st_set_geometry(NULL) %>%
count(cd_name, borough_cd_id)
## Available datasets
|Geography |Definition |Spatial data |Census data |Filter by|
|`"borough"`|borough (county) |`nycgeo::borough_sf` |`nycgeo::borough_acs_data`|`"borough"`|
|`"puma"` |public use microdata area|`nycgeo::puma_sf`|`nycgeo::puma_acs_data`|`"borough"`,`"puma"`|
|`"cd"` |community district|`nycgeo::cd_sf` |not currently available |`"borough"`, `"cd"`|
|`"nta"` |neighborhood tabulation area|`nycgeo::nta_sf`|`nycgeo::nta_acs_data`|`"borough"`,`"puma"`, `"nta"`|
|`"tract"` |census tract |`nycgeo::tract_sf` |`nycgeo::tract_acs_data` |`"borough"`,`"puma"`, `"nta"`|
|`"block"` |census block |`nycgeo::block_sf` |`nycgeo::block_census_data`|`"borough"`,`"puma"`, `"nta"`|
|`"council"`|city council district|`nycgeo::council_sf`|not currently available|none |
|`"police"` |police precinct |`nycgeo::police_sf` |not currently available |none |
|`"school"` |school district |`nycgeo::school_sf` |not currently available |none |
|`"cong"` |u.s. congressional district|`nycgeo::cong_sf` |not currently available|none |