Skip to content

jvieroe/dyadicdist

Repository files navigation

dyadicdist

CodeFactor License: GPL (>= 3)

Master branch status: R-CMD-check Codecov test coverage

Introduction

The purpose of dyadicdist is to provide quick and easy calculation of dyadic distances between geo-referenced points. The main contribution of dyadicdist is that the output is stored as a long, dyadic tibble as opposed to a wide matrix.

This is still a development version. Please don’t hesitate to let me know of any errors and/or deficiencies you might come across.

Quick example

A simple example illustrates the purpose of dyadicdist and its four main functions: ddist(), ddist_sf(), ddist_xy(), and ddist_xy_sf():

library(tidyverse)
library(dyadicdist)

df <- tibble::tribble(
  ~city_name, ~idvar, ~latitude, ~longitude,
  "copenhagen", 5, 55.68, 12.58,
  "stockholm", 2, 59.33, 18.07,
  "oslo", 51, 59.91, 10.75
)

ddist(data = df,
      id = "idvar")
#> # A tibble: 9 × 7
#>   distance distance_units city_name_1 idvar_1 city_name_2 idvar_2 match_id
#>      <dbl> <chr>          <chr>         <dbl> <chr>         <dbl> <chr>   
#> 1       0  m              copenhagen        5 copenhagen        5 5_5     
#> 2  521455. m              copenhagen        5 stockholm         2 5_2     
#> 3  482648. m              copenhagen        5 oslo             51 5_51    
#> 4  521455. m              stockholm         2 copenhagen        5 2_5     
#> 5       0  m              stockholm         2 stockholm         2 2_2     
#> 6  416439. m              stockholm         2 oslo             51 2_51    
#> 7  482648. m              oslo             51 copenhagen        5 51_5    
#> 8  416439. m              oslo             51 stockholm         2 51_2    
#> 9       0  m              oslo             51 oslo             51 51_51

Installation

At the moment, dyadicdist is under review at CRAN and is thus not yet available.

You can install the development version from GitHub with:

if(!require("devtools")) install.packages("devtools")
library(devtools)
devtools::install_github("jvieroe/dyadicdist")

Usage

Below, I describe some of the key features of dyadicdist. Let’s use some data on the 100 largest US cities as a working example:

library(dyadicdist)
library(tidyverse)
library(magrittr)

cities <- dyadicdist::cities

ddist()

ddist() takes as input a data.frame or a tibble and returns a tibble with dyadic distances for any combination of points i and j (see more below).

Beyond the data argument it requires the specification of latitude and longitude as well as a unique id indicator (the latter can be either numeric, integer, factor, or character).

ddist(cities,
      id = "id") %>% 
  head(5)
#> # A tibble: 5 × 11
#>   distance distance_units city_1      state_1 country_1  id_1 city_2     state_2
#>      <dbl> <chr>          <chr>       <chr>   <chr>     <int> <chr>      <chr>  
#> 1       0  m              Schenectady NY      USA         275 Schenecta… NY     
#> 2   31869. m              Schenectady NY      USA         275 Saratoga … NY     
#> 3  204716. m              Schenectady NY      USA         275 Rye        NY     
#> 4  133700. m              Schenectady NY      USA         275 Rome       NY     
#> 5   24559. m              Schenectady NY      USA         275 Rensselaer NY     
#> # … with 3 more variables: country_2 <chr>, id_2 <int>, match_id <chr>

As a default, latitude and longitude are specified as "latitude" and "longitude", respectively, and don’t need manual inputs. If necessary their variable names can be specified in the ddist() call:

cities %>%
  rename(lat = latitude,
         lon = longitude) %>% 
  ddist(.,
        id = "id",
        latitude = "lat",
        longitude = "lon") %>%
  head(5)
#> # A tibble: 5 × 11
#>   distance distance_units city_1      state_1 country_1  id_1 city_2     state_2
#>      <dbl> <chr>          <chr>       <chr>   <chr>     <int> <chr>      <chr>  
#> 1       0  m              Schenectady NY      USA         275 Schenecta… NY     
#> 2   31869. m              Schenectady NY      USA         275 Saratoga … NY     
#> 3  204716. m              Schenectady NY      USA         275 Rye        NY     
#> 4  133700. m              Schenectady NY      USA         275 Rome       NY     
#> 5   24559. m              Schenectady NY      USA         275 Rensselaer NY     
#> # … with 3 more variables: country_2 <chr>, id_2 <int>, match_id <chr>

ddist_sf(): spatial input data

To measure dyadic distances with an object of class sf use ddist_sf():

library(sf)

cities %>%
  st_as_sf(.,
           coords = c("longitude", "latitude"),
           crs = 4326) %>%
  ddist_sf(.,
           id = "id") %>%
  head(5)
#> # A tibble: 5 × 11
#>   distance distance_units city_1      state_1 country_1  id_1 city_2     state_2
#>      <dbl> <chr>          <chr>       <chr>   <chr>     <int> <chr>      <chr>  
#> 1       0  m              Schenectady NY      USA         275 Schenecta… NY     
#> 2   31869. m              Schenectady NY      USA         275 Saratoga … NY     
#> 3  204716. m              Schenectady NY      USA         275 Rye        NY     
#> 4  133700. m              Schenectady NY      USA         275 Rome       NY     
#> 5   24559. m              Schenectady NY      USA         275 Rensselaer NY     
#> # … with 3 more variables: country_2 <chr>, id_2 <int>, match_id <chr>

With the exception of crs, longitude, and latitude (all of which are inherently provided in an object of class sf), ddist_sf() takes the same optional arguments as ddist().

Output specification for ddist() and ddist_sf()

By default, ddist() and ddist_sf() return the full list of dyadic distances between any points i and j, including j = i. In total, this amount to nrow(data) * nrow(data) dyads and includes by default:

  • dyads between any observation and itself, i.e. dyads of type (i,i) (see example above)
  • duplicated dyads, i.e. both (i,j) and (j,i)

Both of these inclusions are optional, however.

  • Sort out (i,i) dyads (the diagonal in a distance matrix) by specifying diagonal = FALSE
  • Sort out duplicated dyads by specifying duplicates = FALSE
  • Sort out both by specifying diagonal = FALSE and duplicates = FALSE

ddist_xy() and ddist_xy_sf(): dual data inputs

ddist() and ddist_sf() take as input a single data.frame or tibble and returns dyads and dyadic distances between each observation.

The ddist_xy*() functions performs the same underlying task but takes two data inputs, x and y. For each input you need to specify an id variable (id_x and id_y) as well as longitude/latitude variables (both defaulting to "longitude" and "latitude")

fl <- cities %>%
  filter(state == "FL")

ca <- cities %>% 
  filter(state == "CA") %>% 
  rename(id_var = id)

ddist_xy(x = fl,
         y = ca,
         ids = c("id", "id_var")) %>% 
  head(5)
#> # A tibble: 5 × 11
#>   distance distance_units city_1        state_1 country_1    id city_2   state_2
#>      <dbl> <chr>          <chr>         <chr>   <chr>     <int> <chr>    <chr>  
#> 1 3639194. m              Madeira Beach FL      USA         224 South L… CA     
#> 2 3552612. m              Madeira Beach FL      USA         224 Carpint… CA     
#> 3 3522633. m              Madeira Beach FL      USA         224 Port Hu… CA     
#> 4 3338749. m              Madeira Beach FL      USA         224 Vista    CA     
#> 5 3823367. m              Madeira Beach FL      USA         224 San Mat… CA     
#> # … with 3 more variables: country_2 <chr>, id_var <int>, match_id <chr>

As with ddist(), we can apply the ddist_xy() function on spatial objects of class sf too:

fl <- cities %>%
  filter(state == "FL") %>% 
  st_as_sf(coords = c("longitude", "latitude"),
           crs = 4326)

ca <- cities %>% 
  filter(state == "CA") %>% 
  rename(id_var = id) %>% 
  st_as_sf(coords = c("longitude", "latitude"),
           crs = 4326)

ddist_xy_sf(x = fl,
            y = ca,
            ids = c("id", "id_var")) %>% 
  head(5)
#> # A tibble: 5 × 11
#>   distance distance_units city_1        state_1 country_1    id city_2   state_2
#>      <dbl> <chr>          <chr>         <chr>   <chr>     <int> <chr>    <chr>  
#> 1 3639194. m              Madeira Beach FL      USA         224 South L… CA     
#> 2 3552612. m              Madeira Beach FL      USA         224 Carpint… CA     
#> 3 3522633. m              Madeira Beach FL      USA         224 Port Hu… CA     
#> 4 3338749. m              Madeira Beach FL      USA         224 Vista    CA     
#> 5 3823367. m              Madeira Beach FL      USA         224 San Mat… CA     
#> # … with 3 more variables: country_2 <chr>, id_var <int>, match_id <chr>

CRS transformations

Raw coordinates

By default ddist() and ddist_xy() assume unprojected coordinates in basic latitude/longitude format (EPSG code 4326) when converting the raw data provided in the data argument to a spatial feature. This is consistent with the default when converting latitude/longitude data to spatial features in the sf package (see sf::st_as_sf()). You can apply a different CRS by providing a valid EPSG code of type numeric with the crs argument.

Transformations

All ddist*() functions allow you to transform the CRS before calculating dyadic distances using the crs_transform and new_crs arguments:

ddist(cities,
      id = "id",
      crs_transform = T,
      new_crs = 3359)
#> # A tibble: 10,000 × 11
#>    distance distance_units city_1      state_1 country_1  id_1 city_2    state_2
#>       <dbl> <chr>          <chr>       <chr>   <chr>     <int> <chr>     <chr>  
#>  1       0  US_survey_foot Schenectady NY      USA         275 Schenect… NY     
#>  2  105468. US_survey_foot Schenectady NY      USA         275 Saratoga… NY     
#>  3  675517. US_survey_foot Schenectady NY      USA         275 Rye       NY     
#>  4  443781. US_survey_foot Schenectady NY      USA         275 Rome      NY     
#>  5   81318. US_survey_foot Schenectady NY      USA         275 Renssela… NY     
#>  6  706757. US_survey_foot Schenectady NY      USA         275 Plattsbu… NY     
#>  7  558267. US_survey_foot Schenectady NY      USA         275 Peekskill NY     
#>  8  478389. US_survey_foot Schenectady NY      USA         275 Oneida    NY     
#>  9  694798. US_survey_foot Schenectady NY      USA         275 New Roch… NY     
#> 10  696411. US_survey_foot Schenectady NY      USA         275 Mount Ve… NY     
#> # … with 9,990 more rows, and 3 more variables: country_2 <chr>, id_2 <int>,
#> #   match_id <chr>

For a list of supported CRS transformations, see rgdal::make_EPSG().

Note that the choice of CRS may impact your results considerably. For more information on choosing an appropriate CRS, see here, here, here, and here

Citation

If you use dyadicdist for a publication, feel free to cite the package accordingly:

Vierø, Jeppe (2022). dyadicdist: Compute Dyadic Distances. R package version 0.3.1

The BibTeXentry for the (current version of the) package is:

@Manual{
  title = {dyadicdist: Compute Dyadic Distances},
  author = {Jeppe Vierø},
  year = {2022},
  note = {R package version 0.3.1},
  url = {https://github.com/jvieroe/dyadicdist},
}

Acknowledgements

  • The R Core Team for developing and maintaining the language
  • The authors of the amazing sf package. sf has greatly reduced barriers to entry for anyone working with spatial data in R and those who wish to do so
  • LatLong.net for the dyadicdist::cities data