vignettes/inequality_and_poverty.Rmd

---
title: "Calculating accessibility inequality and poverty"
output: rmarkdown::html_vignette
bibliography: '`r system.file("REFERENCES.bib", package = "accessibility")`'
vignette: >
  %\VignetteIndexEntry{Calculating accessibility inequality and poverty} 
  %\VignetteEngine{knitr::rmarkdown} 
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
Sys.setenv(OMP_THREAD_LIMIT = 2)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

**accessibility** not only includes functions to calculate accessibility levels
from travel time matrices and land use datasets, but also to calculate
accessibility inequality and poverty from an accessibility distribution. This
vignette briefly presents these functions.

# Demonstration on sample data

We first need to calculate the accessibility levels that we are going to use in
this demonstration. To do so, we use the `cumulative_cutoff()` function and the
sample data included in the package.

```{r}
library(accessibility)

data_dir <- system.file("extdata", package = "accessibility")
travel_matrix <- readRDS(file.path(data_dir, "travel_matrix.rds"))
land_use_data <- readRDS(file.path(data_dir, "land_use_data.rds"))

access <- cumulative_cutoff(
  travel_matrix,
  land_use_data,
  opportunity = "jobs",
  travel_cost = "travel_time",
  cutoff = 30
)
head(access)
```

The functions we’ll be demonstrating in this section take much of the same
information as input, including:

- the accessibility distribution;
- a sociodemographic dataset containing information such as the distribution of
  people in the study area and their income;
- the name of the column in the accessibility dataset with the opportunities
  whose accessibility should be considered in the calculation; and
- the name of the column in the sociodemographic dataset with the number of
  people in each cell, used to weigh accessibility levels when calculating
  inequality/poverty.

## Palma Ratio

`palma_ratio()` calculates the Palma Ratio of a given accessibility
distribution. Originally defined as the income share of the wealthiest 10% of a
population divided by the income share of the poorest 40%, this measure has been
adapted in transport planning as the average accessibility of the wealthiest 10%
divided by the average accessibility of the poorest 40%. Palma Ratio values
higher than 1 indicate a scenario in which the wealthiest population has higher
accessibility levels than the poorest, whereas values lower than 1 indicate the
opposite situation.

This function includes an additional `income` parameter, used to list the column
in the sociodemographic dataset with the income variable that should be used to
classify the population in socioeconomic groups. Please note that this variable
should describe income per capita (e.g. mean income per capita, household income
per capita, etc), instead of the total amount of income in each cell.

```{r}
palma <- palma_ratio(
  access,
  sociodemographic_data = land_use_data,
  opportunity = "jobs",
  population = "population",
  income = "income_per_capita"
)
palma
```

## Gini Index

`gini_index()` calculates the Gini Index of a given accessibility distribution.
Probably the most frequently used inequality measure in transport planning, this
index estimates how much a distribution deviates from a hypothetical situation
in which everyone has the exact same accessibility conditions. Gini Index values
range from 0 to 1. A value of 0 indicates a scenario of perfect
equality, in which everyone has the same accessibility levels, whereas a value
of 1 indicates a scenario of perfect inequality, in which the accessibility
levels in a study area are all concentrated into a single cell.

```{r}
gini <- gini_index(
  access,
  sociodemographic_data = land_use_data,
  opportunity = "jobs",
  population = "population"
)
gini
```

## Concentration Index

`concentration_index()` calculates the Concentration Index (CI) of a given
accessibility distribution. This measures estimates the extent to which
accessibility inequalities are systematically associated with individuals'
socioeconomic levels. CI values can theoretically vary between -1 and 1 (when
all accessibility is concentrated in the most or in the least disadvantaged
cell, respectively). Negative values indicate that inequalities favor the poor,
while positive values indicate a pro-rich bias.

Just like `palma_ratio()`, this function includes an `income` parameter to
indicate which variable from the sociodemographic dataset should be used to rank
the population from the least to the most privileged groups. Unlike the Palma
Ratio function, however, any variable that can be used to describe one's
socioeconomic status, such as education level, for example, can be passed to
this argument, as long as it can be numerically ordered (in which higher values
denote higher socioeconomic status).

`concentration_index()` also includes a `type` parameter, used to indicate which
Concentration Index to calculate. This parameter currently supports two values,
`"standard"` and `"corrected"`, which respectively identify the standard
relative CI and the corrected CI, proposed by @erreygers2009correcting.

```{r}
ci <- concentration_index(
  access,
  sociodemographic_data = land_use_data,
  opportunity = "jobs",
  population = "population",
  income = "income_per_capita",
  type = "corrected"
)
ci
```

## Theil T

`theil_t()` calculates the Theil T Index of a given accessibility distribution.
Values range from 0 (when all individuals have exactly the same accessibility
levels) to the natural log of *n*, in which *n* is the number of individuals in
the accessibility dataset.

If the individuals can be classified into mutually exclusive and completely
exhaustive groups (i.e. into groups that do not overlap and cover the entire
population), the index can be decomposed into a between- and a within-groups
inequality component. The function includes a `socioeconomic_groups` parameter
to indicate which variable from the sociodemographic dataset should be used
identify the socioeconomic groups used to calculate these components.

Please note that the output `theil_t()` varies based on the value of
`socioeconomic_groups`. If `NULL` (the default), the between- and within-groups
components are not calculated, and the function returns a data frame containing
only the total aggregate inequality for the returned area. If
`socioeconomic_groups` is not `NULL`, however, the function returns a list
containing three dataframes: one summarizing the total inequality and the
between- and within-groups components, one listing the contribution of each
group to the between-groups component and another listing the contribution of
each group to the within-groups component. Both behaviors are shown below.

```{r}
theil_without_groups <- theil_t(
  access,
  sociodemographic_data = land_use_data,
  opportunity = "jobs",
  population = "population"
)
theil_without_groups

# some cells are classified as in the decile NA because their income per capita
# is NaN, as they don't have any population. we filter these cells from our
# accessibility data, otherwise the output would include NA values (note that
# subsetting the data like this doesn't affect the assumption that groups are
# completely exhaustive, because cells with NA income decile don't have any
# population)

na_decile_ids <- land_use_data[is.na(land_use_data$income_decile), ]$id
no_na_access <- access[! access$id %in% na_decile_ids, ]
sociodem_data <- land_use_data[! land_use_data$id %in% na_decile_ids, ]

theil_with_groups <- theil_t(
  no_na_access,
  sociodemographic_data = sociodem_data,
  opportunity = "jobs",
  population = "population",
  socioeconomic_groups = "income_decile"
)
theil_with_groups
```


## Foster-Greer-Thorbecke (FGT) poverty measures

`fgt_poverty()` calculates the FGT metrics, a family of poverty measures
originally proposed by @foster1984class that capture the extent and severity of
poverty within an accessibility distribution. The FGT family is composed of
three measures that differ based on the $\alpha$ parameter used to calculate
them (either 0, 1 or 2) and which also changes their interpretation:

- with $\alpha = 0$ (FGT0) the measure captures the extent of poverty as a
  simple headcount - i.e. the proportion of people below the poverty line;
- with $\alpha = 1$ (FGT1) the measure, also know as the "poverty gap index",
  captures the severity of poverty as the average percentage distance between
  the poverty line and the accessibility of individuals below the poverty line;
- with $\alpha = 2$ (FGT2) the measure simultaneously captures the extent and
  the severity of poverty by calculating the number of people below the poverty
  line weighted by the size of the accessibility shortfall relative to the
  poverty line.

This function includes an additional `poverty_line` parameter, used to define
the poverty line below which individuals are considered to be in accessibility
poverty.

```{r}
poverty <- fgt_poverty(
  access,
  sociodemographic_data = land_use_data,
  opportunity = "jobs",
  population = "population",
  poverty_line = 95368
)
poverty
```

# References