Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terra and raster extract output differs for categorical values #776

Closed
tomroh opened this issue Aug 23, 2022 · 6 comments
Closed

terra and raster extract output differs for categorical values #776

tomroh opened this issue Aug 23, 2022 · 6 comments

Comments

@tomroh
Copy link

tomroh commented Aug 23, 2022

For the raster package:

x <- raster::raster(<my-file>)
y <- sf::st_buffer( sf::st_as_sfc(list(sf::st_point(c(<Longitude>, <Latitude>))), crs = 4326),  dist = 10)
terra::extract(x, sf::as_Spatial(y))

Returns the integer values of the categorical values in the dataset.

z <- terra::rast(<my-file>)
terra::extract(z, terra::vect(y))

Returns a factor values but the levels are not the same integer levels within the raster dataset.

Per https://stackoverflow.com/questions/69533813/terra-extracting-incorrect-value-from-categorical-raster, this can be resolved by:

levels(z) <- NULL
terra::extract(z, terra::vect(y))

I could not find this anywhere in the documentation. Can an argument be added that turns off the "factoring" the values? Or can the returned factor include the correct integer values?

@rhijmans
Copy link
Member

What would be a good use-case where you would want the "code" (integer level) rather than the actual "value" (factor label)?

@tomroh
Copy link
Author

tomroh commented Aug 24, 2022

The "code" can be a key in a larger database where the "code" is used as the link in a relational data model. i.e. extracting the "values" and trying to join them to another table would break the model.

@rhijmans
Copy link
Member

rhijmans commented Aug 25, 2022

Example data

library(terra)
set.seed(0)
r <- rast(nrows=10, ncols=10)
values(r) <- sample(3, ncell(r), replace=TRUE) + 10
levels(r) <- data.frame(id=11:13, cover=c("forest", "water", "urban"), letters=letters[1:3], value=10:12)

Default

extract(r, cbind(0,0))
#  cover
#1 water

Solution 1: Set the active category to 0

activeCat(r) <- 0
extract(r, cbind(0,0))
#  id
#1 12

Solution 2: Remove the levels

activeCat(r) <- 1
levels(r) <- NULL
extract(r, cbind(0,0))
#    cover
#1      12

I could also add an argument "raw=TRUE" to extract, as you suggest. That used to be there (as factors=TRUE), but the large number of options and their interactions, was making things a bit complicated. I would prefer not to go back there, but it may be preferable so that you can have a simple workflow that you know will work whether there are factors or not.

Your idea of having the "raw values" as the R factor indices as well is good, in principle, but I am not sure if that is possible. I think they have to be from 1 to n. That would require adding, perhaps very many, empty levels if the raw values start at, say, 10,000.

@rhijmans
Copy link
Member

I have added argument raw=FALSE. When set to TRUE a matrix with the raw values is returned

library(terra)
#terra 1.6.10
set.seed(0)
r <- rast(nrows=10, ncols=10)
values(r) <- sample(3, ncell(r), replace=TRUE) + 10
levels(r) <- data.frame(id=11:13, cover=c("forest", "water", "urban"), letters=letters[1:3], value=10:12)
 
extract(r, cbind(0,0))
#  cover
#1 water

extract(r, cbind(0,0), raw=TRUE)
#     cover
#[1,]    12

@tomroh
Copy link
Author

tomroh commented Aug 25, 2022

Thanks!

@tomroh
Copy link
Author

tomroh commented Aug 25, 2022

Your idea of having the "raw values" as the R factor indices as well is good, in principle, but I am not sure if that is possible. I think they have to be from 1 to n. That would require adding, perhaps very many, empty levels if the raw values start at, say, 10,000.

Yes, I thought this might be case with using factor. That would add too much overhead for large values. A formal datatype that behaves as integer coded strings allowing non-consecutive integer levels doesn't exist currently in R. This would be useful in a variety of applications in R.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants