as.data.frame defaults to na.rm = T for spatraster with many layers #792

ECarnell · 2022-09-05T10:48:51Z

The argument na.rm now defaults to TRUE in the as.data.frame function.

While this makes sense for single layer rasters its really unhelpful for multi-band rasters.

This is also a change from the previous raster package, where the default for na.rm was FALSE

require(terra)
r <- rast(ext = ext(c(xmin = 0, xmax = 2, ymin = 0, ymax = 2)),  res = 1)
r1 <- setValues(r, c(1, rep(NA,3)))
r2 <- setValues(r, c(rep(NA,3),1))

## no rows
c(r1,r2) |> as.data.frame(xy = T) 
#[1] x     y     lyr.1 lyr.1
#<0 rows> (or 0-length row.names)

# all results 
c(r1,r2) |> as.data.frame(xy = T, na.rm = F) 
#    x   y lyr.1 lyr.1
#1 0.5 1.5     1    NA
#2 1.5 1.5    NA    NA
#3 0.5 0.5    NA    NA
#4 1.5 0.5    NA     1

The text was updated successfully, but these errors were encountered:

rhijmans · 2022-09-05T16:34:14Z

With trepidation I changed the default to FALSE. That may cause pain in existing scripts so I am a bit on the fence about this.
I have also added a third option na.rm=NA to only remove records (cells) where all layers are NA.

c(r1,r2) |> as.data.frame(xy = T, na.rm=NA) 
#    x   y lyr.1 lyr.1
#1 0.5 1.5     1    NA
#4 1.5 0.5    NA     1

That could also be sensible default, as it would not change the behavior for a single layer SpatRaster

kadyb · 2023-02-07T12:07:26Z

What is the status of this issue? In 57ab669 you changed na.rm=FALSE to na.rm=NA. Personally, I would vote to set this argument to FALSE by default.

kadyb · 2023-02-12T16:51:22Z

Or if missing values are to be removed by default, perhaps there should be message that some observations have been removed from the data frame?

rhijmans · 2023-02-13T01:14:20Z

I will check how many revdeps break with that (but of course there is other code that will break that is not in dependent packages). But whether they do or not, why you would you want records with only NAs. I suppose that would only be on the rare case that you would want to manipulate the data.frame and then recreate a SpatRaster from that. Is that necessary?

rhijmans · 2023-02-13T01:24:22Z

To better express myself: I would hope and think that going from a SpatRaster to a data.frame and back would be rare. Also, it would be easy to detect the problem when creating the new SpatRaster as the number of rows in the data.frame won't match ncell of the SpatRaster.

kadyb · 2023-02-13T08:03:21Z

I suppose that would only be on the rare case that you would want to manipulate the data.frame and then recreate a SpatRaster from that. Is that necessary?

Yes, exactly. I have some workflows where I convert the raster to data frame and then use kmeans clustering or convert to xgboost::xgb.DMatrix class for machine learning. Later I convert back the results to SpatRaster, but I need to know where the missing values were. BTW: In values() function the default argument is na.rm = FALSE.

Here is example:

library("terra")
r = rast(system.file("ex/elev.tif", package = "terra"))
df = as.data.frame(r, na.rm = FALSE)
mdl = kmeans(na.omit(df), centers = 3)
output = rep(NA, ncell(r))
output[complete.cases(df)] = mdl$cluster
clustering = rast(r, vals = output)

### if `as.data.frame(r, na.rm = NA)` there is warning
### but I didn't know why after updating terra this code stopped working
# Warning message:
# In output[complete.cases(data)] = mdl$cluster :
#   number of items to replace is not a multiple of replacement length

Hongwuliang · 2024-03-28T14:04:33Z

The predict function of the terra package cannot be combined with models trained with XGBoost. The raster data is extensive, and cannot be converted to a data frame, which is causing me a great deal of distress.

rhijmans · 2024-03-28T16:38:32Z

@Hongwuliang: I think it can. See: https://stackoverflow.com/a/74330713/635245

Please create a minimal, self-contained, reproducible example and use that for a question on stackoverflow.com. This github site is not for answering this type of "how do I do this?" questions.

rhijmans closed this as completed in b5d7fd4 Sep 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

as.data.frame defaults to na.rm = T for spatraster with many layers #792

as.data.frame defaults to na.rm = T for spatraster with many layers #792

ECarnell commented Sep 5, 2022 •

edited by rhijmans

rhijmans commented Sep 5, 2022

kadyb commented Feb 7, 2023

kadyb commented Feb 12, 2023

rhijmans commented Feb 13, 2023

rhijmans commented Feb 13, 2023

kadyb commented Feb 13, 2023

Hongwuliang commented Mar 28, 2024

rhijmans commented Mar 28, 2024

as.data.frame defaults to na.rm = T for spatraster with many layers #792

as.data.frame defaults to na.rm = T for spatraster with many layers #792

Comments

ECarnell commented Sep 5, 2022 • edited by rhijmans

rhijmans commented Sep 5, 2022

kadyb commented Feb 7, 2023

kadyb commented Feb 12, 2023

rhijmans commented Feb 13, 2023

rhijmans commented Feb 13, 2023

kadyb commented Feb 13, 2023

Hongwuliang commented Mar 28, 2024

rhijmans commented Mar 28, 2024

ECarnell commented Sep 5, 2022 •

edited by rhijmans