Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpatRaster breaks when saved as an R object #549

Closed
mirt001 opened this issue Feb 21, 2022 · 9 comments
Closed

SpatRaster breaks when saved as an R object #549

mirt001 opened this issue Feb 21, 2022 · 9 comments

Comments

@mirt001
Copy link

mirt001 commented Feb 21, 2022

SpatRaster objects do not seem to work as regular R objects, when it comes to being saved using save, or saveRDS, or serialize. Moreover no error or warning is produced when saving SpatRaster R objects. This means that this issue will cause data loss without warning when encountered.

The code below can be used to reproduce the issue.

> saveRDS(rast(nrows=108, ncols=21, xmin=0, xmax=10),'SpatRaster.RDS')
> r <- readRDS('./SpatRaster.RDS')
> r
class       : SpatRaster 
Error in x@ptr$nrow() : external pointer is not valid

Ideally, SpatRasters would behave like any other R object.
If the ideal solution is not preferred, alternatively serialize could be overloaded for class SpatRaster to give a clear error. Moreover, the documentation (?rast ?SpatRaster ?writeRaster) can be improved to warn users about SpatRasters limitation.

@Rapsodia86
Copy link

This is not an issue. In the package description, at the very beginning it says:
"(..) These classes hold a C++ pointer to the data and they cannot be directly saved to a ".Rds" file or
used in cluster computing. They cannot be recovered from a saved R session either. See wrap or
writeRaster to work around that limitation."
https://cran.r-project.org/web/packages/terra/terra.pdf

@mirt001
Copy link
Author

mirt001 commented Feb 21, 2022

Thank you for pointing out the relevant part of the documentation. I missed the paragraph you referred me to, and indeed, something along those lines is what I was expecting.

Yet I am a bit confused of why you consider it to not be an issue. It is good that SpatRaster deviating from how R objects behave is partially documented, but, in my opinion, functions that should work on R objects, should preferably work on R objects, unless they shouldn't for some specific reason, in which case, they shouldn't. What currently happens is that r <- rast(nrows=108, ncols=21, xmin=0, xmax=10); save(r,file='SpatRaster.RData') works perfectly, but it yields an unexpected result, which includes data loss. I believe this is fixable by overloading serialize for SpatRaster class and probably terra classes in general. Do you see a technical reason why this would not be possible, or not a good idea?

Moreover, I see only benefits from mentioning this aspect in parts of the documentation that are more read than the introduction, e.g. ?SpatRaster , ?rast and possibly ?writeRaster. Obviously, some users, like me, missed the introduction.

Of course, now I am thinking that this could be an issue for CRAN as well, as this could have been checked there, but it would be a completely different issue.

@rhijmans
Copy link
Member

Thank you for your suggestions. I have now added serialize and saveRDS methods for SpatRaster and SpatVector

library(terra)
#terra 1.5.23
f <- system.file("ex/lux.shp", package="terra")
v <- vect(f)
saveRDS(v, "test.rds")
readRDS("test.rds")
#[1] "This is a PackedSpatVector object. Use 'terra::vect()' to unpack it"
 
f <- system.file("ex/elev.tif", package="terra")
r <- rast(f)
saveRDS(r, "test.rds")
#Error: [only in-memory SpatRasters can be save to RDS] 
 
saveRDS(r*1, "test.rds")

readRDS("test.rds")
[1] "This is a PackedSpatRaster object. Use 'terra::rast()' to unpack it"
 
readRDS("test.rds") |> rast()
#class       : SpatRaster 
#dimensions  : 90, 95, 1  (nrow, ncol, nlyr)
#resolution  : 0.008333333, 0.008333333  (x, y)
#extent      : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#source      : memory 
#name        : elevation 
#min value   :       141 
#max value   :       547 

So this won't work for SpatRasters that point to a file or files. Further refinement could be to then read all values into memory for smaller SpatRasters.

save still does not work, but that does not bother me too much (asking you to save a sessions is the first thing I would remove from R if I were the boss).

@mirt001
Copy link
Author

mirt001 commented Feb 22, 2022

From my point of view, this wasn't a request, but a suggestion that may or may not be implemented some time in the future. Nevertheless, your promptness in tackling this issue is greatly appreciated. Moreover I learned quite a bit from your reply alone.

  1. I didn't know that the sole issue is whether the Spat* object is in memory or not, meaning that an in-memory Spat* object works with saveRDS. I assumed it must be more than this to it.
  2. I didn't know that it's as easy as r*1 to move it to memory, although it makes perfect sense.
  3. I didn't know |> is a thing in R, apart from its magrittr version.

I understand your feelings about save. I feel similarly, although perhaps not as strong. I find it useful in very particular situations, but at the same time, I understand its potentially destructive effects and how ultimately it can always be replaced by saveRDS with just some minimal wrapping code (initially I thought that's what save is). I also understand how save cannot be overloaded, as it works on symbols, characters, or lists, so not on R objects generally, for which a method specific to SpatRaster could be written. Midthrough writing this I was confused of why wouldn't overloading serialize be enough to prevent the usage of save on terra classes, but I just checked the sources and although both saveRDS and save depend on R_serialize, they don't directly depend on R's base::serialize. So not much, beside documentation, can be done about it when it comes to terra objects, which is perhaps one more reason why save shouldn't exist (or at least not in its current form). I am aware that you (@rhijmans and @Rapsodia86) know all of this, but I think it may still be useful that I am documenting as much as possible of this issue here for future me, and also perhaps future users/contributors.

Ultimately, I still think that the documentation for ?terra::SpatRaster could include a mention that only in-memory objects can be saveRDS, with the example you provided (r*1) for moving small rasters to memory.

@rhijmans
Copy link
Member

I didn't know that the sole issue is whether the Spat* object is in memory or not, meaning that an in-memory Spat* object works with saveRDS. I assumed it must be more than this to it.

The fundamental issue is that Spat* objects are S4 (R) objects that hold a reference object (a C++ object). serialize/saveRDS do not work correctly with reference objects. Your suggestion to overload serialize/saveRDS was very useful. These overloaded functions call wrap to create an R representation (list) of the data, without the reference object. That is why after readRDS vect or rast need to be called. Perhaps a nuisance, but there is a clear messages that this needs to be done if you forget. So no big surprises.

Doing this with a raster that points to a file is a different problem. There is no guarantee that the file will exist in the future, let alone on another computer. There are specialized file formats for raster data such as GeoTiff that can, and should be used instead. Or you can just store the filename. Also, right now, wrap may not store all metadata, but that can be fixed; it is just a rather inefficient format that should be used with some caution. So my sense for the documentation is to first and foremost advice against the use of saveRDS.

I didn't know that it's as easy as r*1 to move it to memory, although it makes perfect sense.

There is an internal method terra:::readAll() that does this as well (in-place). I have not exposed it because it should not normally help to use it. saveRDS now does this automatically if the raster is not too large.

@brownag
Copy link
Contributor

brownag commented Feb 22, 2022

There is an internal method terra:::readAll() that does this as well (in-place). I have not exposed it because it should not normally help to use it. saveRDS now does this automatically if the raster is not too large.

Sorry, a bit off-topic but I was wondering about terra:::readAll(). I recently was porting over some code that used raster::readAll()... and found I needed to do be able to do something e.g. r*1 to "force" raster values be read into memory by terra in lieu of readAll().

I have a possible use case where exporting readAll would "help":

Say you have an R function that provides access to a web coverage service that downloads a tif file to a temporary location for some user-specified extent. After the download is complete, the result is read into memory and the temporary file deleted so the user doesn't have to deal with the temp file/function has no side effects in the file system. To be able to unlink the temporary file, and have the user still be able to use the SpatRaster, values need to be in memory. For my example case there are API constraints on extent and resolution that generally prevent requesting very large results--so it can be "assumed" the result will fit in memory.

Any suggestions for this?

@kadyb
Copy link
Contributor

kadyb commented Feb 22, 2022

@brownag, this is related: #507

@brownag
Copy link
Contributor

brownag commented Feb 22, 2022

@brownag, this is related: #507

@kadyb Thanks! I recall reading that now. I had done the conversion for the case in question back in December, so about a month before the set.values(<SpatRaster>, <missing>) was added. Incidentally back in December my way of forcing the SpatRaster into memory was values(x) <- values(x)

@rhijmans
Copy link
Member

Thanks, I also forgot about set.values(<SpatRaster>, <missing>)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants