Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot convert an object of type Type{Union{Missing, Float64}} to an object of type DataType #37

Closed
evetion opened this issue May 6, 2019 · 7 comments
Labels

Comments

@evetion
Copy link

evetion commented May 6, 2019

Thanks for the package!

I'm in the process of adding interpolation methods to https://github.com/evetion/GeoRasters.jl using GeoStats (GeoStatsDevTools). I hope to just provide an interpolate(::GeoArray, ::Solver) method.

However, I run into trouble with Array{Union{Missing, Float64}} support. The following code works flawlessly:

z = Array{Float64}(rand(10, 10))
z[2,2] = NaN
z[3,3] = NaN

problemdata = RegularGridData(Dict(:z=>z), (0.,0.), (1.,1.))
problemdomain = PointSet([1.5 2.5; 1.5 2.5])  # center coords of two missing values
problem = EstimationProblem(problemdata, problemdomain, :z)
solve(problem, Kriging())

While the same code, using Missing support:

z = Array{Union{Missing, Float64}}(rand(10, 10))
z[2,2] = missing
z[3,3] = missing

problemdata = RegularGridData(Dict(:z=>z), (0.,0.), (1.,1.))
problemdomain = PointSet([1.5 2.5; 1.5 2.5])  # center coords of two missing values
problem = EstimationProblem(problemdata, problemdomain, :z)

fails with:

ERROR: MethodError: Cannot convert an object of type Type{Union{Missing, Float64}} to an object of type DataType

@juliohm
Copy link
Member

juliohm commented May 6, 2019

Thank you @evetion for reporting the issue. It is always great to see people playing with missing values 👍

I've submitted a patch release with a fix, and it was already merged. Missing values are supported, but for some reason I forgot to update the problem definitions to retrieve the underlying data types: JuliaEarth/GeoStatsBase.jl@2d55fec

I would like to comment on your use case however because I think there is a hidden assumption about some of the solvers I didn't clarify well in the docs. At the moment, some solvers (not all) assume that the data object is "inside" the domain object (interpolation). In your case the coordinates of the estimation domain lie outside the convex hull of the spatial data (extrapolation), and I think solvers like Kriging won't work. I have to rethink the API at some point in the future to handle completely disconnected data and domain.

As an alternative solution, you could try formulate your estimation problem as a "gap-filling" problem where you copy all the data with gaps to the domain, and let Kriging fill the gaps. By default, the EstimationProblem object uses a SimpleMapper algorithm to find the nearest point in the domain to any data point. You can use a CopyMapper instead if you are interested in Kriging for example:

using GeoStats

data = RegularGridData(...)
domain = PointSet(...) + PointSet(coordinates(data))
problem = EstimationProblem(data, domain, :z, mapper=CopyMapper(...))

The + (or union) operation between point sets is not implemented yet, but you could work on the coordinate matrix directly by hcat-ing the coordinates of the domain with those of the data. The CopyMapper constructor here let's you choose which data values are copied from data to domain. The origin and destination vectors are vectors of integer locations. Both data and domain are indexed from 1:npoints(data) and 1:npoints(domain).

I will try to find the time to work on these features, please let me know if something is not clear. Also feel free to contribute :) The union operation between various types of domains would be very handy.

@juliohm juliohm closed this as completed May 6, 2019
@juliohm
Copy link
Member

juliohm commented May 6, 2019

Adding a note to my previous comment. You should be fine with the current problem setup if you are only using global Kriging. That is, you are not using the options neighborhood, or maxneighbors, etc. In global Kriging, you build the linear system once using all the raster data, and then apply the model to the new locations. If you are however using some sort of local Kriging, then the problem setup is problematic.

@evetion
Copy link
Author

evetion commented May 7, 2019

Thanks for fixing this bug so quickly!

With regards to your comments, I have two questions:

  • In my understanding of the example, the domain is inside the convex hull of data. The RegularGridData(rand(10,10)) has a bounding box from (0,0) to (10,10). In the case of center coordinates, the bounding box is (0.5, 0.5) to (9.5, 9.5). I request an estimation for (1.5, 1.5) and (2.5, 2.5).
  • In your CopyMapper example, coupled with your last comment, I'm unclear how I would apply local Kriging. The Mappers are used to find relevant data points, right? So how does adding all data coordinates (of already known values) to the domain help the algorithm?

It seems I still have some fundamental misunderstanding of GeoStats, I will try to dive into source code.

@juliohm
Copy link
Member

juliohm commented May 7, 2019

Sorry for the confusion! When you typed RegularGridData(Dict(:z=>z), (0.,0.), (1.,1.)) I was thinking about the other constructor, and interpreted (1.,1.) as the actual size (upper left) of the grid as opposed to spacing. Too many constructors, I always forget which is which.

Everything is correct as shown in the plot below. Please ignore the fact that there is an offset between RegularGridData and PointSet (corner vs. center) in the plot. This is a plotting issue that I hope to fix using Makie.jl in the future.

using GeoStats
using Plots

z = Array{Float64}(rand(10, 10))
z[2,2] = NaN
z[3,3] = NaN

problemdata = RegularGridData(Dict(:z=>z), (0.,0.), (1.,1.))
problemdomain = PointSet([1.5 2.5; 1.5 2.5])

plot(problemdata)
plot!(problemdomain)

Screenshot from 2019-05-07 08-36-49

@juliohm
Copy link
Member

juliohm commented May 8, 2019

@evetion I just want to make sure you are using the CopyMapper in this case. Notice that the SimpleMapper will just try to find the nearest data point to the two points you are trying to estimate and copy the values. So at the time of estimation, there will be no point left to estimate. If you look the example in the README of the project you will understand the difference. In that example, there are only a few locations with data and Kriging fills the rest. In your case you have data everywhere except two locations. You could define a domain that is also a regular grid of the same size of the image, and then CopyMapper the values that have data already.

Please let me know if you need help in the use case.

@evetion
Copy link
Author

evetion commented May 8, 2019

Ok, after some reading the source in the different repositories, the Mapper is quite important and the domain is different from what I thought. I assumed the domain is (only) a definition of the unknowns, but it's the complete coordinate space. The spatial data is actually not aligned/the same with/as domain space, hence the abstraction via the (hidden) Mapper. I got the architecture wrong.

So my example above should be:

using GeoStats

z = Array{Float64}(rand(10, 10))
z[2,2] = NaN
z[3,3] = NaN

problemdata = RegularGridData(Dict(:z=>z), (0.,0.), (1.,1.))
problemdomain = RegularGrid(size(z), (0.,0.), (1.,1.))

# mapper = Dict(x=>x for x in LinearIndices(z)[.!isnan.(z)])  # copymapper behaviour

problem = EstimationProblem(problemdata, problemdomain, :z, mapper=CopyMapper())
@assert length(datamap(problem, :z)) == 98  # 10 x 10 - 2  locations for which we have data
solution = solve(problem, Kriging())

I would advise to document this default Mapper behavior better. Its API is currently only documented in the Developers Guide, but the mapping abstraction itself could be explained on some higher level architecture page. Note also that this default keyword SimpleMapper is missing from the EstimationProblem documentation here.

Thanks for all the advice!

@juliohm
Copy link
Member

juliohm commented May 8, 2019

You've got it right! You can also pass the list of indices you wish to copy in the CopyMapper constructor. The first list of indices refers to the data, and the second list of indices refers to the domain. The only requirement is that these two lists of indices have the same length.

I am actually thinking about a major rewrite of these components in the future to allow for more general spatial problems. I didn't find the time yet, but I will try to look into it for the next paper that I am writing.

Sorry for the confusion again. If you have a good documentation in mind, please feel free to submit a PR. I'd be happy to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants