Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

heterogeneous feature collection #49

Open
visr opened this issue May 22, 2020 · 8 comments
Open

heterogeneous feature collection #49

visr opened this issue May 22, 2020 · 8 comments

Comments

@visr
Copy link
Member

visr commented May 22, 2020

I understand that the way to have geometries with attributes behave like tables in this package is to create a StructArray for the collection.

Do you have any idea what we could do when we want to represent different geometry types? Most commonly a dataset is all of the same type, but there are exceptions, and it would be nice to be able to represent them as well.

This understandably fails:

point = meta(Point(3, 1), city="Abuja", rainfall=1221.2)
polygon = meta(Polygon(Point{2, Int}[(3, 1), (4, 4), (2, 4), (1, 2), (3, 1)]), city="Borongan", rainfall=4114.8)

StructArray([point, polygon])  # ArgumentError: type does not have a definite number of fields

If I try to throw it into for instance a TypedTables.Table, it works fine, accepting geometry as a Vector{Any}.

using TypedTables
Table(
    geometry=[Point(3, 1), Polygon(Point{2, Int}[(3, 1), (4, 4), (2, 4), (1, 2), (3, 1)])],
    city=["Abuja", "Borongan"],
    rainfall=[1221.2, 4114.0],
)

I know this is more of a basic StructArrays vs TypedTables question, and understand that with a Vector{Any} things will be slower. But it would be nice to have a "GeometryBasics" table solution for this as well.

EDIT: concrete example here: https://github.com/visr/GeoJSONTables.jl/blob/4104e66a638814d77ef98af1d205450549361519/test/basics.jl#L97-L101

@visr visr changed the title heterogeneous geometry collection heterogeneous feature collection May 22, 2020
@piever
Copy link
Contributor

piever commented Jun 29, 2020

Came here from JuliaArrays/StructArrays.jl#135 (cc: @Sov-trotter )
I'm slowly wrapping my head around this use case. I am slowly realizing that what I said there may be a bit misleading, as you do not want to do the "struct of array, array of structs" transform on the geometries, but just store an array of custom structs (possibly heterogeneous).

In that case, you can use StructArray just like any other table, so the following would work:

StructArray(
    geometry=[Point(3, 1), Polygon(Point{2, Int}[(3, 1), (4, 4), (2, 4), (1, 2), (3, 1)])],
    city=["Abuja", "Borongan"],
    rainfall=[1221.2, 4114.0],
)

GeometryBasics uses custom types (MetaPoint, MetaPolygon), so I guess one should change to a Meta{T} type so that in the heterogenous case the overall array can be of eltype Meta{Any}. We'd need to check that collect_structarray (which implements widening as needed already) widens correctly to Meta{Any}. Otherwise, I think I could finally add the option to do "custom widening" upon collection: I've wanted that for a while and it should be relatively straightforward.

@Sov-trotter
Copy link
Contributor

Sov-trotter commented Jun 29, 2020

Yeah! The above method works. We have done a similar working implementation here.
There's one problem with this approach. We actually construct the geometries initially(the Base.read() methods in Shapefile.jl file) and then have to break them down by using meta()(metadata only) and MetaFree()(geometries only) methods and then put them into StructArray, while what we want is to put intact meta-geometries(metadata + geometries) into the StructArray as a vector.

@visr pointed out how this method of breaking down(not being able to iterate over) "meta-geometries" is a deviation of the basic GeometryBasics idea and how it might create problems in the future when we try put it into Makie(plotting) or performing spatial operations on the data.

So we might keep it as a the last resort in case no other generalization works.

@piever
Copy link
Contributor

piever commented Jun 29, 2020

The only tricky thing is that widening over a custom type is a bit ill-defined, as in general it's impossible to know how the parameters should change. The first step is definitely changing MetaPoint into Meta{Point} to have a Meta{Any} type.

Then we can see to what extent the widening of StructArrays works. One possible solution would be to allow custom widening. Alternatively, one could do a flattening of the structure (into a named tuple with geometry and meta data) on the fly while iterating. Then, once all the relevant vectors are created, one can easily transform the columns into a StructArray{Meta{T}} (with essentially no runtime cost).

@Sov-trotter
Copy link
Contributor

Sov-trotter commented Jul 1, 2020

@piever suggests that automatically widening for custom types seemed tricky while Nesting / unnesting on the fly is much easier.

using GeometryBasics, StructArrays

function maketable(iter)
    unnested_iter = Base.Generator(iter) do geom_meta
        geom = getfield(geom_meta, :main) # well, the public accessor for this
        metadata = getfield(geom_meta, :meta)
        (; geometry=geom, metadata...) # I think the GeometryBasics name for this field is `:position`
    end
    soa = fieldarrays(StructArray(unnested_iter))
    return meta(soa.geometry; Base.tail(soa)...)
end

point1 = meta(Point(2, 1), city="Delhi", rainfall=121.1)
point2 = meta(Point(2, 1), city="Delhi", rainfall=120)

maketable([point1, point2])

The above example is pretty effective when it comes to heterogeneity in features/geometry even when the MetaData types tend to be inconsistent.
This method doesn't work if soa.geometry widens to Vector{Any}. For that a small refactor in GeometryBasics where MetaPoint, MetaPolygon, etc... become Meta{Point}, Meta{Polygon} etc. is needed so that it can return a Meta{Any} .

But I am unsure whether it is useful to change the @meta_type definition only for the sake of heterogeneous geometries?
Again thanks to @piever, when I mentioned him this concern, he instantly came up with a solution. How would it be to have a metageometry type that contains metadata with geometry type Any, viz. AnyMeta (we can obviously name it better xD). This way we preserve the original homogeneous nature whilst introducing a hetero type. This can be easily done by declaring a AnyMeta using @meta_type macro.
Here's a working example :

point1 = meta(Point(2, 1), city="Delhi", rainfall=121.1)

polygon2 = PolygonMeta(Point{2, Int}[(5, 1), (3, 3), (4, 8), (1, 2), (5, 1)], city="Delhi", rainfall=44)

sa = maketable([point1, polygon2])
2-element AnyMeta{Any,Array{Any,1},(:city, :rainfall),Tuple{Array{String,1},Array{Real,1}}}:
 [2, 1]
 Polygon{2,Int64,Point.....}     

sa.any
2-element Array{Any,1}:
 [2, 1]
 Polygon{......}

sa.rainfall
2-element Array{Real,1}:
 121.1
  44

What do you think @visr, @SimonDanisch ?

@visr
Copy link
Member Author

visr commented Jul 1, 2020

The first step is definitely changing MetaPoint into Meta{Point} to have a Meta{Any} type.

This sounds like a good move to me. It would be breaking, though I guess we could temporarily define const MetaPoint = Meta{Point}. But if we are going to make changes to Meta it would be good to tackle #48 as well.

EDIT: probably the name Meta is a bad idea though, since that is already a defined module.

@Sov-trotter
Copy link
Contributor

Sov-trotter commented Jul 6, 2020

Now that things are getting a bit clear, we have come up with a different approach for handling meta and are slowly working towards it. Also experimenting with StructArrays along the way. What we aim to do currently is put geometry and metadata separately in a Feature struct, and create a iteratable StructArray of Feature structs.
Something like this works well.

using StructArrays, GeometryBasics

struct Feature{Geom, NamedTuple}
    geometry::Geom
    properties::NamedTuple
end

p1 = Point(2, 1)
p2 = Point(3, 2)

sa = StructArray([Feature(Point(1, 0), (city = "Delhi", rainfall = 121)),
                Feature(MultiPoint([p1, p2]), (city = "Goa", rainfall = 1211.1)),
                Feature(Point(1.0, 2.2), (city = "Mumbai", rainfall = 1300))])

But here we leave the NamedTuple untyped with is quite hamering for speed incase of homogeneous types.
I'd be nice if @piever and others could suggest given,

struct Feature{Geom, Names, Types}
    geometry::Geom
    properties::NamedTuple{Names, Types}
end

is there a way to have a StructArray of type StructArray{Feature{Any, String, Float64}}

@visr
Copy link
Member Author

visr commented Jul 6, 2020

I'd like to add that Feature here is nothing more than the typed Meta approach suggested, but being worked out outside GeometryBasics for now, in visr/GeoJSONTables.jl#3.

For construction, in most cases it'd be easiest to construct the StructArray from vectors:

StructArray(
    geometry=[Point(3, 1), Polygon(Point{2, Int}[(3, 1), (4, 4), (2, 4), (1, 2), (3, 1)])],
    city=["Abuja", "Borongan"],
    rainfall=[1221.2, 4114.0],
)

Then, once all the relevant vectors are created, one can easily transform the columns into a StructArray{Meta{T}} (with essentially no runtime cost).

How can we do this, given the StructArray defined above?

@piever
Copy link
Contributor

piever commented Jul 6, 2020

I also feel that the Meta approach is a bit extreme, and something like your Feature could be safer.
Normally, if you want to create a new StructArray with the same columns but different eltype, you would just do:
StructArray{NewElType}(fieldarrays(oldstructarray)), which shares the columns, so no runtime cost. With Feature, it slightly depends whether you want to store things nested or not nested.

Nested approach

struct Feature{Geom, NamedTuple}
    geometry::Geom
    properties::NamedTuple
end
sa = StructArray(
    geometry=[Point(3, 1), Polygon(Point{2, Int}[(3, 1), (4, 4), (2, 4), (1, 2), (3, 1)])],
    city=["Abuja", "Borongan"],
    rainfall=[1221.2, 4114.0],
)
geom = sa.geometry
metadata = StructArray(Base.tail(fieldarrays(sa)))
type = Feature{eltype(geom), eltype(metadata)}
feature_vec = StructArray{type}((geom, metadata))

I call this "nested" because the second column of feature_vec is itself a StructArray.

Non-nested approach

This is a bit trickier, because you want the layout of the StructArray to be unnested, unlike the layout of Feature. For that, you would need to follow the example here, overloading getproperty, createinstance, and staticschema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants