How to remove duplicate geometries?

I'm trying to remove duplicate geometries, in this case points.
There are several ways to do so: my first idea was to use dplyr::distinct(), but it does not seem to work for `geometry` columns.

Some examples below:
```
#Create example dataset
library(sf)
library(dplyr)
d <- structure(list(layer = 274.146911621094, geometry = structure(list(
    `1` = structure(list(structure(c(-3162000, -3150000, -3150000, 
    -3162000, -3162000, 3162000, 3162000, 3150000, 3150000, 3162000
    ), .Dim = c(5L, 2L))), class = c("XY", "POLYGON", "sfg"))), .Names = "1", class = c("sfc_POLYGON", 
"sfc"), precision = 0, bbox = structure(c(-3162000, 3150000, 
-3150000, 3162000), .Names = c("xmin", "ymin", "xmax", "ymax"
), class = "bbox"), crs = structure(list(epsg = NA_integer_, 
    proj4string = "+proj=lcc +lat_1=30 +lat_2=65 +lat_0=48 +lon_0=9.75 +x_0=-6000 +y_0=-6000 +a=6371229 +b=6371229 +units=m +no_defs"), .Names = c("epsg", 
"proj4string"), class = "crs"), n_empty = 0L)), .Names = c("layer", 
"geometry"), row.names = 1L, class = c("sf", "data.frame"), sf_column = "geometry", agr = structure(NA_integer_, .Names = "layer", .Label = c("constant", 
"aggregate", "identity"), class = "factor"))
dpoint <- (st_cast(d, "POINT"))

#Now let's try to eliminate the duplicate point: 4 different ways come to mind
dpoint %>% distinct(geometry) #does nothing   <---- would be my preferred solution
st_intersection(dpoint) #Works, adds columns
st_cast(st_union(dpoint), "POINT") #Works
st_as_sf(as.data.frame(st_coordinates(dpoint)) %>% distinct(X,Y), coords=1:2, crs=st_crs(dpoint)) #Works
```
Now for some performance testing on a larger dataset which I do not attach (10k points, most of which duplicated):

```
library(microbenchmark)
mb <- microbenchmark(times=10,
st_intersection(dpoint),
st_cast(st_union(dpoint), "POINT"),
st_as_sf(as.data.frame(st_coordinates(dpoint)) %>% distinct(X,Y), coords=1:2, crs=st_crs(dpoint)) 
)
```
Result:
```
Unit: milliseconds
                                                                                                        expr
                                                                                     st_intersection(dpoint)
                                                                          st_cast(st_union(dpoint), "POINT")
 st_as_sf(as.data.frame(st_coordinates(dpoint)) %>% distinct(X,      Y), coords = 1:2, crs = st_crs(dpoint))
        min         lq        mean     median          uq         max neval cld
 9683.35427 9777.54914 10036.55727 9975.24093 10205.31676 10667.48132    10   b
  106.32318  108.49353   115.39371  110.64525   111.65030   143.65393    10  a 
   37.90596   38.33749    38.86942   38.61979    39.36003    40.67645    10  a
```
And on a much larger dataset (1.4M points), for the two fastest methods:
```
mb <- microbenchmark(times=10,
st_cast(st_union(dpoint), "POINT"),
st_as_sf(as.data.frame(st_coordinates(dpoint)) %>% distinct(X,Y), coords=1:2, crs=st_crs(dpoint)) 
)
```
Result:
```
Unit: seconds
                                                                                                        expr
                                                                          st_cast(st_union(dpoint), "POINT")
 st_as_sf(as.data.frame(st_coordinates(dpoint)) %>% distinct(X,      Y), coords = 1:2, crs = st_crs(dpoint))
      min        lq     mean    median        uq       max neval cld
 14.26685 14.531474 15.70820 15.675282 16.283850 18.349245    10   b
  5.06904  5.166566  5.98637  5.647217  6.737356  7.637938    10  a
```

Is there any faster, more elegant method? Can't `dplyr::distinct(geometry)` be made to work?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to remove duplicate geometries? #669

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to remove duplicate geometries? #669

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions