Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert list columns to character strings before passing to GDAL? #2142

Closed
robinlovelace-ate opened this issue Apr 4, 2023 · 9 comments
Closed

Comments

@robinlovelace-ate
Copy link

robinlovelace-ate commented Apr 4, 2023

Describe the bug

This is a bit of an edge case I think but I have data with the following structure:

head(mm_data)
Simple feature collection with 6 features and 37 fields
Geometry type: LINESTRING
Dimension:     XYZ
Bounding box:  xmin: 434106 ymin: 1123972 xmax: 454074.9 ymax: 1203116
z_range:       zmin: 4.4 zmax: 67.8
Projected CRS: OSGB36 / British National Grid
                gml_id                            identifier    beginLifespanVersion localId
1 osgb4000000003210905 http://data.os.uk/id/4000000003210905 2021-01-20T00:00:00.000   4e+15
2 osgb4000000003218329 http://data.os.uk/id/4000000003218329 2021-04-21T00:00:00.000   4e+15
3 osgb4000000003218851 http://data.os.uk/id/4000000003218851 2021-04-21T00:00:00.000   4e+15
4 osgb4000000003219012 http://data.os.uk/id/4000000003219012 2021-04-21T00:00:00.000   4e+15
5 osgb4000000003220023 http://data.os.uk/id/4000000003220023 2021-02-16T00:00:00.000   4e+15
6 osgb4000000003220495 http://data.os.uk/id/4000000003220495 2021-02-17T00:00:00.000   4e+15
           namespace fictitious validFrom     reasonForChange    roadClassification
1 http://data.os.uk/      FALSE      <NA> Modified Attributes               Unknown
2 http://data.os.uk/      FALSE      <NA> Modified Attributes Classified Unnumbered
3 http://data.os.uk/      FALSE      <NA> Modified Attributes Classified Unnumbered
4 http://data.os.uk/      FALSE      <NA> Modified Attributes Classified Unnumbered
5 http://data.os.uk/      FALSE      <NA> Modified Attributes          Unclassified
6 http://data.os.uk/      FALSE      <NA> Modified Attributes          Unclassified
                routeHierarchy          formOfWay trunkRoad primaryRoute operationalState
1 Restricted Local Access Road Single Carriageway     FALSE        FALSE             Open
2                   Minor Road Single Carriageway     FALSE        FALSE             Open
3                   Minor Road Single Carriageway     FALSE        FALSE             Open
4                   Minor Road Single Carriageway     FALSE        FALSE             Open
5                   Local Road Single Carriageway     FALSE        FALSE             Open
6                   Minor Road Single Carriageway     FALSE        FALSE             Open
              provenance length length_uom matchStatus startGradeSeparation endGradeSeparation
1 OS Rural And OS Height  46.77          m    No Match                    0                  0
2 OS Rural And OS Height 909.12          m     Matched                    0                  0
3 OS Rural And OS Height 206.13          m     Matched                    0                  0
4 OS Rural And OS Height 308.57          m     Matched                    0                  0
5 OS Urban And OS Height 118.39          m     Matched                    0                  0
6 OS Urban And OS Height  80.36          m     Matched                    0                  0
  inDirection inDirection_uom inOppositeDirection inOppositeDirection_uom averageWidth
1         0.9               m                 0.1                       m           NA
2         1.0               m                39.7                       m           NA
3         0.0               m                 1.3                       m           NA
4         1.5               m                 3.7                       m           NA
5         0.0               m                 5.4                       m          6.0
6         1.2               m                 0.0                       m          6.3
  averageWidth_uom minimumWidth minimumWidth_uom          confidenceLevel cycleFacility
1             <NA>           NA             <NA>                     <NA>          <NA>
2             <NA>           NA             <NA>                     <NA>          <NA>
3             <NA>           NA             <NA>                     <NA>          <NA>
4             <NA>           NA             <NA>                     <NA>          <NA>
5                m          5.4                m OS Urban And Full Extent          <NA>
6                m          5.5                m OS Urban And Full Extent          <NA>
  wholeLink roadStructure alternateIdentifier|ThematicIdentifier|identifier
1      <NA>          <NA>                                                  
2      <NA>          <NA>                                9010_4350841173099
3      <NA>          <NA>                                9010_4433091123798
4      <NA>          <NA>                                9010_4392871146659
5      <NA>          <NA>                                9010_4471001141870
6      <NA>          <NA>                                9010_4465511140663
                        identifierScheme        roadName alternateName
1                                                                     
2 NSG Elementary Street Unit ID (ESU ID)                              
3 NSG Elementary Street Unit ID (ESU ID)                              
4 NSG Elementary Street Unit ID (ESU ID)                              
5 NSG Elementary Street Unit ID (ESU ID) Garthspool Road              
6 NSG Elementary Street Unit ID (ESU ID) Kantersted Road              
  roadClassificationNumber             centrelineGeometry
1                     <NA> LINESTRING Z (454046.1 1203...
2                     <NA> LINESTRING Z (434106 117268...
3                     <NA> LINESTRING Z (443504.1 1124...
4                     <NA> LINESTRING Z (438475.8 1143...
5                     <NA> LINESTRING Z (447118.5 1141...
6                     <NA> LINESTRING Z (446550.8 1140...

When I try to write to .gpkg file it errors as follows:

+   sf::write_sf(mm_data, "C:/Users/RLOVELAC/data/os/mm_data_RoadLink.gpkg")
Error in clean_columns(as.data.frame(obj), factorsAsCharacter) : 
  list columns are only allowed with raw vector contents
Timing stopped at: 9.46 0.08 9.59

To Reproduce

Sign up for Ordnance Survey license and follow instructions in Active Travel England's first in-house developed R package: https://github.com/acteng/mastermapr

I think this will fix it, removing the list columns:

coltypes = sapply(mm_data, class)
cols_remove = which(coltypes == "list")
mm_data_chars = mm_data[-cols_remove]
Paste the output of your `sessionInfo()` and `sf::sf_extSoftVersion()`
sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mastermapr_0.0.0.9000

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.10        pillar_1.8.1       compiler_4.2.2     R.utils_2.12.2    
 [5] R.methodsS3_1.8.2  class_7.3-20       tools_4.2.2        digest_0.6.31     
 [9] R.cache_0.16.0     evaluate_0.20      lifecycle_1.0.3    tibble_3.2.0      
[13] pkgconfig_2.0.3    rlang_1.1.0        reprex_2.0.2       DBI_1.1.3         
[17] cli_3.6.0          rstudioapi_0.14    yaml_2.3.7         parallel_4.2.2    
[21] xfun_0.37          fastmap_1.1.1      e1071_1.7-13       styler_1.9.1      
[25] withr_2.5.0        dplyr_1.1.0        knitr_1.42         fs_1.6.1          
[29] generics_0.1.3     vctrs_0.6.0        classInt_0.4-9     grid_4.2.2        
[33] tidyselect_1.2.0   glue_1.6.2         sf_1.0-11          R6_2.5.1          
[37] processx_3.8.0     fansi_1.0.4        pbapply_1.7-0      rmarkdown_2.20    
[41] clipr_0.8.0        callr_3.7.3        purrr_1.0.1        magrittr_2.0.3    
[45] ps_1.7.2           htmltools_0.5.4    units_0.8-1        collapse_1.9.3    
[49] utf8_1.2.3         KernSmooth_2.23-20 proxy_0.4-27       R.oo_1.25.0 
sf::sf_extSoftVersion()
          GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H           PROJ 
       "3.9.3"        "3.5.2"        "8.2.1"         "true"         "true"        "8.2.1" 
@edzer
Copy link
Member

edzer commented Apr 4, 2023

It's not clear to me what the issue is here: you get a clear error message, and use it to continue. What is also not clear from your issue is which column(s) is/are list columns, and why - can they non-trivially be converted into non-list columns, and if yes how? Also, how did they arise, did st_read do this, or some other software? It seems that st_read() can read GDAL Types of OFT*List, but cannot write such data back with st_write(). That could in principle be implemented, with some effort.

@robinlovelace-ate
Copy link
Author

did st_read do this, or some other software?

st_read() did it but it got 'fastbinded' thanks to this function lifted from @kadyb's Gist 🙏 https://github.com/acteng/mastermapr/blob/main/R/mm_read.R#L70

robinlovelace-ate added a commit to acteng/mastermapr that referenced this issue Apr 4, 2023
@robinlovelace-ate
Copy link
Author

robinlovelace-ate commented Apr 4, 2023

Update here: st_read() creates the unwritable error-generating object in the absence of the 'fastbind' step also, as per test in the commit above.

@edzer
Copy link
Member

edzer commented Apr 4, 2023

Looks like the underlying problem is badly written GML, which is read as a bad data.frame; collapse::unlist2d() seems to do the trick but messes up the geometry list column.

@edzer
Copy link
Member

edzer commented Apr 4, 2023

> x = st_sf(geom = st_sfc(st_point(0:1), st_point(1:0)))
> x$a = list(1, 2) # bad data.frame: should have c(1, 2) here
> x
Simple feature collection with 2 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 0 ymin: 0 xmax: 1 ymax: 1
CRS:           NA
         geom a
1 POINT (0 1) 1
2 POINT (1 0) 2
> st_write(x, "/tmp/x.gpkg")
writing: substituting ENGCRS["Undefined Cartesian SRS with unknown unit"] for missing CRS
Error in clean_columns(as.data.frame(obj), factorsAsCharacter) : 
  list columns are only allowed with raw vector contents

You probably don't want this to be written to an OFTIntegerList field in the GPKG, as it only propagates the problem.

@robinlovelace-ate
Copy link
Author

robinlovelace-ate commented Apr 4, 2023

Looks like the underlying problem is badly written GML, which is read as a bad data.frame; collapse::unlist2d() seems to do the trick but messes up the geometry list column.

My understanding was that it's non geo attribute columns like roadName. Would the geometries be incorrect if unlist2d() messes with them?

The work-around I came up with was

coltypes = sapply(mm_data, class)
cols_remove = which(coltypes == "list")
mm_data_chars = mm_data[-cols_remove]

That fixed the problem documented in the original post.

I'm not sure what is in the list columns. I can investigate if that would be of use and interest.

@edzer
Copy link
Member

edzer commented Apr 4, 2023

If removing the list columns is good enough, then it solves your problem. Another simple approach would be

> unl = function(x) { if(inherits(x, "sfc")) { x } else { unlist(x) } }
> data.frame(lapply(x, unl))
     geometry a
1 POINT (0 1) 1
2 POINT (1 0) 2
> data.frame(lapply(x, unl)) |> sapply(class)
$geometry
[1] "sfc_POINT" "sfc"      

$a
[1] "numeric"

but of course I don't have your data, so can only say it worked for my example.

@robinlovelace-ate
Copy link
Author

Thanks for the additional reprex Edzer and apologies for not bringing one, will try to create a minimal example dataset based on the byzantine inputs I have from OS. If that code can pick up on edge cases like this I'm thinking it could go into sf somewhere, in scope for st_write() or st_as_sf() or elsewhere?

@edzer
Copy link
Member

edzer commented Apr 12, 2023

OK, please reopen if you have one.

@edzer edzer closed this as completed Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants