-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
st_write loses data when writing to ESRI Shapefile #464
Comments
The field name length restriction is a known feature of shapefiles, and dates way back (10 is generous, MS-DOS had 8 as maximum in file names). Migrate to GPKG and other more modern formats, or manually shorten and disambiguate field names before writing, for example using base::abbreviate(). |
Can we check and abbreviate column names before attempting to st_write using "ESRI Shapefile"? Driver will do it anyways, causing a data loss. We can warn the users that the field names have been abbreviated to comply with ESRI driver limitations |
Sp uses base::abbreviate to automatically handle this issue, I see no reason why sf can't do the same thing. If not, there needs to be a clear warning on st_write that long columns = data loss when output format is shp. |
Not sp, rgdal::writeOGR(). Shapefiles were the only option then; maptools::writeSpatial() did this through foreign::write.dbf(), whose helpfile says:
Why help people to (ab)use shapefiles when we want them to migrate? |
Ah, so it is. Well, I'd love to migrate, but I'm stuck in a very ESRI-centric workplace with change-averse colleagues, so moving on is a bit of a fraught process. Without broader institutional support, I'm just That Coworker. The other issue for now is gpkg's slow disk write speed, which can be very inconvenient. |
I unfortunately second @obrl-soil 's comment in a non-academic setting. Though, internally I could push gpkg through (as I already do for small data sets) the disk write speed is a big inconvenience for large data sets. Which doesn't mean I am in favour of trimming names automatically btw. |
This is related to this thread? The foreign approach is OGR/shapelib - to truncate, risking non-unique field names. In rgdal::writeOGR(), base::abbreviate() is used and Does anyone know how encoding affects the length constraint - it is bytes, not characters, isn't it? On UTF-8:
|
@rsbivand yes, that is what I see too. I really need to find some time to investigate... |
When I write a shp using
st_write(shp1, "myshp.shp", driver="ESRI Shapefile")
if the column names in the attribute table are too long for ESRI the output .dbf shortens these column names BUT ALSO deletes any data in that column. Have tested using integer & numeric data, columns write fine if col names are < 10 characters but end up blank if col names > 10 characters.
The text was updated successfully, but these errors were encountered: