BUG: Exporting a GeoDataframe with column 'fid' to GeoPackage will delete that column #1035

joehuanguf · 2019-07-02T21:05:59Z

Steps:

Create a GeoDataframe with a column 'fid'
Export as a GeoPackage using to_file()
Read the exported GeoPackage

Bug

The 'fid' column should exist, but is missing.

Hunches

A scan of both the GeoPandas and the Fiona source code reveal no reason this bug should occur, so I suspect that it's actually an issue with GDAL.

The text was updated successfully, but these errors were encountered:

kuanb · 2019-07-02T21:09:19Z

Example script to recreate:

import geopandas as gpd
import pandas as pd
from shapely.geometry import Point

columns = ['fid', 'gid', 'id']
data = [(1, 1.0, '1')]

df = pd.DataFrame(data, columns=columns)

p = Point(1,2)
gdf = gpd.GeoDataFrame(df, geometry=[p])

print(gdf, '\n')


gdf.to_file('ohm/test', driver='GPKG', limit=0)
gdf2 = gpd.read_file('ohm/test')

print(gdf2)

Output will be as follows:

   fid  gid id     geometry
0    1  1.0  1  POINT (1 2) 

   gid id     geometry
0  1.0  1  POINT (1 2)

fid is trimmed at some point during the export/import operation.

I noticed in ogr2ogr documentation the flag for preserving fid. I wonder if this is related to the need to pass through specific parameters/flags to fiona?

Link: https://gdal.org/programs/ogr2ogr.html#cmdoption-ogr2ogr-preserve-fid

kuanb · 2019-07-02T21:44:24Z

Update:

This is expected behavior for the driver and occurs on write. GPKG uses "fid" as the index. If present, it is assigned as the index "id." A more clear example would be to use a number other than 1 for the "fid" value in the above example. Let's say we use "12" instead.

The resulting output would have the "fid" column missing from the columns but the index value for the first row would be "12" instead of "0."

Reading in the file with fiona we can see:

with fiona.open('ohm/test') as f:
    feats = list(f)
    print(feats)

# [{'type': 'Feature', 'id': '12', 'properties': OrderedDict([('gid', 1.0), ('id', '1')]), 'geometry': {'type': 'Point', 'coordinates': (1.0, 2.0)}}]

In the above example we can see the "12" value being assigned to the feature id. But it is not preserved and carried through to the index in GeoPandas:

gpd.read_file('foo/test')

Returns...

	gid	id	geometry
0	1.0	1	POINT (1 2)

kuanb · 2019-07-03T18:53:40Z

I took some additional notes on what is happening and captured it here: http://kuanbutts.com/2019/07/02/gpkg-write-from-geopandas/

Ultimately, I am not sure if this is a decision GeoPandas should be making. I can see the argument that this is something that a user might want to specifically handle themselves.

The only item I think might be valid in terms of making a modification to GeoPandas would be that, when reading in a GPKG, create an additional column fid and use id values present to create values for fid.

jorisvandenbossche · 2019-07-06T14:33:57Z

@kuanb thanks for the exploration, very useful!

There is also some related discussion to FIDs in fiona in Toblerity/Fiona#327. Additional problem in handling this consistently is that it probably also depends on the driver.
And some discussion about id's specific for GeoJSON:#390

Also not sure what is best here. We could set the ids as the index or as an 'fid' column, it's not that hard code-wise if we would decide that is best (although it is somewhat backwards incompatible). But I would prefer to not do too many driver-specific things to geopandas.
Eg roundtrip to Shapefile or GeoJSON do preserve the fid column, and their id's become simple integers starting at 0. But in that light, it is certainly confusing that GeoPackage behaves differently

stdmn · 2019-10-24T21:06:04Z

Any update on this? Would be great to have the option of explicity indexing by the GeoJSON feature IDs instead of just re-sequencing based on read order.

I'm specifically interested in this because I'm splitting the data between vector tiles and a Postgres Database, so if I can't specify the index manually, there's a chance that features could end up with different indices which would make it impossible to share information between the tiles and the DB.

theroggy · 2020-06-02T14:34:24Z

Also interested in this. Mainly for the part about reading the fid values properly from an existing geopackage rather than writing the fid column.

I need to do some checks on the geometries in a geopackage file and print out which rows need to be manually checked, and the fid would be the logical thing to print out for users to find the rows to be checked... but not possible due to the current behaviour of just ditching the fid column.

At first sight, I think the safest solution would be to add an fid column rather than change the pandas index for (backwards)compatibility reasons...

jorisvandenbossche · 2020-06-12T13:13:34Z

I am still a bit hesitant to always convert the id to an "fid" column, when reading from a GeoPackage file: 1) it introduces GPKG specific code in geopandas and 2) it adds an additional column for all users, also when you actually don't use/need it.

But I certainly understand that, if you want access to the FID of the GeoPackage file, the current situation is quite annoying..

So some option I can think of:

Simply always include a "fid" column on GPKG file reading (but only for GPKG, I think?)
Have an option to indicate to preserve the id when reading (then this could be done for all formats, I think, and we don't necessarily need to make it specific to GPK?)
Warn users when they write a GeoDataFrame with an "fid" column to GPKG that their column will be lost when reading back (this would be an option if we decide to not change reading, to avoid the roundtrip surprise, but of course in itself doesn't solve the use case of accessing GPKG's FID)

Thoughts? Somebody interested in doing a PR for one of those?

theroggy · 2022-03-23T16:21:16Z

I forgot about this back then... but a new case where a solution for this would have made life a lot easier arrived...

So, in response to @jorisvandenbossche: I think your second option sounds great. It is safe backwards-compatibility wise, and as it is an index, it sounds reasonable that it would end up as the index of the resulting GeoDataFrame as well.

I also noticed pyogrio.read_dataframe supports it already (on the master branch, not yet in the released latest version), via a parameter called "fid_as_index=False":
https://pyogrio.readthedocs.io/en/latest/api.html#geopandas-integration

MaxDragonheart · 2022-10-22T08:58:35Z

Any solution? In my project fid is very important because contains unique values that I use for a lot of processes.

MaxDragonheart · 2022-10-22T10:46:46Z

This is my workaround, hope it can be usefull https://gist.github.com/MaxDragonheart/46445a150aac9d528dadd2ec877203a5

jorisvandenbossche mentioned this issue Jul 16, 2019

ENH: to_file should write index in certain cases #1059

Merged

martinfleis added the bug label Dec 30, 2022

martinfleis added this to the 1.0 milestone Dec 30, 2022

theroggy mentioned this issue Feb 24, 2023

DOC: Clarify how to access "fid" from geopackage #2794

Open

3 tasks

jorisvandenbossche modified the milestones: 1.0, 1.1 Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Exporting a GeoDataframe with column 'fid' to GeoPackage will delete that column #1035

BUG: Exporting a GeoDataframe with column 'fid' to GeoPackage will delete that column #1035

joehuanguf commented Jul 2, 2019 •

edited

Loading

kuanb commented Jul 2, 2019

kuanb commented Jul 2, 2019 •

edited

Loading

kuanb commented Jul 3, 2019

jorisvandenbossche commented Jul 6, 2019

stdmn commented Oct 24, 2019

theroggy commented Jun 2, 2020 •

edited

Loading

jorisvandenbossche commented Jun 12, 2020

theroggy commented Mar 23, 2022

MaxDragonheart commented Oct 22, 2022

MaxDragonheart commented Oct 22, 2022

BUG: Exporting a GeoDataframe with column 'fid' to GeoPackage will delete that column #1035

BUG: Exporting a GeoDataframe with column 'fid' to GeoPackage will delete that column #1035

Comments

joehuanguf commented Jul 2, 2019 • edited Loading

kuanb commented Jul 2, 2019

kuanb commented Jul 2, 2019 • edited Loading

kuanb commented Jul 3, 2019

jorisvandenbossche commented Jul 6, 2019

stdmn commented Oct 24, 2019

theroggy commented Jun 2, 2020 • edited Loading

jorisvandenbossche commented Jun 12, 2020

theroggy commented Mar 23, 2022

MaxDragonheart commented Oct 22, 2022

MaxDragonheart commented Oct 22, 2022

joehuanguf commented Jul 2, 2019 •

edited

Loading

kuanb commented Jul 2, 2019 •

edited

Loading

theroggy commented Jun 2, 2020 •

edited

Loading