Pull request 449 (associated with issue 446) breaks applications in the field and needs to be revisited #498

KRyden · 2019-08-21T23:25:58Z

Pull request #449's description is vague but is associated with a larger discussion in #446. There are several issues with this pull request - which has already been integrated into the working Geopackage document.

Issue#1 - the pull request removes the AUTOINCREMENT. AUTOINCREMENT is used in fielded software to allow SQLite to manage the unique row ID. The following code already in use in the field will no longer work without AUTOINCREMENT:

CREATE TABLE example1 (
id INTEGER PRIMARY KEY,
tag TEXT NOT NULL,
geom POINT);

And then insert like so:

INSERT INTO example1 (tag,geom)
VALUES (‘Row 1’,ST_GeomFromText(‘POINT (1 1)’,4326));

In this example, ID is autogenerated by SQLite - without this, the application would have to manage the rowid.

Issue#2: AUTOINCREMENT is still referenced in several tables - was the intent to remove it everywhere (which would be bad as discussed above)?

Issue#3: Requirement 150 probably does not meet modular specification requirements:

Requirement 150
A feature table or view SHALL have a column that uniquely identifies the row. For a feature table, the column SHOULD be an integer primary key. If there is no primary key column, the first column SHALL be of type INTEGER and SHALL contain unique values for each row.

The "SHOULD" would prevent Requirement 150 from being a testable requirement, and really doesn't add anything. this needs to be reviewed.

Issue#4: it's not clear that quorum was present for the adoption of this change. The wiki page shows 11 voting members, but only five present - 6 would be needed for quorum (50%, but odd number so +1). https://portal.opengeospatial.org/wiki/GEOPACKAGEswg/06-May-2019 - it's quite possible somebody else was there or came in late and was not logged as present - this is merely an observation.

Bottom line, issue#1 removal of AUTOINCREMENT breaks existing software.

jyutzler · 2019-08-22T01:55:32Z

After a thorough investigation, I do not believe any action is required here.

Your understanding of AUTOINCREMENT is incorrect. Most of the time, you do not need it. You can try this on your own - create a table without AUTOINCREMENT, add a row with a null primary key, delete the row, and add another. SQLite will manage the keys for you and assign the primary key value 1 to the second row (1 greater than the highest key already in use). The only time you need AUTOINCREMENT is if you are doing a lot of inserting and deleting and you want to prevent a primary key value from being reused.
I don't think believe we need to take any action on AUTOINCREMENT. It does not appear in any of the normative table definitions in Annex C, just the informative ones. Even in v1.0.1, the DDL for a features table was always informative. The ATS (http://www.geopackage.org/spec121/index.html#_vector_features_user_data_tables et. al.) and ETS don't test for it.
It is unclear to me how Requirement 150 would violate the modular specification. Technically you can have whatever type you want as a primary key and it can be in any position in the table. We are just adding another path that would allow views to function. Supporting views in GeoPackage has been part of the intent from the beginning (the words "or view" appear all over the 1.0.1 version) and multiple OGC members have directly asked me how to do it. Since primary keys don't exist for views in SQLite, it wasn't possible to implement a view with a strict interpretation of the requirements of v1.0, 1.1, or 1.2. PR 449 was designed to fix this as best as we could. Of course if you try to update a view it won't work, but IMHO that is working as designed.
This was a clerical error - I just failed to put a trailing * after Kevin and Tracey's names to make them bold to indicate their attendance. Tracey seconded a motion and that is proof she was in fact on the call. I have corrected the meeting notes.

jerstlouis · 2019-11-12T16:13:52Z

From https://www.sqlite.org/autoinc.html :

the purpose of AUTOINCREMENT is to prevent the reuse of ROWIDs from previously deleted rows.

jyutzler · 2019-11-12T17:34:17Z

In today's SWG meeting, @KRyden requested more time to review the issue.

jyutzler · 2020-05-13T18:23:29Z

#517 should not be closed until this is completely adjudicated

jyutzler · 2020-06-10T12:19:52Z

Reopening this issue because of feedback we received during open comment.

The performance difference from adding AUTOINCREMENT is generally too small to notice.
We prefer that the autoincrement keyword still be required as it it ensures that the id column is a rowid alias (highly desirable for the RTREE extension), which makes it a form of error correction. Also, it still seems like an unnecessary behavior change.
I noticed that the NOT NULL keywords were removed from the id column declaration. While it is probably not necessary to have NOT NULL on the integer primary key column (since it is implied by primary key), but it is worth noting that SQLite only reports such a column as being NOT NULL if it is explicitly declared not NULL.

jerstlouis · 2020-06-10T12:44:21Z

re: 1, The performance is significant on a large enough number of records, and the size overhead is also significant over a larger file.
re: 2, I believe as soon as a column is an INTEGER PRIMARY KEY, then that column becomes an alias for the row ID (regardless of AUTOINCREMENT or not).
re: 3, What exactly is meant by reporting it as declared NOT NULL? If it's a primary key, it cannot be null, so it is quite redundant and unnecessary to say NOT NULL on an integer primary key.

bosborn · 2020-06-10T13:12:30Z

re: 1, The performance is significant on a large enough number of records, and the size overhead is also significant over a larger file.

The size overhead is not significant, 4096 bytes for the added sqlite_sequence table. K6a was added to allow people to omit the keyword for performance at their own risk

re: 2, I believe as soon as a column is an INTEGER PRIMARY KEY, then that column becomes an alias for the row ID (regardless of AUTOINCREMENT or not).

I agree that point 2 appears to be false.

re: 3, What exactly is meant by reporting it as declared NOT NULL? If it's a primary key, it cannot be null, so it is quite redundant and unnecessary to say NOT NULL on an integer primary key.

PRAGMA table_info(name) reports a 0 in the notnull column, a 1 in the pk column.

Suggestion for point 4, directly from the documentation:

"If the AUTOINCREMENT keyword appears after INTEGER PRIMARY KEY, that changes the automatic ROWID assignment algorithm to prevent the reuse of ROWIDs over the lifetime of the database. In other words, the purpose of AUTOINCREMENT is to prevent the reuse of ROWIDs from previously deleted rows."

PeterAronson · 2020-06-10T17:47:55Z

Hi, I'm the technical lead on Keith's team for SQLite.

There's some additional information that may explain better what we're getting on about, so let me expand on what's above.

Is certainly a point on where reasonably people can differ, particularly about the question of what is a significant difference in performance. We haven't found it to be an issue, but that doesn't mean no one else will.
There are times when declaring a column INTEGER PRIMARY KEY does not make it an internal rowid alias: if the DESC keyword is also specified, or if the table is a WITHOUT ROWID table. So, consider the following examples:

sqlite> create table no_rowid_alias1 (id integer primary key desc);
sqlite> create table no_rowid_alias_autoinc1 (id integer primary key desc autoincrement);
Error: AUTOINCREMENT is only allowed on an INTEGER PRIMARY KEY
sqlite> create table no_rowid_alias2 (id integer primary key) without rowid;
sqlite> create table no_rowid_alias_autoincrement2 (id integer primary key autoincrement) without rowid;
Error: AUTOINCREMENT not allowed on WITHOUT ROWID tables
sqlite>

Notice that neither of the cases where the INTEGER PRIMARY KEY column isn't an internal rowid alias allows you to add the AUTOINCREMENT keyword. That's what I meant by AUTOINCREMENT guaranteeing correctness in that case.

Is just a SQLite quirk I noticed, and not something I thought was a serious issue. But if you don't declare an INTEGER PRIMARY KEY column as NOT NULL, it behaves just the same, but when you ask SQLite to describe the column, it doesn't report it as NOT NULL:

sqlite> create table id_not_null (id integer primary key not null);
sqlite> create table id_without_not_null (id integer primary key);
sqlite> pragma table_info(id_not_null);
cid         name        type        notnull     dflt_value  pk
----------  ----------  ----------  ----------  ----------  ----------
0           id          integer     1                       1
sqlite> pragma table_info(id_without_not_null);
cid         name        type        notnull     dflt_value  pk
----------  ----------  ----------  ----------  ----------  ----------
0           id          integer     0                       1
sqlite>

Notice the difference between the returned value for the notnull field between the first and second cases. It's not a big deal or even a reason necessarily not to make the change, but it is a difference, and I thought it worth pointing out as long as I was discussing the changes.

Hopefully this clarifies what we've been going on about.

fjlopez · 2020-06-11T16:51:17Z

Regarding to this quirk:

Is just a SQLite quirk I noticed, and not something I thought was a serious issue. But if you don't declare an INTEGER PRIMARY KEY column as NOT NULL, it behaves just the same, but when you ask SQLite to describe the column, it doesn't report it as NOT NULL:

It is not a quirk, it is just bad documentation in the SQLite website. The description of the pragma table_info at https://www.sqlite.org/pragma.html#pragma_table_info is deceitful.

This pragma returns one row for each column in the named table. Columns in the result set include the column name, data type, whether or not the column can be NULL, and the default value for the column.

In the source code is more clear:

notnull: True if 'NOT NULL' is part of column declaration

https://github.com/sqlite/sqlite/blob/b5aaee5e31c081b3ba5beeb563419f2d66be2656/src/pragma.c#L1135

The behaviour is as expected, the documentation is not. It should be:

This pragma returns one row for each column in the named table. Columns in the result set include the column name, data type, whether or not 'NOT NULL' is part of column declaration, and the default value for the column.

jyutzler · 2020-06-11T17:24:38Z

Regarding 2, I put language into 1a28a78 indicating that the rowid alias SHALL be present. I'm not sure how to test this for this, though.

On 3, we run into a testability issue. If we test for notnull, we would mark GeoPackages that don't set that flag as invalid even though there is no damage done, no risk to interoperability.

PeterAronson · 2020-06-11T17:27:53Z

(Oops! Sorry about the reply.)

@fjlopez
Actually, I was considering the behavior to be a bit of a quirk, but thinking about it, you may be correct. Here's a survey of SQLite's behavior in these cases:

create table pk_is_internal_rowid_alias (id integer primary key);
pragma table_info (pk_is_internal_rowid_alias);
cid         name        type        notnull     dflt_value  pk
----------  ----------  ----------  ----------  ----------  ----------
0           id          integer     0                       1
insert into pk_is_internal_rowid_alias values (NULL);
select * from pk_is_internal_rowid_alias;
id
----------
1

create table pk_is_internal_rowid_alias_nn (id integer not null primary key);
pragma table_info (pk_is_internal_rowid_alias_nn);
cid         name        type        notnull     dflt_value  pk
----------  ----------  ----------  ----------  ----------  ----------
0           id          integer     1                       1
insert into pk_is_internal_rowid_alias_nn values (NULL);
select * from pk_is_internal_rowid_alias_nn;
id
----------
1

create table pk_is_not_internal_rowid_alias (id integer primary key desc);
pragma table_info (pk_is_not_internal_rowid_alias);
cid         name        type        notnull     dflt_value  pk
----------  ----------  ----------  ----------  ----------  ----------
0           id          integer     0                       1
insert into pk_is_not_internal_rowid_alias values (NULL);
select * from pk_is_not_internal_rowid_alias;
id
----------


create table pk_is_not_internal_rowid_alias_nn (id integer not null primary key desc);
pragma table_info (pk_is_not_internal_rowid_alias_nn);
cid         name        type        notnull     dflt_value  pk
----------  ----------  ----------  ----------  ----------  ----------
0           id          integer     1                       1
insert into pk_is_not_internal_rowid_alias_nn values (NULL);
Error: near line 18: NOT NULL constraint failed: pk_is_not_internal_rowid_alias_nn.id

create table pk_is_not_internal_rowid_alias_wr (id integer primary key) without rowid;
pragma table_info (pk_is_not_internal_rowid_alias_wr);
cid         name        type        notnull     dflt_value  pk
----------  ----------  ----------  ----------  ----------  ----------
0           id          integer     1                       1
insert into pk_is_not_internal_rowid_alias_wr values (NULL);
Error: near line 22: NOT NULL constraint failed: pk_is_not_internal_rowid_alias_wr.id
sqlite>

It seems a bit inconsistent in places.

PeterAronson · 2020-06-11T17:38:43Z

@jyutzler This section of the SQLite documentation gives you some background: https://www.sqlite.org/lang_createtable.html#rowid.

I think the following criteria should work:

The table declaration in the sql column of the sqlite_master table does not end with without rowid (case-insensitive).
The id column id is the sole member of the primary key as indicated by table_info pragma.
The id column is declared integer.
The string primary key desc (case insensitive) does not occur in the sql column of sqlite_master row for the table.

Both 3. and 4. could be replaced by requiring the string <id_column> integer primary key (case-insensitive), where <id_column> is replaced by the actual name of the id column, be present in the sql column of sqlite_master entry for the table declaration.

PeterAronson · 2020-06-11T17:45:33Z

@jyutzler BTW, one possible minor issue with dealing with the alternate approach above, is SQLite allows 4 different ways of delimiting identifiers (5 if you count no delimiter):

sqlite> select sql from sqlite_master;
sql
-----------------------------------------------------------
CREATE TABLE t1 (id integer primary key)
CREATE TABLE t2 ("id" integer primary key)
CREATE TABLE t3 ([id] integer primary key)
CREATE TABLE t4 (`id` integer primary key)
CREATE TABLE t5 ('id' integer primary key)

So you would have to test for any of those.

fjlopez · 2020-06-11T17:46:51Z

I think that a query for the rowid column can also help for the without rowid.

sqlite> create table id_pk(id integer primary key);
sqlite> select rowid from id_pk;
sqlite> create table id_pk_without_rowid(id integer primary key) without rowid;
sqlite> select rowid from id_pk_without_rowid;
Error: no such column: rowid

fjlopez · 2020-06-11T17:53:25Z

pragma index_list(table) and then pragma index_xinfo(index_name) can reveal a desc for the primary column in such table. This can be an alternative to 4.

PeterAronson · 2020-06-11T17:58:43Z

@fjlopez Unless, of course, they create a column named rowid.

sqlite> create table gotcha (rowid integer primary key) without rowid;
sqlite> select rowid from gotcha;
sqlite>

However, using _rowid_ is probably safe enough -- who would name a column that (unless they were actively trying to be malicious).

The pragma approach looks like a good alternative. If I were writing from a C/C++ that is what I would probably use.

fjlopez · 2020-06-11T18:03:04Z

The pragma approach also works in the Xerial JDBC driver and since SQLITE 3.16 it is possible to use pragma functions:

sqlite> select * from pragma_table_info('id_pk');
0|id|integer|0||1

jyutzler · 2020-07-07T19:22:55Z

In light of #446 (comment), is there any further action required in this ticket?

PeterAronson · 2020-07-07T19:39:18Z

@jyutzler I do not believe there is any further action required for this ticket. The Pull Request you referenced handles the issues very cleanly. Thanks you!

jyutzler mentioned this issue May 20, 2020

Multiple clarifications #528

Merged

jyutzler closed this as completed in #528 Jun 4, 2020

jyutzler reopened this Jun 10, 2020

jyutzler pinned this issue Jun 10, 2020

jyutzler closed this as completed Sep 23, 2020

jyutzler added this to the 1.3.0 milestone Sep 23, 2020

jyutzler unpinned this issue Sep 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pull request 449 (associated with issue 446) breaks applications in the field and needs to be revisited #498

Pull request 449 (associated with issue 446) breaks applications in the field and needs to be revisited #498

KRyden commented Aug 21, 2019 •

edited by jyutzler

Loading

jyutzler commented Aug 22, 2019 •

edited

Loading

jerstlouis commented Nov 12, 2019

jyutzler commented Nov 12, 2019

jyutzler commented May 13, 2020

jyutzler commented Jun 10, 2020

jerstlouis commented Jun 10, 2020

bosborn commented Jun 10, 2020

PeterAronson commented Jun 10, 2020

fjlopez commented Jun 11, 2020 •

edited

Loading

jyutzler commented Jun 11, 2020

PeterAronson commented Jun 11, 2020

PeterAronson commented Jun 11, 2020 •

edited

Loading

PeterAronson commented Jun 11, 2020

fjlopez commented Jun 11, 2020

fjlopez commented Jun 11, 2020 •

edited

Loading

PeterAronson commented Jun 11, 2020

fjlopez commented Jun 11, 2020

jyutzler commented Jul 7, 2020

PeterAronson commented Jul 7, 2020

Pull request 449 (associated with issue 446) breaks applications in the field and needs to be revisited #498

Pull request 449 (associated with issue 446) breaks applications in the field and needs to be revisited #498

Comments

KRyden commented Aug 21, 2019 • edited by jyutzler Loading

jyutzler commented Aug 22, 2019 • edited Loading

jerstlouis commented Nov 12, 2019

jyutzler commented Nov 12, 2019

jyutzler commented May 13, 2020

jyutzler commented Jun 10, 2020

jerstlouis commented Jun 10, 2020

bosborn commented Jun 10, 2020

PeterAronson commented Jun 10, 2020

fjlopez commented Jun 11, 2020 • edited Loading

jyutzler commented Jun 11, 2020

PeterAronson commented Jun 11, 2020

PeterAronson commented Jun 11, 2020 • edited Loading

PeterAronson commented Jun 11, 2020

fjlopez commented Jun 11, 2020

fjlopez commented Jun 11, 2020 • edited Loading

PeterAronson commented Jun 11, 2020

fjlopez commented Jun 11, 2020

jyutzler commented Jul 7, 2020

PeterAronson commented Jul 7, 2020

KRyden commented Aug 21, 2019 •

edited by jyutzler

Loading

jyutzler commented Aug 22, 2019 •

edited

Loading

fjlopez commented Jun 11, 2020 •

edited

Loading

PeterAronson commented Jun 11, 2020 •

edited

Loading

fjlopez commented Jun 11, 2020 •

edited

Loading