Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure insert API has good tests for rowid and compound primark key tables #1873

Open
simonw opened this issue Oct 30, 2022 · 11 comments
Open

Comments

@simonw
Copy link
Owner

simonw commented Oct 30, 2022

Following:

I need to design and implement various edge-cases or primary keys:

  • Table without an auto-incrementing primary key
  • Table with compound primary keys
  • Table with just a rowid
@simonw simonw mentioned this issue Oct 30, 2022
17 tasks
@simonw simonw added this to the Datasette 1.0 milestone Oct 30, 2022
@simonw
Copy link
Owner Author

simonw commented Oct 30, 2022

Relevant TODO:

# Validate columns of each row
columns = await db.table_columns(table_name)
# TODO: There are cases where pks are OK, if not using auto-incrementing pk
pks = await db.primary_keys(table_name)
allowed_columns = set(columns) - set(pks)

@simonw
Copy link
Owner Author

simonw commented Oct 30, 2022

If a table has an auto-incrementing primary key, should you be allowed to insert records with an explicit key into it?

I'm torn on this one. It's something you can do with direct database access, but it's something I very rarely want to do.

I'm inclined to disallow it and say that if you want that you can get it using a writable canned query instead.

Likewise, I'm not going to provide a way to set the rowid explicitly on a freshly inserted row.

@simonw
Copy link
Owner Author

simonw commented Oct 30, 2022

I checked and SQLite itself does allow you to set the rowid on that kind of table - it then increments from whatever you inserted:

% sqlite3 /tmp/t.db
SQLite version 3.39.4 2022-09-07 20:51:41
Enter ".help" for usage hints.
sqlite> create table docs (title text);
sqlite> insert into docs (title) values ('one');
sqlite> select rowid, title from docs;
1|one
sqlite> insert into docs (rowid, title) values (3, 'three');
sqlite> select rowid, title from docs;
1|one
3|three
sqlite> insert into docs (title) values ('another');
sqlite> select rowid, title from docs;
1|one
3|three
4|another

@simonw
Copy link
Owner Author

simonw commented Oct 30, 2022

Some notes on what Datasette does already

https://latest.datasette.io/fixtures/tags.json?_shape=array returns:

[
  {
    "tag": "canine"
  },
  {
    "tag": "feline"
  }
]

That table is defined like this:

CREATE TABLE tags (
    tag TEXT PRIMARY KEY
);

Here's a rowid table with no explicit primary key: https://latest.datasette.io/fixtures/binary_data

https://latest.datasette.io/fixtures/binary_data.json?_shape=array

[
  {
    "rowid": 1,
    "data": {
      "$base64": true,
      "encoded": "FRwCx60F/g=="
    }
  },
  {
    "rowid": 2,
    "data": {
      "$base64": true,
      "encoded": "FRwDx60F/g=="
    }
  },
  {
    "rowid": 3,
    "data": null
  }
]
CREATE TABLE binary_data (
    data BLOB
);

https://latest.datasette.io/fixtures/simple_primary_key has a text primary key:

https://latest.datasette.io/fixtures/simple_primary_key.json?_shape=array

[
  {
    "id": "1",
    "content": "hello"
  },
  {
    "id": "2",
    "content": "world"
  },
  {
    "id": "3",
    "content": ""
  },
  {
    "id": "4",
    "content": "RENDER_CELL_DEMO"
  },
  {
    "id": "5",
    "content": "RENDER_CELL_ASYNC"
  }
]
CREATE TABLE simple_primary_key (
  id varchar(30) primary key,
  content text
);

https://latest.datasette.io/fixtures/compound_primary_key is a compound primary key.

https://latest.datasette.io/fixtures/compound_primary_key.json?_shape=array

[
  {
    "pk1": "a",
    "pk2": "b",
    "content": "c"
  },
  {
    "pk1": "a/b",
    "pk2": ".c-d",
    "content": "c"
  }
]
CREATE TABLE compound_primary_key (
  pk1 varchar(30),
  pk2 varchar(30),
  content text,
  PRIMARY KEY (pk1, pk2)
);

@simonw
Copy link
Owner Author

simonw commented Oct 30, 2022

Actually, for simplicity I'm going to say that you can always set the primary key, even for auto-incrementing primary key columns... but you cannot set it on pure rowid columns.

@simonw
Copy link
Owner Author

simonw commented Oct 30, 2022

One last case to consider: WITHOUT ROWID tables.

https://www.sqlite.org/withoutrowid.html

By default, every row in SQLite has a special column, usually called the "rowid", that uniquely identifies that row within the table. However if the phrase "WITHOUT ROWID" is added to the end of a CREATE TABLE statement, then the special "rowid" column is omitted. There are sometimes space and performance advantages to omitting the rowid.

...

Every WITHOUT ROWID table must have a PRIMARY KEY. An error is raised if a CREATE TABLE statement with the WITHOUT ROWID clause lacks a PRIMARY KEY.

@simonw
Copy link
Owner Author

simonw commented Oct 30, 2022

So maybe they're not actually worth worrying about separately, because they are guaranteed to have a primary key set.

@simonw
Copy link
Owner Author

simonw commented Oct 30, 2022

I think the key feature I need here is going to be the equivalent of ignore=True and replace=True for dealing with primary key collisions, see https://sqlite-utils.datasette.io/en/stable/reference.html#sqlite_utils.db.Table.insert

@simonw
Copy link
Owner Author

simonw commented Nov 1, 2022

Design decision:

{
  "rows": [{"id": 1, "title": "The title"}],
  "ignore": true
}

Or "replace": true.

@simonw
Copy link
Owner Author

simonw commented Nov 1, 2022

It's a bit surprising that you can send "ignore": true, "return_rows": true and the returned "inserted" key will list rows that were NOT inserted (since they were ignored).

Three options:

  1. Ignore that and document it
  2. Fix it so "inserted" only returns rows that were actually inserted (bit tricky)
  3. Change the name of "inserted" to something else

I'm picking 3 - I'm going to change it to be called "rows" instead.

simonw added a commit that referenced this issue Nov 1, 2022
Also removed the rule that you cannot include primary keys in the rows you insert.

And added validation that catches invalid parameters in the incoming JSON.

And renamed "inserted" to "rows" in the returned JSON for return_rows: true
@simonw
Copy link
Owner Author

simonw commented Nov 1, 2022

I forgot to document ignore and replace. Also I need to add tests that cover:

  • Forgetting to include a primary key on a non-autoincrement table
  • Compound primary keys
  • Rowid only tables with and without rowid specified

I think my validation logic here will get caught out by the fact that rowid does not show up as a valid column name:

# Validate columns of each row
columns = set(await db.table_columns(table_name))
for i, row in enumerate(rows):
invalid_columns = set(row.keys()) - columns
if invalid_columns:
errors.append(
"Row {} has invalid columns: {}".format(
i, ", ".join(sorted(invalid_columns))
)
)

simonw added a commit that referenced this issue Nov 1, 2022
Also improved API explorer to show HTTP status of response, refs #1871
@simonw simonw modified the milestones: Datasette 1.0, Datasette 1.0a0 Nov 15, 2022
@simonw simonw closed this as completed Dec 1, 2022
@simonw simonw reopened this Dec 1, 2022
@simonw simonw changed the title Decide what to do about tables without primary keys / with compound primary keys Ensure insert API has good tests for rowid and compound primark key tables Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant