Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when first record contains fewer columns than subsequent records #145

Closed
simonwiles opened this issue Aug 30, 2020 · 2 comments
Closed
Labels
bug Something isn't working

Comments

@simonwiles
Copy link
Contributor

simonwiles commented Aug 30, 2020

insert_all() selects the maximum batch size based on the number of fields in the first record. If the first record has fewer fields than subsequent records (and alter=True is passed), this can result in SQL statements with more than the maximum permitted number of host parameters. This situation is perhaps unlikely to occur, but could happen if the first record had, say, 10 columns, such that batch_size (based on SQLITE_MAX_VARIABLE_NUMBER = 999) would be 99. If the next 98 rows had 11 columns, the resulting SQL statement for the first batch would have 10 * 1 + 11 * 98 = 1088 host parameters (and subsequent batches, if the data were consistent from thereon out, would have 99 * 11 = 1089).

I suspect that this bug is masked somewhat by the fact that while:

SQLITE_MAX_VARIABLE_NUMBER ... defaults to 999 for SQLite versions prior to 3.32.0 (2020-05-22) or 32766 for SQLite versions after 3.32.0.

it is common that it is increased at compile time. Debian-based systems, for example, seem to ship with a version of sqlite compiled with SQLITE_MAX_VARIABLE_NUMBER set to 250,000, and I believe this is the case for homebrew installations too.

A test for this issue might look like this:

def test_columns_not_in_first_record_should_not_cause_batch_to_be_too_large(fresh_db):
    # sqlite on homebrew and Debian/Ubuntu etc. is typically compiled with
    #  SQLITE_MAX_VARIABLE_NUMBER set to 250,000, so we need to exceed this value to
    #  trigger the error on these systems.
    THRESHOLD = 250000
    extra_columns = 1 + (THRESHOLD - 1) // 99
    records = [
        {"c0": "first record"},  # one column in first record -> batch_size = 100
        # fill out the batch with 99 records with enough columns to exceed THRESHOLD
        *[
            dict([("c{}".format(i), j) for i in range(extra_columns)])
            for j in range(99)
        ]
    ]
    try:
        fresh_db["too_many_columns"].insert_all(records, alter=True)
    except sqlite3.OperationalError:
        raise

The best solution, I think, is simply to process all the records when determining columns, column types, and the batch size. In my tests this doesn't seem to be particularly costly at all, and cuts out a lot of complications (including obviating my implementation of #139 at #142). I'll raise a PR for your consideration.

@simonwiles
Copy link
Contributor Author

simonwiles commented Aug 30, 2020

Note: had to adjust the test above because trying to exhaust a SQLITE_MAX_VARIABLE_NUMBER of 250000 in 99 records requires 2526 columns, and trips the "Rows can have a maximum of {} columns".format(SQLITE_MAX_VARS) check even before it trips the default SQLITE_MAX_COLUMN value (2000).

simonwiles added a commit to simonwiles/sqlite-utils that referenced this issue Aug 30, 2020
@simonw simonw added the bug Something isn't working label Aug 31, 2020
simonwiles added a commit to simonwiles/sqlite-utils that referenced this issue Sep 6, 2020
simonwiles added a commit to simonwiles/sqlite-utils that referenced this issue Sep 7, 2020
simonwiles added a commit to simonwiles/sqlite-utils that referenced this issue Sep 8, 2020
simonwiles added a commit to simonwiles/sqlite-utils that referenced this issue Sep 8, 2020
simonw pushed a commit that referenced this issue Sep 8, 2020
…a columns

Refs #145.

* Extract build_insert_queries_and_params
* Extract insert_chunk so it can be called recursively

Thanks, @simonwiles
@simonw
Copy link
Owner

simonw commented Sep 8, 2020

Fixed in PR #146.

@simonw simonw closed this as completed Sep 8, 2020
simonw added a commit that referenced this issue Sep 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants