New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
insert_all(..., alter=True) should work for new columns introduced after the first 100 records #139
Comments
So the reason for this is that the With regard to the issue of ignoring columns, however, I made a fork and hacked a temporary fix that looks like this: It doesn't seem to affect performance enormously (but I've not tested it thoroughly), and it now does what I need (and would expect, tbh), but it now fails the test here: The existence of this test suggests that @simonw is this something you'd be willing to look at a PR for? I assume you wouldn't want to change the default behaviour at this point, but perhaps an option could be provided, or at least a bit more of a warning in the docs. Are there oversights in the implementation that I've made? Would be grateful for your thoughts! Thanks! |
This is deliberate behaviour, but I'm not at all attached to it - you're right in pointing out that it's actually pretty unexpected. I'd be happy to change this behaviour so if you pass |
I'd be happy to accept a PR for this, provided it included updated unit tests that illustrate it working. I think this is a really good improvement. |
Here's a suggested test update: diff --git a/sqlite_utils/db.py b/sqlite_utils/db.py
index a8791c3..12fa2f2 100644
--- a/sqlite_utils/db.py
+++ b/sqlite_utils/db.py
@@ -1074,6 +1074,13 @@ class Table(Queryable):
all_columns = list(sorted(all_columns))
if hash_id:
all_columns.insert(0, hash_id)
+ else:
+ all_columns += [
+ column
+ for record in chunk
+ for column in record
+ if column not in all_columns
+ ]
validate_column_names(all_columns)
first = False
# values is the list of insert data that is passed to the
diff --git a/tests/test_create.py b/tests/test_create.py
index a84eb8d..3a7fafc 100644
--- a/tests/test_create.py
+++ b/tests/test_create.py
@@ -707,13 +707,15 @@ def test_insert_thousands_using_generator(fresh_db):
assert 10000 == fresh_db["test"].count
-def test_insert_thousands_ignores_extra_columns_after_first_100(fresh_db):
+def test_insert_thousands_adds_extra_columns_after_first_100(fresh_db):
+ # https://github.com/simonw/sqlite-utils/issues/139
fresh_db["test"].insert_all(
[{"i": i, "word": "word_{}".format(i)} for i in range(100)]
- + [{"i": 101, "extra": "This extra column should cause an exception"}]
+ + [{"i": 101, "extra": "Should trigger ALTER"}],
+ alter=True,
)
rows = fresh_db.execute_returning_dicts("select * from test where i = 101")
- assert [{"i": 101, "word": None}] == rows
+ assert [{"i": 101, "word": None, "extra": "Should trigger ALTER"}] == rows
def test_insert_ignore(fresh_db): |
That pull request should update this section of the documentation too:
sqlite-utils/docs/python-api.rst Line 393 in ea87c2b
|
Thanks! And yeah, I had updating the docs on my list too :) Will try to get to it this afternoon (budgeting time is fraught with uncertainty at the moment!). |
Exactly as suggested by @simonw at simonw#139.
…100 records * Insert all columns for every chunk * Update unit test to reflect new behaviour * Test that exception is raised * Update documentation Closes #139. Thanks, Simon Wiles!
Thanks @simonwiles, this is now released in 2.16.1: https://sqlite-utils.readthedocs.io/en/stable/changelog.html |
Is there a way to make
.insert_all()
work properly when new columns are introduced outside the first 100 records (with or without thealter=True
argument)?I'm using
.insert_all()
to bulk insert ~3-4k records at a time and it is common for records to need to introduce new columns. However, if new columns are introduced after the first 100 records,sqlite_utils
doesn't even raise theOperationalError: table ... has no column named ...
exception; it just silently drops the extra data and moves on.It took me a while to find this little snippet in the documentation for
.insert_all()
(it's not mentioned under Adding columns automatically on insert/update):I tried changing the
batch_size
argument to the total number of records, but it seems only to effect the number of rows that are committed at a time, and has no influence on this problem.Is there a way around this that you would suggest? It seems like it should raise an exception at least.
The text was updated successfully, but these errors were encountered: