Skip to content

Conversation

@simonw
Copy link
Owner

@simonw simonw commented Nov 22, 2025

Prompt to Claude Code for web:

Experiment with modifying simonw/sqlite-utils from github such that the insert_all and upsert_all methods can optionally be passed a Python iterator that yields lists instead of dicts

Currently those methods expect to iterate over dicts where the keys become DB column names and the values are the data for each row

This is inefficient for large amounts of data

The new feature will extend those methods such that they detect if the first value yielded by that iterator is a list or a dictionary

If a dictionary they work as they do now

If a list then they switch to a different mode. They check that the first value returned is a list of strings - representing column names. If not a list of strings they raise an error

Then every subsequent list yielded is treated as a row - all types are supported. These are efficiently inserted into the database table.

The goal of this research is to identify how much of a performance benefit this system could provide. Run benchmarks of the original code, then after you have implemented the new feature run benchmarks of both the dictionary and the list usage. Designed these benchmarks to cover several realistic scenarios. Generate performance charts, which are embedded in the final README.

Run the existing test suite before you start work, then continue to use those test to ensure you have not broken anything as you make changes. Add new, passing tests that cover the new functionality.

At the end of the project, submit your report as well as a diff against the latest SQLite-utils main branch.

Be sure to clone simonw/sqlite-utils to a tmp directory to ensure you do not accidentally try to commit and push THAT repo later on - I only want the report and the diff.


Final report: https://github.com/simonw/research/blob/main/sqlite-utils-iterator-support/README.md


Claude Code wrote this PR:

Implemented and benchmarked list-based iteration for insert_all/upsert_all methods.
This allows passing iterators that yield lists instead of dicts for better performance
with certain data patterns.

Key findings:

  • Up to 21.6% performance improvement for datasets with 5-10 columns
  • 100% backward compatible, all 1001 existing tests pass
  • 10 new tests added for list mode functionality
  • Performance varies by column count (better with fewer columns)

Deliverables:

  • Complete implementation (222-line diff)
  • Comprehensive test suite
  • Performance benchmarks across 6 scenarios
  • Visual analysis with 4 charts
  • Full technical report

Implemented and benchmarked list-based iteration for insert_all/upsert_all methods.
This allows passing iterators that yield lists instead of dicts for better performance
with certain data patterns.

Key findings:
- Up to 21.6% performance improvement for datasets with 5-10 columns
- 100% backward compatible, all 1001 existing tests pass
- 10 new tests added for list mode functionality
- Performance varies by column count (better with fewer columns)

Deliverables:
- Complete implementation (222-line diff)
- Comprehensive test suite
- Performance benchmarks across 6 scenarios
- Visual analysis with 4 charts
- Full technical report
@simonw simonw merged commit 5bc7378 into main Nov 22, 2025
@simonw simonw deleted the claude/sqlite-utils-iterator-support-01EW7y2w1SKDxAjYKAiwjbot branch November 22, 2025 01:14
simonw added a commit to simonw/sqlite-utils that referenced this pull request Nov 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants