django-chunkator

Chunk large QuerySets into small chunks, and iterate over them without killing your RAM.

Tested with all the combinations of:

Python: 3.5, 3.6, 3.7, 3.8
Django: 2, 2.1, 2.2, 3.0, master

Note

Django 3.0 is incompatible with Python 3.5, see <https://docs.djangoproject.com/en/3.0/releases/3.0/#python-compatibility>

Usage

from chunkator import chunkator
for item in chunkator(LargeModel.objects.all(), 200):
    do_something(item)

This tool is intended to work on Django querysets.

Your model must define a pk field (this is done by default, but sometimes it can be overridden) and this pk has to be unique. django- chunkator has been tested with PostgreSQL and SQLite, using regular PKs and UUIDs as primary keys.

You can also use values():

from chunkator import chunkator
for item in chunkator(LargeModel.objects.values('pk', 'name'), 200):
    do_something(item)

Important

If you're using values() you have to add at least your "pk" field to the values, otherwise, the chunkator will throw a MissingPkFieldException.

Warning

This will not accelerate your process. Instead of having one BIG query, you'll have several small queries. This will save your RAM instead, because you'll not load a huge queryset result before looping on it.

If you want to manipulate the pages directly, you can use chunkator_page:

from chunkator import chunkator_page
queryset = LargeModel.objects.all().values('pk')
for page in chunkator_page(queryset, 200):
    launch_some_task([item['pk'] for item in page])

FAQ

How is django-chunkator different from Django's iterator?

If you have server-side cursors (using Postgres or Oracle & not setting DISABLE_SERVER_SIDE_CURSORS), then the main difference is that the cursor is in the hands of the application instead of the server. It really depends on your constraints, but sometimes server side cursors can put too much strains on your DB.

If you don't have server-side cursors, then chunkator will allow you to iterate over your queryset by batch, without relying on LIMIT/OFFSET. The problem with LIMIT/OFFSET is that computing a large offset (when you're at the end of your queryset) requires the DB to go through all the previous entries. With large tables this can be a huge issue.

Will django-chunkator preserve the ordering on my querysets?

No, it orders the queryset by pk. However you could do the same thing than chunkator with another field, given that it's unique and not nullable, see here for more details.

License

MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
.github		.github
chunkator		chunkator
demo		demo
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.rst		README.rst
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

django-chunkator

Usage

FAQ

License

About

Releases 9

Packages

Contributors 13

Languages

License

peopledoc/django-chunkator

Folders and files

Latest commit

History

Repository files navigation

django-chunkator

Usage

FAQ

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 9

Packages 0

Contributors 13

Languages

Packages