chunking for json normalization #914

kbuma · 2024-02-05T17:47:54Z

Summary

Collections such as materials have end up with over 4 million columns if the whole collection is json normalized. Attempting to process collections like this caused a memory error.

Solution is to chunk and then normalize and filter that chunk, then pull all those filtered chunks together.

Checklist

Google format doc strings added.
Code linted with ruff. (For guidance in fixing rule violates, see rule list)
Type annotations included. Check with mypy.
Tests added for new features/fixes.
I have run the tests locally and they passed.

chunking for json normalization

edc2a99

munrojm added release:minor release:patch and removed release:minor labels Feb 5, 2024

munrojm merged commit 445df9f into materialsproject:main Feb 5, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chunking for json normalization #914

chunking for json normalization #914

kbuma commented Feb 5, 2024 •

edited

chunking for json normalization #914

chunking for json normalization #914

Conversation

kbuma commented Feb 5, 2024 • edited

Summary

Checklist

kbuma commented Feb 5, 2024 •

edited