New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] large database restore memory usage fix #17663
Changes from 2 commits
889f8e1
e880d45
d63a7cd
9c522b6
133552c
4f33764
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -211,9 +211,13 @@ def dump_db(db_name, stream, backup_format='zip'): | |
return stdout | ||
|
||
def exp_restore(db_name, data, copy=False): | ||
def chunks(d, n=8192): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you still need this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since we don't use |
||
for i in range(0, len(d), n): | ||
yield d[i:i+n] | ||
data_file = tempfile.NamedTemporaryFile(delete=False) | ||
try: | ||
data_file.write(data.decode('base64')) | ||
for chunk in chunks(data): | ||
data_file.write(chunk.decode('base64')) | ||
data_file.close() | ||
restore_db(db_name, data_file.name, copy=copy) | ||
finally: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is broken.
Also data could probably be a bytearray (explicitly mutable bytes) rather than rely on CPython optimisations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, my mistake.
Could you elaborate on data being a bytearray? As I see it,
dispatch_rpc()
needsdata
to be a base64 encoded string. Or maybe I'm missing something.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
base64 is a bytes -> bytes conversion, so
data
is more or less a bytestring (if dispatch_rpc needs a native string, the base64 content should be converted correctly or it's going to blow up badly in Python 3).Bytes are immutable, concatenating immutable string is fundamentally quadratic. This code implicitly relies on a CPython optimisation[0] to reclaim (amortised) linear behaviour. I would rather we avoid this issue and reliance, by using bytearray. The bytearray can be converted into regular bytes at the end of the loop.
[0] it does and can not exist in pypy