Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] large database restore memory usage fix #17663

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
5 changes: 3 additions & 2 deletions addons/web/controllers/main.py
Expand Up @@ -702,8 +702,9 @@ def backup(self, master_pwd, name, backup_format = 'zip'):
@http.route('/web/database/restore', type='http', auth="none", methods=['POST'], csrf=False)
def restore(self, master_pwd, backup_file, name, copy=False):
try:
data = base64.b64encode(backup_file.read())
dispatch_rpc('db', 'restore', [master_pwd, name, data, str2bool(copy)])
data = ''
for chunk in iter(lambda: backup_file.read(8190), b''):
data += base64.b64encode(chunk) dispatch_rpc('db', 'restore', [master_pwd, name, data, str2bool(copy)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is broken.

Also data could probably be a bytearray (explicitly mutable bytes) rather than rely on CPython optimisations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, my mistake.

Could you elaborate on data being a bytearray? As I see it, dispatch_rpc() needs data to be a base64 encoded string. Or maybe I'm missing something.

Copy link
Collaborator

@xmo-odoo xmo-odoo Jun 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base64 is a bytes -> bytes conversion, so data is more or less a bytestring (if dispatch_rpc needs a native string, the base64 content should be converted correctly or it's going to blow up badly in Python 3).

Bytes are immutable, concatenating immutable string is fundamentally quadratic. This code implicitly relies on a CPython optimisation[0] to reclaim (amortised) linear behaviour. I would rather we avoid this issue and reliance, by using bytearray. The bytearray can be converted into regular bytes at the end of the loop.

[0] it does and can not exist in pypy

return http.local_redirect('/web/database/manager')
except Exception, e:
error = "Database restore error: %s" % str(e) or repr(e)
Expand Down
6 changes: 5 additions & 1 deletion odoo/service/db.py
Expand Up @@ -211,9 +211,13 @@ def dump_db(db_name, stream, backup_format='zip'):
return stdout

def exp_restore(db_name, data, copy=False):
def chunks(d, n=8192):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still need this chunking system now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't use exp_restore anymore in the scope of this PR, this modification could be dropped but IIRC exp_restore is still used in the restful api and this chunking system addresses a critical point of memory usage where the whole database was loaded into RAM by .decode('base64')

for i in range(0, len(d), n):
yield d[i:i+n]
data_file = tempfile.NamedTemporaryFile(delete=False)
try:
data_file.write(data.decode('base64'))
for chunk in chunks(data):
data_file.write(chunk.decode('base64'))
data_file.close()
restore_db(db_name, data_file.name, copy=copy)
finally:
Expand Down