Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix goose memory consumption on large migrations #93

Merged
merged 2 commits into from
Mar 4, 2019

Conversation

hexdigest
Copy link
Contributor

Hi,

I found out that on large SQL migrations that either have multiple queries (>25k insert queries in my case) or one insert query with 25K lines of values divided by "," goose consumes about 200M of memory. The size of the .sql file is about 5M.

This PR fixes the issue without breaking to much of an existing code.

Copy link
Collaborator

@VojtechVitek VojtechVitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @hexdigest. Thanks for your contribution!

  1. How does this work in high level?
  2. Why do we need sync.Pool at all?
  3. Can't we use bytes.Buffer instead?

@hexdigest
Copy link
Contributor Author

Hi @VojtechVitek

Well, actually this PR fixes 2 problems:

  1. Default 64K buffer in bufio.Scanner, so you can't run migrations where the lines are too long, I get "token too long" error when tried to use result of pg_dump as a migration.
  2. If you simply use new bigger buffer in endsWithSemicolon func then on large migrations you have a chance to get OOM error before GC cleans everything, in my case container is killed by docker (64Mb limit is set), that's why I used sync.Pool

I can update PR using bytes.Buffer but it's not concurrent safe, so if there'll be parallel tests for endsWithSemicolon/getSQLStatements they might blink. What do you think?

@VojtechVitek
Copy link
Collaborator

@hexdigest hey, sorry for late answer. Yeah, if you could use bytes.Buffer, that'd be great. I don't think we need to worry about concurrency, tests might as well use two instances of goose. Goose migrations are supposed to run sequentially anyway.

How did you come up with number for scanBufSize btw, can you explain in the code as a comment?

LGTM, if we get rid of sync.Pool and use bytes.Buffer :)

Copy link
Collaborator

@VojtechVitek VojtechVitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I found some time today to test this PR against our CI test suite (~100 sample .sql migrations that we ran in production in the past) and against our pre-prod environments. No issues found. Going to merge this.

Thanks again for your contribution!

@VojtechVitek VojtechVitek merged commit 9292c39 into pressly:master Mar 4, 2019
@VojtechVitek VojtechVitek mentioned this pull request Mar 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants