fix goose memory consumption on large migrations #93

hexdigest · 2018-01-22T16:58:42Z

Hi,

I found out that on large SQL migrations that either have multiple queries (>25k insert queries in my case) or one insert query with 25K lines of values divided by "," goose consumes about 200M of memory. The size of the .sql file is about 5M.

This PR fixes the issue without breaking to much of an existing code.

VojtechVitek

Hi @hexdigest. Thanks for your contribution!

How does this work in high level?
Why do we need sync.Pool at all?
Can't we use bytes.Buffer instead?

hexdigest · 2018-01-29T15:00:26Z

Hi @VojtechVitek

Well, actually this PR fixes 2 problems:

Default 64K buffer in bufio.Scanner, so you can't run migrations where the lines are too long, I get "token too long" error when tried to use result of pg_dump as a migration.
If you simply use new bigger buffer in endsWithSemicolon func then on large migrations you have a chance to get OOM error before GC cleans everything, in my case container is killed by docker (64Mb limit is set), that's why I used sync.Pool

I can update PR using bytes.Buffer but it's not concurrent safe, so if there'll be parallel tests for endsWithSemicolon/getSQLStatements they might blink. What do you think?

VojtechVitek · 2018-05-03T15:38:55Z

@hexdigest hey, sorry for late answer. Yeah, if you could use bytes.Buffer, that'd be great. I don't think we need to worry about concurrency, tests might as well use two instances of goose. Goose migrations are supposed to run sequentially anyway.

How did you come up with number for scanBufSize btw, can you explain in the code as a comment?

LGTM, if we get rid of sync.Pool and use bytes.Buffer :)

VojtechVitek

LGTM

I found some time today to test this PR against our CI test suite (~100 sample .sql migrations that we ran in production in the past) and against our pre-prod environments. No issues found. Going to merge this.

Thanks again for your contribution!

Max Chechel added 2 commits January 22, 2018 19:51

fix goose memory consumption on large migrations

111f7d2

return buffer back to the pool

90415d4

VojtechVitek reviewed Jan 29, 2018

View reviewed changes

VojtechVitek approved these changes Mar 4, 2019

View reviewed changes

VojtechVitek merged commit 9292c39 into pressly:master Mar 4, 2019

VojtechVitek mentioned this pull request Mar 4, 2019

Support large lines #74

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix goose memory consumption on large migrations #93

fix goose memory consumption on large migrations #93

hexdigest commented Jan 22, 2018

VojtechVitek left a comment

hexdigest commented Jan 29, 2018

VojtechVitek commented May 3, 2018

VojtechVitek left a comment

fix goose memory consumption on large migrations #93

fix goose memory consumption on large migrations #93

Conversation

hexdigest commented Jan 22, 2018

VojtechVitek left a comment

Choose a reason for hiding this comment

hexdigest commented Jan 29, 2018

VojtechVitek commented May 3, 2018

VojtechVitek left a comment

Choose a reason for hiding this comment