Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Litestream from a directory of sqlite files #1

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Conversation

ericvolp12
Copy link
Owner

Litestream from a directory of sqlite files, only keeping recently modified DBs in active replication mode

This only supports S3 right now and has a lot of polishing that needs done but it technically works

@ericvolp12
Copy link
Owner Author

For testing, build litestream and start it with:

$ export AWS_ACCESS_KEY_ID={key_id}
$ export AWS_SECRET_ACCESS_KEY={access_key_secret}
$ go build -ldflags "-s -w -X 'main.Version=${LITESTREAM_VERSION}' -extldflags '-static'" -tags osusergo,netgo,sqlite_omit_load_extension ./cmd/litestream
$ ./litestream replicate-dir dbs/ https://{s3_compatible_host}/{bucket}

Then start the test script which generates 100 sqlite DBs and updates them every 5 seconds. It updates 10 DBs for 3 minutes and then moves onto another set of 10, simulating active/passive users so you can see litestream closing the inactive replicas until the next time they become active.

$ go run test/test.go

@ericvolp12
Copy link
Owner Author

Did some more reading into inotify and this will definitely run into max_watcher limits. The directory watchers are cheap since you only need one per directory but since we want to watch for modifications on each file in the directory, we need to have an inotify watcher registered in the kernel for each sqlite which will hit the 8192 default limit pretty quickly. We can raise the limit a bunch if we want, just something to be aware of.

@ericvolp12
Copy link
Owner Author

ericvolp12 commented Oct 31, 2023

There may be a performance showstopper in terms of memory usage at certain scales.

When trying to track 10,000 active SQLites I OOM'd blowing way past the 16GB I had set aside for it:
CleanShot 2023-10-30 at 22 14 45

^ the above heap dump didn't show the total resident memory for some reason

Also litestream ended up with > 10,000 threads active on my system which was maybe not great.

When only 1,000 DBs are active at once, we peak around 6.7GB of memory usage and it's a lot more manageable. CPU sits at >300% though (3 logical cores of a AMD Epyc 7302P) which is pretty intense, spiking to >700% CPU when shifting between active sets of DBs (2k active DBs during the shift):
CleanShot 2023-10-30 at 22 21 08

CPU Profile with 1,000 changing DBs for 120 seconds capturing the "switchover":
CleanShot 2023-10-30 at 22 31 18

Approx 40% of the CPU cycles are spent in Malloc and GC

@ericvolp12
Copy link
Owner Author

I ran into some system limits which caused the following error:

{"time":"2023-11-01T16:00:40.421177913-07:00","level":"ERROR","msg":"watcher error","source":"watcher","error":"fsnotify: queue or buffer overflow"}

Digging into fsnotify source I found this snippet so I've bumped two system params to help deal with it:

$ sudo sysctl fs.inotify.max_queued_events=65536 # bump the inotify queue max size by 4x to 2^16
$ sudo sysctl fs.inotify.max_user_watches=524288 # bump the inotify max user watches to 2^19

@ericvolp12
Copy link
Owner Author

This most recent commit hasn't been tested yet because of the CloudFlare outages but I'll run through it once things are stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant