Litestream from a directory of sqlite files #1

ericvolp12 · 2023-10-30T22:02:35Z

Litestream from a directory of sqlite files, only keeping recently modified DBs in active replication mode

This only supports S3 right now and has a lot of polishing that needs done but it technically works

…dified DBs in active replication mode

ericvolp12 · 2023-10-30T22:08:53Z

For testing, build litestream and start it with:

$ export AWS_ACCESS_KEY_ID={key_id}
$ export AWS_SECRET_ACCESS_KEY={access_key_secret}
$ go build -ldflags "-s -w -X 'main.Version=${LITESTREAM_VERSION}' -extldflags '-static'" -tags osusergo,netgo,sqlite_omit_load_extension ./cmd/litestream
$ ./litestream replicate-dir dbs/ https://{s3_compatible_host}/{bucket}

Then start the test script which generates 100 sqlite DBs and updates them every 5 seconds. It updates 10 DBs for 3 minutes and then moves onto another set of 10, simulating active/passive users so you can see litestream closing the inactive replicas until the next time they become active.

$ go run test/test.go

ericvolp12 · 2023-10-31T01:26:31Z

Did some more reading into inotify and this will definitely run into max_watcher limits. The directory watchers are cheap since you only need one per directory but since we want to watch for modifications on each file in the directory, we need to have an inotify watcher registered in the kernel for each sqlite which will hit the 8192 default limit pretty quickly. We can raise the limit a bunch if we want, just something to be aware of.

ericvolp12 · 2023-10-31T05:32:06Z

There may be a performance showstopper in terms of memory usage at certain scales.

When trying to track 10,000 active SQLites I OOM'd blowing way past the 16GB I had set aside for it:

^ the above heap dump didn't show the total resident memory for some reason

Also litestream ended up with > 10,000 threads active on my system which was maybe not great.

When only 1,000 DBs are active at once, we peak around 6.7GB of memory usage and it's a lot more manageable. CPU sits at >300% though (3 logical cores of a AMD Epyc 7302P) which is pretty intense, spiking to >700% CPU when shifting between active sets of DBs (2k active DBs during the shift):

CPU Profile with 1,000 changing DBs for 120 seconds capturing the "switchover":

Approx 40% of the CPU cycles are spent in Malloc and GC

Dirstream standalone binary

…etrics for tracking active DBs

ericvolp12 · 2023-11-01T23:21:34Z

I ran into some system limits which caused the following error:

{"time":"2023-11-01T16:00:40.421177913-07:00","level":"ERROR","msg":"watcher error","source":"watcher","error":"fsnotify: queue or buffer overflow"}

Digging into fsnotify source I found this snippet so I've bumped two system params to help deal with it:

$ sudo sysctl fs.inotify.max_queued_events=65536 # bump the inotify queue max size by 4x to 2^16
$ sudo sysctl fs.inotify.max_user_watches=524288 # bump the inotify max user watches to 2^19

ericvolp12 · 2023-11-02T17:49:10Z

This most recent commit hasn't been tested yet because of the CloudFlare outages but I'll run through it once things are stable.

ericvolp12 added 2 commits October 30, 2023 15:01

Litestream from a directory of sqlite files, only keeping recently mo…

e3c50c1

…dified DBs in active replication mode

Add test file

af41a45

ericvolp12 added 2 commits October 30, 2023 15:15

Use a set not a list for debounce

d112b20

Timer on debounce

df0cb2e

ericvolp12 added 9 commits October 30, 2023 22:34

Update test and add default addr for metrics for litestream

19dc6d8

Handy-dandy deletion program for emptying a s3 bucket

42cf8e9

Dirstream standalone binary

76b6901

Merge branch 'dirstream' into dirstream_standalone

6ee5a2b

Merge pull request #2 from ericvolp12/dirstream_standalone

f08cf6f

Dirstream standalone binary

Revert changes to main litestream

3da07f8

Cleaning up some logging

c2e70a0

Filter out high cardinality litestream metrics from prometheus, add m…

0299c29

…etrics for tracking active DBs

Configurable TTL and Sync Interval, log config options on startup

c4ac948

ericvolp12 added 2 commits November 1, 2023 16:28

Cycle tests faster

877525f

Preserve relative paths and limit concurrently syncing DBs

bf4d7a4

ericvolp12 added 2 commits November 6, 2023 14:21

Refactoring and fixing bugs

4ba40dc

Working out some bugs in the LRU

f754b42

ericvolp12 mentioned this pull request Nov 7, 2023

Create Literiver, a mass-sqlite backup tool bluesky-social/indigo#421

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Litestream from a directory of sqlite files #1

Litestream from a directory of sqlite files #1

ericvolp12 commented Oct 30, 2023

ericvolp12 commented Oct 30, 2023

ericvolp12 commented Oct 31, 2023

ericvolp12 commented Oct 31, 2023 •

edited

Loading

ericvolp12 commented Nov 1, 2023

ericvolp12 commented Nov 2, 2023

Litestream from a directory of sqlite files #1

Are you sure you want to change the base?

Litestream from a directory of sqlite files #1

Conversation

ericvolp12 commented Oct 30, 2023

ericvolp12 commented Oct 30, 2023

ericvolp12 commented Oct 31, 2023

ericvolp12 commented Oct 31, 2023 • edited Loading

ericvolp12 commented Nov 1, 2023

ericvolp12 commented Nov 2, 2023

ericvolp12 commented Oct 31, 2023 •

edited

Loading