Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbbackup: A minimal database backup plugin #40

Closed

Conversation

SimonVrouwe
Copy link
Contributor

@SimonVrouwe SimonVrouwe commented Jul 13, 2019

Here a simple plugin that keeps a synchronized backup of CL's database.

I heard rumors that similar (more advanced ) backup plugins are being developed. But I needed this now to sleep better, please comment if that is justified or not.

Some tests are included that can be run in c-lightnings pytest framework by linking this plugin directory into CL's /tests. It is not ideal, but couldn't find another solution.

Attempts to fix #2400

Edit: Last test_dbbackup_plugin_kill requires a logline in CL that's not there yet (will create PR)

@cdecker
Copy link
Contributor

cdecker commented Jul 18, 2019

Awesome work, I'll review this asap ^^

@fiatjaf
Copy link
Contributor

fiatjaf commented Jul 18, 2019

Just tested in production and it works. Now yes, someone more capable should review it.

@cdecker
Copy link
Contributor

cdecker commented Jul 18, 2019

Some tests are included that can be run in c-lightnings pytest framework by linking this plugin directory into CL's /tests. It is not ideal, but couldn't find another solution.

That's something that I have been wanting to look into. Basically the pylightning library should contain the testing framework that we use for c-lightning as well, so that you can just add that as a dependency in your requirements.txt file and, providing you have c-lightning installed somewhere where we can find it, it should be possible to just run against it.

I'm sorry you had to jump through these hoops to get testing working, but great work getting testing working 👍

@SimonVrouwe
Copy link
Contributor Author

SimonVrouwe commented Jul 22, 2019

Modified README.md (nothing else) and squashed everything into two commits.

The /tests directory inside the plugin directory appears harmless, because it doesn't contain any executable files. Not a python expert, but I assume that __pycache__ is also harmless.

As a side note, I run this plugin with Toshiba 16GB usb flash drive for the backup-file. Little concerned about flash (p/e cycles) lifetimes, iostat -d sdb shows an average ~150MB write per day over past 2 weeks for my not-very-busy node. Using a conservative estimate lifetime of 250 writes per cell, this should last for about 16GB * 250 / (0.150 * 365) = 73 year?

dbbackup/README.md Show resolved Hide resolved
dbbackup/README.md Outdated Show resolved Hide resolved
dbbackup/dbbackup.py Outdated Show resolved Hide resolved
'db-backup-file': backup})

# rename backup file will cause db_write failure and crash lightningd
os.rename(backup, backup + '_')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this cause a failure writing to the DB? I thought sqlite3 keeps an fd open to the db file, which is unaffected by moves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea, but it works . The error returned by sqlite3: attempt to write a readonly database

Any other ideas how to emulate a disk-full scenario? (maybe loop device)

@fiatjaf
Copy link
Contributor

fiatjaf commented Jul 25, 2019

How can we use this to backup the db to another computer?

@fiatjaf
Copy link
Contributor

fiatjaf commented Jul 25, 2019

Since I've started to run this plugin I've been seeing hundreds of crashes per day:

+213.639880031 plugin-dbbackup.pyBROKEN: Failed to write to backup: near \"channels\": syntax error
+213.641676833 lightningd(13527):BROKEN: Plugin returned failed db_write: ??{"jsonrpc": "2.0", "id": 231, "result": false}??U.

@cdecker
Copy link
Contributor

cdecker commented Jul 26, 2019

How can we use this to backup the db to another computer?

You'd likely have to mount a remote disk over the network and write the backup file to that disk.

@cdecker
Copy link
Contributor

cdecker commented Jul 26, 2019

Since I've started to run this plugin I've been seeing hundreds of crashes per day:

+213.639880031 plugin-dbbackup.pyBROKEN: Failed to write to backup: near \"channels\": syntax error
+213.641676833 lightningd(13527):BROKEN: Plugin returned failed db_write: ??{"jsonrpc": "2.0", "id": 231, "result": false}??U.

@fiatjaf Can you add a print of the executed SQL command before it is being applied to the replica? Might be a difference in the sqlite3 version used by lightningd and the plugin itself.

@SimonVrouwe
Copy link
Contributor Author

Incorporated the suggestions and some fixups in the tests (some assert's where missing).

Since I've started to run this plugin I've been seeing hundreds of crashes per day

@fiatjaf On a write failure, the returned error now includes the SQL statement that was tried to be written. Maybe this helps in debugging?

@plugin.hook('db_write')
def db_write(plugin, writes):
if not plugin.initted:
plugin.sqlite_pre_init_cmds += writes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a potential (but unlikely) race condition where db_write is invoked before init, then C-lightning crashes before it is able to init the plugins. It may be safer to save these in a temporary file that you clear out once init is invoked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, though writes before the init call will all be migrations iirc, so these are not catastrophic.

Copy link

@ZmnSCPxj ZmnSCPxj Aug 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not keeping up with C-lightning upgrades is "not catastrophic"? Upgrades are a risk by themselves as they represent changed code, and the user might be running on some obscure combination of system components that happen to work on older releases of C-lightning but not on newer ones, so a new release (which is when we are likely to have migrations running) is at increased risk of crashes before init... shrug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZmnSCPxj Thanks, there is also a potential dead-lock when the plugin makes the rpc.stop() call from init while at the same time lightningd is waiting for a db_write hook to return. TBH I was totally unaware that plugin with (sync) hooks shouldn't make any (sync) RPC calls, at least not from the same threat that also handles the hook.

If I understand correctly, RPC calls can be send after jsonrpc_listen but are not be answered until the main io_loop_with_timers

To solve these and similar issues, I am exploring a sync init in plugins_config where lightningd waits for the init response of certain (perhaps all static?) plugins before continue startup, in a similar way as it does for getmanifest. And maybe shutdown CL when a plugin's init returns False? Will make a PR if that doesn't break too much.

os.rename(backup_copy, backup)
plugin.log("Existing db-backup-file OK and successfully synced")
else:
plugin.log("Existing db-backup-file differs from original database, i.e. applying"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better handling of this would be to just outright copy the original over the backup. Though it would not be safe to do so in the init as that is async. Instead, on the first db_write after init, that is when you do your overwriting the backup with the original. Though this is sensitive to the order in which C-Lightning writes to its own original copy (whether it does the write before db_write or after db_write). Might be more complicated than what you are willing to implement and I suppose the operator can manually do the copy in this edge case while C-Lightning is not running.

@cdecker
Copy link
Contributor

cdecker commented Aug 29, 2019

Just a heads up that with PR ElementsProject/lightning#2924 a couple of things will change:

  • The DB initialization will no longer be sent as a db_write call. This is limited to the PRAGMA foreign_keys = ON; query that enables foreign keys (and by consequence enforcement of cascade constraints on which we rely heavily). So make sure to enable that upon opening the destination DB, otherwise some queries could hit constraints like this plugin: dbbackup "Failed to write to backup: UNIQUE constraint failed: utxoset.txid, utxoset.outnum" ElementsProject/lightning#3007.
  • We will also no longer send the BEGIN TRANSACTION; and COMMIT; statements that initialize and commit a DB transaction. This is mainly due to them also being the first and last statement respectively, and were mostly just a waste of payload. You can check whether you need to wrap the statements in the db_write payload by checking whether the first and last statement match the above TX statements. A single call to db_write will always correspond to a single DB transaction, so that is safe.

This was done partially because it allows us to safe some data transferred, simplifies the flush logic (empty list of statements means we don't call db_write, whereas before we had to check whether we only have TX begin and commit statements), and unifies the logic for other DBs that are supported, that may or may not have SQL statements to initialize and complete DB transactions, or just C API calls with no SQL representation.

@SimonVrouwe
Copy link
Contributor Author

I am closing this PR, because I am not really happy about the design, mostly the use of sqlite_pre_init_cmds buffer, which feels like a hack. To make it really bullet-proof, I think we need something like ElementsProject/lightning#3013.

Or maybe a plugin is just not the right choice for this or even having a sync db_write hook for plugins was a mistake because it just doesn't fit in well enough.

@SimonVrouwe SimonVrouwe closed this Sep 2, 2019
@SimonVrouwe SimonVrouwe deleted the 2_july_plugin_dbbackup branch October 6, 2019 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Backup and other feature requests
4 participants