Skip to content

Conversation

@snadrus
Copy link
Contributor

@snadrus snadrus commented Aug 20, 2024

Allow max of 1 db upgrader at a time.

@snadrus snadrus requested a review from magik6k August 20, 2024 16:29
@snadrus
Copy link
Contributor Author

snadrus commented Aug 20, 2024

Well no form of db locking is going to work:
github.com/filecoin-project/curio/harmony/harmonydb.(*DB).upgrade.func1.1
/home/circleci/curio/harmony/harmonydb/harmonydb.go:256
- ERROR: ACCESS EXCLUSIVE lock mode not supported yet (SQLSTATE 0A000)

@snadrus
Copy link
Contributor Author

snadrus commented Aug 20, 2024

Is this a known problem @magik6k or a nice-to-have?
The only window of risk is on upgrade when a new schema arrives if 2 machines get there about the same time (single digit milliseconds). I think today it'll crash the "second" process (which should have a restart), then everything should be fine. I could "sleep on error and retry" to avoid that crash. WDYT?

@magik6k
Copy link
Collaborator

magik6k commented Aug 21, 2024

Is this a known problem @magik6k or a nice-to-have?

A problem that we didn't have any case that manifested in actual trouble, but I do foresee it being an issue at some point given that schema changes in YB do tend to take a while and are likely to be a large source of edge-cases given what YB does with postgres.

The only window of risk is on upgrade when a new schema arrives if 2 machines get there about the same time (single digit milliseconds).

It's usually much longer, in my deployments most migrations take 2-10s, depending on what they do

@LexLuthr
Copy link
Contributor

https://yugabytedb.tips/suppress-the-error/ advisory locks are not supported. I found out yesterday the hard way after using them for libp2p table.

@snadrus
Copy link
Contributor Author

snadrus commented Aug 21, 2024

I'd say we shelve this. If it becomes a problem, maybe the locks will arrive then and this code will be a drop-in fix.

@snadrus
Copy link
Contributor Author

snadrus commented Aug 21, 2024

One alternative is to have an upgrade 'task' that gets scheduled, deduped, claimed by those only with that version, and ran that way, but we can't start the task engine with any other task. It sounds more difficult.

@magik6k
Copy link
Collaborator

magik6k commented Oct 30, 2024

I believe I see a good way to do this:

  • Add a separate base_upgrades table
  • When we don't see an entry in the base table for a given upgrade, write the entry to the base_upgrades table
    • If that write conflicts, wait a bit (~10s) and go to step one. Either someone else has applied the update, or it failed so manual intervention is needed anyways
    • If it didn't conflict, we're now responsible for applying the schema, so do that.

@snadrus
Copy link
Contributor Author

snadrus commented Dec 5, 2024 via email

@LexLuthr
Copy link
Contributor

Closing for now as this has not been a problem in last few months.

@LexLuthr LexLuthr closed this Apr 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants