Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pebble datastore #2018

Merged
merged 4 commits into from Jan 15, 2024
Merged

Update pebble datastore #2018

merged 4 commits into from Jan 15, 2024

Conversation

hsanjuan
Copy link
Collaborator

@hsanjuan hsanjuan commented Jan 9, 2024

No description provided.

@hsanjuan hsanjuan added this to the Release v1.0.8 milestone Jan 9, 2024
@Mayeu
Copy link

Mayeu commented Jan 10, 2024

Thank you for this.

It seems that pebble can't auto migrate the store:

ipfs-cluster-service[2226543]: error creating datastore: failed to open pebble database: pebble: database "/ipfs-data/cluster/pebble" written in format major version 1 which is no longer supported

(I did change the services.json file).

Also this test data should be updated with the new version.

@Mayeu
Copy link

Mayeu commented Jan 10, 2024

I'll try to export the state of one of our node, and reimport it in an up to date store to test the patch.

@hsanjuan
Copy link
Collaborator Author

I have to revert to pebble v1.0.0 then. Which should be newer than latest release but older than what we had on master.

@hsanjuan
Copy link
Collaborator Author

@Mayeu ok, this should be compatible with your existing DB. This is the latest pebble version we can use and we will have to stick with that. Anything newer has to go as a new pebble2 datastore backend, as there is no tooling to upgrade existing databases. Can you try this one?

v1.0.0 would be a downgrade, which would also force me to revert the 32-bit-build fixes etc.

@Mayeu
Copy link

Mayeu commented Jan 10, 2024

I'll try that and report back. Thank you.

@Mayeu
Copy link

Mayeu commented Jan 10, 2024

@hsanjuan: when I tried to export & reimport the state earlier (with ipfs-cluster-service state {export,import}), something came up preventing me from doing so. On both node the export stopped after a while (in the middle of a JSON line) and just hanged there. On each node it consistently stopped on a specific CID (different CID for each node, but relaunching the export hanged on the same spot).

After around 30 min of being stuck, I assume this was definitely not going to work so I stopped trying to export.

That said, exporting everything was possible with ipfs-cluster-ctl status --local that works without issue.

@Mayeu
Copy link

Mayeu commented Jan 10, 2024

@hsanjuan: running the patched version starts as planned. But around 30 minutes after starting, it stop committing things up.

That said, I may have found something unexpected in pebble configuration. In the pebble/OPTIONS-008892 file, the max_concurrent_compactions option was set to 0. I just launched a node where I override this to 5 (the default from ipfs-cluster source code, I also changed it in the service.json file).

Here is the content of the pebble/OPTIONS-008892 file (after updating the concurrent compaction setting):

[Options]
  bytes_per_sync=1048576
  cache_size=1073741824
  cleaner=delete
  compaction_debt_concurrency=1073741824
  comparer=leveldb.BytewiseComparator
  disable_wal=false
  flush_delay_delete_range=0s
  flush_delay_range_key=0s
  flush_split_bytes=4194304
  format_major_version=10
  l0_compaction_concurrency=10
  l0_compaction_file_threshold=750
  l0_compaction_threshold=4
  l0_stop_writes_threshold=12
  lbase_max_bytes=134217728
  max_concurrent_compactions=5
  max_manifest_file_size=134217728
  max_open_files=1000
  mem_table_size=67108864
  mem_table_stop_writes_threshold=20
  min_deletion_rate=0
  merger=pebble.concatenate
  multilevel_compaction_heuristic=wamp(0.00, false)
  read_compaction_rate=16000
  read_sampling_multiplier=16
  strict_wal_tail=true
  table_cache_shards=12
  table_property_collectors=[]
  validate_on_ingest=false
  wal_dir=
  wal_bytes_per_sync=0
  max_writer_concurrency=0
  force_writer_parallelism=false
  secondary_cache_size_bytes=0
  create_on_shared=0

@hsanjuan
Copy link
Collaborator Author

hmm did that fix anything? Do you have any idea why it was 0?

@Mayeu
Copy link

Mayeu commented Jan 11, 2024

Sorry, I forgot to send a message yesterday evening. Changing that seems to have done the trick. As soon as I started the daemon with this configuration change, I saw compaction happened in the pebble stats and data moved through the cache layer.

The daemon has been running for 20h without locks up now. (Before it was locking around 1h in.)

I have no clue why it suddenly got to 0. When I did the setup, I generated all the default configuration with ipfs-cluster-service init and then we never touched other files than the service.json. We don't have regular automatic snapshots of those files, but in the few I created around the cluster creation I see that this option was set correctly to 5.

@Mayeu
Copy link

Mayeu commented Jan 11, 2024

@hsanjuan: about migrating the datastore, pebble's documentation state that:

To opt into new formats, a user may set FormatMajorVersion on the [Options](https://pkg.go.dev/github.com/cockroachdb/pebble#Options) supplied to [Open](https://pkg.go.dev/github.com/cockroachdb/pebble#Open),

And then there is an array describing which versions have migration.

So I tried to change that option in the service.json (which ipfs-cluster at commit 46fd8e5), setting it directly to 14, but that failed with a panic:

panic: expected a compaction of marked files in progress [recovered]
panic: expected a compaction of marked files in progress

But jumping from 1 to 7, then 7 to 10, and now 10 to 14 works. One of our nodes is now running the datastore v14 format.

So a migration tool could be created by looking at the current store version, and opening the DB as much as needed to reach the highest version supported by that version of pebble.

In our case, even blocking migrations were basically instant.

@hsanjuan
Copy link
Collaborator Author

I thought it gave this error: #2018 (comment) when upgrading.

Anyways, is there a version where things work for you?

@Mayeu
Copy link

Mayeu commented Jan 11, 2024

I thought it gave this error: #2018 (comment) when upgrading.

Anyways, is there a version where things work for you?

Ha no, this error was because the pebble version in ipfs-cluster did not support the version 1 datastore format anymore. That was with commit dc6ff93 (pebble v0.0.0-20240109173520-f93e739e51c8).

After you downgraded pebble to v0.0.0-20231218155426-48b54c29d8fe in commit 46fd8e5 the version 1 store was loaded correctly, and I was able to run the datastore migration.

@hsanjuan
Copy link
Collaborator Author

And the deadlock?

@Mayeu
Copy link

Mayeu commented Jan 11, 2024

@hsanjuan: it seems to be gone since the upgrade to 46fd8e5 + forcing the max_concurrent_compaction to 5.

@hsanjuan
Copy link
Collaborator Author

ok, so this PR probably solves the deadlock problem "as is". And then I have somehow to find a way to move FormatMajorVersion up to 13 in multiple steps so that datastore upgrades itself to a point where it is ready to take the v2-series (which start at FormatMajorVersion 14). Right?

@Mayeu
Copy link

Mayeu commented Jan 15, 2024

ok, so this PR probably solves the deadlock problem "as is".

Yes

And then I have somehow to find a way to move FormatMajorVersion up to 13 in multiple steps so that datastore upgrades itself to a point where it is ready to take the v2-series (which start at FormatMajorVersion 14).

You won't be able to jump to 13 because it does not come with a migration you'll have to target FormatMajorVersion 14. (Only 6, 7, 10, and 14 have migration apparently.) And based on my test, you should be able to directly jump to 7.

Don't hesitate to poke me in the PR about the migration. I can help test it as we still have a node using the FormatMajorVersion 1.

@hsanjuan hsanjuan merged commit 8c9899b into master Jan 15, 2024
11 checks passed
@hsanjuan hsanjuan deleted the update-pebble2 branch January 15, 2024 14:26
@hsanjuan
Copy link
Collaborator Author

I don't think the pebble version in this commit supports V14? Anyways, I'll check and probably add a specific command to upgrade.

@Mayeu
Copy link

Mayeu commented Jan 15, 2024

I don't think the pebble version in this commit supports V14? Anyways, I'll check and probably add a specific command to upgrade.

Hmm, I was sure to have upgraded one node up to V14, but after double-checking it seems to be on V10 only. So yeah, not sure anymore if I tried with V14.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants