Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import export trie log #6363

Merged
merged 51 commits into from Jan 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
16c0a49
Add x-trie-log subcommand for one-off backlog prune
siladu Nov 20, 2023
7dd4928
long -> int
siladu Nov 20, 2023
bf2b098
Removed banned method
gfukushima Dec 12, 2023
e67ae51
Preload process stream in parallel
gfukushima Dec 12, 2023
9b4e0c9
Drop unwanted trielogs and keep reatain layers only
gfukushima Dec 14, 2023
0b9fe83
Add output to user and cleanup refactor
gfukushima Dec 15, 2023
426848e
small tweak to display cf that had reference dropped by RocksDbSegmen…
gfukushima Dec 15, 2023
7401b59
spotless
gfukushima Dec 15, 2023
1b7fb72
Fix classes that changed package
gfukushima Dec 15, 2023
11e6b05
spotless
gfukushima Dec 15, 2023
f2d01e2
Code review
gfukushima Dec 15, 2023
04f1aaa
Only clear DB when we have the exact amount of trie logs we want in m…
gfukushima Dec 15, 2023
2f01c5a
Trielogs stream to and from file to avoid possibly OOM
gfukushima Dec 18, 2023
56e4c8e
Process trie logs in chunks to avoid OOM
gfukushima Dec 18, 2023
78561b0
save and read in batches to handle edge cases
gfukushima Dec 19, 2023
42c72cf
save and read files to/from database dir
gfukushima Dec 20, 2023
9961fc2
Merge branch 'main' into x-trie-log-subcommand-2
gfukushima Dec 20, 2023
9389540
add unit tests and PR review fixes
gfukushima Dec 21, 2023
e3d4fbc
Merge branch 'main' into x-trie-log-subcommand-2
gfukushima Dec 21, 2023
c7144fe
spdx
gfukushima Dec 21, 2023
20b0ba5
Fix unit tests directory creation and deletion
gfukushima Dec 21, 2023
586ab25
rename Xbonsai-trie-log-pruning-enabled to Xbonsai-limit-trie-logs-en…
gfukushima Jan 4, 2024
67e6f3d
Import and export trie log subcommands
gfukushima Jan 4, 2024
b9640e5
PR review
gfukushima Jan 4, 2024
3bc1878
spotless
gfukushima Jan 4, 2024
d47ddf5
fix path resolver and added unit tests
gfukushima Jan 7, 2024
999edb6
Merge branch 'main' into import-export-trie-log
gfukushima Jan 8, 2024
1699fe4
fix unit test
gfukushima Jan 8, 2024
e679cb3
Merge remote-tracking branch 'origin/import-export-trie-log' into imp…
gfukushima Jan 8, 2024
5d3b4f2
fix unit test
gfukushima Jan 8, 2024
f839b75
Merge branch 'main' into import-export-trie-log
gfukushima Jan 8, 2024
0caa4cf
Add import and export to list of subcommands under --x-trie-log
gfukushima Jan 8, 2024
2d5d31d
Merge remote-tracking branch 'origin/import-export-trie-log' into imp…
gfukushima Jan 8, 2024
37df23e
Remove static from setup method
gfukushima Jan 8, 2024
5ce1800
change option name and fix descriptions
gfukushima Jan 8, 2024
98423dc
Merge branch 'main' into import-export-trie-log
gfukushima Jan 8, 2024
cf3a5e6
Fix subcommands descriptions
gfukushima Jan 9, 2024
c759bba
Merge branch 'main' into import-export-trie-log
gfukushima Jan 9, 2024
5fb9413
Remove old flag and move commands const into Unstable
gfukushima Jan 12, 2024
087c54b
Allow list of block hashes to passed as well as a file to be generate…
gfukushima Jan 12, 2024
4b033a3
Merge remote-tracking branch 'origin/import-export-trie-log' into imp…
gfukushima Jan 12, 2024
3a89ac3
Allow list of block hashes to passed as well as a file to be generate…
gfukushima Jan 12, 2024
9be7d13
Fix broken test when replaced the old option
gfukushima Jan 12, 2024
75d1c3b
import and export using rlp
jframe Jan 19, 2024
5eb4cda
Merge branch 'main' into import-export-trie-log
jframe Jan 19, 2024
5425075
tests for exporting and importing multiple trielogs
jframe Jan 19, 2024
2de0b19
Merge branch 'import-export-trie-log' of github.com:gfukushima/besu i…
jframe Jan 19, 2024
248a776
Merge branch 'main' into import-export-trie-log
jframe Jan 19, 2024
55d653e
fix build
jframe Jan 19, 2024
c9d38f0
Merge branch 'main' into import-export-trie-log
jframe Jan 19, 2024
b9d7620
Merge branch 'main' into import-export-trie-log
jframe Jan 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Expand Up @@ -62,23 +62,28 @@ public class DataStorageOptions implements CLIOptions<DataStorageConfiguration>
private final DataStorageOptions.Unstable unstableOptions = new Unstable();

static class Unstable {
private static final String BONSAI_LIMIT_TRIE_LOGS_ENABLED =
"--Xbonsai-limit-trie-logs-enabled";
private static final String BONSAI_TRIE_LOGS_RETENTION_THRESHOLD =
"--Xbonsai-trie-logs-retention-threshold";
private static final String BONSAI_TRIE_LOG_PRUNING_LIMIT = "--Xbonsai-trie-logs-pruning-limit";

@CommandLine.Option(
hidden = true,
names = {"--Xbonsai-trie-log-pruning-enabled"},
names = {BONSAI_LIMIT_TRIE_LOGS_ENABLED},
description = "Enable trie log pruning. (default: ${DEFAULT-VALUE})")
private boolean bonsaiTrieLogPruningEnabled = DEFAULT_BONSAI_TRIE_LOG_PRUNING_ENABLED;

@CommandLine.Option(
hidden = true,
names = {"--Xbonsai-trie-log-retention-threshold"},
names = {BONSAI_TRIE_LOGS_RETENTION_THRESHOLD},
description =
"The number of blocks for which to retain trie logs. (default: ${DEFAULT-VALUE})")
private long bonsaiTrieLogRetentionThreshold = DEFAULT_BONSAI_TRIE_LOG_RETENTION_THRESHOLD;

@CommandLine.Option(
hidden = true,
names = {"--Xbonsai-trie-log-pruning-limit"},
names = {BONSAI_TRIE_LOG_PRUNING_LIMIT},
description =
"The max number of blocks to load and prune trie logs for at startup. (default: ${DEFAULT-VALUE})")
private int bonsaiTrieLogPruningLimit = DEFAULT_BONSAI_TRIE_LOG_PRUNING_LIMIT;
Expand Down
Expand Up @@ -22,7 +22,11 @@
import org.hyperledger.besu.ethereum.chain.Blockchain;
import org.hyperledger.besu.ethereum.chain.MutableBlockchain;
import org.hyperledger.besu.ethereum.core.BlockHeader;
import org.hyperledger.besu.ethereum.rlp.BytesValueRLPInput;
import org.hyperledger.besu.ethereum.rlp.RLP;
import org.hyperledger.besu.ethereum.trie.bonsai.storage.BonsaiWorldStateKeyValueStorage;
import org.hyperledger.besu.ethereum.trie.bonsai.trielog.TrieLogFactoryImpl;
import org.hyperledger.besu.ethereum.trie.bonsai.trielog.TrieLogLayer;
import org.hyperledger.besu.ethereum.worldstate.DataStorageConfiguration;

import java.io.File;
Expand All @@ -32,13 +36,15 @@
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.PrintWriter;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.IdentityHashMap;
import java.util.List;
import java.util.Optional;
import java.util.concurrent.atomic.AtomicInteger;

import org.apache.tuweni.bytes.Bytes;
import org.apache.tuweni.bytes.Bytes32;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
Expand Down Expand Up @@ -97,16 +103,15 @@ private static void processTrieLogBatches(
final String batchFileNameBase) {

for (long batchNumber = 1; batchNumber <= numberOfBatches; batchNumber++) {

final String batchFileName = batchFileNameBase + "-" + batchNumber;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might make more sense to have first/last block numbers included in the filename. Otherwise it won't be clear what is actually in the files after an export

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The batch filenames aren't used as part of import/export subcommands the filename is taken from command line args instead. This is only used for the prune subcommand

final long firstBlockOfBatch = chainHeight - ((batchNumber - 1) * BATCH_SIZE);

final long lastBlockOfBatch =
Math.max(chainHeight - (batchNumber * BATCH_SIZE), lastBlockNumberToRetainTrieLogsFor);

final List<Hash> trieLogKeys =
getTrieLogKeysForBlocks(blockchain, firstBlockOfBatch, lastBlockOfBatch);

saveTrieLogBatches(batchFileNameBase, rootWorldStateStorage, batchNumber, trieLogKeys);
LOG.info("Saving trie logs to retain in file (batch {})...", batchNumber);
siladu marked this conversation as resolved.
Show resolved Hide resolved
saveTrieLogBatches(batchFileName, rootWorldStateStorage, trieLogKeys);
}

LOG.info("Clear trie logs...");
Expand All @@ -118,15 +123,12 @@ private static void processTrieLogBatches(
}

private static void saveTrieLogBatches(
final String batchFileNameBase,
final String batchFileName,
final BonsaiWorldStateKeyValueStorage rootWorldStateStorage,
final long batchNumber,
final List<Hash> trieLogKeys) {

LOG.info("Saving trie logs to retain in file (batch {})...", batchNumber);

try {
saveTrieLogsInFile(trieLogKeys, rootWorldStateStorage, batchNumber, batchFileNameBase);
saveTrieLogsInFile(trieLogKeys, rootWorldStateStorage, batchFileName);
} catch (IOException e) {
LOG.error("Error saving trie logs to file: {}", e.getMessage());
throw new RuntimeException(e);
Expand Down Expand Up @@ -210,9 +212,8 @@ private static void recreateTrieLogs(
final String batchFileNameBase)
throws IOException {
// process in chunk to avoid OOM

IdentityHashMap<byte[], byte[]> trieLogsToRetain =
readTrieLogsFromFile(batchFileNameBase, batchNumber);
final String batchFileName = batchFileNameBase + "-" + batchNumber;
IdentityHashMap<byte[], byte[]> trieLogsToRetain = readTrieLogsFromFile(batchFileName);
final int chunkSize = ROCKSDB_MAX_INSERTS_PER_TRANSACTION;
List<byte[]> keys = new ArrayList<>(trieLogsToRetain.keySet());

Expand Down Expand Up @@ -265,11 +266,10 @@ private static void validatePruneConfiguration(final DataStorageConfiguration co
private static void saveTrieLogsInFile(
final List<Hash> trieLogsKeys,
final BonsaiWorldStateKeyValueStorage rootWorldStateStorage,
final long batchNumber,
final String batchFileNameBase)
final String batchFileName)
throws IOException {

File file = new File(batchFileNameBase + "-" + batchNumber);
File file = new File(batchFileName);
if (file.exists()) {
LOG.error("File already exists, skipping file creation");
return;
Expand All @@ -285,24 +285,67 @@ private static void saveTrieLogsInFile(
}

@SuppressWarnings("unchecked")
private static IdentityHashMap<byte[], byte[]> readTrieLogsFromFile(
final String batchFileNameBase, final long batchNumber) {
static IdentityHashMap<byte[], byte[]> readTrieLogsFromFile(final String batchFileName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non blocking feedback -

I am all for code reuse, but if we are going to allow for arbitrary import and export, the import files should be more readable and "createable".

The ObjectOutputStream seems fine for backup/recovery of a pruning process, but when part of a general import/export process this file format is too inscrutable IMO.

At least for import/export we should serialize/deserialize these as json maps. Key as the hash string, and the trielog itself as hex (or as a rich json object if we wanted to be super transparent). In addition to being a bit more introspectable, it would allow us to create and import our own handcrafted trielogs when debugging


IdentityHashMap<byte[], byte[]> trieLogs;
try (FileInputStream fis = new FileInputStream(batchFileNameBase + "-" + batchNumber);
try (FileInputStream fis = new FileInputStream(batchFileName);
ObjectInputStream ois = new ObjectInputStream(fis)) {

trieLogs = (IdentityHashMap<byte[], byte[]>) ois.readObject();

} catch (IOException | ClassNotFoundException e) {

LOG.error(e.getMessage());
throw new RuntimeException(e);
}

return trieLogs;
}

private static void saveTrieLogsAsRlpInFile(
final List<Hash> trieLogsKeys,
final BonsaiWorldStateKeyValueStorage rootWorldStateStorage,
final String batchFileName) {
File file = new File(batchFileName);
if (file.exists()) {
LOG.error("File already exists, skipping file creation");
return;
}

final IdentityHashMap<byte[], byte[]> trieLogs =
getTrieLogs(trieLogsKeys, rootWorldStateStorage);
final Bytes rlp =
RLP.encode(
o ->
o.writeList(
trieLogs.entrySet(), (val, out) -> out.writeRaw(Bytes.wrap(val.getValue()))));
try {
Files.write(file.toPath(), rlp.toArrayUnsafe());
} catch (IOException e) {
LOG.error(e.getMessage());
throw new RuntimeException(e);
}
}

static IdentityHashMap<byte[], byte[]> readTrieLogsAsRlpFromFile(final String batchFileName) {
try {
final Bytes file = Bytes.wrap(Files.readAllBytes(Path.of(batchFileName)));
final BytesValueRLPInput input = new BytesValueRLPInput(file, false);

input.enterList();
final IdentityHashMap<byte[], byte[]> trieLogs = new IdentityHashMap<>();
while (!input.isEndOfCurrentList()) {
final Bytes trieLogBytes = input.currentListAsBytes();
TrieLogLayer trieLogLayer =
TrieLogFactoryImpl.readFrom(new BytesValueRLPInput(Bytes.wrap(trieLogBytes), false));
trieLogs.put(trieLogLayer.getBlockHash().toArrayUnsafe(), trieLogBytes.toArrayUnsafe());
}
input.leaveList();

return trieLogs;
} catch (IOException e) {
throw new RuntimeException(e);
}
}

private static IdentityHashMap<byte[], byte[]> getTrieLogs(
final List<Hash> trieLogKeys, final BonsaiWorldStateKeyValueStorage rootWorldStateStorage) {
IdentityHashMap<byte[], byte[]> trieLogsToRetain = new IdentityHashMap<>();
Expand Down Expand Up @@ -357,5 +400,25 @@ static void printCount(final PrintWriter out, final TrieLogCount count) {
count.total, count.canonicalCount, count.forkCount, count.orphanCount);
}

static void importTrieLog(
final BonsaiWorldStateKeyValueStorage rootWorldStateStorage, final Path trieLogFilePath) {

var trieLog = readTrieLogsAsRlpFromFile(trieLogFilePath.toString());

var updater = rootWorldStateStorage.updater();
trieLog.forEach((key, value) -> updater.getTrieLogStorageTransaction().put(key, value));
jframe marked this conversation as resolved.
Show resolved Hide resolved
updater.getTrieLogStorageTransaction().commit();
}

static void exportTrieLog(
final BonsaiWorldStateKeyValueStorage rootWorldStateStorage,
final List<Hash> trieLogHash,
final Path directoryPath)
throws IOException {
final String trieLogFile = directoryPath.toString();

saveTrieLogsAsRlpInFile(trieLogHash, rootWorldStateStorage, trieLogFile);
}

record TrieLogCount(int total, int canonicalCount, int forkCount, int orphanCount) {}
}
Expand Up @@ -19,16 +19,19 @@

import org.hyperledger.besu.cli.util.VersionProvider;
import org.hyperledger.besu.controller.BesuController;
import org.hyperledger.besu.datatypes.Hash;
import org.hyperledger.besu.ethereum.chain.MutableBlockchain;
import org.hyperledger.besu.ethereum.storage.StorageProvider;
import org.hyperledger.besu.ethereum.trie.bonsai.storage.BonsaiWorldStateKeyValueStorage;
import org.hyperledger.besu.ethereum.trie.bonsai.trielog.TrieLogPruner;
import org.hyperledger.besu.ethereum.worldstate.DataStorageConfiguration;
import org.hyperledger.besu.ethereum.worldstate.DataStorageFormat;

import java.io.IOException;
import java.io.PrintWriter;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;

import org.apache.logging.log4j.Level;
import org.apache.logging.log4j.core.config.Configurator;
Expand All @@ -43,7 +46,12 @@
description = "Manipulate trie logs",
mixinStandardHelpOptions = true,
versionProvider = VersionProvider.class,
subcommands = {TrieLogSubCommand.CountTrieLog.class, TrieLogSubCommand.PruneTrieLog.class})
subcommands = {
TrieLogSubCommand.CountTrieLog.class,
TrieLogSubCommand.PruneTrieLog.class,
TrieLogSubCommand.ExportTrieLog.class,
TrieLogSubCommand.ImportTrieLog.class
})
public class TrieLogSubCommand implements Runnable {

@SuppressWarnings("UnusedVariable")
Expand Down Expand Up @@ -123,6 +131,102 @@ public void run() {
}
}

@Command(
name = "export",
description = "This command exports the trie log of a determined block to a binary file",
mixinStandardHelpOptions = true,
versionProvider = VersionProvider.class)
static class ExportTrieLog implements Runnable {

@SuppressWarnings("unused")
@ParentCommand
private TrieLogSubCommand parentCommand;

@SuppressWarnings("unused")
@CommandLine.Spec
private CommandLine.Model.CommandSpec spec; // Picocli injects reference to command spec

@CommandLine.Option(
names = "--trie-log-block-hash",
description =
"Comma separated list of hashes from the blocks you want to export the trie logs of",
split = " {0,1}, {0,1}",
arity = "1..*")
private List<String> trieLogBlockHashList;

@CommandLine.Option(
names = "--trie-log-file-path",
description = "The file you want to export the trie logs to",
arity = "1..1")
private Path trieLogFilePath = null;

jframe marked this conversation as resolved.
Show resolved Hide resolved
@Override
public void run() {
if (trieLogFilePath == null) {
trieLogFilePath =
Paths.get(
TrieLogSubCommand.parentCommand
.parentCommand
.dataDir()
.resolve("trie-logs.bin")
.toAbsolutePath()
.toString());
}

TrieLogContext context = getTrieLogContext();

final List<Hash> listOfBlockHashes =
trieLogBlockHashList.stream().map(Hash::fromHexString).toList();

try {
TrieLogHelper.exportTrieLog(
context.rootWorldStateStorage(), listOfBlockHashes, trieLogFilePath);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
}

@Command(
name = "import",
description = "This command imports a trie log exported by another besu node",
mixinStandardHelpOptions = true,
versionProvider = VersionProvider.class)
static class ImportTrieLog implements Runnable {

@SuppressWarnings("unused")
@ParentCommand
private TrieLogSubCommand parentCommand;

@SuppressWarnings("unused")
@CommandLine.Spec
private CommandLine.Model.CommandSpec spec; // Picocli injects reference to command spec

@CommandLine.Option(
names = "--trie-log-file-path",
description = "The file you want to import the trie logs from",
arity = "1..1")
private Path trieLogFilePath = null;

@Override
public void run() {
if (trieLogFilePath == null) {
trieLogFilePath =
Paths.get(
TrieLogSubCommand.parentCommand
.parentCommand
.dataDir()
.resolve("trie-logs.bin")
.toAbsolutePath()
.toString());
}

TrieLogContext context = getTrieLogContext();

TrieLogHelper.importTrieLog(context.rootWorldStateStorage(), trieLogFilePath);
}
}

record TrieLogContext(
DataStorageConfiguration config,
BonsaiWorldStateKeyValueStorage rootWorldStateStorage,
Expand Down