New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cluster-test] Log rotate libra.log #1585
Conversation
testsuite/cluster-test/src/main.rs
Outdated
now.second() | ||
); | ||
let suffix = &suffix; | ||
info!("Fill use suffix {} for log rotation", suffix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typo in Fill
@bors-libra delegate+ |
✌️ @andll can now approve this pull request |
instance | ||
.run_cmd_tee_err(vec!["sudo", "rm", "-rf", "/data/libra/*db"]) | ||
.map_err(|e| info!("Failed to wipe {}: {:?}", instance, e)) | ||
.ok(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Correct me if I'm wrong, .ok()
here and below is a no-op right? If so, can we remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok()
here is to suppress linter warning - otherwise it complains about unused Result
This diff uses log file created in diem#1584 and rotates this log on every deploy Why do log rotation on deploy? Since we wipe db here, we pretty much start from "clean plate", so logs can be invalidated at this point
@bors-libra r=ankushagarwal |
📌 Commit f445b8b has been approved by |
☀️ Test successful - checks-circle_commit_workflow |
* Use GIT_REVISION env if already exist setup env for cluster test Closes: diem#1572 Approved by: andll * [cluster-test] Bump experiment deadline Looks like in rare cases reboot can take longer then 10min, this is rare, but in order to not fail experiment it makes sense to bump this deadline Closes: diem#1579 Approved by: dmitri-perelman * [x] skip running cargo if we have no packages to run Currently if an empty iterator is passed to `run_on_packages_together`, we'll happily run the cargo command with no `--package` args. This patch fixes this behavior and instead checks to see if we have any package args and does an early return if the provided iterator is empty. Closes: diem#1578 Approved by: metajack * [docker] Move docker CMD into docker-run.sh Instead of having complicated command in docker CMD, this diff moves setting up environment into `docker-run.sh` file and sets CMD to this shell file. Closes: diem#1580 Approved by: opsguy * [enhancement] Follow-up changes in libra_channel Summary Implement Drop trait for Sender and Receiver Keep track of when the receiver is dropped. The Sender will log crit! whenever it tries to send to a Receiver which has been dropped. When a Sender gets dropped, we will log a crit! message as well Related to diem#1483 Closes: diem#1490 Approved by: ankushagarwal * [enhancement] Refactor libra_channel and remove MessageQueue trait Closes: diem#1573 Approved by: ankushagarwal * [terraform] Allow to redirect output to a file This diff introduces log_to_file variable, when set to true log output will be redirected to /opt/libra/data/libra.log (in container) and /data/libra/libra.log (on host) Closes: diem#1584 Approved by: opsguy * [cluster-test] Log rotate libra.log This diff uses log file created in diem#1584 and rotates this log on every deploy Why do log rotation on deploy? Since we wipe db here, we pretty much start from "clean plate", so logs can be invalidated at this point Closes: diem#1585 Approved by: ankushagarwal * [executor] add transaction info hashes to ProcessedVMOutput in TransactionData In order for SafetyRules to verify that the parent blocks state output is extended by the childs output, we need the new leaves (transaction info hashes) of the child block's output to compute the new tree and root. This commit adds these values to the output received by consensus. Closes: diem#1537 Approved by: davidiw * [types] Add AccumulatorExtensionProof This proof is used to demonstrate how one root hash is an extension of a previous. The expectation is that the prover will give a tree (frozen subtrees + the number of leaves) and new leaves. The tree computes to a known hash and the new leaves, when appended, produce the new hash allowing the verified to independently compute that hash with a proof of correctness via (unfortunately) repeated work. Also added a version tag as there's a bit of copypasta logic on calculating versions. Having it in the tree itself helps ensure a consistent (and accessible) definition. Closes: diem#1537 Approved by: davidiw * [consensus] verify linear history of ledger in safety rules - VoteProposal replaces version and state_id with a proof of extension from the previous parent. This extension can be used to compute the proposal's (block's) version and root hash (state_id). - Expose a getter for the transaction_info_hashes from ExecutedBlock that returns only Some types - Simplified the output of VoteProposal to only print whatever the block prints - Added a new error to detect bad accumulator extensions - Add a safety rules test to verify breakage of linear history - Utilized smoke tests to verify event processor is correct (it breaks if you don't pass the right execution state into safety rules) Closes: diem#1537 Approved by: davidiw * [consensus][reconfig] dispatch network messages based on epoch Closes: diem#1574 Approved by: davidiw * [network] Add health-checker network interface + Some types (`Ping2`, `Pong2`) are temporarily suffixed with "2". We'll remove this once the refactor is complete. Closes: diem#1581 Approved by: phlip9 * [vm] fix oom discovered by fuzzer An oom was discovered by fuzzing the interface used to deserialize a VM Value given a VM StructDef. The oom is caused by the deserialization logic reading a u32 from the input stream, which is intended to be the length of a variable length vector, and attempting to pre-allocate a vector of that length. The fix to this oom is to: 1) Stop attempting to pre-allocate the entire vector and instead allocate entries as needed. 2) push the checks earlier which determine if this is even a valid `NativeStructType`. This works in this particular case because the inputs provided by the fuzzer are not guaranteed to be a valid (StructDef, bytes) pairs and by the time we try to deserialize each element (of the potentially very long vector) another deserialization error is encountered. Closes: diem#1586 Approved by: dariorussi * [vm-types] impl DeserializeSeed for StructDef and friends Instead of implementing ad-hoc deserialize_* methods for StructDef and all of its container types, implement DeserializeSeed for these types which is the more idiomatic and Serde supported way for doing stateful deserialization. Closes: diem#1586 Approved by: dariorussi * [lcs] introduce from_bytes_seed Introduce the `from_bytes_seed` method which enables stateful deserialization from a `&[u8]` based on some initial state `seed`. This is the more idiomatic and recommended way to perform stateful deserialization using serde. Now that there is a way to do stateful deserialization, remove the public export of the Deserializer type defined in LCS. Closes: diem#1586 Approved by: dariorussi * [reconfig] refactor ValidatorChangeEventWithProof We change the ValidatorChangeEventWithProof struct to prepare for usage, it carries a list of LedgerInfo that has validator set for corresponding epoch and we implement the function to verify it given a known validator set. Closes: diem#1589 Approved by: dmitri-perelman * [full-node] Fixing flaky test_full_node_basic_flow Closes: diem#1591 Approved by: phoenix-antigravity * [crypto] assert private keys not cloneable Closes: diem#1484 Approved by: huitseeker * [x] add bench command Closes: diem#1593 Approved by: metajack * [crypto] modularize union macros Closes: diem#1512 Approved by: kchalkias * [language] add get and set for Vector I was reading some code that used the struct MyStruct<T> pattern and was temporarily confused--why do we allow this? I remembered that there was a good answer and that a motivating use-case is in Vector: this lets us have a single Vector implementation with a strict API for resource types and a more permissive API for unrestricted types. However, the current Vector API doesn't have any procedures that allow a more permissive API for unrestricted types. This PR fixes that by adding two such procedures: get and set. These examples demonstrate why the pattern makes sense + makes vector nicer to use for unrestricted types. Closes: diem#1601 Approved by: davidiw * [consensus][reconfig] adopt ValidatorChangeEventWithProof in consensus Closes: diem#1597 Approved by: aching * [Docker] look up git rev in build container Building the libra-metrics crate needs to look up the git rev it was built with. Currently we pass in the git rev as docker build arg when building inside the container. Putting these args before the rust build cmd also causes layer cache miss for the build layer. We can try look it up inside the container. The docker setup is updated such that the .git directory is copied inside the container. The rust build process can then look it up. Closes: diem#1603 Approved by: bmwill * [admission-control] Refactor AC Closes: diem#1566 Approved by: sunmilee * [metrics] Convert RPC bytes and DirectSend bytes metrics to histogram Closes: diem#1595 Approved by: bothra90 * [grafana] Update network dashboard to use new histogram metrics Now uses `irate(libra_network_direct_send_bytes_sum)` to get total outbound throughput for DirectSend Protocol. Closes: diem#1595 Approved by: bothra90 * [StateSynchronizer] Revert the flow that would allow sync to an arbitrary target. Summary: We were trying to come up with a simplified solution that would allow both the validators and the full nodes operate in a similar fashion by trying to state sync to the highest possible ledger info. Unfortunately this flow significantly complicates the logic in consensus due to the new reconfiguration design because it opens a possibility for an EventProcessor to accidentally state sync to another epoch. The basic paradigm is for the EventProcessor to operate in one epoch only, hence complicating that logic is not worth the effort. In addition to that we realized that the new flow did not really solve the liveness attack of non-reachable state sync. Instead, we are planning to update storage to support idempotent commits, which would mean that in future state sync should be able to just "give up" and return without changing the visible storage state. Testing: existing unit test coverage Closes: diem#1600 Approved by: zekun000 * [crypto] remove secret-service secret-service was the precursor to the SafetyRules TCB. Later in development, we realized the need to know what we are signing and not delegate that to another process. Thus the model used by secret-service has been superseded by that used by SafetyRules. As such, secret-service has been deprecated and is being removed. Closes: diem#1582 Approved by: mimoo * [language] Include other block informations in `LibraSystem.BlockMetadata` Closes: diem#1489 Approved by: sblackshear * [network] Add benchmarks for yamux-over-yamux multiplexing Closes: diem#1592 Approved by: phlip9 * [move] Beginning of a limited Move source language Move source language is an ergonomic language for writing Modules and Scripts that compile to Move bytecode. Closes: diem#1506 Approved by: tzakian * [consensus][reconfig] impl EpochRetrieval flow to help nodes advance epoch Closes: diem#1608 Approved by: dmitri-perelman * [Consensus] Clean up the pending messages of the previous epoch Summary: When we start a new epoch there might be still pending messages in the LibraChannel between networking and SMR event loop. In this diff we introduce a capability of cleaning up the LibraChannel from all the messages and use it in NetworkReceivers. (As a side effect this also gives a way to GC old keys from LibraChannel). Testing: Added a unit test for the LibraChannel. Closes: diem#1612 Approved by: dmitri-perelman * [x] add lint command Introduce a `lint` command to x. `lint` is the begining of a general purpose linting engine for libra which we can use to write custom lints which may not be covered by clippy and which we may want to run on non-rust files. Closes: diem#1609 Approved by: davidiw * [lint] ensure all text files have a newline at EOF Closes: diem#1609 Approved by: davidiw * [lint] ensure all text files have no trailing whitespace Closes: diem#1609 Approved by: davidiw * [lint] ensure all *.{rs,sh} files have a license header Closes: diem#1609 Approved by: davidiw * [ci] run 'cargo x lint' Closes: diem#1609 Approved by: davidiw * [Docker][cluster_test] look up git rev in container Closes: diem#1619 Approved by: andll * [cluster-test] Debug info for mint failures - Print which instance mint running on - Allow to disable mint retry with NO_MINT_RETRY env var - Prefix all tx emitter thread logs with instance name to understand where to look at Closes: diem#1617 Approved by: dmitri-perelman * [network] Simplify the property of Discovery * Add self "PeerId" so that we can avoid deserializing it again and again * Prefix "note" with an underscore as the property is not used for now Closes: diem#1412 Approved by: phlip9 * [network] Refactor out common `NetworkEvents` implementations + There was a lot of code duplication from implementing many wrappers around a `channel::Receiver<NetworkNotification>` that would just deserialize inbound rpc and direct-send messages. + Initially, we only had `mempool` and `consensus` interfaces, so the code duplication was not a problem. Today, however, we have more and so the code duplication is getting to be a problem. + This change adds a `NetworkEvents<T: prost::Message>`, which is a `Stream` that reads messages from a `channel::Receiver<NetworkNotification>` and deserializes them into `T: prost::Message`. + Next, we can try to refactor out some common parts for the `NetworkSender` interface. Closes: diem#1599 Approved by: phlip9 * [mempool] Change log level of raw bytes debug->trace Raw bytes sent pollutes log quite a bit even in debug mode, and unlikely to be used Closes: diem#1623 Approved by: phoenix-antigravity * [language] remove CreateAccount bytecode Remove the bytecode CreateAccount in favor of a native function. This is mostly deleting code and a decent change in our proptest for e2e tests. We have now enabled gas on create account. That was problematic because create account has different gas cost depending on whether the event counter in the account that sends the transaction had been created or not. So we need to track that operation. Closes: diem#1594 Approved by: sblackshear * [network] Add common `NetworkSender` network interface. + All network applications will use a wrapped `NetworkSender<TMessage>` where `TMessage` is their own protobuf message format. + `NetworkSender` handles the "low-level" message-based interface to network, so the upper network applications only see a nice async-await API. + Adds a `send_to_many` method optimized for sending the same message to many peers. Consensus currently uses something like this for their proposal broadcast, so we will explicitly pull this into the network interface. + Adds `dial_peer` and `disconnect_peer` methods to support forthcoming work to pull out `gossip-discovery` and `connectivity-manager` into network application modules. Closes: diem#1621 Approved by: bothra90 * [Storage] Add proptest for JellyfishMerkleTree::get_with_proof The existing tests probably do not have enough coverage. I wrote some code and they did not catch a bug in it. Only the executor tests were able to catch it. Add some more tests here to help catch bugs earlier. Closes: diem#1610 Approved by: lightmark * [ci] pin rustc to beta-2019-10-03 to avoid using 1.40 beta Closes: diem#1632 Approved by: dmitri-perelman * [language] On-chain definition of gas schedule The gas schedule within the VM is defined on-chain, and is read in once per-block. Normal transactions initiatied from the association account can add, or update the costs within this gas schedule. Closes: diem#1406 Approved by: dariorussi * [mempool] commit old transactions on transaction insertion Fixes diem#1625 When node does state sync, transactions that are committed through this sync are not removed from mempool. This means that even though transactions are committed, client can not use this node to submit new transactions, until they expire, which can take significant amount of time. We do state sync quite often, so this reproduce routinely during cluster test - we had to add retry on transaction submission because of this problem Closes: diem#1633 Approved by: andll * [monitoring] change public metric whitelist to be a constant Closes: diem#1571 Approved by: bmwill * [cluster-test] Increase deploy startup timeout Closes: diem#1636 Approved by: ankushagarwal * [x] skip whitespace lints for .exp files In some future patch a number of .exp files are going to be added to the repository. These files are used for testing the move source language and include the expected output from the move compiler. Since these files are the project of a third-party library its difficult to completely control their format and ensure that there are no whitespace violations. Due to this, lets just skip whitespace lints for all .exp files. Closes: diem#1634 Approved by: tnowacki * [x] simplify extension checking in license lint Closes: diem#1634 Approved by: tnowacki * [language] Implement block metadata transaction logic Closes: diem#1611 Approved by: dariorussi * [language] Deprecate TransactionPayload::Program Closes: diem#1626 Approved by: dariorussi * [cluster-test] Remove kinesis log tail This was replaced with debug_interface_log_tail Closes: diem#1641 Approved by: ankushagarwal * [cluster-test] Remove log prune Log prune was previously used to cleanup cloudwatch logs Since we don't longer use cloud watch this is not needed Closes: diem#1641 Approved by: ankushagarwal * [network][consensus] ConsensusNetworkSender now wraps `NetworkSender` + Refactored `chained_bft::NetworkSender` so it uses `ConsensusNetworkSender::send_to_many` instead of the previous `send_bytes` method, which no longer needs to exist. + Added `ValidatorVerifier::get_account_addresses_iter()`. We should be able to remove `get_ordered_account_addresses()` in a subsequent commit, since it does an unnecessary sort (`BTreeMap` already sorts on insertion). Closes: diem#1643 Approved by: bothra90 * [Storage][JellyfishMerkle] Extract some util functions in tests So we don't repeat the code. Closes: diem#1648 Approved by: lightmark * [Storage][JellyfishMerkle] Fix get_with_proof for edge cases If there exist two keys that only differ from the last nibble, the code would have a problem. Closes: diem#1648 Approved by: lightmark * [cluster-test] Use sync log Instead of using async drain, using sync. Mainly two reasons: - Some parts of cluster test use println! for better UX, but when async drain is used output of println! and log! macro is mixed in a bad way - if program uses log! macro and terminates quickly, part of output can disappear because async thread did not process log There is no intense log output in cluster test so sync log is not an issue Closes: diem#1649 Approved by: ankushagarwal * [cluster-test] Update genesis.blob on deploy This will fetch genesis.blob generated by circle during deploy We need this because genesis.blob is updated relatively frequently and it breaks cluster test every time Fixes diem#1224 Closes: diem#1651 Approved by: ankushagarwal * LedgerInfo commit information aggregated in a BlockInfo struct Summary: The fields of LedgerInfo that describe the committed status of the Ledger are in fact identical to the fields of the block metadata that Consensus is carrying around (the only field of BlockInfo that is currently not present in LedgerInfo is a round). Any update to the LedgerInfo would have to be mirrored in BlockInfo because TCB needs to verify & sign it. Hence, this change is aggregating the LedgerInfo fields in the BlockInfo. We had to move BlockInfo from consensus types to libra types as a result of that. Testing: this is supposed to be a noop, existing unit test coverage Ref diem#1604 Closes: diem#1629 Approved by: zekun000 * [language][Move] Added test framework - Added test framework for Move lang expected output tests - Added tests to check all of the stdlib files Closes: diem#1624 Approved by: vgao1996 * [cluster-test] Introduce some convenience utils ``` ./cluster-test --discovery <print list of nodes> ./cluster-test --pssh -- echo Hello world <execute commands on all nodes> ``` Closes: diem#1647 Approved by: ankushagarwal * [network][mempool] `MempoolNetworkSender` now wraps `NetworkSender` Closes: diem#1644 Approved by: phlip9 * [network][state-sync] `StateSynchronizerSender` now wraps `NetworkSender` + Rename `STATE_SYNCHRONIZER_MSG_PROTOCOL` to `STATE_SYNCHRONIZER_DIRECT_SEND_PROTOCOL` so it's consistent with other network application modules. Closes: diem#1644 Approved by: phlip9 * [network][ac] `AdmissionControlNetworkSender` now wraps `NetworkSender` Closes: diem#1644 Approved by: phlip9 * [network] `HealthCheckerNetworkSender` now wraps `NetworkSender` Closes: diem#1644 Approved by: phlip9 * [cluster-test] Use sudo when updating genesis.blob on deploy Closes: diem#1654 Approved by: ankushagarwal * [cluster-test] Reload faucet account on every tx emit job (1) Faucet account data can get stale and needs to be reloaded between jobs (2) Cluster might not be healthy on cluster test startup and it might not be possible to load faucet account at that time Closes: diem#1655 Approved by: ankushagarwal * [rust] migrate to stable toolchain Noew that 1.39.0 has stabilized, remove the rust-toolchain file and configure ci to use the stable toolchain. Closes: diem#1656 Approved by: metajack * [consensus][reconfig] help peers who sent old epoch messages As part of reconfiguration, honest nodes in new epoch need to help others still in old epoch. Otherwise we have a risk that one honest node join the new epoch and stop consensus, and remaining 2f are not able to make any progress. Closes: diem#1622 Approved by: dmitri-perelman * [CI] add conditional docker build to commit verify When we make changes to these docker files, we want to verify them in the commit work flow. Or we end up breaking nightly. This is a follow-up to 243f535. Here we added a conditional docker build job to the commit verify work flow. The docker build job will kick off iff the PR contains change in any file matches `*.Dockerfile`. Once triggered, the work flow will build each of the updated docker files. Closes: diem#1542 Approved by: huitseeker * [admission-control] Add service test into lib Closes: diem#1640 Approved by: phlip9 * add a rust-toolchain file back in This has many advantages: 1. It pushes our builds towards hermeticity, which having worked on devtools for many years I've come to believe is *a priori* good. 2. It would be possible to make CI hermetic while making dev workflows not so much, but I believe that making dev builds as close to CI as possible is *a priori* good. 3. It completely obviates the need for manually requesting that developers upgrade Rust versions for new features -- that can simply be managed through tooling, as it should be. Closes: diem#1662 Approved by: bmwill * [cluster-test] Log command used to fetch genesis.blob Closes: diem#1665 Approved by: dmitri-perelman * [easy] More idiomatic Option/Result patterns Closes: diem#1639 Approved by: zekun000 * [terraform] Increase monitoring instance disk volume rename the variable Closes: diem#1666 Approved by: ankushagarwal * [Proof] Introduce SparseMerkleRangeProof This proof intends to prove that a range of things exist in a sparse Merkle tree. Given that when restoring the state tree, we always go from left to right, so at any point in time a list of siblings on the right is sufficient to prove the everything on the left. The verification is a bit complex... The basic idea is that when we have the full list of leaves for a sparse Merkle tree, we can compute the common prefix length of each adjacent key pairs and find out which pair consists of the left child and right child of the same parent. Then we can compute their parent and reduce the problem. Note that we are just doing this in unit tests to test the `get_range_proof` method, the real verification will be a little bit different. Closes: diem#1577 Approved by: msmouse * [Crypto] Implement from_bit_iter for HashValue So we can easily transform a vector of boolean to a HashValue. Closes: diem#1577 Approved by: msmouse * Remove warning to cut types dependency on slog Closes: diem#1614 Approved by: bmwill * First version of the new borrow checker based on the abstraction of an acyclic labeled borrow graph. Closes: diem#1598 Approved by: tnowacki * [cluster-test] Add experiment to simulate multi-region environment and report result Summary This is the code which is used to run experiments with introducing network delays between nodes and simulation of multi-region With this we can specify a list of split sizes and delays and we will run the simulation for every combination of these two parameters Update NetworkDelay to be an Effect instead of Action Update flags for multi-region simulation because now we can run simulations with a variety of split sizes and delays instead of just one Complete rewrite of multi_region_network_simulation.rs to handle running simulations with a variety of configs Print a list of all metrics in a csv format at the end of the experiment Test Plan Tested this on my cluster Closes: diem#1652 Approved by: andll * [executor] idempotent commits Closes: diem#1627 Approved by: dmitri-perelman * [executor] refactor executor with synced_trees Closes: diem#1627 Approved by: dmitri-perelman * Make MIRAI happy Closes: diem#1664 Approved by: huitseeker * [consensus][restart] simplify the recovery flow given idempotent commits support We're able to greatly simplify the recovery process during restart thanks to the idempotent commit support. We could directly rely on the latest ledger info storages returns us and it's now guaranteed to exist in consensusdb due to the state sync failure handling, diem#1590 Also we don't need to continue sync upon restart. A side-effect of this pr is we now generate genesis virtually and never persist it into consensusdb. Closes: diem#1616 Approved by: dmitri-perelman * [consensus][reconfig] extend reconfiguration test for a few epochs Closes: diem#1616 Approved by: dmitri-perelman * [fuzzing] adding merkle tree proto fuzzing MOTIVATION: Adding two fuzzers. When receiving sparse and non-sparse merkle tree proofs, there is some involved proto decoding code that we can fuzz. Closes: diem#1232 Approved by: metajack * [storage] fix typo Closes: diem#1683 Approved by: wqfish * Make MIRAI happy Closes: diem#1678 Approved by: lightmark * [storage] EpochByVersionSchema Closes: diem#1635 Approved by: msmouse * [storage] `LedgerStore::get_epoch()` Closes: diem#1635 Approved by: msmouse * Add initial debugging instrumentation to tree_heap Closes: diem#1684 Approved by: cbarrettfb * [network] Add benchmark for transport with TCP_NODELAY set Closes: diem#1653 Approved by: ankushagarwal * [language] Remove unused ParseError::UnrecognizedToken The unused UnrecognizedToken error adds a type parameter for Token that propagates all over the place. Since this is not even used, remove it along with all those type parameters. We should definitely add more detailed error messages (probably only in the new move-lang compiler, not in the IR compiler) but whatever we do should not require the internal token types, since the diagnostics should describe issues in terms that are directly visible to end users. Closes: diem#1685 Approved by: tnowacki * [language] Clean up some remaining references to lalrpop for the IR compiler Closes: diem#1685 Approved by: tnowacki * [language][Move] Added expansion tests - Added tests for expansion pass. - Tried to cover +/- cases for each check Closes: diem#1660 Approved by: tzakian * Make MIRAI happy Closes: diem#1681 Approved by: huitseeker * make clippy happy * [consensus] removing the `terminate` in chained_bft_smr loop The consensus loop/select has a terminate that should not be reachable as the loop should run forever. This commit removes it. Closes: diem#1675 Approved by: zekun000 * [cluster-test] Update timeouts, print intermediate results Closes: diem#1689 Approved by: andll * [language] Refactor lexer to add a lookahead API There are some cases where the parser really needs to look ahead at the next token before deciding how to parse the current token. It was hard to support that with the original lexer that I hacked together from the lalrpop output, but now that the lexer is sane, it is not so hard. Add a new lookahead API and use it in the parser, replacing the current workarounds. Closes: diem#1688 Approved by: tnowacki * [easy][move] Sort errors by first Loc when displaying - Sort errors by the initial Loc, makes reading errors a lot easier in big lists Closes: diem#1693 Approved by: vgao1996 * [cluster-test] Test if file exists before log rotate When cluster test fails to setup cluster(for example, failed to cp genesis.blob), then there is no log file on host - previous was log rotate, and new one was not created. In this case attempt to log rotate it on next run produces a lot of noise. This checks if file exists before attempting to log rotate it Closes: diem#1692 Approved by: ankushagarwal * [vm] Clean up the code cache api. Closes: diem#1668 Approved by: dariorussi * [cluster-test] Bump liveness health check timeout 1m->2m We have intermittent failures with 1m timeout, this diff will double it Closes: diem#1697 Approved by: ankushagarwal * [CI] upgrade stretch to buster In CircleCI setup, bump rust:stretch to rust:buster for builders. This is to keep it consistent with Docker build. Closes: diem#1691 Approved by: bmwill * [consensus] enforce epoch consistency of messages We enforce every messages contain the information about the same epoch. (Fix a few missing verification too) Closes: diem#1694 Approved by: aching * [config] add a new config for safety rules Add a new config because this needs to be entirely managed (owned by Safety Rules) so that it can mutate it during run time. It is worth noting that currently ConsensusConfig is aware of this config, in the future, this would only be the case for testing. Safety Rules binary should load this file directly, whereas Safety Rules library would receive this as part of validator starting. Closes: diem#1615 Approved by: davidiw * [consensus] Add persistent storage for safety rules This introduces a persistent storage interface and two implementations for SafetyRules: - InMemory for (integration) testing purposes - OnDisk for ("production") testing purposes Eventually this same API should be able to be used by various Secrets Managers. Closes: diem#1615 Approved by: davidiw * [consensus] Move toward PersistentStorage interface for SafetyRules ConsensusState isn't really a store, SafetyRules needs a store. So this replaces the code to leverage a config backed storage unit and all the code mods that go with it. Note: because of the fact that consensus is multithreaded, PersistentStore must be both Send + Sync This commit also refactors the code within safety rules to take on a more library style approach as there are more features now within the code. Closes: diem#1615 Approved by: davidiw * [consensus] eliminate consensus state from consensus db SafetyRules has its own persistent storage, let's leverage it. Most of this code is just deleting the consensus state from consensus db The rest is setting up the appropriate tests so that the code leverages the new means for starting safety rules with a persistent backend This diff also solves some other issues that somehow overlapped with this work: - Epoch changes are somewhat handled in SafetyRules::update - Epochs begin at 1 Closes: diem#1615 Approved by: davidiw * [state sync] reduce error verbosity currently state sync prints full stack trace whenever routine error occurs it polluts log. This diff reduces verbosity Closes: diem#1707 Approved by: andll * [cluster-test] Emit transactions when running network delay simulations Summary This PR updates the multi_region_network_simulation experiment such that transactions are generated in the background. Without that, we would be simply processing empty blocks in the experiment. Test Plan Ran against my cluster. Closes: diem#1708 Approved by: andll * [consensus] Update all consensus counters to new format Summary Stop using OP_COUNTERS Create a separate metric for each counter prefixed by "libra_consensus_" Use promethus macros directly for creating counters Update all usages of consensus counters to new names fixup! [consensus] Update all consensus counters to new format Closes: diem#1663 Approved by: sherry-x * [MIRAI] Add annotations to accumulator crate Closes: diem#1676 Approved by: wqfish * [CI] docker build should check file exist Updated the docker builder to check if the docker file exists before triggering the build. Closes: diem#1712 Approved by: andll * [language][move] Added naming tests - Added tests for naming pass. - Tried to cover +/- cases for each check Closes: diem#1699 Approved by: tzakian * [Proof] Remove redundant assertion in `new` We should not have the assertion here. Clients would call `new` when constructing the proofs from protobuf objects, but the objects returned by servers are not guaranteed to be correct. So a bad server could cause the clients to crash. Since we will verify the proof later, it's probably redundant to do this particular check in `new`. Closes: diem#1715 Approved by: delegate * [network] Set TCP_NODELAY for tcp connections Closes: diem#1686 Approved by: ankushagarwal * [transaction-builder] key rotation builder takes bytes instead of address This builder API was misleading. It took an AccountAddress as input, but the rotate authentication key script expects bytes encoding the hash of a public key. Closes: diem#1711 Approved by: sblackshear * [consensus] push RPC processing into event processor to make network thread non-blocking Closes: diem#1714 Approved by: andll * [consensus] multi proposer election with SHA-3 **1. Multi proposer election** currently uses the siphash PRF, this is not super uber cool because the rest of the codebase uses SHA-3 and this adds a cryptographic dependency that we don't really need. So I changed it to SHA-3. **2. there's a bias in the modular reduction** of the operation that computes a round's proposers. It is not worth being fixed as the bias is negligible in practice, I've added a comment to indicate this. Closes: diem#1709 Approved by: dmitri-perelman * [MIRAI] Add annotations to libra-wallet crate Closes: diem#1679 Approved by: huitseeker * [crypto] Derive CryptoHasher Derive a CryptoHasher which ensures a distinct hasher string for every derivation. Add a few lines of documentation for the CryptoHasher derivation. [crypto] Add inline doc for CryptoHasher derive macro Closes: diem#1511 Approved by: davidiw * [crypto] Remove superseded CryptoHasher implementations - in the process, eliminate the reuse of the Sparse Merkle Tree node hashers (!), - clean up some of the duplication in language, storage, .. Closes: diem#1511 Approved by: davidiw * [crypto] Update genesis id after hashing change Closes: diem#1511 Approved by: davidiw * [language] Update genesis blob after hashing change Closes: diem#1511 Approved by: davidiw * [consensus] update safety rules documentation and add TODOs The documentation had fallen out of date as the code moved from within consensus, to its own file, to its own crate. In addition, there were race conditions due to concurrent work happening in the code base. Listing TODOs to better help track these. So ideally if a PR were to hit on a TODO it would either be addressed or updated to reflect the new world order. Closes: diem#1713 Approved by: zekun000 * [easy] Small rewrites to ?/if Recent versions of clippy will actually complain in the safety_rules case! Remove match_bool from x.toml Closes: diem#1717 Approved by: aching * [cluster-test] Fix metrics for multi-region-simulation Closes: diem#1719 Approved by: andll * [MIRAI] Add annotations to libra-mempool crate Closes: diem#1682 Approved by: tzakian * [state sync] use synced_state as known version Closes: diem#1721 Approved by: dmitri-perelman * Update genesis.blob Used build_all.sh to regenerate correct version Closes: diem#1725 Approved by: huitseeker * [Storage][JellyfishMerkle] Reorder a few code blocks I always find it hard to look for code in this file. Putting the `impl` blocks closer to the struct definition so things are structured more consistently. Closes: diem#1726 Approved by: lightmark * [Storage][JellyfishMerkle] Use Self when possible Closes: diem#1726 Approved by: lightmark * [CI] check genesis blob in commit verify Check the genensis blob consistenchy as part of commit verify. Closes: diem#1727 Approved by: andll * [cluster-test] Make dockerfile for cluster test This diff introduces single dockerfile for cluster test in docker/cluster-test/cluster-test.Dockerfile Since we want to support incremental build, this docker file is generated via generate.sh script Fixes diem#1669 Closes: diem#1705 Approved by: sausagee * [cost-synthesis] Update creation of identifiers and fix bitrot Previously we could create invalid identifiers which would cause the generation to crash. We now make sure that all strings and identifiers conform to the identifiere specification in types. Also generated modules were previously not representng empty type parameter info previously. This fixes this as well. Closes: diem#1731 Approved by: shazqadeer * update Cargo.log * fix compiler error. * fix gas_schedule and re generate genesis. * merge pow. * commit_info for LedgerInfo. * add back some changes on libra * fix star node bug. * fix mint block bug.
This diff uses log file created in #1584 and rotates this log on every deploy
Why do log rotation on deploy? Since we wipe db here, we pretty much start from "clean plate", so logs can be invalidated at this point
Depends on #1584, please do not use
r+
, usedelegate+
instead