Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cluster-test] Log rotate libra.log #1585

Closed
wants to merge 1 commit into from
Closed

Conversation

andll
Copy link

@andll andll commented Oct 31, 2019

This diff uses log file created in #1584 and rotates this log on every deploy
Why do log rotation on deploy? Since we wipe db here, we pretty much start from "clean plate", so logs can be invalidated at this point

Depends on #1584, please do not use r+, use delegate+ instead

now.second()
);
let suffix = &suffix;
info!("Fill use suffix {} for log rotation", suffix);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo in Fill

@ankushagarwal
Copy link
Contributor

@bors-libra delegate+

@bors-libra
Copy link
Contributor

✌️ @andll can now approve this pull request

instance
.run_cmd_tee_err(vec!["sudo", "rm", "-rf", "/data/libra/*db"])
.map_err(|e| info!("Failed to wipe {}: {:?}", instance, e))
.ok();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Correct me if I'm wrong, .ok() here and below is a no-op right? If so, can we remove it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok() here is to suppress linter warning - otherwise it complains about unused Result

This diff uses log file created in diem#1584 and rotates this log on every deploy
Why do log rotation on deploy? Since we wipe db here, we pretty much start from "clean plate", so logs can be invalidated at this point
@andll
Copy link
Author

andll commented Oct 31, 2019

@bors-libra r=ankushagarwal

@bors-libra
Copy link
Contributor

📌 Commit f445b8b has been approved by ankushagarwal

@bors-libra
Copy link
Contributor

⌛ Testing commit f445b8b with merge e383a0b...

@bors-libra
Copy link
Contributor

☀️ Test successful - checks-circle_commit_workflow
Approved by: ankushagarwal
Pushing e383a0b to master...

tiangong3624749 added a commit to starcoinorg/diem that referenced this pull request Nov 14, 2019
* Use GIT_REVISION env if already exist
setup env for cluster test

Closes: diem#1572
Approved by: andll

* [cluster-test] Bump experiment deadline

Looks like in rare cases reboot can take longer then 10min, this is rare, but in order to not fail experiment it makes sense to bump this deadline

Closes: diem#1579
Approved by: dmitri-perelman

* [x] skip running cargo if we have no packages to run

Currently if an empty iterator is passed to `run_on_packages_together`,
we'll happily run the cargo command with no `--package` args. This patch
fixes this behavior and instead checks to see if we have any package
args and does an early return if the provided iterator is empty.

Closes: diem#1578
Approved by: metajack

* [docker] Move docker CMD into docker-run.sh

Instead of having complicated command in docker CMD, this diff moves setting up environment into `docker-run.sh` file and sets CMD to this shell file.

Closes: diem#1580
Approved by: opsguy

* [enhancement] Follow-up changes in libra_channel

Summary
Implement Drop trait for Sender and Receiver
Keep track of when the receiver is dropped. The Sender will log crit! whenever it tries to send to a Receiver which has been dropped.
When a Sender gets dropped, we will log a crit! message as well
Related to diem#1483

Closes: diem#1490
Approved by: ankushagarwal

* [enhancement] Refactor libra_channel and remove MessageQueue trait

Closes: diem#1573
Approved by: ankushagarwal

* [terraform] Allow to redirect output to a file

This diff introduces log_to_file variable, when set to true log output will be redirected to  /opt/libra/data/libra.log (in container) and /data/libra/libra.log (on host)

Closes: diem#1584
Approved by: opsguy

* [cluster-test] Log rotate libra.log

This diff uses log file created in diem#1584 and rotates this log on every deploy
Why do log rotation on deploy? Since we wipe db here, we pretty much start from "clean plate", so logs can be invalidated at this point

Closes: diem#1585
Approved by: ankushagarwal

* [executor] add transaction info hashes to ProcessedVMOutput in
TransactionData
In order for SafetyRules to verify that the parent blocks state output
is extended by the childs output, we need the new leaves (transaction
info hashes) of the child block's output to compute the new tree and
root. This commit adds these values to the output received by consensus.

Closes: diem#1537
Approved by: davidiw

* [types] Add AccumulatorExtensionProof
This proof is used to demonstrate how one root hash is an extension of a
previous. The expectation is that the prover will give a tree (frozen
subtrees + the number of leaves) and new leaves. The tree computes to a
known hash and the new leaves, when appended, produce the new hash
allowing the verified to independently compute that hash with a proof of
correctness via (unfortunately) repeated work.

Also added a version tag as there's a bit of copypasta logic on
calculating versions. Having it in the tree itself helps ensure a
consistent (and accessible) definition.

Closes: diem#1537
Approved by: davidiw

* [consensus] verify linear history of ledger in safety rules
- VoteProposal replaces version and state_id with a proof of extension
from the previous parent. This extension can be used to compute the
proposal's (block's) version and root hash (state_id).
- Expose a getter for the transaction_info_hashes from ExecutedBlock
that returns only Some types
- Simplified the output of VoteProposal to only print whatever the block
prints
- Added a new error to detect bad accumulator extensions
- Add a safety rules test to verify breakage of linear history
- Utilized smoke tests to verify event processor is correct (it
breaks if you don't pass the right execution state into safety rules)

Closes: diem#1537
Approved by: davidiw

* [consensus][reconfig] dispatch network messages based on epoch

Closes: diem#1574
Approved by: davidiw

* [network] Add health-checker network interface

+ Some types (`Ping2`, `Pong2`) are temporarily suffixed with "2". We'll
remove this once the refactor is complete.

Closes: diem#1581
Approved by: phlip9

* [vm] fix oom discovered by fuzzer

An oom was discovered by fuzzing the interface used to deserialize a VM
Value given a VM StructDef. The oom is caused by the deserialization
logic reading a u32 from the input stream, which is intended to be the
length of a variable length vector, and attempting to pre-allocate a
vector of that length.

The fix to this oom is to:
    1) Stop attempting to pre-allocate the entire vector and instead
       allocate entries as needed.
    2) push the checks earlier which determine if this is even a valid
       `NativeStructType`.

This works in this particular case because the inputs provided by the
fuzzer are not guaranteed to be a valid (StructDef, bytes) pairs and by
the time we try to deserialize each element (of the potentially very
long vector) another deserialization error is encountered.

Closes: diem#1586
Approved by: dariorussi

* [vm-types] impl DeserializeSeed for StructDef and friends

Instead of implementing ad-hoc deserialize_* methods for StructDef and
all of its container types, implement DeserializeSeed for these types
which is the more idiomatic and Serde supported way for doing stateful
deserialization.

Closes: diem#1586
Approved by: dariorussi

* [lcs] introduce from_bytes_seed

Introduce the `from_bytes_seed` method which enables stateful
deserialization from a `&[u8]` based on some initial state `seed`. This
is the more idiomatic and recommended way to perform stateful
deserialization using serde.

Now that there is a way to do stateful deserialization, remove the
public export of the Deserializer type defined in LCS.

Closes: diem#1586
Approved by: dariorussi

* [reconfig] refactor ValidatorChangeEventWithProof

We change the ValidatorChangeEventWithProof struct to prepare for usage,
it carries a list of LedgerInfo that has validator set for corresponding
epoch and we implement the function to verify it given a known validator
 set.

Closes: diem#1589
Approved by: dmitri-perelman

* [full-node] Fixing flaky test_full_node_basic_flow

Closes: diem#1591
Approved by: phoenix-antigravity

* [crypto] assert private keys not cloneable

Closes: diem#1484
Approved by: huitseeker

* [x] add bench command

Closes: diem#1593
Approved by: metajack

* [crypto] modularize union macros

Closes: diem#1512
Approved by: kchalkias

* [language] add get and set for Vector

I was reading some code that used the struct MyStruct<T> pattern and was temporarily confused--why do we allow this? I remembered that there was a good answer and that a motivating use-case is in Vector: this lets us have a single Vector implementation with a strict API for resource types and a more permissive API for unrestricted types.

However, the current Vector API doesn't have any procedures that allow a more permissive API for unrestricted types. This PR fixes that by adding two such procedures: get and set. These examples demonstrate why the pattern makes sense + makes vector nicer to use for unrestricted types.

Closes: diem#1601
Approved by: davidiw

* [consensus][reconfig] adopt ValidatorChangeEventWithProof in consensus

Closes: diem#1597
Approved by: aching

* [Docker] look up git rev in build container

Building the libra-metrics crate needs to look up the git rev it was
built with.  Currently we pass in the git rev as docker build arg
when building inside the container.  Putting these args before the
rust build cmd also causes layer cache miss for the build layer.  We
can try look it up inside the container.

The docker setup is updated such that the .git directory is copied
inside the container. The rust build process can then look it up.

Closes: diem#1603
Approved by: bmwill

* [admission-control] Refactor AC

Closes: diem#1566
Approved by: sunmilee

* [metrics] Convert RPC bytes and DirectSend bytes metrics to histogram

Closes: diem#1595
Approved by: bothra90

* [grafana] Update network dashboard to use new histogram metrics

Now uses `irate(libra_network_direct_send_bytes_sum)` to get total
outbound throughput for DirectSend Protocol.

Closes: diem#1595
Approved by: bothra90

* [StateSynchronizer] Revert the flow that would allow sync to an arbitrary target.

Summary:
We were trying to come up with a simplified solution that would allow both the validators and the full nodes
operate in a similar fashion by trying to state sync to the highest possible ledger info.

Unfortunately this flow significantly complicates the logic in consensus due to the new reconfiguration design
because it opens a possibility for an EventProcessor to accidentally state sync to another epoch. The basic
paradigm is for the EventProcessor to operate in one epoch only, hence complicating that logic is not worth the
effort.

In addition to that we realized that the new flow did not really solve the liveness attack of non-reachable
state sync. Instead, we are planning to update storage to support idempotent commits, which would mean that
in future state sync should be able to just "give up" and return without changing the visible storage state.

Testing: existing unit test coverage

Closes: diem#1600
Approved by: zekun000

* [crypto] remove secret-service
secret-service was the precursor to the SafetyRules TCB. Later in
development, we realized the need to know what we are signing and not
delegate that to another process. Thus the model used by secret-service
has been superseded by that used by SafetyRules. As such, secret-service
has been deprecated and is being removed.

Closes: diem#1582
Approved by: mimoo

* [language] Include other block informations in `LibraSystem.BlockMetadata`

Closes: diem#1489
Approved by: sblackshear

* [network] Add benchmarks for yamux-over-yamux multiplexing

Closes: diem#1592
Approved by: phlip9

* [move] Beginning of a limited Move source language

Move source language is an ergonomic language for writing Modules and Scripts that compile to Move bytecode.

Closes: diem#1506
Approved by: tzakian

* [consensus][reconfig] impl EpochRetrieval flow to help nodes advance epoch

Closes: diem#1608
Approved by: dmitri-perelman

* [Consensus] Clean up the pending messages of the previous epoch

Summary:
When we start a new epoch there might be still pending messages in the LibraChannel between
networking and SMR event loop.
In this diff we introduce a capability of cleaning up the LibraChannel from all the messages and
use it in NetworkReceivers. (As a side effect this also gives a way to GC old keys from LibraChannel).

Testing:
Added a unit test for the LibraChannel.

Closes: diem#1612
Approved by: dmitri-perelman

* [x] add lint command

Introduce a `lint` command to x. `lint` is the begining of a general
purpose linting engine for libra which we can use to write custom lints
which may not be covered by clippy and which we may want to run on
non-rust files.

Closes: diem#1609
Approved by: davidiw

* [lint] ensure all text files have a newline at EOF

Closes: diem#1609
Approved by: davidiw

* [lint] ensure all text files have no trailing whitespace

Closes: diem#1609
Approved by: davidiw

* [lint] ensure all *.{rs,sh} files have a license header

Closes: diem#1609
Approved by: davidiw

* [ci] run 'cargo x lint'

Closes: diem#1609
Approved by: davidiw

* [Docker][cluster_test] look up git rev in  container

Closes: diem#1619
Approved by: andll

* [cluster-test] Debug info for mint failures

- Print which instance mint running on
- Allow to disable mint retry with NO_MINT_RETRY env var
- Prefix all tx emitter thread logs with instance name to understand where to look at

Closes: diem#1617
Approved by: dmitri-perelman

* [network] Simplify the property of Discovery

* Add self "PeerId" so that we can avoid deserializing it again and again
* Prefix "note" with an underscore as the property is not used for now

Closes: diem#1412
Approved by: phlip9

* [network] Refactor out common `NetworkEvents` implementations

+ There was a lot of code duplication from implementing many wrappers
around a `channel::Receiver<NetworkNotification>` that would just
deserialize inbound rpc and direct-send messages.

+ Initially, we only had `mempool` and `consensus` interfaces, so the
code duplication was not a problem. Today, however, we have more and so
the code duplication is getting to be a problem.

+ This change adds a `NetworkEvents<T: prost::Message>`, which is a
`Stream` that reads messages from a `channel::Receiver<NetworkNotification>`
and deserializes them into `T: prost::Message`.

+ Next, we can try to refactor out some common parts for the
`NetworkSender` interface.

Closes: diem#1599
Approved by: phlip9

* [mempool] Change log level of raw bytes debug->trace

Raw bytes sent pollutes log quite a bit even in debug mode, and unlikely to be used

Closes: diem#1623
Approved by: phoenix-antigravity

* [language] remove CreateAccount bytecode

Remove the bytecode CreateAccount in favor of a native function.
This is mostly deleting code and a decent change in our proptest for e2e tests.
We have now enabled gas on create account. That was problematic because create account
has different gas cost depending on whether the event counter in the account that
sends the transaction had been created or not. So we need to track that operation.

Closes: diem#1594
Approved by: sblackshear

* [network] Add common `NetworkSender` network interface.

+ All network applications will use a wrapped `NetworkSender<TMessage>`
where `TMessage` is their own protobuf message format.

+ `NetworkSender` handles the "low-level" message-based interface to
network, so the upper network applications only see a nice async-await
API.

+ Adds a `send_to_many` method optimized for sending the same message to
many peers. Consensus currently uses something like this for their
proposal broadcast, so we will explicitly pull this into the network
interface.

+ Adds `dial_peer` and `disconnect_peer` methods to support forthcoming
work to pull out `gossip-discovery` and `connectivity-manager` into
network application modules.

Closes: diem#1621
Approved by: bothra90

* [Storage] Add proptest for JellyfishMerkleTree::get_with_proof

The existing tests probably do not have enough coverage. I wrote some
code and they did not catch a bug in it. Only the executor tests were
able to catch it.

Add some more tests here to help catch bugs earlier.

Closes: diem#1610
Approved by: lightmark

* [ci] pin rustc to beta-2019-10-03 to avoid using 1.40 beta

Closes: diem#1632
Approved by: dmitri-perelman

* [language] On-chain definition of gas schedule

The gas schedule within the VM is defined on-chain, and is read in once
per-block. Normal transactions initiatied from the association account
can add, or update the costs within this gas schedule.

Closes: diem#1406
Approved by: dariorussi

* [mempool] commit old transactions on transaction insertion

Fixes diem#1625
When node does state sync, transactions that are committed through this sync are not removed from mempool.
This means that even though transactions are committed, client can not use this node to submit new transactions, until they expire, which can take significant amount of time.

We do state sync quite often, so this reproduce routinely during cluster test - we had to add retry on transaction submission because of this problem

Closes: diem#1633
Approved by: andll

* [monitoring] change public metric whitelist to be a constant

Closes: diem#1571
Approved by: bmwill

* [cluster-test] Increase deploy startup timeout

Closes: diem#1636
Approved by: ankushagarwal

* [x] skip whitespace lints for .exp files

In some future patch a number of .exp files are going to be added to the
repository. These files are used for testing the move source language
and include the expected output from the move compiler. Since these
files are the project of a third-party library its difficult to
completely control their format and ensure that there are no whitespace
violations. Due to this, lets just skip whitespace lints for all .exp
files.

Closes: diem#1634
Approved by: tnowacki

* [x] simplify extension checking in license lint

Closes: diem#1634
Approved by: tnowacki

* [language] Implement block metadata transaction logic

Closes: diem#1611
Approved by: dariorussi

* [language] Deprecate TransactionPayload::Program

Closes: diem#1626
Approved by: dariorussi

* [cluster-test] Remove kinesis log tail

This was replaced with debug_interface_log_tail

Closes: diem#1641
Approved by: ankushagarwal

* [cluster-test] Remove log prune

Log prune was previously used to cleanup cloudwatch logs
Since we don't longer use cloud watch this is not needed

Closes: diem#1641
Approved by: ankushagarwal

* [network][consensus] ConsensusNetworkSender now wraps `NetworkSender`

+ Refactored `chained_bft::NetworkSender` so it uses
`ConsensusNetworkSender::send_to_many` instead of the previous
`send_bytes` method, which no longer needs to exist.

+ Added `ValidatorVerifier::get_account_addresses_iter()`. We should be
able to remove `get_ordered_account_addresses()` in a subsequent commit,
since it does an unnecessary sort (`BTreeMap` already sorts on insertion).

Closes: diem#1643
Approved by: bothra90

* [Storage][JellyfishMerkle] Extract some util functions in tests

So we don't repeat the code.

Closes: diem#1648
Approved by: lightmark

* [Storage][JellyfishMerkle] Fix get_with_proof for edge cases

If there exist two keys that only differ from the last nibble, the code
would have a problem.

Closes: diem#1648
Approved by: lightmark

* [cluster-test] Use sync log

Instead of using async drain, using sync.
Mainly two reasons:

- Some parts of cluster test use println! for better UX, but when async drain is used output of println! and log! macro is mixed in a bad way
- if program uses log! macro and terminates quickly, part of output can disappear because async thread did not process log

There is no intense log output in cluster test so sync log is not an issue

Closes: diem#1649
Approved by: ankushagarwal

* [cluster-test] Update genesis.blob on deploy

This will fetch genesis.blob generated by circle during deploy
We need this because genesis.blob is updated relatively frequently and it breaks cluster test every time

Fixes diem#1224

Closes: diem#1651
Approved by: ankushagarwal

* LedgerInfo commit information aggregated in a BlockInfo struct

Summary:
The fields of LedgerInfo that describe the committed status of the Ledger are in fact
identical to the fields of the block metadata that Consensus is carrying around (the only
field of BlockInfo that is currently not present in LedgerInfo is a round).
Any update to the LedgerInfo would have to be mirrored in BlockInfo because TCB needs to verify & sign it.
Hence, this change is aggregating the LedgerInfo fields in the BlockInfo.
We had to move BlockInfo from consensus types to libra types as a result of that.

Testing: this is supposed to be a noop, existing unit test coverage

Ref diem#1604

Closes: diem#1629
Approved by: zekun000

* [language][Move] Added test framework

- Added test framework for Move lang expected output tests
- Added tests to check all of the stdlib files

Closes: diem#1624
Approved by: vgao1996

* [cluster-test] Introduce some convenience utils

```
./cluster-test --discovery
<print list of nodes>
./cluster-test --pssh -- echo Hello world
<execute commands on all nodes>
```

Closes: diem#1647
Approved by: ankushagarwal

* [network][mempool] `MempoolNetworkSender` now wraps `NetworkSender`

Closes: diem#1644
Approved by: phlip9

* [network][state-sync] `StateSynchronizerSender` now wraps `NetworkSender`

+ Rename `STATE_SYNCHRONIZER_MSG_PROTOCOL` to
`STATE_SYNCHRONIZER_DIRECT_SEND_PROTOCOL` so it's consistent with other
network application modules.

Closes: diem#1644
Approved by: phlip9

* [network][ac] `AdmissionControlNetworkSender` now wraps `NetworkSender`

Closes: diem#1644
Approved by: phlip9

* [network] `HealthCheckerNetworkSender` now wraps `NetworkSender`

Closes: diem#1644
Approved by: phlip9

* [cluster-test] Use sudo when updating genesis.blob on deploy

Closes: diem#1654
Approved by: ankushagarwal

* [cluster-test] Reload faucet account on every tx emit job

(1) Faucet account data can get stale and needs to be reloaded between jobs
(2) Cluster might not be healthy on cluster test startup and it might not be possible to load faucet account at that time

Closes: diem#1655
Approved by: ankushagarwal

* [rust] migrate to stable toolchain

Noew that 1.39.0 has stabilized, remove the rust-toolchain file and
configure ci to use the stable toolchain.

Closes: diem#1656
Approved by: metajack

* [consensus][reconfig] help peers who sent old epoch messages

As part of reconfiguration, honest nodes in new epoch need to help others still in old epoch.
Otherwise we have a risk that one honest node join the new epoch and
stop consensus, and remaining 2f are not able to make any progress.

Closes: diem#1622
Approved by: dmitri-perelman

* [CI] add conditional docker build to commit verify

When we make changes to these docker files, we want to verify them in
the commit work flow.  Or we end up breaking nightly. This is a
follow-up to 243f535.

Here we added a conditional docker build job to the commit verify work
flow.  The docker build job will kick off iff the PR contains change in
any file matches `*.Dockerfile`.  Once triggered, the work flow will
build each of the updated docker files.

Closes: diem#1542
Approved by: huitseeker

* [admission-control] Add service test into lib

Closes: diem#1640
Approved by: phlip9

* add a rust-toolchain file back in

This has many advantages:

1. It pushes our builds towards hermeticity, which having worked on devtools for many years I've come to believe is *a priori* good.
2. It would be possible to make CI hermetic while making dev workflows not so much, but I believe that making dev builds as close to CI as possible is *a priori* good.
3. It completely obviates the need for manually requesting that developers upgrade Rust versions for new features -- that can simply be managed through tooling, as it should be.

Closes: diem#1662
Approved by: bmwill

* [cluster-test] Log command used to fetch genesis.blob

Closes: diem#1665
Approved by: dmitri-perelman

* [easy] More idiomatic Option/Result patterns

Closes: diem#1639
Approved by: zekun000

* [terraform] Increase monitoring instance disk volume
rename the variable

Closes: diem#1666
Approved by: ankushagarwal

* [Proof] Introduce SparseMerkleRangeProof

This proof intends to prove that a range of things exist in a sparse
Merkle tree. Given that when restoring the state tree, we always go from
left to right, so at any point in time a list of siblings on the right
is sufficient to prove the everything on the left.

The verification is a bit complex... The basic idea is that when we have
the full list of leaves for a sparse Merkle tree, we can compute the
common prefix length of each adjacent key pairs and find out which pair
consists of the left child and right child of the same parent. Then we
can compute their parent and reduce the problem. Note that we are just
doing this in unit tests to test the `get_range_proof` method, the real
verification will be a little bit different.

Closes: diem#1577
Approved by: msmouse

* [Crypto] Implement from_bit_iter for HashValue

So we can easily transform a vector of boolean to a HashValue.

Closes: diem#1577
Approved by: msmouse

* Remove warning to cut types dependency on slog

Closes: diem#1614
Approved by: bmwill

* First version of the new borrow checker based on the abstraction of an
acyclic labeled borrow graph.

Closes: diem#1598
Approved by: tnowacki

* [cluster-test] Add experiment to simulate multi-region environment and report result

Summary
This is the code which is used to run experiments with introducing network delays between nodes and simulation of multi-region
With this we can specify a list of split sizes and delays and we will run the simulation for every combination of these two parameters
Update NetworkDelay to be an Effect instead of Action
Update flags for multi-region simulation because now we can run simulations with a variety of split sizes and delays instead of just one
Complete rewrite of multi_region_network_simulation.rs to handle running simulations with a variety of configs
Print a list of all metrics in a csv format at the end of the experiment
Test Plan
Tested this on my cluster

Closes: diem#1652
Approved by: andll

* [executor] idempotent commits

Closes: diem#1627
Approved by: dmitri-perelman

* [executor] refactor executor with synced_trees

Closes: diem#1627
Approved by: dmitri-perelman

* Make MIRAI happy

Closes: diem#1664
Approved by: huitseeker

* [consensus][restart] simplify the recovery flow given idempotent commits support

We're able to greatly simplify the recovery process during restart thanks to the idempotent commit support.

We could directly rely on the latest ledger info storages returns us and it's now guaranteed to exist in consensusdb due to the state sync failure handling, diem#1590 Also we don't need to continue sync upon restart.

A side-effect of this pr is we now generate genesis virtually and never persist it into consensusdb.

Closes: diem#1616
Approved by: dmitri-perelman

* [consensus][reconfig] extend reconfiguration test for a few epochs

Closes: diem#1616
Approved by: dmitri-perelman

* [fuzzing] adding merkle tree proto fuzzing

MOTIVATION:

Adding two fuzzers. When receiving sparse and non-sparse merkle tree proofs,
there is some involved proto decoding code that we can fuzz.

Closes: diem#1232
Approved by: metajack

* [storage] fix typo

Closes: diem#1683
Approved by: wqfish

* Make MIRAI happy

Closes: diem#1678
Approved by: lightmark

* [storage] EpochByVersionSchema

Closes: diem#1635
Approved by: msmouse

* [storage] `LedgerStore::get_epoch()`

Closes: diem#1635
Approved by: msmouse

* Add initial debugging instrumentation to tree_heap

Closes: diem#1684
Approved by: cbarrettfb

* [network] Add benchmark for transport with TCP_NODELAY set

Closes: diem#1653
Approved by: ankushagarwal

* [language] Remove unused ParseError::UnrecognizedToken

The unused UnrecognizedToken error adds a type parameter for
Token that propagates all over the place. Since this is not even
used, remove it along with all those type parameters. We should
definitely add more detailed error messages (probably only in the
new move-lang compiler, not in the IR compiler) but whatever we do
should not require the internal token types, since the diagnostics
should describe issues in terms that are directly visible to end users.

Closes: diem#1685
Approved by: tnowacki

* [language] Clean up some remaining references to lalrpop for the IR compiler

Closes: diem#1685
Approved by: tnowacki

* [language][Move] Added expansion tests

- Added tests for expansion pass.
- Tried to cover +/- cases for each check

Closes: diem#1660
Approved by: tzakian

* Make MIRAI happy

Closes: diem#1681
Approved by: huitseeker

* make clippy happy

* [consensus] removing the `terminate` in chained_bft_smr loop

The consensus loop/select has a terminate that should not be reachable as the loop should run forever.
This commit removes it.

Closes: diem#1675
Approved by: zekun000

* [cluster-test] Update timeouts, print intermediate results

Closes: diem#1689
Approved by: andll

* [language] Refactor lexer to add a lookahead API

There are some cases where the parser really needs to look ahead at
the next token before deciding how to parse the current token. It was
hard to support that with the original lexer that I hacked together
from the lalrpop output, but now that the lexer is sane, it is not
so hard. Add a new lookahead API and use it in the parser, replacing
the current workarounds.

Closes: diem#1688
Approved by: tnowacki

* [easy][move] Sort errors by first Loc when displaying

- Sort errors by the initial Loc, makes reading errors a lot easier in big lists

Closes: diem#1693
Approved by: vgao1996

* [cluster-test] Test if file exists before log rotate

When cluster test fails to setup cluster(for example, failed to cp genesis.blob), then there is no log file on host - previous was log rotate, and new one was not created.

In this case attempt to log rotate it on next run produces a lot of noise.

This checks if file exists before attempting to log rotate it

Closes: diem#1692
Approved by: ankushagarwal

* [vm] Clean up the code cache api.

Closes: diem#1668
Approved by: dariorussi

* [cluster-test] Bump liveness health check timeout 1m->2m

We have intermittent failures with 1m timeout, this diff will double it

Closes: diem#1697
Approved by: ankushagarwal

* [CI] upgrade stretch to buster

In CircleCI setup, bump rust:stretch to rust:buster for builders. This
is to keep it consistent with Docker build.

Closes: diem#1691
Approved by: bmwill

* [consensus] enforce epoch consistency of messages

We enforce every messages contain the information about the same epoch.
(Fix a few missing verification too)

Closes: diem#1694
Approved by: aching

* [config] add a new config for safety rules

Add a new config because this needs to be entirely managed (owned by
Safety Rules) so that it can mutate it during run time.

It is worth noting that currently ConsensusConfig is aware of this
config, in the future, this would only be the case for testing. Safety
Rules binary should load this file directly, whereas Safety Rules
library would receive this as part of validator starting.

Closes: diem#1615
Approved by: davidiw

* [consensus] Add persistent storage for safety rules

This introduces a persistent storage interface and two implementations
for SafetyRules:
- InMemory for (integration) testing purposes
- OnDisk for ("production") testing purposes
Eventually this same API should be able to be used by various Secrets
Managers.

Closes: diem#1615
Approved by: davidiw

* [consensus] Move toward PersistentStorage interface for SafetyRules

ConsensusState isn't really a store, SafetyRules needs a store. So this
replaces the code to leverage a config backed storage unit and all the
code mods that go with it.

Note: because of the fact that consensus is multithreaded,
PersistentStore must be both Send + Sync

This commit also refactors the code within safety rules to take on a
more library style approach as there are more features now within the
code.

Closes: diem#1615
Approved by: davidiw

* [consensus] eliminate consensus state from consensus db

SafetyRules has its own persistent storage, let's leverage it.
Most of this code is just deleting the consensus state from consensus db
The rest is setting up the appropriate tests so that the code leverages
the new means for starting safety rules with a persistent backend

This diff also solves some other issues that somehow overlapped with
this work:
- Epoch changes are somewhat handled in SafetyRules::update
- Epochs begin at 1

Closes: diem#1615
Approved by: davidiw

* [state sync] reduce error verbosity

currently state sync prints full stack trace whenever routine error occurs
it polluts log. This diff reduces verbosity

Closes: diem#1707
Approved by: andll

* [cluster-test] Emit transactions when running network delay simulations

Summary
This PR updates the multi_region_network_simulation experiment such that transactions are generated in the background. Without that, we would be simply processing empty blocks in the experiment.

Test Plan
Ran against my cluster.

Closes: diem#1708
Approved by: andll

* [consensus] Update all consensus counters to new format

Summary
Stop using OP_COUNTERS
Create a separate metric for each counter prefixed by "libra_consensus_"
Use promethus macros directly for creating counters
Update all usages of consensus counters to new names

fixup! [consensus] Update all consensus counters to new format

Closes: diem#1663
Approved by: sherry-x

* [MIRAI] Add annotations to accumulator crate

Closes: diem#1676
Approved by: wqfish

* [CI] docker build should check file exist

Updated the docker builder to check if the docker file exists before
triggering the build.

Closes: diem#1712
Approved by: andll

* [language][move] Added naming tests

- Added tests for naming pass.
- Tried to cover +/- cases for each check

Closes: diem#1699
Approved by: tzakian

* [Proof] Remove redundant assertion in `new`

We should not have the assertion here. Clients would call `new` when
constructing the proofs from protobuf objects, but the objects returned
by servers are not guaranteed to be correct. So a bad server could cause
the clients to crash.

Since we will verify the proof later, it's probably redundant to do this
particular check in `new`.

Closes: diem#1715
Approved by: delegate

* [network] Set TCP_NODELAY for tcp connections

Closes: diem#1686
Approved by: ankushagarwal

* [transaction-builder] key rotation builder takes bytes instead of address

This builder API was misleading. It took an AccountAddress as input, but the rotate authentication key script expects bytes encoding the hash of a public key.

Closes: diem#1711
Approved by: sblackshear

* [consensus] push RPC processing into event processor to make network thread non-blocking

Closes: diem#1714
Approved by: andll

* [consensus] multi proposer election with SHA-3

**1. Multi proposer election** currently uses the siphash PRF, this is not super uber cool because the rest of the codebase uses SHA-3 and this adds a cryptographic dependency that we don't really need. So I changed it to SHA-3.

**2. there's a bias in the modular reduction** of the operation that computes a round's proposers. It is not worth being fixed as the bias is negligible in practice, I've added a comment to indicate this.

Closes: diem#1709
Approved by: dmitri-perelman

* [MIRAI] Add annotations to libra-wallet crate

Closes: diem#1679
Approved by: huitseeker

* [crypto] Derive CryptoHasher

Derive a CryptoHasher which ensures a distinct hasher string for every derivation.

Add a few lines of documentation for the CryptoHasher derivation.

[crypto] Add inline doc for CryptoHasher derive macro

Closes: diem#1511
Approved by: davidiw

* [crypto] Remove superseded CryptoHasher implementations

- in the process, eliminate the reuse of the Sparse Merkle Tree node hashers (!),
- clean up some of the duplication in language, storage, ..

Closes: diem#1511
Approved by: davidiw

* [crypto] Update genesis id after hashing change

Closes: diem#1511
Approved by: davidiw

* [language] Update genesis blob after hashing change

Closes: diem#1511
Approved by: davidiw

* [consensus] update safety rules documentation and add TODOs

The documentation had fallen out of date as the code moved from within
consensus, to its own file, to its own crate.
In addition, there were race conditions due to concurrent work happening
in the code base. Listing TODOs to better help track these. So ideally
if a PR were to hit on a TODO it would either be addressed or updated
to reflect the new world order.

Closes: diem#1713
Approved by: zekun000

* [easy] Small rewrites to ?/if

Recent versions of clippy will actually complain in the safety_rules case!
Remove match_bool from x.toml

Closes: diem#1717
Approved by: aching

* [cluster-test] Fix metrics for multi-region-simulation

Closes: diem#1719
Approved by: andll

* [MIRAI] Add annotations to libra-mempool crate

Closes: diem#1682
Approved by: tzakian

* [state sync] use synced_state as known version

Closes: diem#1721
Approved by: dmitri-perelman

* Update genesis.blob

Used build_all.sh to regenerate correct version

Closes: diem#1725
Approved by: huitseeker

* [Storage][JellyfishMerkle] Reorder a few code blocks

I always find it hard to look for code in this file. Putting the `impl`
blocks closer to the struct definition so things are structured more
consistently.

Closes: diem#1726
Approved by: lightmark

* [Storage][JellyfishMerkle] Use Self when possible

Closes: diem#1726
Approved by: lightmark

* [CI] check genesis blob in commit verify

Check the genensis blob consistenchy as part of commit verify.

Closes: diem#1727
Approved by: andll

* [cluster-test] Make dockerfile for cluster test

This diff introduces single dockerfile for cluster test in docker/cluster-test/cluster-test.Dockerfile

Since we want to support incremental build, this docker file is generated via generate.sh script

Fixes diem#1669

Closes: diem#1705
Approved by: sausagee

* [cost-synthesis] Update creation of identifiers and fix bitrot

Previously we could create invalid identifiers which would cause the
generation to crash. We now make sure that all strings and identifiers
conform to the identifiere specification in types.

Also generated modules were previously not representng empty type
parameter info previously. This fixes this as well.

Closes: diem#1731
Approved by: shazqadeer

* update Cargo.log

* fix compiler error.

* fix gas_schedule and re generate genesis.

* merge pow.

* commit_info for LedgerInfo.

* add back some changes on libra

* fix star node bug.

* fix mint block bug.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants