Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gha][lbt] don't run LBT compat on cherry-pick and update slack log #5701

Merged
merged 1 commit into from
Sep 4, 2020

Conversation

rustielin
Copy link
Contributor

@rustielin rustielin commented Aug 20, 2020

Overview

  • Updates how we fetch PR details (number, ref) and reorders some initial steps in the LBT workflow
  • Fetch PR base ref for use in compat test. For now, only run compat test if the PR targets master
  • Slack message shows a dump of the cluster-test report to be more representative of the actual failure since tailing the logs usually doesn't show anything too meaningful
  • Bonus: new button in Slack to "Visit PR", so we don't have to hunt down the PR number from the job

Test

A bunch of canary on this PR and also failure cases on #5700

Note

EDIT:

This PR makes it such that compat test in LBT will only run if the base of the PR is master. To get compat to work for other branches, some fixes need to be made to the workflow to ensure that the right images exist to use in test.

@rustielin
Copy link
Contributor Author

/canary

bors-libra pushed a commit that referenced this pull request Sep 1, 2020
* Changes PR number and details calculation, and reorders initial
steps
* fetch base ref for use in compat test
* New button in Slack to Visit PR
* Slack message uses the report.json

Closes: #5701
@bors-libra
Copy link
Contributor

💔 Test Failed - Build images and run cluster test

@rustielin
Copy link
Contributor Author

/canary

bors-libra pushed a commit that referenced this pull request Sep 1, 2020
* Changes PR number and details calculation, and reorders initial
steps
* fetch base ref for use in compat test
* New button in Slack to Visit PR
* Slack message uses the report.json

Closes: #5701
@bors-libra
Copy link
Contributor

💔 Test Failed - Build images and run cluster test

@rustielin
Copy link
Contributor Author

/canary

bors-libra pushed a commit that referenced this pull request Sep 1, 2020
* Changes PR number and details calculation, and reorders initial
steps
* fetch base ref for use in compat test
* New button in Slack to Visit PR
* Slack message uses the report.json

Closes: #5701
@bors-libra
Copy link
Contributor

💔 Test Failed - Build images and run cluster test

@rustielin
Copy link
Contributor Author

/canary

bors-libra pushed a commit that referenced this pull request Sep 1, 2020
* Changes PR number and details calculation, and reorders initial
steps
* fetch base ref for use in compat test
* New button in Slack to Visit PR
* Slack message uses the report.json

Closes: #5701
@github-actions
Copy link

github-actions bot commented Sep 1, 2020

Cluster Test Result

Compatibility test results for land_7371a800 ==> land_f3fbdee8 (PR)
1. All instances running land_7371a800, generating some traffic on network
2. First validator land_7371a800 ==> land_f3fbdee8, to validate storage
3. First batch validators (14) land_7371a800 ==> land_f3fbdee8, to test consensus
4. Second batch validators (15) land_7371a800 ==> land_f3fbdee8, to upgrade rest of the validators
5. All full nodes (30) land_7371a800 ==> land_f3fbdee8, to finish the network upgrade
all up : 1000 TPS, 4528 ms latency, 5450 ms p99 latency, no expired txns
Logs: http://kibana.ct-1-k8s-testnet.aws.hlw3truzy4ls.com/app/kibana#/discover?_g=(time:(from:'2020-09-01T18:52:01Z',to:'2020-09-01T19:13:17Z'))
Dashboard: http://grafana.ct-1-k8s-testnet.aws.hlw3truzy4ls.com/d/2XqUIhnWz/performance?from=1598986321000&to=1598987597000
Validator 1 logs: http://kibana.ct-1-k8s-testnet.aws.hlw3truzy4ls.com/app/kibana#/discover?_g=(time:(from:'2020-09-01T18:52:01Z',to:'2020-09-01T19:13:17Z'))&_a=(columns:!(log),query:(language:kuery,query:'kubernetes.pod_name:"val-1"'),sort:!(!('@timestamp',desc)))

Repro cmd:

./scripts/cti --tag land_7371a800 --cluster-test-tag land_f3fbdee8 -E RUST_LOG=debug -E BATCH_SIZE=15 -E UPDATE_TO_TAG=land_f3fbdee8 --report report.json --suite land_blocking_compat

@bors-libra
Copy link
Contributor

☀️ Canary successful

@rustielin rustielin force-pushed the lbt-report-pr-num branch 3 times, most recently from 02d0d56 to 6797abb Compare September 1, 2020 21:06
bors-libra pushed a commit that referenced this pull request Sep 2, 2020
`find-lbt-images.sh` now only looks for recent images until the first
breaking change has been found, since after that, compat is bound to
fail. If a recent image is not found, report to Slack, but also build
the expected image from `BASE_REF` and continue with the workflow.

Also allows land-blocking compat test to be run on non-master branches,
such as for verifying commits cherry-picked into release branches.
Appropriate changes have been made in scripts to account for variable
`BASE_REF`.

Depends on #5701

Closes: #5878
bors-libra pushed a commit that referenced this pull request Sep 2, 2020
`find-lbt-images.sh` now only looks for recent images until the first
breaking change has been found, since after that, compat is bound to
fail. If a recent image is not found, report to Slack, but also build
the expected image from `BASE_REF` and continue with the workflow.

Also allows land-blocking compat test to be run on non-master branches,
such as for verifying commits cherry-picked into release branches.
Appropriate changes have been made in scripts to account for variable
`BASE_REF`.

Depends on #5701

Closes: #5879
rustielin added a commit to rustielin/libra that referenced this pull request Sep 2, 2020
`find-lbt-images.sh` now only looks for recent images until the first
breaking change has been found, since after that, compat is bound to
fail. If a recent image is not found, report to Slack, but also build
the expected image from `BASE_REF` and continue with the workflow.

Also allows land-blocking compat test to be run on non-master branches,
such as for verifying commits cherry-picked into release branches.
Appropriate changes have been made in scripts to account for variable
`BASE_REF`.

Depends on diem#5701
bors-libra pushed a commit that referenced this pull request Sep 2, 2020
`find-lbt-images.sh` now only looks for recent images until the first
breaking change has been found, since after that, compat is bound to
fail. If a recent image is not found, report to Slack, but also build
the expected image from `BASE_REF` and continue with the workflow.

Also allows land-blocking compat test to be run on non-master branches,
such as for verifying commits cherry-picked into release branches.
Appropriate changes have been made in scripts to account for variable
`BASE_REF`.

Depends on #5701

Closes: #5879
rustielin added a commit to rustielin/libra that referenced this pull request Sep 2, 2020
`find-lbt-images.sh` now only looks for recent images until the first
breaking change has been found, since after that, compat is bound to
fail. If a recent image is not found, report to Slack, but also build
the expected image from `BASE_REF` and continue with the workflow.

Also allows land-blocking compat test to be run on non-master branches,
such as for verifying commits cherry-picked into release branches.
Appropriate changes have been made in scripts to account for variable
`BASE_REF`.

Depends on diem#5701
@andll
Copy link

andll commented Sep 2, 2020

I agree that there is value in running compat test on release branch and we can't assume that 'compat test runs elsewhere so there is no need to worry'.
We can probably land this and then work on activating compat test?
Leaving it to @sausagee to approve since I did not look in detail into the LBT code

bors-libra pushed a commit that referenced this pull request Sep 3, 2020
`find-lbt-images.sh` now only looks for recent images until the first
breaking change has been found, since after that, compat is bound to
fail. If a recent image is not found, report to Slack, but also build
the expected image from `BASE_REF` and continue with the workflow.

Also allows land-blocking compat test to be run on non-master branches,
such as for verifying commits cherry-picked into release branches.
Appropriate changes have been made in scripts to account for variable
`BASE_REF`.

Depends on #5701
rustielin added a commit to rustielin/libra that referenced this pull request Sep 3, 2020
`find-lbt-images.sh` now only looks for recent images until the first
breaking change has been found, since after that, compat is bound to
fail. If a recent image is not found, report to Slack, but also build
the expected image from `BASE_REF` and continue with the workflow.

Also allows land-blocking compat test to be run on non-master branches,
such as for verifying commits cherry-picked into release branches.
Appropriate changes have been made in scripts to account for variable
`BASE_REF`.

Depends on diem#5701
bors-libra pushed a commit that referenced this pull request Sep 3, 2020
`find-lbt-images.sh` now only looks for recent images until the first
breaking change has been found, since after that, compat is bound to
fail. If a recent image is not found, report to Slack, but also build
the expected image from `BASE_REF` and continue with the workflow.

Also allows land-blocking compat test to be run on non-master branches,
such as for verifying commits cherry-picked into release branches.
Appropriate changes have been made in scripts to account for variable
`BASE_REF`.

Depends on #5701
@rustielin
Copy link
Contributor Author

@sausagee @andll agreed, thanks for the feedback! I've added the ability for LBT to build two sets of images if it is unable to find the right images to use for compat test (#5878 which is stacked on this PR) LBT + compat should now work for cherry-picking into release branches.

@sausagee
Copy link
Contributor

sausagee commented Sep 4, 2020

LBT + compat should now work for cherry-picking into release branches.

The PR title reads "[gha][lbt] don't run LBT on cherry-pick and update slack log #5701" ATM, update it to match the intent?

Copy link
Contributor

@sausagee sausagee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good other the PR title.

@rustielin rustielin changed the title [gha][lbt] don't run LBT on cherry-pick and update slack log [gha][lbt] don't run LBT compat on cherry-pick and update slack log Sep 4, 2020
@rustielin
Copy link
Contributor Author

/land

* Changes PR number and details calculation, and reorders initial
steps
* fetch base ref for use in compat test
* New button in Slack to Visit PR
* Slack message uses the report.json

Closes: diem#5701
rustielin added a commit to rustielin/libra that referenced this pull request Sep 4, 2020
`find-lbt-images.sh` now only looks for recent images until the first
breaking change has been found, since after that, compat is bound to
fail. If a recent image is not found, report to Slack, but also build
the expected image from `BASE_REF` and continue with the workflow.

Also allows land-blocking compat test to be run on non-master branches,
such as for verifying commits cherry-picked into release branches.
Appropriate changes have been made in scripts to account for variable
`BASE_REF`.

Depends on diem#5701
rustielin added a commit to rustielin/libra that referenced this pull request Sep 4, 2020
`find-lbt-images.sh` now only looks for recent images until the first
breaking change has been found, since after that, compat is bound to
fail. If a recent image is not found, report to Slack, but also build
the expected image from `BASE_REF` and continue with the workflow.

Also allows land-blocking compat test to be run on non-master branches,
such as for verifying commits cherry-picked into release branches.
Appropriate changes have been made in scripts to account for variable
`BASE_REF`.

Depends on diem#5701
@github-actions
Copy link

github-actions bot commented Sep 4, 2020

Cluster Test Result

Compatibility test results for land_d2fcf992 ==> land_89a7d136 (PR)
1. All instances running land_d2fcf992, generating some traffic on network
2. First validator land_d2fcf992 ==> land_89a7d136, to validate storage
3. First batch validators (14) land_d2fcf992 ==> land_89a7d136, to test consensus
4. Second batch validators (15) land_d2fcf992 ==> land_89a7d136, to upgrade rest of the validators
5. All full nodes (30) land_d2fcf992 ==> land_89a7d136, to finish the network upgrade
all up : 995 TPS, 4551 ms latency, 5250 ms p99 latency, no expired txns
Logs: http://kibana.ct-0-k8s-testnet.aws.hlw3truzy4ls.com/app/kibana#/discover?_g=(time:(from:'2020-09-04T17:00:16Z',to:'2020-09-04T17:20:35Z'))
Dashboard: http://grafana.ct-0-k8s-testnet.aws.hlw3truzy4ls.com/d/2XqUIhnWz/performance?from=1599238816000&to=1599240035000
Validator 1 logs: http://kibana.ct-0-k8s-testnet.aws.hlw3truzy4ls.com/app/kibana#/discover?_g=(time:(from:'2020-09-04T17:00:16Z',to:'2020-09-04T17:20:35Z'))&_a=(columns:!(log),query:(language:kuery,query:'kubernetes.pod_name:"val-1"'),sort:!(!('@timestamp',desc)))

Repro cmd:

./scripts/cti --tag land_d2fcf992 --cluster-test-tag land_89a7d136 -E RUST_LOG=debug -E BATCH_SIZE=15 -E UPDATE_TO_TAG=land_89a7d136 --report report.json --suite land_blocking_compat

@bors-libra bors-libra closed this in 89a7d13 Sep 4, 2020
@bors-libra bors-libra merged commit 89a7d13 into diem:master Sep 4, 2020
rustielin added a commit to rustielin/libra that referenced this pull request Sep 4, 2020
`find-lbt-images.sh` now only looks for recent images until the first
breaking change has been found, since after that, compat is bound to
fail. If a recent image is not found, report to Slack, but also build
the expected image from `BASE_REF` and continue with the workflow.

Also allows land-blocking compat test to be run on non-master branches,
such as for verifying commits cherry-picked into release branches.
Appropriate changes have been made in scripts to account for variable
`BASE_REF`.

Depends on diem#5701
rustielin added a commit to rustielin/libra that referenced this pull request Sep 4, 2020
`find-lbt-images.sh` now only looks for recent images until the first
breaking change has been found, since after that, compat is bound to
fail. If a recent image is not found, report to Slack, but also build
the expected image from `BASE_REF` and continue with the workflow.

Also allows land-blocking compat test to be run on non-master branches,
such as for verifying commits cherry-picked into release branches.
Appropriate changes have been made in scripts to account for variable
`BASE_REF`.

Depends on diem#5701
bors-libra pushed a commit that referenced this pull request Sep 4, 2020
`find-lbt-images.sh` now only looks for recent images until the first
breaking change has been found, since after that, compat is bound to
fail. If a recent image is not found, report to Slack, but also build
the expected image from `BASE_REF` and continue with the workflow.

Also allows land-blocking compat test to be run on non-master branches,
such as for verifying commits cherry-picked into release branches.
Appropriate changes have been made in scripts to account for variable
`BASE_REF`.

Depends on #5701

Closes: #5878
bors-libra pushed a commit that referenced this pull request Sep 4, 2020
`find-lbt-images.sh` now only looks for recent images until the first
breaking change has been found, since after that, compat is bound to
fail. If a recent image is not found, report to Slack, but also build
the expected image from `BASE_REF` and continue with the workflow.

Also allows land-blocking compat test to be run on non-master branches,
such as for verifying commits cherry-picked into release branches.
Appropriate changes have been made in scripts to account for variable
`BASE_REF`.

Depends on #5701

Closes: #5878
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants