Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integration: Publish aggregate test report summary #2453

Merged
merged 1 commit into from
Feb 7, 2024

Conversation

mqasimsarfraz
Copy link
Member

Currently, getting overview of test failures in the CI can be challenging. This change addresses that by extending the work we did for summaries and enhances that by storing them to a GitHub page.

It took inspiration from the ig-benchmarks and commits the reports on each change. Similarly, uses retry to avoid running into multiple git push issue.

Testing Done

Reports: https://inspektor-gadget.github.io/ig-test-reports/

@alban
Copy link
Member

alban commented Feb 5, 2024

That looks great.

Will the web page display all previous tests or only the last few ones? If it displays everything, that would be a lot of columns over time, and make it difficult to read.

Copy link
Member

@mauriciovasquezbernal mauriciovasquezbernal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks really great. I don't have so many comments, besides Alban's comment above I don't have anything to add. I think we should we the last N runs, N == 100?, and that's it.

- test-integration-k8s-ig
- test-integration-non-k8s-ig
runs-on: ubuntu-latest
# Skip this job when running on a fork or a PR from a fork.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to skip when running from a fork?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The secrets.TEST_REPORTS_TOKEN won't be available for external folks PRs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot you modify this to test that the secret is present?
Like what we already do here:


if: needs.check-secrets.outputs.aks == 'true'

@mqasimsarfraz
Copy link
Member Author

If it displays everything, that would be a lot of columns over time, and make it difficult to read.

💯

Will the web page display all previous tests or only the last few ones?

I currently limit it to last 10 only. We can change this value in ig-test-reports later on as well.

@mauriciovasquezbernal
Copy link
Member

I currently limit it to last 10 only. We can change this value in ig-test-reports later on as well.

Should we limit the size of the json somehow? I suppose it'll become to big and difficult to handle very soon.

Copy link
Member

@eiffel-fl eiffel-fl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!

I understand the need to store our CI results on the long term and in an easy way to parse.
But I am wondering if this our goal of that of the platform where we run the CI, i.e. GitHub. So, do they provide something like this?

Also, with regard to the code itself, you make a parallel with benchmarks which has been really flaky recently...
If possible, I would like to avoid adding other flaky tests in our CI as it just decreases the reliability over time.
I understand this is indeed complicated to guess whether something will be flaky but the retry function does not really lead me to the path where I think this will not be flaky.
Moreover, I am wondering if this is not something which should be written in javascript rather than bash, particularly given the amount of lines.

Best regards.

local n=1
local max=10
local delay=15
while true; do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you can use a for here.

- test-integration-k8s-ig
- test-integration-non-k8s-ig
runs-on: ubuntu-latest
# Skip this job when running on a fork or a PR from a fork.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot you modify this to test that the secret is present?
Like what we already do here:


if: needs.check-secrets.outputs.aks == 'true'

.github/workflows/inspektor-gadget.yml Outdated Show resolved Hide resolved
- name: Store test reports
shell: bash {0}
run: |
function retry {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this whole retry function?
I am generally not a big fan of this type of code as it often goes along flakiness.

@mqasimsarfraz
Copy link
Member Author

I currently limit it to last 10 only. We can change this value in ig-test-reports later on as well.

Should we limit the size of the json somehow? I suppose it'll become to big and difficult to handle very soon.

Great catch! I adjusted to only store last 100 results. I believe we should be fine with that but if that will be too much we can decrease this in future.

@mauriciovasquezbernal
Copy link
Member

mauriciovasquezbernal commented Feb 6, 2024

Also, with regard to the code itself, you make a parallel with benchmarks which has been really flaky recently...
If possible, I would like to avoid adding other flaky tests in our CI as it just decreases the reliability over time.
I understand this is indeed complicated to guess whether something will be flaky but the retry function does not really lead me to the path where I think this will not be flaky.

I think the logic introduced in the retry function is a good enough effort to avoid this job failing when other instances of it run on parallel on different PRs. Perhaps we can also use a concurrency group (https://docs.github.com/en/actions/using-jobs/using-concurrency) to avoid multiple instances of this job running on parallel and reduce the change of a collision?

@mqasimsarfraz
Copy link
Member Author

I understand the need to store our CI results on the long term and in an easy way to parse. But I am wondering if this our goal of that of the platform where we run the CI, i.e. GitHub. So, do they provide something like this?

I would have really hoped GitHub to have something for this but I don't see any signs of that happening. I saw there are custom runners and perhaps other platform offering this but that sounds more complicated to me.

Also, with regard to the code itself, you make a parallel with benchmarks which has been really flaky recently... If possible, I would like to avoid adding other flaky tests in our CI as it just decreases the reliability over time. I understand this is indeed complicated to guess whether something will be flaky but the retry function does not really lead me to the path where I think this will not be flaky.

The issue with benchmarks was recently fixed so the idea was to see if we can handle the things in similar way to avoid issues in first place.

Moreover, I am wondering if this is not something which should be written in javascript rather than bash, particularly given the amount of lines.

Which part are we talking about? like still store the data but have a bash script render it? I personally think a web page is simpler approach.

Copy link
Member

@mauriciovasquezbernal mauriciovasquezbernal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking very good. What do you think about moving the script to a different file? I think that would help to edit it easier (highlight syntax and other stuff will help too).

Also, I can approve it now if you want to merge as it is. I saw the result and it's very nice, we can always improve it later on if needed. It's up to you how much you want to improve it before merging.

name: "test-reports"
- name: Store test reports
shell: bash {0}
run: |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps move this to something under https://github.com/inspektor-gadget/inspektor-gadget/tree/main/tools to avoid adding more lines to this file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! I will do it!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mqasimsarfraz
Copy link
Member Author

so use a concurrency group (https://docs.github.com/en/actions/using-jobs/using-concurrency) to avoid multiple instances of this job running on parallel and reduce the change of a collision?

I also thought about this but the reason I decided against it since it only allows single job in pending state?

Any previously pending job or workflow in the concurrency group will be canceled. To also cancel any currently running job or workflow in the same concurrency group, specify cancel-in-progress: true.

@mauriciovasquezbernal
Copy link
Member

So in that case we'll missing the entry for the skipped job in the results?
I agree the current approach is better then, let's go for it.

Signed-off-by: Qasim Sarfraz <qasimsarfraz@microsoft.com>
Copy link
Member

@mauriciovasquezbernal mauriciovasquezbernal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for working on this!

@eiffel-fl
Copy link
Member

The issue with benchmarks was #2038 (comment) so the idea was to see if we can handle the things in similar way to avoid issues in first place.

You made me doubt, but there is still an issue with benchmarks:
https://github.com/inspektor-gadget/inspektor-gadget/actions/runs/7810164205/job/21303171942?pr=2460
Maybe this is a different one, but this will change that I deem this to be flaky as hell.

@eiffel-fl
Copy link
Member

eiffel-fl commented Feb 7, 2024

The issue with benchmarks was #2038 (comment) so the idea was to see if we can handle the things in similar way to avoid issues in first place.

You made me doubt, but there is still an issue with benchmarks:
https://github.com/inspektor-gadget/inspektor-gadget/actions/runs/7810164205/job/21303171942?pr=2460
Maybe this is a different one, but this will not change that I deem this to be flaky as hell.

@mqasimsarfraz
Copy link
Member Author

mqasimsarfraz commented Feb 7, 2024

The issue with benchmarks was #2038 (comment) so the idea was to see if we can handle the things in similar way to avoid issues in first place.

You made me doubt, but there is still an issue with benchmarks: https://github.com/inspektor-gadget/inspektor-gadget/actions/runs/7810164205/job/21303171942?pr=2460 Maybe this is a different one, but this will change that I deem this to be flaky as hell.

The error does look different but I see the point. But I don't think we should block this change on it. I can keep a close on eye on next few builds after we merge this change and would be happy to switch to other approaches (e.g concurrency group) if they are more stable even with less coverage. Flaky CI is the last thing I want 😄

@mqasimsarfraz
Copy link
Member Author

I will merge this and keep a close eye on flakiness. Hopefully we will be able to identity failures faster!

@mqasimsarfraz mqasimsarfraz merged commit 5f06246 into main Feb 7, 2024
57 checks passed
@mqasimsarfraz mqasimsarfraz deleted the qasim/test-report branch February 7, 2024 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants