integration: Publish aggregate test report summary #2453

mqasimsarfraz · 2024-02-05T14:14:55Z

Currently, getting overview of test failures in the CI can be challenging. This change addresses that by extending the work we did for summaries and enhances that by storing them to a GitHub page.

It took inspiration from the ig-benchmarks and commits the reports on each change. Similarly, uses retry to avoid running into multiple git push issue.

Testing Done

Reports: https://inspektor-gadget.github.io/ig-test-reports/

alban · 2024-02-05T15:36:42Z

That looks great.

Will the web page display all previous tests or only the last few ones? If it displays everything, that would be a lot of columns over time, and make it difficult to read.

mauriciovasquezbernal

It looks really great. I don't have so many comments, besides Alban's comment above I don't have anything to add. I think we should we the last N runs, N == 100?, and that's it.

mauriciovasquezbernal · 2024-02-05T15:52:00Z

.github/workflows/inspektor-gadget.yml

+      - test-integration-k8s-ig
+      - test-integration-non-k8s-ig
+    runs-on: ubuntu-latest
+    # Skip this job when running on a fork or a PR from a fork.


Why do we need to skip when running from a fork?

The secrets.TEST_REPORTS_TOKEN won't be available for external folks PRs.

Cannot you modify this to test that the secret is present?
Like what we already do here:

inspektor-gadget/.github/workflows/inspektor-gadget.yml

Line 1102 in 16777bb

check-secrets:

inspektor-gadget/.github/workflows/inspektor-gadget.yml

Line 1331 in 16777bb

if: needs.check-secrets.outputs.aks == 'true'

mqasimsarfraz · 2024-02-05T16:22:21Z

If it displays everything, that would be a lot of columns over time, and make it difficult to read.

💯

Will the web page display all previous tests or only the last few ones?

I currently limit it to last 10 only. We can change this value in ig-test-reports later on as well.

mauriciovasquezbernal · 2024-02-05T18:39:31Z

I currently limit it to last 10 only. We can change this value in ig-test-reports later on as well.

Should we limit the size of the json somehow? I suppose it'll become to big and difficult to handle very soon.

eiffel-fl

Hi!

I understand the need to store our CI results on the long term and in an easy way to parse.
But I am wondering if this our goal of that of the platform where we run the CI, i.e. GitHub. So, do they provide something like this?

Also, with regard to the code itself, you make a parallel with benchmarks which has been really flaky recently...
If possible, I would like to avoid adding other flaky tests in our CI as it just decreases the reliability over time.
I understand this is indeed complicated to guess whether something will be flaky but the retry function does not really lead me to the path where I think this will not be flaky.
Moreover, I am wondering if this is not something which should be written in javascript rather than bash, particularly given the amount of lines.

Best regards.

eiffel-fl · 2024-02-06T03:17:54Z

.github/workflows/inspektor-gadget.yml

+          local n=1
+          local max=10
+          local delay=15
+          while true; do


I suppose you can use a for here.

eiffel-fl · 2024-02-06T03:17:56Z

.github/workflows/inspektor-gadget.yml

+      - test-integration-k8s-ig
+      - test-integration-non-k8s-ig
+    runs-on: ubuntu-latest
+    # Skip this job when running on a fork or a PR from a fork.


Cannot you modify this to test that the secret is present?
Like what we already do here:

inspektor-gadget/.github/workflows/inspektor-gadget.yml

Line 1102 in 16777bb

check-secrets:

inspektor-gadget/.github/workflows/inspektor-gadget.yml

Line 1331 in 16777bb

if: needs.check-secrets.outputs.aks == 'true'

.github/workflows/inspektor-gadget.yml

eiffel-fl · 2024-02-06T03:22:26Z

.github/workflows/inspektor-gadget.yml

+    - name: Store test reports
+      shell: bash {0}
+      run: |
+        function retry {


Why do you need this whole retry function?
I am generally not a big fan of this type of code as it often goes along flakiness.

mqasimsarfraz · 2024-02-06T13:15:20Z

I currently limit it to last 10 only. We can change this value in ig-test-reports later on as well.

Should we limit the size of the json somehow? I suppose it'll become to big and difficult to handle very soon.

Great catch! I adjusted to only store last 100 results. I believe we should be fine with that but if that will be too much we can decrease this in future.

mauriciovasquezbernal · 2024-02-06T13:28:27Z

Also, with regard to the code itself, you make a parallel with benchmarks which has been really flaky recently...
If possible, I would like to avoid adding other flaky tests in our CI as it just decreases the reliability over time.
I understand this is indeed complicated to guess whether something will be flaky but the retry function does not really lead me to the path where I think this will not be flaky.

I think the logic introduced in the retry function is a good enough effort to avoid this job failing when other instances of it run on parallel on different PRs. Perhaps we can also use a concurrency group (https://docs.github.com/en/actions/using-jobs/using-concurrency) to avoid multiple instances of this job running on parallel and reduce the change of a collision?

mqasimsarfraz · 2024-02-06T13:29:15Z

I understand the need to store our CI results on the long term and in an easy way to parse. But I am wondering if this our goal of that of the platform where we run the CI, i.e. GitHub. So, do they provide something like this?

I would have really hoped GitHub to have something for this but I don't see any signs of that happening. I saw there are custom runners and perhaps other platform offering this but that sounds more complicated to me.

Also, with regard to the code itself, you make a parallel with benchmarks which has been really flaky recently... If possible, I would like to avoid adding other flaky tests in our CI as it just decreases the reliability over time. I understand this is indeed complicated to guess whether something will be flaky but the retry function does not really lead me to the path where I think this will not be flaky.

The issue with benchmarks was recently fixed so the idea was to see if we can handle the things in similar way to avoid issues in first place.

Moreover, I am wondering if this is not something which should be written in javascript rather than bash, particularly given the amount of lines.

Which part are we talking about? like still store the data but have a bash script render it? I personally think a web page is simpler approach.

mauriciovasquezbernal

It's looking very good. What do you think about moving the script to a different file? I think that would help to edit it easier (highlight syntax and other stuff will help too).

Also, I can approve it now if you want to merge as it is. I saw the result and it's very nice, we can always improve it later on if needed. It's up to you how much you want to improve it before merging.

mauriciovasquezbernal · 2024-02-06T13:17:11Z

.github/workflows/inspektor-gadget.yml

+        name: "test-reports"
+    - name: Store test reports
+      shell: bash {0}
+      run: |


Perhaps move this to something under https://github.com/inspektor-gadget/inspektor-gadget/tree/main/tools to avoid adding more lines to this file?

Good idea! I will do it!

mqasimsarfraz · 2024-02-06T13:33:47Z

so use a concurrency group (https://docs.github.com/en/actions/using-jobs/using-concurrency) to avoid multiple instances of this job running on parallel and reduce the change of a collision?

I also thought about this but the reason I decided against it since it only allows single job in pending state?

Any previously pending job or workflow in the concurrency group will be canceled. To also cancel any currently running job or workflow in the same concurrency group, specify cancel-in-progress: true.

mauriciovasquezbernal · 2024-02-06T13:52:03Z

So in that case we'll missing the entry for the skipped job in the results?
I agree the current approach is better then, let's go for it.

Signed-off-by: Qasim Sarfraz <qasimsarfraz@microsoft.com>

mauriciovasquezbernal

LGTM. Thanks for working on this!

eiffel-fl · 2024-02-07T08:37:11Z

The issue with benchmarks was #2038 (comment) so the idea was to see if we can handle the things in similar way to avoid issues in first place.

You made me doubt, but there is still an issue with benchmarks:
https://github.com/inspektor-gadget/inspektor-gadget/actions/runs/7810164205/job/21303171942?pr=2460
Maybe this is a different one, but this will change that I deem this to be flaky as hell.

eiffel-fl · 2024-02-07T08:38:31Z

The issue with benchmarks was #2038 (comment) so the idea was to see if we can handle the things in similar way to avoid issues in first place.

You made me doubt, but there is still an issue with benchmarks:
https://github.com/inspektor-gadget/inspektor-gadget/actions/runs/7810164205/job/21303171942?pr=2460
Maybe this is a different one, but this will not change that I deem this to be flaky as hell.

mqasimsarfraz · 2024-02-07T09:24:01Z

The issue with benchmarks was #2038 (comment) so the idea was to see if we can handle the things in similar way to avoid issues in first place.

You made me doubt, but there is still an issue with benchmarks: https://github.com/inspektor-gadget/inspektor-gadget/actions/runs/7810164205/job/21303171942?pr=2460 Maybe this is a different one, but this will change that I deem this to be flaky as hell.

The error does look different but I see the point. But I don't think we should block this change on it. I can keep a close on eye on next few builds after we merge this change and would be happy to switch to other approaches (e.g concurrency group) if they are more stable even with less coverage. Flaky CI is the last thing I want 😄

mqasimsarfraz · 2024-02-07T10:31:04Z

I will merge this and keep a close eye on flakiness. Hopefully we will be able to identity failures faster!

mqasimsarfraz requested a review from eiffel-fl as a code owner February 5, 2024 14:14

mauriciovasquezbernal reviewed Feb 5, 2024

View reviewed changes

mqasimsarfraz force-pushed the qasim/test-report branch from 8297a45 to c096736 Compare February 5, 2024 16:23

eiffel-fl reviewed Feb 6, 2024

View reviewed changes

mqasimsarfraz force-pushed the qasim/test-report branch from c096736 to 5a3d738 Compare February 6, 2024 11:53

mauriciovasquezbernal reviewed Feb 6, 2024

View reviewed changes

mqasimsarfraz force-pushed the qasim/test-report branch from 5a3d738 to 695dbce Compare February 6, 2024 15:17

integration: Publish aggregate test report summary

8097a20

Signed-off-by: Qasim Sarfraz <qasimsarfraz@microsoft.com>

mqasimsarfraz force-pushed the qasim/test-report branch from 695dbce to 8097a20 Compare February 6, 2024 16:06

mauriciovasquezbernal approved these changes Feb 6, 2024

View reviewed changes

mqasimsarfraz merged commit 5f06246 into main Feb 7, 2024
57 checks passed

mqasimsarfraz deleted the qasim/test-report branch February 7, 2024 10:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integration: Publish aggregate test report summary #2453

integration: Publish aggregate test report summary #2453

mqasimsarfraz commented Feb 5, 2024

alban commented Feb 5, 2024

mauriciovasquezbernal left a comment

mauriciovasquezbernal Feb 5, 2024

mqasimsarfraz Feb 5, 2024

eiffel-fl Feb 6, 2024

mqasimsarfraz commented Feb 5, 2024

mauriciovasquezbernal commented Feb 5, 2024

eiffel-fl left a comment

eiffel-fl Feb 6, 2024

eiffel-fl Feb 6, 2024

eiffel-fl Feb 6, 2024

mqasimsarfraz commented Feb 6, 2024

mauriciovasquezbernal commented Feb 6, 2024 •

edited

mqasimsarfraz commented Feb 6, 2024

mauriciovasquezbernal left a comment

mauriciovasquezbernal Feb 6, 2024

mqasimsarfraz Feb 6, 2024

mqasimsarfraz Feb 6, 2024

mqasimsarfraz commented Feb 6, 2024

mauriciovasquezbernal commented Feb 6, 2024

mauriciovasquezbernal left a comment

eiffel-fl commented Feb 7, 2024

eiffel-fl commented Feb 7, 2024 •

edited

mqasimsarfraz commented Feb 7, 2024 •

edited

mqasimsarfraz commented Feb 7, 2024

integration: Publish aggregate test report summary #2453

integration: Publish aggregate test report summary #2453

Conversation

mqasimsarfraz commented Feb 5, 2024

Testing Done

alban commented Feb 5, 2024

mauriciovasquezbernal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mqasimsarfraz commented Feb 5, 2024

mauriciovasquezbernal commented Feb 5, 2024

eiffel-fl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mqasimsarfraz commented Feb 6, 2024

mauriciovasquezbernal commented Feb 6, 2024 • edited

mqasimsarfraz commented Feb 6, 2024

mauriciovasquezbernal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mqasimsarfraz commented Feb 6, 2024

mauriciovasquezbernal commented Feb 6, 2024

mauriciovasquezbernal left a comment

Choose a reason for hiding this comment

eiffel-fl commented Feb 7, 2024

eiffel-fl commented Feb 7, 2024 • edited

mqasimsarfraz commented Feb 7, 2024 • edited

mqasimsarfraz commented Feb 7, 2024

mauriciovasquezbernal commented Feb 6, 2024 •

edited

eiffel-fl commented Feb 7, 2024 •

edited

mqasimsarfraz commented Feb 7, 2024 •

edited