Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce ns-3 per-commit CI pipelines (push, MR) #141

Merged
merged 1 commit into from Nov 9, 2023

Conversation

non-det-alle
Copy link
Collaborator

@non-det-alle non-det-alle commented Oct 22, 2023

This is a re-do of the GitHub workflows of this repo in order to mimic the CI pipelines run on push and MR by ns-3 on GitLab.
The jobs are written to be as 1:1 as possible with the code run on ns-3's GitLab, limiting compilation to this module (and dependencies) when possible to reduce time and storage used by the workflow. Jobs that do not involve the module (Cppyy, documentation for installation, manual, contributing and tutorial) are not reproduced.

See a complete run of the workflow here, and compare with an example on ns-3's GitLab here.

This workflow runs the following checks/jobs organized in stages:

  1. code-format:
    • run clang-format validation (version 14, 15 and 16)
    • check for leftover emacs lines
    • run spell-checking on the code
  2. compile lorawan module and dependencies:
    • build and run tests with g++ on default mode
    • build and run tests with g++ on optimized mode
    • build with g++ on debug mode
    • build with clang++ on debug mode
    • build with g++ on debug mode and --disable-asserts option
    • build with g++ on debug mode and --disable-precompiled-headers option
  3. run code linting with clang-tidy (version 16)
  4. documentation related:
    • check doxygen coverage (but not make it! as in the ns-3 pipeline)
    • build modules docs in html and latexpdf

Notes:

  • the workflow will trigger on push or MR only on the develop branch. This allows development on other branches without running the workflow on each push
  • to speed up the jobs, we use a shared cache for jobs that build ns-3 with the same parameters (compiler and mode). So, compile jobs will take some time the first time they are run, but from the second run they will be much faster (see for instance the lengths here)
  • jobs that build will save the build-files as an artifact that can be consulted by developers afterwards (artifacts have 2-day lifetime)
  • tests mentioned above are run only if code changes are detected by ccache
  • currently the jobs for emacs-lines, spell-checking and doxygen coverage are allowed to fail because currently the code does not satisfy them
  • the whole workflow should not take much more than 1.5h in the worst case scenario (no build cache)

Possible addition to the ns-3 pipeline:

  • job to deploy the module documentation on another repo
  • job to check the module test coverage

If needed I can also implement daily/weekly ns-3 pipelines, but for that I think using a GitLab ns-3 fork with the lorawan submodule is better (as previously proposed by @pagmatt here #138).

@pagmatt
Copy link
Member

pagmatt commented Oct 25, 2023

Thanks for your efforts!
I sent you a request for acquiring write access to the repo, so that you can create branches and open MRs directly from this repository, so that I can better track (and possibly push myself) changes.

In terms of the MR itself, I am not a big fan of introducing tests we do not pass. Therefore, I'd like to fix the ones that currently fail before merging this.
Edit: Took the freedom of fixing some of the failing tests myself, there seems to be issues with the cache system though (which look unrelated to my changes)

@non-det-alle
Copy link
Collaborator Author

I agree with you that before merging we should first fix the code that makes the tests fail.
This can stay open until we have done that.

Shouldn't we have a separate MR for each problem to keep the history clean?
You could do separate single-commit MRs for emacs-lines and spelling errors, respectively.
Than I can review them so we always have 2 people checking MRs (as they do for ns-3 on GitLab).

Yesterday I was also able to fix the problems that caused caused building the doxygen documentation to fail.
I'll open another MR with that too, and maybe then we can add a final workflow job to deploy the docs.

Unfortunately, the doxygen test itself (coverage != from building) is bound to fail until we comment all the code, which could be a long process.

(I'll keep this MR updated with changes and make sure it works)

@pagmatt
Copy link
Member

pagmatt commented Oct 25, 2023

As long as changes are rather small (such as removing the emacs lines), I'd keep everything in this MR.
I agree that bigger changes (such as documenting the module) are better suited to separate MRs

@non-det-alle
Copy link
Collaborator Author

But if we tie these corrections to the present MR, they will not be integrated until we are satisfied with the workflow pipeline tests. Also they are a bit out of scope for this MR which is already quite big.
Personally, I think that creating a single-commit MR even for small changes (as @albertogara was saying too) helps us better keep track of everything.
I know it can be tedious but it'll come in handy in the long term. I can open myself these MRs by rebasing your commits on the current state of the repo.

@pagmatt
Copy link
Member

pagmatt commented Oct 25, 2023

I personally feel like this is a bit overkill, but feel free to go ahead if you truly deem it necessary

@non-det-alle
Copy link
Collaborator Author

non-det-alle commented Oct 25, 2023

On the topic of helping you pull and review the code, I'm sorry if I did not create a branch here for #144 and #145.
From now on I'll publish my branches here before creating an MR to the 'develop' branch.

@pagmatt
Copy link
Member

pagmatt commented Oct 25, 2023

On the topic of helping you pull and review the code, I'm sorry if I did not create a branch here for #144 and #145. From now on I'll publish my branches here before creating an MR to the 'develop' branch.

Thanks, and no worries :)

@non-det-alle
Copy link
Collaborator Author

non-det-alle commented Nov 2, 2023

there seems to be issues with the cache system though (which look unrelated to my changes)

After a bit of investigation, it seems that the problem is due to ccache and the -march=native flag of the optimized ns-3 compilation mode (see Equivalent build profiles table). The -march=native flag expands into a number of compilation parameters depending on the hardware architecture.

Currently we are splitting building and running tests into two subsequent jobs, that Subsequent runs of the build jobs may be executed by GitHub on two runners with different architectures. This is not supported by ccache that only stores the -march=native flag and not its expansion, thus failing when the cache is reused on a different system/runner. Also, GitHub provides no control over the architecture to be used by runners.

A perfect solution would be use self-hosted runners. For now, I propose we test the release mode instead of the optimized one, although losing the ability to test some compiler optimizations.

@pagmatt
Copy link
Member

pagmatt commented Nov 2, 2023

there seems to be issues with the cache system though (which look unrelated to my changes)

After a bit of investigation, it seems that the problem is due to ccache and the -march=native flag of the optimized ns-3 compilation mode (see Equivalent build profiles table). The -march=native flag expands into a number of compilation parameters depending on the hardware architecture.

Currently we are splitting building and running tests into two subsequent jobs, that may be executed by GitHub on two runners with different architectures. This is not supported by ccache that only stores the -march=native flag and not its expansion, thus failing when the cache is reused on a different system/runner. Also, GitHub provides no control over the architecture to be used by runners.

A perfect solution would be use self-hosted runners. For now, I propose we test the release mode instead of the optimized one, although losing the ability to test some compiler optimizations.

From my own personal experience, optimized builds usually lead to stricter compilation checks (especially with gcc). For this reason, I am against removing the corresponding builds from the pipeline. Instead, I suggest either switching to a self-hosted runner, or remove ccache altogether (despite the inevitable performance degradation). Have you tried asking ns-3 maintainers (Gabriel, specifically) whether they're aware of a more elegant solution for avoiding this false positives ?

@non-det-alle
Copy link
Collaborator Author

non-det-alle commented Nov 2, 2023

(I made a correction to my previous comment to clarify that the issue is between subsequent optimized builds, not between the build & test stages)

My understanding is that on GitLab they are using self-hosted runners, while on the GitHub copy they only adopt the release build mode.

I don't know where we could deploy self-hosted runners, but I think we can avoid using ccache for optimized jobs without particular drawbacks (we already have a parallel job with --disable-precompiled-headers that does not use ccache, so the duration of the pipeline should not increase).

I will test that and push it here.

I also added the code analysis jobs from the ns-3 GitHub pipeline (CodeQL & CodeCov) at the end of our pipeline and a manual job to deploy the documentation. For these new ones we may need to add a couple of secrets to the repository if not present already. Namely:

@pagmatt
Copy link
Member

pagmatt commented Nov 2, 2023

I do not think we have availability of computational resources for deploying a self-hosted runner at my home uni, so yeah let's go for disabling ccache for now and let's follow the ccache issue, hopefully they'll deploy a fix. Regarding the Code* jobs, I think we could skip them tbh, agree with the test deployment test. Once you will push we can double check which secrets are to be added, I can take care of that.

@non-det-alle
Copy link
Collaborator Author

non-det-alle commented Nov 2, 2023

The CodeQL is a standard GitHub test that searches for known bug-prone pieces of code (see an example https://github.com/non-det-alle/lorawan/security/code-scanning/1). The CodeCov one is really useful to keep track of test coverage (see https://app.codecov.io/gh/non-det-alle/lorawan/tree/github-ci-dev). I would keep them, they make a good code analysis tool causing no harm.

@codecov-commenter
Copy link

Welcome to Codecov 🎉

Once merged to your default branch, Codecov will compare your coverage reports and display the results in this comment.

Thanks for integrating Codecov - We've got you covered ☂️

@github-advanced-security
Copy link

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

@non-det-alle
Copy link
Collaborator Author

I pushed the 2nd version of the CI pipeline with bug fixes and the discussed changes.

The first run went as expected. As you can see, to obviate the problems introduced by different runners having different underlying architectures, now the 'running tests' is an optional step of the base per-commit compile job. Moreover, optimized compilation does not use ccache for the same reason.

It is safe to say that the CI pipeline has been thoroughly tested on my side, with almost 200 runs in the past 3 weeks. Apparently, there is no limit on the total amount of workflow run-hours for public repositories, while the cache is limited to 10GB. You'll see that each run produces timestamped caches: that's the way to do it as you cannot rewrite existing caches. Older caches will be deleted automatically once we reach the 10GB limit.

Finally, it seems that CodeCov was already enabled by someone else in the past for this repo (@DvdMgr ?). For instance, you can see here the current code coverage relative to this merge.

Deploying the documentation is a manually-activated workflow included in this commit. I will push a small refactoring of the main CI pipeline and temporarily put the deploy-doc workflow as activated on pull_request such that we can test it in this context.

@pagmatt
Copy link
Member

pagmatt commented Nov 3, 2023

  • Why is the 'running tests' an optional step of the base per-commit compile job now? Shouldn't it always work now that ccache is disabled ?
  • I would set the doc deployment as an automatic job for each MR at least
  • Just a suggestion: avoid continuously rebasing your commits, as it's hard to keep track of the changes you introduce right now. We can just squash and merge after all, or rebase once the MR is approved before merging if you prefer

@non-det-alle
Copy link
Collaborator Author

non-det-alle commented Nov 3, 2023

  • Why is the 'running tests' an optional step of the base per-commit compile job now? Shouldn't it always work now that ccache is disabled ?

It is an optional step because not all compile jobs also run tests in the the ns-3 per-commit pipeline. Here, tests are only run for the g++ default and optimized builds. It has been the case since the beginning, and it is not related to jobs using (or not) ccache. I think the confusion comes from a further optimization that is present in the pipeline: if the C++ code was not modified by the push/MR (and we check this by looking at the ccache miss rate), tests are skipped. The difference from before then it is that for the optimized build tests are run even if the C++ code was unchanged, losing this pipeline optimization.

  • I would set the doc deployment as an automatic job for each MR at least

If we set it to trigger on MRs, I think people would be able to change the documentation without maintainer approval by submitting pull requests? I'll try to set it to run once a MR is closed, and merged with the repo.

  • Just a suggestion: avoid continuously rebasing your commits, as it's hard to keep track of the changes you introduce right now. We can just squash and merge after all, or rebase once the MR is approved before merging if you prefer

No problem. I was force pushing the one commit because I saw in a previous MR that if we rebase multiple commits before merging we effectively lose the changes in-between commits, while with the force pushes GitHub lets you check the changes (I have a Compare button aside the force push in the MR timeline).

@pagmatt
Copy link
Member

pagmatt commented Nov 3, 2023

If we set it to trigger on MRs, I think people would be able to change the documentation without maintainer approval by submitting pull requests? I'll try to set it to run once a MR is closed, and merged with the repo.

Valid point, I am not sure what would happen to be honest. Let's keep it to commit on the main branch then

@non-det-alle
Copy link
Collaborator Author

non-det-alle commented Nov 3, 2023

Currently the deploy-doc job is failing during the push-to-another-repository action with no clear error. I think possibly the API_TOKEN_GITHUB secret may be missing?

Run cpina/github-action-push-to-another-repository@v1
  with:
    source-directory: doc/
    destination-github-username: signetlabdei
    destination-repository-name: lorawan-docs
    user-email: alleaimi95@gmail.com
    target-branch: master
  env:
    API_TOKEN_GITHUB: ${{ secrets.API_TOKEN_GITHUB }}

It must be set to a personal access token with push permissions onto the lorawan-docs repository.

Note: if this jobs succeeds here, a security problem still remains because somebody else could open a MR with a workflow pushing onto lorawan-docs.

@pagmatt
Copy link
Member

pagmatt commented Nov 3, 2023

I setup an SSH deploy key, which is the suggested, more secure option according to the action's documentation and modified the workflow accordingly (I will possibly push more fixes based on the test outcome)

Edit: regarding the fact that the doc will be deployed only for MRs opened from the repo itself, this seem to be a "standard" issue, so I believe we could simply provide the suggestion of opening MRs by opening a branch in this repository instead.. (or trigger the build manually, and on actual merge of the commit on the repo, which may be more secure)

@non-det-alle
Copy link
Collaborator Author

When you are done, I have prepared a trigger configuration for the deploy-doc as follows:

  • on MR: start only once the MR is closed and merged in the repository
  • on direct push: start at the end of the whole per-commit pipeline

@pagmatt
Copy link
Member

pagmatt commented Nov 8, 2023

The doc deployment should work now, I believe it still does not because you opened the MR from your fork, thus the DEPLOY_KEY secret can not found. However, it should work from the this repository

@non-det-alle
Copy link
Collaborator Author

non-det-alle commented Nov 8, 2023

I also think that could be the issue. If that's the case, it could be a good thing because it means people cannot use MRs to push onto lorawan-docs.

If you want to test this hypothesis beforehand, I can move deploy-doc to another branch (of this repo) and open a separate MR.

Otherwise I'll proceed to push the correct triggers for deploy-doc and squash the commits of this MR before you merge it.

@pagmatt
Copy link
Member

pagmatt commented Nov 8, 2023

I also think that could be the issue. If that's the case, it could be a good thing because it means people cannot use MRs to push onto lorawan-docs.

If you want to test this hypothesis beforehand, I can move deploy-doc to another branch (of this repo) and open a separate MR.

Otherwise I'll proceed to push the correct triggers for deploy-doc and squash the commits of this MR before you merge it.

I'd opt for the second option, and then I can fix the CI afterwards in case the secret is still not found (but I expect it to work 🤞🏻 )

+ code analysis jobs
+ docs deployment job
@non-det-alle
Copy link
Collaborator Author

Ok, it's done. Now the deploy-doc job should run once you merge this MR. Maybe let's have the on-commit pipeline terminate once again before merging, just to avoid surprises.

As a next step, I'd review the readme and add new CI badges (build, coverage) in place of the old ones (gitter chat, travis build).

@pagmatt pagmatt merged commit a20d30a into signetlabdei:develop Nov 9, 2023
17 of 18 checks passed
@non-det-alle
Copy link
Collaborator Author

Unfortunately the deploy-doc job is still failing. Maybe revert to using a TOKEN?

Looking at the merge outcome I also have another observation: I did not expect the per-commit pipeline to run again once the merge completes. We can try to replicate the rule from the GitLab pipeline to avoid it:

workflow:
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH && $CI_OPEN_MERGE_REQUESTS
      when: never
    - if: $CI_COMMIT_BRANCH

I'll work on this second point.

@non-det-alle non-det-alle deleted the github-ci branch November 9, 2023 10:55
@pagmatt
Copy link
Member

pagmatt commented Nov 9, 2023

Perhaps it is still picking up the commit as a MR from a fork? I am basically following 1:1 the tutorial, and the secret is there in the repo.. I'm kind of at a loss on understanding why the job is still failing.

Ok regarding the second point, perhaps we can try that first, to check whether a MR opened from a branch in this repository is finally able to pick up the deploy secret ?

@non-det-alle
Copy link
Collaborator Author

non-det-alle commented Nov 9, 2023

I run the deploy-doc job manually and it worked!

Now we need to understand why we are not able to do it automatically. I understand when it was failing as a MR; the strangest thing for me it is that it did not work here, where the pipeline was triggered by you merging the MR (it counts as a push) and it should have been fully internal to this repository.

I will do a push test and merge test to see if it is still failing. I predict that the direct push works, but that the merge may still have the same issue...

Edit: I did not realize I don't have permissions for a direct push on develop. I'll test a new MR first.

@pagmatt
Copy link
Member

pagmatt commented Nov 9, 2023

I believe that even though the commit counts as a push, it is still different than a non-MR-derived push and it encodes the fact that it originated from an MR, perhaps Github is not providing access to the secret anyway?
Let's check with an MR from a branch on this repository first.
P.S. Gave you write access

@non-det-alle
Copy link
Collaborator Author

Found the problem: https://docs.github.com/en/actions/using-workflows/reusing-workflows#passing-inputs-and-secrets-to-a-reusable-workflow
That's why it was working on manually running the deploy-doc workflow.

I squashed the fix + all CI related commits I used for testing into the one that merges this PR.

We would also be missing a CODECOV_TOKEN repo secret. If you can get admin access to this page (you need to login with the repo owner GitHub account), then you should find the token value in the Settings tab.

@pagmatt
Copy link
Member

pagmatt commented Nov 10, 2023

Thanks for the find. Please avoid force pushing on main branches though, as it messes up the git history for everyone who cloned the repo.
I'll look into the CODECOV_TOKEN as well soon

@non-det-alle
Copy link
Collaborator Author

non-det-alle commented Nov 10, 2023

First complete working pipeline

Sorry about force pushing, I was trying to keep the git history clean and I usually do it with rebase + force-push. Do you now a better way? For instance, I'd like to commit some minor changes (adding names to sub-workflows) that are fit to be squashed into the initial CI commit. Should I leave it as a separate commit?

In general, in the future I'll try to avoid pushing directly. I think it's better to work on forks and MRs that you can then review and merge.

We are almost done! 🎉
(there are some possible future optimizations related to avoid running some jobs when there are no changes in the C++ code, but for now I'd say its ok)

@pagmatt
Copy link
Member

pagmatt commented Nov 10, 2023

Good job! Finally we have a proper CI pipeline

Regarding the push, no worries, I just think that leaving a separate commit is the way to go. I agree that it may lead to a more convoluted git history, but force-pushing on branches that people clone is usually a bad practice AFAIK and is thus avoided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants