-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds gitlab documentation #70
Conversation
{% raw %} | ||
```yaml | ||
variables: | ||
SANITIZER: "address" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we can make use of matrix builds similar to GitHub, like this, which is quite nice because it avoids job duplication:
parallel:
matrix:
- SANITIZER: [address, undefined, memory] # customize to your needs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed.
I wanted to have a minimal config for this doc first.
So adding this tip afterwards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should just copy what the github docs do here: https://github.com/google/clusterfuzzlite/blob/main/docs/running-clusterfuzzlite/github_actions.md?plain=1#L61
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, nit but are the quotes needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit but are the quotes needed here?
Indeed not
tags: | ||
- fuzz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave these out of the example, they are quite deployment-specific and won't work on gitlab.com shared runners out of the box.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I need these on some hosted gitlab.
Changing this in next commit
- export CFL_CONTAINER_ID=`cut -c9- < /proc/1/cpuset` | ||
script: | ||
# will build and run the fuzzers | ||
- python3 "/opt/oss-fuzz/infra/cifuzz/cifuzz_combined_entrypoint.py" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Prow integration uses a different container with the combined entrypoint gcr.io/k8s-testimages/clusterfuzzlite:latest
. Is it ok to use the build_fuzzers
image for running the fuzzers, too, and what is the difference between the build_fuzzers
, run_fuzzers
and clusterfuzzlite:latest
images in this regard? That is maybe more of a question for @jonathanmetzman.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving this to Johnathan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The images are exactly the same (for now) except for their entrypoints.
I'm not sure this is something I want to support and I don't like the fact that this is using build_fuzzers for running fuzzers.
I can publish a cifuzz-combined image that uses the combined entrypoint to support both.
but let me explain why we have different images.
On github, having two images makes it easy to seperate build logs from run logs.
See how in these logs you can click to view the build step and run step seperately.
Now, the drawback of this approach is that I think it has made the interface more complex, instead of having to invoke Cifuzz once it must be invoked twice. See how in this example two steps must be specified: https://google.github.io/clusterfuzzlite/running-clusterfuzzlite/github-actions/#pr-fuzzing
In hindsight, I think it might be a small mistake to have seperated the two actions. So the question is, should we have separate actions on gitlab as well (which will make things consistent at least?)
@oliverchang WDYT of all this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. As I mentioned, I think 2 actions/images makes the interface for users more complicated, but I guess that interface is only used once while the seperation provides benefits that are experienced many times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply. I think we should try to be as consistent as we can and keep the two separate entrypoints for gitlab as well. Does this add significantly more complexity to the gitlab integration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@catenacyber can you please do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into this, I think this adds significant complexity, compared to already integrated CI systems.
We need to share the fuzzers built by clusterfuzzlite-build-fuzzers
with clusterfuzzlite-run-fuzzers
As these are 2 different docker images, we need 2 different GitLab jobs. So they do not share any workspace...
To share data between jobs (no matter if they belong to the same pipeline or not), we either need cache or artifacts.
I am trying the solution where clusterfuzzlite-build-fuzzers
ends with cp -r $CI_BUILDS_DIR/$CI_JOB_ID/build-out/ $CI_PROJECT_DIR/artifacts/build-out/$SANITIZER
and clusterfuzzlite-run-fuzzers
begins with mv $CI_PROJECT_DIR/artifacts/build-out/$SANITIZER $CI_BUILDS_DIR/$CI_JOB_ID/build-out
Anyways, the build-out
contents needs to be put in a specific directory under the $CI_PROJECT_DIR
aka project_src_path
. And we need to differentiate between the different sanitizers builds
This can work also with cache instead of artifacts.
Note that I need to use $CI_BUILDS_DIR/$CI_JOB_ID
as gitlab does not know $WORKSPACE
Another solution would be to bring back the workspace in the project directory... But we would still need to differentiate between the sanitizers...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
But then, the `.gitlab-ci.yml` should be different, and explicitly call the `docker` commands | ||
on clusterfuzz-lite images. | ||
|
||
Docker-in-Docker does not seem possible as clusterfuzz-lite images |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClusterFuzzLite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't possible with these images but it is possible in general. The prow image does this.
In anycase though, I'd just get rid of this line, is it even worth mentioning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@securitykernel managed to get Docker-in-Docker to work.
I did not yet, having a problem with setting DOCKER_HOST
to the right value on my setup...
So, removing this, and hoping to add how to setup with Docker-in-Docker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want docker-in-docker?
- I think it's slower.
- It's different from how things are done on GH, so it's not the main supported way we have of doing things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we want it.
It is one of the 3 supported modes to use docker in Gitlab CI cf https://docs.gitlab.com/ee/ci/docker/using_docker_build.html
{% raw %} | ||
```yaml | ||
variables: | ||
SANITIZER: "address" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should just copy what the github docs do here: https://github.com/google/clusterfuzzlite/blob/main/docs/running-clusterfuzzlite/github_actions.md?plain=1#L61
{% raw %} | ||
```yaml | ||
variables: | ||
SANITIZER: "address" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, nit but are the quotes needed here?
|
||
## Extra configuration | ||
|
||
### Gitlab artifacts filestore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for going through the trouble of setting this up.
I have some high level comments.
- This section seems to offer two options: 1. ??? and 2. using a personal access token.
Can you tell me the tradeoff between the two and we will decide which one to recommend and then we won't even mention the other option. - In the github docs we have screenshots for setting up a personal acccess token (https://google.github.io/clusterfuzzlite/running-clusterfuzzlite/github-actions/#storage-repo). If we go that route, can you add screenshots here?
- Can you add screenshots and explanation for downloading artifacts please?
- This section makes artifacts seem optional, why should they be optional?
- The reason we use the git filestore on github in addition to actions is two-fold: 1. it makes browsing coverage reports on the web possible. 2. It makes it possible for two batch fuzzing jobs to save to the corpus at once (side note: @oliverchang: have we ever tested this? Does batch fuzzing force push so that committing works even if its behind HEAD?). Should we do the same on gitlab?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re parallel pushing: we auto rebase and retry when push fails to make this work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you tell me the tradeoff between the two and we will decide which one to recommend and then we won't even mention the other option.
The downside of the artifacts is that you need to set up an API token
The downside of the cache is that you should ensure that runners share the cache
I think I gather that you want in the end a gitlab
filestore that will
- push crashes as jobs artifacts
- push and pull coverage reports in another git repo
- push and pull builds from a cache
- push and pull corpus from a cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use cache and artifacts? Why not just one?
Why use git repo for reports? Does gitlab have something like github pages?
Sorry if these are dumb questions/answered elsewhere.
``` | ||
{% endraw %} | ||
|
||
You should then define two [schedules](https://docs.gitlab.com/ee/ci/pipelines/schedules.html) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe have some screenshots walk through an example of setting this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that once the rest is good for you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
- export CFL_CONTAINER_ID=`cut -c9- < /proc/1/cpuset` | ||
script: | ||
# will build and run the fuzzers | ||
- python3 "/opt/oss-fuzz/infra/cifuzz/cifuzz_combined_entrypoint.py" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The images are exactly the same (for now) except for their entrypoints.
I'm not sure this is something I want to support and I don't like the fact that this is using build_fuzzers for running fuzzers.
I can publish a cifuzz-combined image that uses the combined entrypoint to support both.
but let me explain why we have different images.
On github, having two images makes it easy to seperate build logs from run logs.
See how in these logs you can click to view the build step and run step seperately.
Now, the drawback of this approach is that I think it has made the interface more complex, instead of having to invoke Cifuzz once it must be invoked twice. See how in this example two steps must be specified: https://google.github.io/clusterfuzzlite/running-clusterfuzzlite/github-actions/#pr-fuzzing
In hindsight, I think it might be a small mistake to have seperated the two actions. So the question is, should we have separate actions on gitlab as well (which will make things consistent at least?)
@oliverchang WDYT of all this?
@jonathanmetzman you can take a look at GitLab Pages output here : Is it good to have everything public or should we restrict it to what is in If everything is good for you, I can make the screenshots |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this basically lgtm.
Only thing left is:
- decide if we want to use one image or two
- fix nits
- Let clusterfuzzlite integration of gitlab oss-fuzz#7073 land before merging this.
- export CFL_CONTAINER_ID=`cut -c9- < /proc/1/cpuset` | ||
script: | ||
# will build and run the fuzzers | ||
- python3 "/opt/oss-fuzz/infra/cifuzz/cifuzz_combined_entrypoint.py" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. As I mentioned, I think 2 actions/images makes the interface for users more complicated, but I guess that interface is only used once while the seperation provides benefits that are experienced many times.
No need to make it private IMO. |
The screenshots are available here : Otherwise nits are fixed, so should be good. |
The links already pointed to these urls in gitlab.md ;-) |
Hi there, sorry for being late but while I was working with this integration I had a problem concerning the corpus feature. maybe It should be mentioned in the docs that the folder name of the external corpus git repository must be "corpus/{fuzztargetname}". It has to follow this structure or the gitlab integration will not find the necessary corpus files. Best regards |
Is this for manually adding elements to the corpus? |
You can also use a seed corpus ad on oss-fuzz with a |
For google/oss-fuzz#7073
cc @jonathanmetzman