Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds gitlab documentation #70

Merged
merged 12 commits into from
Jan 26, 2022
Merged

Adds gitlab documentation #70

merged 12 commits into from
Jan 26, 2022

Conversation

catenacyber
Copy link
Contributor

docs/running-clusterfuzzlite/gitlab.md Outdated Show resolved Hide resolved
{% raw %}
```yaml
variables:
SANITIZER: "address"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we can make use of matrix builds similar to GitHub, like this, which is quite nice because it avoids job duplication:

parallel:
    matrix:
      - SANITIZER: [address, undefined, memory] # customize to your needs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.
I wanted to have a minimal config for this doc first.
So adding this tip afterwards

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, nit but are the quotes needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit but are the quotes needed here?

Indeed not

Comment on lines 66 to 67
tags:
- fuzz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave these out of the example, they are quite deployment-specific and won't work on gitlab.com shared runners out of the box.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I need these on some hosted gitlab.
Changing this in next commit

- export CFL_CONTAINER_ID=`cut -c9- < /proc/1/cpuset`
script:
# will build and run the fuzzers
- python3 "/opt/oss-fuzz/infra/cifuzz/cifuzz_combined_entrypoint.py"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Prow integration uses a different container with the combined entrypoint gcr.io/k8s-testimages/clusterfuzzlite:latest. Is it ok to use the build_fuzzers image for running the fuzzers, too, and what is the difference between the build_fuzzers, run_fuzzers and clusterfuzzlite:latest images in this regard? That is maybe more of a question for @jonathanmetzman.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this to Johnathan

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The images are exactly the same (for now) except for their entrypoints.
I'm not sure this is something I want to support and I don't like the fact that this is using build_fuzzers for running fuzzers.
I can publish a cifuzz-combined image that uses the combined entrypoint to support both.
but let me explain why we have different images.
On github, having two images makes it easy to seperate build logs from run logs.
See how in these logs you can click to view the build step and run step seperately.
Now, the drawback of this approach is that I think it has made the interface more complex, instead of having to invoke Cifuzz once it must be invoked twice. See how in this example two steps must be specified: https://google.github.io/clusterfuzzlite/running-clusterfuzzlite/github-actions/#pr-fuzzing

In hindsight, I think it might be a small mistake to have seperated the two actions. So the question is, should we have separate actions on gitlab as well (which will make things consistent at least?)

@oliverchang WDYT of all this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point now. And I agree that it looks nicer in the logs. In GitLab the log view is different. You would have to specify two jobs per sanitizer, effectively it would look like this (shows 8 jobs):

image

A click on each job leads to the job log.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. As I mentioned, I think 2 actions/images makes the interface for users more complicated, but I guess that interface is only used once while the seperation provides benefits that are experienced many times.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply. I think we should try to be as consistent as we can and keep the two separate entrypoints for gitlab as well. Does this add significantly more complexity to the gitlab integration?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@catenacyber can you please do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into this, I think this adds significant complexity, compared to already integrated CI systems.

We need to share the fuzzers built by clusterfuzzlite-build-fuzzers with clusterfuzzlite-run-fuzzers
As these are 2 different docker images, we need 2 different GitLab jobs. So they do not share any workspace...
To share data between jobs (no matter if they belong to the same pipeline or not), we either need cache or artifacts.

I am trying the solution where clusterfuzzlite-build-fuzzers ends with cp -r $CI_BUILDS_DIR/$CI_JOB_ID/build-out/ $CI_PROJECT_DIR/artifacts/build-out/$SANITIZER and clusterfuzzlite-run-fuzzers begins with mv $CI_PROJECT_DIR/artifacts/build-out/$SANITIZER $CI_BUILDS_DIR/$CI_JOB_ID/build-out
Anyways, the build-outcontents needs to be put in a specific directory under the $CI_PROJECT_DIR aka project_src_path. And we need to differentiate between the different sanitizers builds

This can work also with cache instead of artifacts.
Note that I need to use $CI_BUILDS_DIR/$CI_JOB_ID as gitlab does not know $WORKSPACE
Another solution would be to bring back the workspace in the project directory... But we would still need to differentiate between the sanitizers...

docs/running-clusterfuzzlite/gitlab.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@jonathanmetzman jonathanmetzman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

docs/running-clusterfuzzlite/gitlab.md Outdated Show resolved Hide resolved
But then, the `.gitlab-ci.yml` should be different, and explicitly call the `docker` commands
on clusterfuzz-lite images.

Docker-in-Docker does not seem possible as clusterfuzz-lite images
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClusterFuzzLite

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't possible with these images but it is possible in general. The prow image does this.
In anycase though, I'd just get rid of this line, is it even worth mentioning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@securitykernel managed to get Docker-in-Docker to work.
I did not yet, having a problem with setting DOCKER_HOST to the right value on my setup...

So, removing this, and hoping to add how to setup with Docker-in-Docker

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want docker-in-docker?

  1. I think it's slower.
  2. It's different from how things are done on GH, so it's not the main supported way we have of doing things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we want it.
It is one of the 3 supported modes to use docker in Gitlab CI cf https://docs.gitlab.com/ee/ci/docker/using_docker_build.html

{% raw %}
```yaml
variables:
SANITIZER: "address"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{% raw %}
```yaml
variables:
SANITIZER: "address"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, nit but are the quotes needed here?

docs/running-clusterfuzzlite/gitlab.md Outdated Show resolved Hide resolved

## Extra configuration

### Gitlab artifacts filestore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for going through the trouble of setting this up.
I have some high level comments.

  1. This section seems to offer two options: 1. ??? and 2. using a personal access token.
    Can you tell me the tradeoff between the two and we will decide which one to recommend and then we won't even mention the other option.
  2. In the github docs we have screenshots for setting up a personal acccess token (https://google.github.io/clusterfuzzlite/running-clusterfuzzlite/github-actions/#storage-repo). If we go that route, can you add screenshots here?
  3. Can you add screenshots and explanation for downloading artifacts please?
  4. This section makes artifacts seem optional, why should they be optional?
  5. The reason we use the git filestore on github in addition to actions is two-fold: 1. it makes browsing coverage reports on the web possible. 2. It makes it possible for two batch fuzzing jobs to save to the corpus at once (side note: @oliverchang: have we ever tested this? Does batch fuzzing force push so that committing works even if its behind HEAD?). Should we do the same on gitlab?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re parallel pushing: we auto rebase and retry when push fails to make this work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you tell me the tradeoff between the two and we will decide which one to recommend and then we won't even mention the other option.

The downside of the artifacts is that you need to set up an API token
The downside of the cache is that you should ensure that runners share the cache

I think I gather that you want in the end a gitlab filestore that will

  • push crashes as jobs artifacts
  • push and pull coverage reports in another git repo
  • push and pull builds from a cache
  • push and pull corpus from a cache

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use cache and artifacts? Why not just one?
Why use git repo for reports? Does gitlab have something like github pages?
Sorry if these are dumb questions/answered elsewhere.

```
{% endraw %}

You should then define two [schedules](https://docs.gitlab.com/ee/ci/pipelines/schedules.html)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe have some screenshots walk through an example of setting this up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that once the rest is good for you

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

- export CFL_CONTAINER_ID=`cut -c9- < /proc/1/cpuset`
script:
# will build and run the fuzzers
- python3 "/opt/oss-fuzz/infra/cifuzz/cifuzz_combined_entrypoint.py"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The images are exactly the same (for now) except for their entrypoints.
I'm not sure this is something I want to support and I don't like the fact that this is using build_fuzzers for running fuzzers.
I can publish a cifuzz-combined image that uses the combined entrypoint to support both.
but let me explain why we have different images.
On github, having two images makes it easy to seperate build logs from run logs.
See how in these logs you can click to view the build step and run step seperately.
Now, the drawback of this approach is that I think it has made the interface more complex, instead of having to invoke Cifuzz once it must be invoked twice. See how in this example two steps must be specified: https://google.github.io/clusterfuzzlite/running-clusterfuzzlite/github-actions/#pr-fuzzing

In hindsight, I think it might be a small mistake to have seperated the two actions. So the question is, should we have separate actions on gitlab as well (which will make things consistent at least?)

@oliverchang WDYT of all this?

docs/running-clusterfuzzlite/gitlab.md Outdated Show resolved Hide resolved
docs/running-clusterfuzzlite/gitlab.md Show resolved Hide resolved
@catenacyber
Copy link
Contributor Author

@jonathanmetzman you can take a look at GitLab Pages output here :
https://catenacyber.gitlab.io/suricata-cfl/coverage/latest/report/linux/report.html

Is it good to have everything public or should we restrict it to what is in coverage/latest/report/ ?

If everything is good for you, I can make the screenshots

Copy link
Collaborator

@jonathanmetzman jonathanmetzman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this basically lgtm.
Only thing left is:

  1. decide if we want to use one image or two
  2. fix nits
  3. Let clusterfuzzlite integration of gitlab oss-fuzz#7073 land before merging this.

docs/clusterfuzzlite.md Show resolved Hide resolved
docs/running_clusterfuzzlite.md Show resolved Hide resolved
- export CFL_CONTAINER_ID=`cut -c9- < /proc/1/cpuset`
script:
# will build and run the fuzzers
- python3 "/opt/oss-fuzz/infra/cifuzz/cifuzz_combined_entrypoint.py"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. As I mentioned, I think 2 actions/images makes the interface for users more complicated, but I guess that interface is only used once while the seperation provides benefits that are experienced many times.

docs/running-clusterfuzzlite/gitlab.md Show resolved Hide resolved
docs/running-clusterfuzzlite/gitlab.md Outdated Show resolved Hide resolved
docs/running-clusterfuzzlite/gitlab.md Outdated Show resolved Hide resolved
docs/running-clusterfuzzlite/gitlab.md Outdated Show resolved Hide resolved
@jonathanmetzman
Copy link
Collaborator

jonathanmetzman commented Jan 11, 2022

@jonathanmetzman you can take a look at GitLab Pages output here : https://catenacyber.gitlab.io/suricata-cfl/coverage/latest/report/linux/report.html

Is it good to have everything public or should we restrict it to what is in coverage/latest/report/ ?

If everything is good for you, I can make the screenshots

No need to make it private IMO.
Yes, please make screenshots (keeping in mind that we might split up the action into 2 like on github)

@catenacyber
Copy link
Contributor Author

The screenshots are available here :
http://catenacyber.fr/gitlab-schedule-mode.png
http://catenacyber.fr/gitlab-project-token.png
http://catenacyber.fr/gitlab-variable-token.png

Otherwise nits are fixed, so should be good.
Waiting for your decision about splitting or not building and running...

@catenacyber
Copy link
Contributor Author

@jonathanmetzman jonathanmetzman merged commit 08da142 into google:main Jan 26, 2022
@CarpeDiem-CarpeNoctem
Copy link

Hi there,

sorry for being late but while I was working with this integration I had a problem concerning the corpus feature.

maybe It should be mentioned in the docs that the folder name of the external corpus git repository must be "corpus/{fuzztargetname}". It has to follow this structure or the gitlab integration will not find the necessary corpus files.

Best regards

@jonathanmetzman
Copy link
Collaborator

Hi there,

sorry for being late but while I was working with this integration I had a problem concerning the corpus feature.

maybe It should be mentioned in the docs that the folder name of the external corpus git repository must be "corpus/{fuzztargetname}". It has to follow this structure or the gitlab integration will not find the necessary corpus files.

Best regards

Is this for manually adding elements to the corpus?

@catenacyber
Copy link
Contributor Author

Hi there,
sorry for being late but while I was working with this integration I had a problem concerning the corpus feature.
maybe It should be mentioned in the docs that the folder name of the external corpus git repository must be "corpus/{fuzztargetname}". It has to follow this structure or the gitlab integration will not find the necessary corpus files.
Best regards

Is this for manually adding elements to the corpus?

You can also use a seed corpus ad on oss-fuzz with a {fuzztargetname}_seed_corpus.zip file ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants