Added Helm chart for Kubernetes support #17

timothyclarke · 2018-11-09T13:36:50Z

I noticed you have a dockerhub account. I haven't used travis-ci to build then push to any docker repo's, and it also looks like travic-ci does not have it's own docker repo.
I have assumed that travis-ci is primarily to ensure PR's a good and then build releases. to be executed outside of a docker container.
Using docker.com you can build both releases and more current builds. The Dockerfile in this PR written to create a container within the docker.com build system. It can also be built with docker build -t csp-collector:latest .
The build process will use the golang:1.10-alpine container to build the collector. As the build container is quite large (a few hundred MB), It will then copy the built asset to an alpine:3.8 container which is then published. The published container is about ~5MB

The kubernetes folder is a Helm chart to simplify the deployment into a k8s environment. While it has it's own README, anyone who's used helm should be able to use it easily.
Note the kubernetes chart assumes pull #10 will be merged.

If needed I can create a slightly more complex chart which

Has a second docker container for forwarding the reports
Has a folder which is shared between the collector and forwarder
Writes the json output to a file within a shared space for the above to pickup

I'm assuming you will use your hub.docker.com account to build this and set the image.repository in kubernetes/values.yaml to reflect that. I've also set the URI of this repo in the docker maintainer / Author label.

Please do the following steps prior to merging this PR. That way when this merges in it will trigger a build.

Apologies if you already know this. To get this building in hub.docker.com:

go to your account click Create -> Create Automated build.
Create a build from Github. It should list your repo's.
Select this repo.
I assume that the name will follow this repo. If you want to drop the "go-" from the stat then we'll need a minor update in the values file above.
Under the Short Description click the "Click here to customize"
Switch the second "Push Type" to "Tag". Leave the other fields as is.
Click Create

Now every time you merge anything into master you will get a "latest" tag. Any other tags that are created at github will trigger a build at dockerhub with contents of that tag.
Please feel free to take a look at https://hub.docker.com/r/timothyclarke/go-csp-collector Note as Master does not have a Dockerfile it's not generating a container with the latest tag.

My use case is to forward these reports to Graylog which is similar to splunk in functionality. The k8s chart I'm using has an extra cli option which supplies the URI of the graylog destination. It's slightly lighter weight for me than writing to a file and then having a second container within the pod

Dockerfile

kubernetes/Chart.yaml

kubernetes/README.md

kubernetes/templates/NOTES.txt

kubernetes/README.md

jacobbednarz · 2018-11-12T21:03:00Z

kubernetes/configMaps/filterlist.txt

@@ -0,0 +1,24 @@
+resource://


I'm not sure how I feel about this duplication. Since we merged in your last PR, there is already an example file that would be nice to somehow reuse.

I'm happy enough to remove it either from the top level or from here. If I removed it from here I'd need to provide more instructions in the README.md

copy the sample.filterlist.txt file from root of this repo into the configMaps then add to the custom values file.

Then remove the simple example in kubernetes/values.yaml. Directly re-using the root level sample.filterlist.txt would be a bad idea as the config map is written to take any file in the target directory and create a config map from it. This makes the chart more extensible by allowing other config files to be created for the kubernetes pod(s) by simply putting them into the same directory.

To make practical use of this collector you either need to extend the functionality in code, or add a second container to read the json output. You're then in a position of

What config options does that second container need?

As separate point surrounding the list and probably for a different PR
Should it be a block list or a filter list.
My thoughts are

For one reason or another these are values I do not want to know about. Do I want to reject the report, and as current generate a http 4xx status code, or do I want to simply ignore the reports and generate a http 202 status code or similar.

This comes down to, If someone opens the developer console of their browser should they see an accepted report for these or a error?
In my particular case we make use of google maps in part of some of our pages. Calling the map tile brings along a bunch of google specific fonts (see below) that we do not want. This is a known issue that sadly cannot be auctioned. As it tries to load about 3 or 4 fonts per tile it would swamp the reports and so are filtered out before being passed to the visualisation parts of the platform.

Perhaps both a rejected AND a filtered list is the way to go so some some will generate a 4xx code and others will be accepted, but ignored

Definitely a discussion for another PR but I'll give a quick background here to the existing functionality since it's something that is somewhat critical to this. The blockedURI list is intended as a mechanism to drop any reports that are completely unactionable and provide no use to collection. There are a multitude of reasons for these hitting the collector (browser silliness, browser extensions, malicious actors) and they are something outside of the control of the operator so this list ensures that they don't need to worry about it. In the first year of our CSP, we received along the lines of 70% unactionable violation reports which took up storage and compute resources.

This comes down to, If someone opens the developer console of their browser should they see an accepted report for these or a error?

The console won't show either. The CSP violation reports are in a similar fashion to UDP where the report is fire and forget without any reporting on whether or the payload was accepted.

This is a known issue that sadly cannot be auctioned. As it tries to load about 3 or 4 fonts per tile it would swamp the reports and so are filtered out before being passed to the visualisation parts of the platform.

I think your approach of loading in a file with the blocked URIs is a great use case for this. We also do sampling on the application itself (by only defining the report-uri directive for a percentage of requests) due to the sheer size of our application. Usually we launch a new directive change with a 0.001% as to not flood the log aggregator if something was missed. We later tune it up to a larger portion of the traffic.

kubernetes/configMaps/filterlist.txt

jacobbednarz · 2018-11-12T21:16:39Z

Apologies for the delays in coming back to this, I started reviewing last week but then started thinking about how we want to handle integrations like this and ended up on a bit of an information gathering dive.

On the surface, these are my thoughts:

Initially, I thought this was probably too specific for this project. Similar to what I outlined about visualisation in the README but I came to the thinking that this is still inline with the project, just helping people getting it setup in various configurations.
I don't think we want the kubernetes directory at the top level. This might limit us in the future when adding new functionality to the collector itself. I'm thinking we probably want this in a sub directory (something like "integrations") but then I'm not sure on that naming either. Perhaps "examples"?
Once that is decided on, we should update the main README with some references that outline the resources in that directory can be booted straight up from the directory.
I was also wondering if it would be beneficial to split the Dockerfile into it's own PR so that it wasn't blocked by the kubernetes discussion but I now think that's much of a muchness.

I've gone ahead and setup the Docker Hub repository configuration for this one so once it lands, we should be golden for automated builds.

timothyclarke · 2018-11-13T11:56:21Z

Initially, I thought this was probably too specific for this project. Similar to what I outlined about visualisation in the README but I came to the thinking that this is still inline with the project, just helping people getting it setup in various configurations.

Your responses have been quite positive and I think this one comes down to "Is this just a pet project that at best you'd prefer others to fork and extend to suit their needs, or is it one you want others to use as is"

I don't think we want the kubernetes directory at the top level. This might limit us in the future when adding new functionality to the collector itself. I'm thinking we probably want this in a sub directory (something like "integrations") but then I'm not sure on that naming either. Perhaps "examples"?

Can do, but it should be very clear. If I look at other projects that have gone in that direction most have a kubernetes directory so it's obvious and gives deployment examples. Failing that I'd say a (sub)header in the README so it's bold and stands out. If the project did not have either of them people would probably not notice it.

Once that is decided on, we should update the main README with some references that outline the resources in that directory can be booted straight up from the directory.

Agreed

I was also wondering if it would be beneficial to split the Dockerfile into it's own PR so that it wasn't blocked by the kubernetes discussion but I now think that's much of a muchness.

A separated and updated Dockerfile is in PR #18

I've gone ahead and setup the Docker Hub repository configuration for this one so once it lands, we should be golden for automated builds.

👍

jacobbednarz · 2018-11-13T21:14:39Z

Your responses have been quite positive and I think this one comes down to "Is this just a pet project that at best you'd prefer others to fork and extend to suit their needs, or is it one you want others to use as is"

Fortunately (or unfortunately depending on your viewpoint 😛 ) this has kind of already been decided with quite a few people adopting this as their production-ready CSP collector so I think it need to support the easy deployment method out of the box as much as possible.

Can do, but it should be very clear. If I look at other projects that have gone in that direction most have a kubernetes directory so it's obvious and gives deployment examples. Failing that I'd say a (sub)header in the README so it's bold and stands out. If the project did not have either of them people would probably not notice it.

Agree that it needs to be easily identifiable. I don't have a strong preference at this stage so happy to leave it as is. Doing a bit of research yesterday has yielded some results that /deployments is another recommended way of structuring this type of feature. I think that + some README docs would be a great start. Something like:

### Deployments

Currently supported deployment mechanisms:

- [kubernetes](/path/to/deployments/kubernetes/dir)

jacobbednarz · 2020-05-12T05:31:10Z

@timothyclarke Is this still something you're interested in getting over the line?

timothyclarke · 2020-05-12T07:42:42Z

Yes. I've moved on from the org where I introduced this, and will be introducing elsewhere.
Give me a few days and I'll rebase. Is there anything in particular that you / others are looking for in this?

jacobbednarz · 2020-05-12T21:17:11Z

Just doing a tidy up in here and this was still outstanding 🙂 After you get the Kubernetes stuff in place, I think the only outstanding comments were where the deployment directory was going to live and getting this up to date with master and the latest versions of tools.

timothyclarke · 2020-05-24T14:18:57Z

How's this?

README.md

deployments/kubernetes-helm/README.md