Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(collector): expose Google Cloud Storage collector from guacone CLI #989

Merged
merged 2 commits into from
Jun 28, 2023

Conversation

migmartri
Copy link
Contributor

@migmartri migmartri commented Jun 27, 2023

Exposes the Google Cloud Storage collector through guacone CLI.

This PR is in the context of chainloop-dev/chainloop#209. We want Chainloop to populate blob storage buckets with metadata that then Guac can import.

Peek 2023-06-27 16-54

In the linked issue, @lumjjb suggested exploring the existing GCS integration, and I believe this is a first step, to have an on-demand, client-side collection trigger for now. On another PR I could explore how to enable unattended, collection via a subscribed, collector handler.

$ guacone collect gcs
Error: accepts 1 arg(s), received 0
Usage:
  guacone collect gcs [flags] bucket_name

Examples:
guacone collect gcs my-bucket --gcs-credentials-path /secret/sa.json

Flags:
      --gcp-credentials-path string   Path to the Google Cloud service account credentials json file.
                                      Alternatively you can set GOOGLE_APPLICATION_CREDENTIALS=<path> in your environment.
  -h, --help                          help for gcs

Global Flags:
      --csub-addr string   address to connect to collect-sub service (default "localhost:2782")
      --gql-addr string    endpoint used to connect to graphQL server (default "http://localhost:8080/query")

The command can receive the credentials path from both a flag

$ guacone collect gcs test-guac --gcp-credentials-path service-account-devel.json 

or using default Google Cloud Client credentials env var

GOOGLE_APPLICATION_CREDENTIALS=service-account-devel.json guacone collect gcs test-guac

Some context on the implementation choices have been added inline as PR comments.

Implementation considerations

  • Since it's my first PR, I tried to follow as many patterns I could find, avoiding most refactoring efforts.

See the code below for example. This code is repeated in each collector and could get extracted (I think). But I don't feel with the authority nor with the enough code base understanding to perform a refactoring like that. Maybe in another PR down the line :)

processorFunc := getProcessor(ctx)
ingestorFunc := getIngestor(ctx)
collectSubEmitFunc := getCollectSubEmit(ctx, csubClient)
assemblerFunc := getAssembler(ctx, opts.graphqlEndpoint)
totalNum := 0
totalSuccess := 0
var filesWithErrors []string
gotErr := false
// Set emit function to go through the entire pipeline
emit := func(d *processor.Document) error {
totalNum += 1
start := time.Now()
docTree, err := processorFunc(d)

Collector package updates:

  • I moved the Google Cloud client generation outside to simplify testing of the module, and decouple the package from the client configuration. For example today it uses a credentials file but tomorrow the creds might come from something else.
  • I saw in some other collector that the options pattern was implemented, I moved the constructor to that pattern as well for consistency.

PR Checklist

DISCLAIMER: I didn't add tests to the new cobra command, I've not seen any tests on that package / layer so I am not sure what's the convention here.

  • If GraphQL schema is changed, make generate has been run
  • If collectsub protobuf has been changed, make proto has been run
  • All CI checks are passing (tests and formatting)
  • All dependent PRs have already been merged

Thank you!

@google-cla
Copy link

google-cla bot commented Jun 27, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Signed-off-by: Miguel Martinez Trivino <miguel@chainloop.dev>
@migmartri
Copy link
Contributor Author

Hi,

It seems that the CI test failed with what It seems a non related test case

=== RUN   TestNatsEmitter_PublishOnEmit_DeDuplication
    nats_emitter_test.go:181: expected only 1 document fetched
--- FAIL: TestNatsEmitter_PublishOnEmit_DeDuplication (2.04s)

In fact, I can't reproduce locally. Could this be some kind of racy condition/brittle test? I'll be happy to look into it more but I am afraid I need some guidance.

Thanks!

@pxp928
Copy link
Collaborator

pxp928 commented Jun 28, 2023

Hi,

It seems that the CI test failed with what It seems a non related test case

=== RUN   TestNatsEmitter_PublishOnEmit_DeDuplication
    nats_emitter_test.go:181: expected only 1 document fetched
--- FAIL: TestNatsEmitter_PublishOnEmit_DeDuplication (2.04s)

In fact, I can't reproduce locally. Could this be some kind of racy condition/brittle test? I'll be happy to look into it more but I am afraid I need some guidance.

Thanks!

I think a recent update from NATs has made this test a bit flaky. No change required on your PR. Created an issue to track: #990

@migmartri
Copy link
Contributor Author

Hi,
It seems that the CI test failed with what It seems a non related test case

=== RUN   TestNatsEmitter_PublishOnEmit_DeDuplication
    nats_emitter_test.go:181: expected only 1 document fetched
--- FAIL: TestNatsEmitter_PublishOnEmit_DeDuplication (2.04s)

In fact, I can't reproduce locally. Could this be some kind of racy condition/brittle test? I'll be happy to look into it more but I am afraid I need some guidance.
Thanks!

I think a recent update from NATs has made this test a bit flaky. No change required on your PR. Created an issue to track: #990

Thank you for taking a look at it!

Copy link
Collaborator

@pxp928 pxp928 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the PR

@migmartri
Copy link
Contributor Author

LGTM! Thanks for the PR

Awesome, thanks for the review!

FYI and in case you have any feedback, this is how the integration is looking on our side. chainloop-dev/chainloop#211 :)

Copy link
Contributor

@lumjjb lumjjb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nits, else LGTM! Thanks @migmartri !!!

cmd/guacone/cmd/gcs.go Show resolved Hide resolved
cmd/guacone/cmd/gcs.go Show resolved Hide resolved
cmd/guacone/cmd/gcs.go Outdated Show resolved Hide resolved
pkg/handler/collector/gcs/gcs.go Show resolved Hide resolved
Signed-off-by: Miguel Martinez Trivino <miguel@chainloop.dev>
@migmartri
Copy link
Contributor Author

minor nits, else LGTM! Thanks @migmartri !!!

Thanks for the review. I applied changes to both of your suggestions, thanks a lot!

Could you please give it a last check once you have a sec and kick the tires of the CI tests for me? (It seems that It can't run since it doesn't trust me :)

Thanks!!

@kodiakhq kodiakhq bot merged commit 5f261e9 into guacsec:main Jun 28, 2023
8 checks passed
@migmartri migmartri deleted the gcs-collect branch June 28, 2023 19:17
@migmartri
Copy link
Contributor Author

JFYI, this feature is now mentioned in our new Chainloop->Guac setup guide. Any feedback is welcome!

https://docs.chainloop.dev/guides/guac

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants