Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(index/fetch) Add index URL schemes support to fetch #301

Conversation

max-frank
Copy link
Contributor

Adding different fetchers per URL scheme allows falcoctl indices to be hosted on many different platforms.

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind feature

/kind flaky-test

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area library

/area cli

/area tests

/area examples

What this PR does / why we need it:

To pave way to support index hosting backends other than plain HTTP/S.

Which issue(s) this PR fixes:

First step to implementing #300

Special notes for your reviewer:

req, err := http.NewRequestWithContext(ctx, "GET", url, http.NoBody)
if err != nil {
return nil, fmt.Errorf("cannot fetch index: %w", err)
type FetchFunc func(context.Context, string) ([]byte, error)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I set the return signature to []byte to keep the unmarshall logic in the main fetch function. Incase it is desired to support backends that store indices not as single YAML file we could also move the unmarshal logic into a util function and have the Fetch functions return an index entry list instead.

@max-frank max-frank force-pushed the add-logic-to-support-different-index-url-schemes branch 3 times, most recently from b856549 to d8f36f5 Compare June 6, 2023 08:57
@max-frank
Copy link
Contributor Author

Thinking out loud here some potential future backends might be using HTTP/S as their protocol language but could require more complex requests (i.e., not only a simple GET request).
So instead of adding this URL scheme based approach adding a backend option to index configurations might be better. This could default to simple HTTP/S for backward compatibility and would probably more flexible.

@max-frank max-frank changed the title feat(index/fetch) Add index URL schemes support to fetch WIP: feat(index/fetch) Add index URL schemes support to fetch Jun 13, 2023
@max-frank max-frank force-pushed the add-logic-to-support-different-index-url-schemes branch from d8f36f5 to 76ad94a Compare June 13, 2023 03:53
@poiana poiana added size/XL and removed size/L labels Jun 13, 2023
@max-frank
Copy link
Contributor Author

Thinking out loud here some potential future backends might be using HTTP/S as their protocol language but could require more complex requests (i.e., not only a simple GET request). So instead of adding this URL scheme based approach adding a backend option to index configurations might be better. This could default to simple HTTP/S for backward compatibility and would probably more flexible.

changed the approach to this

@max-frank max-frank force-pushed the add-logic-to-support-different-index-url-schemes branch from 76ad94a to f7b0489 Compare June 14, 2023 02:59
Comment on lines +96 to +117
Context("failure", func() {
When("without URL", func() {
BeforeEach(func() {
args = []string{indexCmd, addCmd, "--config", configFile, indexName}
})
addAssertFailedBehavior(indexAddUsage, "ERRO: accepts between 2 and 3 arg(s), received 1")
})

When("with invalid URL", func() {
BeforeEach(func() {
args = []string{indexCmd, addCmd, "--config", configFile, indexName, "NOTAPROTOCAL://something"}
})
addAssertFailedBehavior(indexAddUsage, "ERRO: unable to add index: unable to fetch index \"testName\""+
" with URL \"NOTAPROTOCAL://something\": unable to fetch index: cannot fetch index: Get "+
"\"notaprotocal://something\": unsupported protocol scheme \"notaprotocal\"")
})

When("with invalid backend", func() {
BeforeEach(func() {
args = []string{indexCmd, addCmd, "--config", configFile, indexName, "http://noindex", "notabackend"}
})
addAssertFailedBehavior(indexAddUsage, "ERRO: unable to add index: unable to fetch index \"testName\" "+
"with URL \"http://noindex\": unsupported index backend type: notabackend")
})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only added failure tests for the new behavior atm, because i noticed that the current implementation of
falcoctl index add behaves strangely or could be considered buggy i.e.,

If you call index add for and index name for which there already exists in the cache the CMD will always return Index %q successfully added and do no modification. For example consider the following sequence

$ cat example.yaml                                                                                      
cat: example.yaml: No such file or directory
$ ./falcoctl index add --config example.yaml example https://falcosecurity.github.io/falcoctl/index.yaml
 INFO  Adding index
 INFO  Index "example" successfully added
$ cat example.yaml                                                                                      
indexes:
    - name: falcosecurity
      url: https://falcosecurity.github.io/falcoctl/index.yaml
      backend: ""
    - name: example
      url: https://falcosecurity.github.io/falcoctl/index.yaml
      backend: ""
$ ./falcoctl index add --config example.yaml example NOPE://NOPE                                        
 INFO  Adding index
 INFO  Index "example" successfully added
$ cat example.yaml                                                                                      
indexes:
    - name: falcosecurity
      url: https://falcosecurity.github.io/falcoctl/index.yaml
      backend: ""
    - name: example
      url: https://falcosecurity.github.io/falcoctl/index.yaml
      backend: ""

One would expect the add command to error that the index already exists or that the index is invalid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @max-frank, the initial idea was to have the add command return an error only in cases when there is not possible to add an index. I agree with you that the message should be different if the index already exists, but it should not error.

Anyway, feel free to modify the current implementation and extend the tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood if its modified behavior I can modify the output in a follow up PR.

I'll try to add more tests relevant to this PR I initial did not because of confusion regarding the above interaction and that I a wall in making the cache and config clean up work correctly in between test runs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alacuku For the tests i have run into a slight problem here. For the add tests to be stable and isolated I need to make sure to clean all index config and cache files after each run, but the location of those files is right now not configurable (see code here).

Now since I cant use an isolated tmp config dir it would mean that index add tests would have side effects on developer machines, i.e., Indices from tests are added to their real local configuration and if we were to do a clean up after a test case we would delete all their index config.

Is it ok if i skip the positive tests here for now until this issue is addressed?

@max-frank max-frank changed the title WIP: feat(index/fetch) Add index URL schemes support to fetch feat(index/fetch) Add index URL schemes support to fetch Jun 14, 2023
@max-frank max-frank force-pushed the add-logic-to-support-different-index-url-schemes branch from f7b0489 to a6f6e7a Compare June 14, 2023 03:13
@max-frank max-frank force-pushed the add-logic-to-support-different-index-url-schemes branch 2 times, most recently from 222926e to d1cca11 Compare July 4, 2023 04:20
@max-frank
Copy link
Contributor Author

I was playing around with the follow up PR that would add GCS support on top of the changes made here and noticed that we kinda need to pass the index entry config to the fetch function. Since if we dont getting the backend config for an entry becomes tedious. So I updated the PR, but since the fetch logic now references the index config entry struct i had to move it to a separate package to avoid import loops.

With that also came some linting required changes e.g., ConfigEntry -> Entry since golint does not like if after importing stuff is called config.ConfigEntry

@max-frank max-frank force-pushed the add-logic-to-support-different-index-url-schemes branch 2 times, most recently from b49c468 to 747ca23 Compare July 4, 2023 10:08
@alacuku
Copy link
Member

alacuku commented Jul 4, 2023

Hi @max-frank ,

So instead of adding this URL scheme based approach adding a backend option to index configurations might be better. This could default to simple HTTP/S for backward compatibility and would probably more flexible.

Could you elaborate more on the backend approach? Starting from the URL we can infer the backend without the need for users to know the backend storing the index. If there is any use case for which the backend approach makes the implementation easier I would consider, otherwise let's stick with the URL scheme.

@max-frank
Copy link
Contributor Author

max-frank commented Jul 4, 2023

Could you elaborate more on the backend approach? Starting from the URL we can infer the backend without the need for users to know the backend storing the index. If there is any use case for which the backend approach makes the implementation easier I would consider, otherwise let's stick with the URL scheme.

@alacuku I was mostly thinking of storage backends that use the same protocol (thus same URI scheme), but work differently e.g.,

in case of HTTP we can have a simple server/API that expects a simple GET request with maybe basic auth, oauth, or some authorization token. That minus the Auth would be the currently supported logic.

Now we can take Azure Blob Storage this can be accessed via HTTP as well, but is a lot more complicated. Looking at the documentation of the REST API it requires certain headers to be set and authorized access uses special Azure specific authorization types some simple like shared keys some less so with dynamic signatures that have to be put in the header.

Stacking support for these kinds of things into a single scheme would get complicated quickly I believe. Though I am aware that I am thinking a bit ahead here in terms of considering support for these things.

@alacuku
Copy link
Member

alacuku commented Jul 5, 2023

Now we can take Azure Blob Storage this can be accessed via HTTP as well, but is a lot more complicated. Looking at the documentation of the REST API it requires certain headers to be set and authorized access uses special Azure specific authorization types some simple like shared keys some less so with dynamic signatures that have to be put in the header.

It would be great to have them both. If the user does not explicitly set the backend falcoctl will try to guess it based on the URL scheme. For URL schemes shared by different backends, it would ask for the backend to be set explicitly by the user.

Do you think we can do it in this PR?

@max-frank
Copy link
Contributor Author

It would be great to have them both. If the user does not explicitly set the backend falcoctl will try to guess it based on the URL scheme. For URL schemes shared by different backends, it would ask for the backend to be set explicitly by the user.

Do you think we can do it in this PR?

I can add some simple scheme to backend mapping with one default for each scheme.

@max-frank max-frank force-pushed the add-logic-to-support-different-index-url-schemes branch from 856b156 to 3491946 Compare July 5, 2023 08:40
@max-frank
Copy link
Contributor Author

max-frank commented Jul 5, 2023

I can add some simple scheme to backend mapping with one default for each scheme.

Added the logic for now its just http and https mapping to the same string, but once there are more backends it will not look as pointless e.g., for Google Cloud Storage gs -> gcs etc.

Also I kept the behavior to try to use HTTP if we cant resolve to a backend type.

@max-frank max-frank force-pushed the add-logic-to-support-different-index-url-schemes branch from 3491946 to aefc12d Compare July 5, 2023 13:39
Copy link
Member

@alacuku alacuku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two minor comments. It seems ok to me. We are a rebase away from merging this one.

Nice job @max-frank! ❤️

cmd/add_test.go Outdated Show resolved Hide resolved
internal/utils/indexes.go Outdated Show resolved Hide resolved
Signed-off-by: Maximilian Frank <mfrank@mercari.com>
This makes it possible to pass complex backend configs to the fetch logic

Signed-off-by: Maximilian Frank <mfrank@mercari.com>
Signed-off-by: Maximilian Frank <mfrank@mercari.com>
Signed-off-by: Maximilian Frank <mfrank@mercari.com>
@max-frank max-frank force-pushed the add-logic-to-support-different-index-url-schemes branch from aefc12d to b5c5b7e Compare July 6, 2023 09:00
@max-frank
Copy link
Contributor Author

Comment updated, dead code removed and rebased to latest main 👍

@max-frank
Copy link
Contributor Author

Also I've been working offline on the followup PR to add GCS as backend. That one will be ready for review soon as well.

Copy link
Member

@alacuku alacuku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/LGTM

@poiana
Copy link
Contributor

poiana commented Jul 6, 2023

LGTM label has been added.

Git tree hash: d5cb49af4bb0fbc0df84677eced457e9f54c85b1

@alacuku
Copy link
Member

alacuku commented Jul 6, 2023

Also I've been working offline on the followup PR to add GCS as backend. That one will be ready for review soon as well.

I would be happy to review it. And again, thank you for your efforts in the falcoctl project.

Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!
Thanks also @alacuku for the review!
/approve

@poiana
Copy link
Contributor

poiana commented Jul 6, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alacuku, FedeDP, max-frank

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana poiana added the approved label Jul 6, 2023
@poiana poiana merged commit 472ef3b into falcosecurity:main Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants