Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alter object.get builtin to support getting an array of nested keys from a nested object #4166

Merged
merged 1 commit into from
Jan 27, 2022

Conversation

charlieegan3
Copy link
Contributor

@charlieegan3 charlieegan3 commented Dec 22, 2021

This PR will change the behavior of object.get to, when passed an array as the key value, interpret that as a path, and perform a series of object.get operations on a the object and objects nested within it along that path.


Original Body:

This adds a new builtin object.dig to allow the extracting of data from nested objects. This is to make it easier for users of the Rego language to extract data from deeply nested input objects.

The name and functionality were inspired by Ruby's Hash.dig function.

Copy link
Contributor

@srenatus srenatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too bad 😉 Just kidding, thanks for the contribution. Got a nitpick there with the docs; and I'm wondering how attached you are to the default argument. And what others think.

docs/content/policy-reference.md Outdated Show resolved Hide resolved
topdown/object.go Outdated Show resolved Hide resolved
@charlieegan3
Copy link
Contributor Author

Thanks for the review 😊

I'm not sure what implementations of object.get/object.dig would look like without the default argument. Would a missing key result in an error or instead return false to stop the evaluation of the rule?

I feel like if it errors when there are missing keys, then it makes the builtins less useful for use cases involving arbitrary data. If it returns false, I feel it might be ambiguous? e.g.

x := object.get({"a": false}, "a")
=> false

x := object.get({"a": false}, "b")
=> false

@srenatus
Copy link
Contributor

Would a missing key result in an error or instead return false to stop the evaluation of the rule?

Yeah object.get(o, k) would become the exact same thing as o[k]. If they key doesn't exist, it would turn into undefined.

So yeah, it makes sense to have both object.get and o[k] because of that difference. object.dig has no "plain rego" equivalent.

@charlieegan3
Copy link
Contributor Author

True, with that comparison it does make sense to have object.get with a default.

I did wonder if object.dig could be object.get with an array param, but since array keys are valid Rego I figured it'd be clearer to add as a new function with an otherwise similar interface.

Copy link
Member

@anderseknert anderseknert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great addition! Thank you very much @charlieegan3 👍

The function name feels a little alien to me.. likely since I've never worked with Ruby. Could something like object.select be considered?

test/cases/testdata/objectdig/test-objectdig.yaml Outdated Show resolved Hide resolved
@charlieegan3
Copy link
Contributor Author

Hey. Not at all wedded to the name, very open to alternative names including select. I had a little look here https://dev.to/obahareth/are-there-functions-similar-to-ruby-s-dig-in-other-languages-4c08 but other than get_in in elixer most languages seem to have this functionality with some form of optional/? style dot notation sugar.

@anderseknert
Copy link
Member

Yeah, Clojure uses get-in as well. I kinda like that as it also connects nicely to the already existing object.get. Should we go with that object.get_in? Also, "get in" memes :)

get-in-loser-were-getting-spaghetti-27017898

@anderseknert
Copy link
Member

Thinking about this some more, couldn't object.get simply be retrofitted to take an array as an alternative to string? Many of the object. functions already seem to work that way. It's at least hard for me to see how that might break any existing policies 🤔

@charlieegan3
Copy link
Contributor Author

Interesting, I did think about that but initially decided against it. If anyone is using object.get with an array key this would be hard to support without a breaking change. For example:

> object.get({[1,2,3]: "a"}, [1,2,3], "b")
"a"

However, this is likely a rare use case so perhaps worth reconsidering?

@anderseknert
Copy link
Member

Interesting, I did think about that but initially decided against it. If anyone is using object.get with an array key this would be hard to support without a breaking change.

Oh, right—I keep forgetting that's even an option. Yeah, I guess we'll need to go with object.get_in then, however unlikely scenario that might be. Or WDYT @srenatus?

@anderseknert
Copy link
Member

On second thought - that type of key will never exist in input as it isn't valid JSON, right? Calling object.get/object.get_in doesn't really make sense on an object that is defined in-policy, or does it? 🤔

@charlieegan3
Copy link
Contributor Author

That's fair enough, it's certainly not valid JSON to find in the input document.

I can change this to alter the functionality of object.get to take an array. If it's an array, it'd be used as a path of nested keys, otherwise, it'd be a single key.

Thanks for the feedback on this, much appreciated 😊

I'll pick this up in early January 🎄

@anderseknert
Copy link
Member

Awesome @charlieegan3 👍

@srenatus srenatus marked this pull request as draft January 5, 2022 09:52
@charlieegan3
Copy link
Contributor Author

I've booked a day to work on this again on Monday 🙂

@charlieegan3
Copy link
Contributor Author

In f4beb29 I have updated the implementation of object.get to have a similar functionality to object.dig when passed an array 'key'.

I have however realised that this will need some WASM changes too (comment). I'm going to look into this now - I might be able to use a Cloud VM to get this working while WASM is still a bit tricky on Apple M1.

@charlieegan3 charlieegan3 changed the title Add object.dig builtin Alter object.get builtin to support getting an array of nested keys from a nested object Jan 10, 2022
@charlieegan3
Copy link
Contributor Author

In 964c24a I have attempted to implement the same functionality in C for WASM support.

I think that this part is not working as expected - I haven't had the time yet to work out why. I'll try and make some time later in the week to look into it again but any pointers much appreciated 🙂

        opa_object_elem_t *elem = opa_object_get(opa_cast_object(obj), k);
        if (elem == NULL)
        {
            return value;
        }

wasm/src/object.c Outdated Show resolved Hide resolved
@srenatus
Copy link
Contributor

Here's a trick, I don't know if you've been aware: We can do printf-style debugging in wasm!

When you put

opa_println(opa_json_dump(elem));

in the code, and use make build, you'll see the output, in, say opa eval -fpretty -t wasm 'object.get({"foo": {"bar": 10}}, ["foo", "bar"], 100)'.

@srenatus
Copy link
Contributor

Had a look, and pushed a commit here: 37e4afa

The only crucial change here is this -- when you iterate over an array in the wasm code, opa_value_iter gives you the indices, and you'll need to use opa_value_get to get the actual array element.

Copy link
Member

@tsandall tsandall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me! I've left two minor comments on the topdown implementation.

topdown/object.go Outdated Show resolved Hide resolved
topdown/object.go Outdated Show resolved Hide resolved
@charlieegan3
Copy link
Contributor Author

Thanks for this feedback (and the helpful WASM debugging pointers too!). I had hoped to have time for this on Wednesday but didn't. I'll try and take another look on Sunday or Monday and try and get this working finally!

Copy link
Member

@tsandall tsandall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlieegan3 thanks for adjusting the PR! Two small nits. Assuming @srenatus is good w/ the Wasm changes, then LGTM.

re: using ast.Ref inside of wasm--correct, this only applies to the Go implementation.

topdown/object.go Outdated Show resolved Hide resolved
internal/ref/ref.go Outdated Show resolved Hide resolved
@charlieegan3
Copy link
Contributor Author

Thanks @tsandall for the clarification regarding ast.Ref 👍

Here's a trick...

@srenatus thanks for this, helped a lot with the c implementation! Let me know what you think, & think regarding #4166 (comment) / _validate_json_path

@srenatus
Copy link
Contributor

Sorry for the radio silence, I'll have a look in a few minutes. Generally, if what we have now (I've skimmed it) works fine, then let's go with that; but I'm curious what was happening there regardless.

@srenatus
Copy link
Contributor

Hmm so, having checked out 9a50f7bee7b6737ea99c98deda51ea787451ffb9, and running make build, I don't get a panic:

$ opa eval -fpretty -t wasm 'object.get({"a": 1}, [], 2)'
2

But I've noticed this in the build output:

src/object.c:449:23: warning: implicit declaration of function '_validate_json_path' is invalid in C99 [-Wimplicit-function-declaration]
    size_t path_len = _validate_json_path(opa_cast_array(key));
                      ^

and that would be bad form of course. What you've got here is fine, let's roll with that.

@charlieegan3
Copy link
Contributor Author

Thanks Stephan.

But I've noticed this in the build output

I too noticed this but thought I might have just been missing a header import or something? Interesting you didn't get the same panic.

I was having a think yesterday and think given this change, it makes sense to have a test case for the nested array paths. e.g. d03b299

However, this will now fail but only for the wasm SDK:

=== RUN   TestWasmE2E/test/cases/testdata/objectget/test-objectget-path.yaml/objectget/get_intermediate_array
    external_test.go:97: expected {{"x": 1}} but got {{"x": 2}}

I think it makes sense to have the functionality the same here, I'll try and have a got at this later today or tomorrow.

@srenatus
Copy link
Contributor

I think it makes sense to have the functionality the same here, I'll try and have a got at this later today or tomorrow.

Yeah that would be preferable. Thank you!

@charlieegan3
Copy link
Contributor Author

Hi again @srenatus, I've had another go and seem to passing the automated checks ✅. I think it matches the object.Find semantics (now used in the go implementation) as best I can. I've marked this as ready for review again, hope that's ok.

Thanks too @anderseknert for the link in https://openpolicyagent.slack.com/archives/CBR63TK2A/p1643093622163300?thread_ts=1643076899.155000&cid=CBR63TK2A, great to see that this is something that others might find useful too 😊

@anderseknert
Copy link
Member

Thanks too @anderseknert for the link in https://openpolicyagent.slack.com/archives/CBR63TK2A/p1643093622163300?thread_ts=1643076899.155000&cid=CBR63TK2A, great to see that this is something that others might find useful too 😊

Of course! This feature has been requested every now and then for at least a few years, so you're definitely solving a real problem here, and I'm looking forward to having this in OPA!

srenatus
srenatus previously approved these changes Jan 27, 2022
Copy link
Contributor

@srenatus srenatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks a lot. Would you mind squashing the commits, adding a short message to the remaining one, so we can get this merged? 🚀

This commit extends the go and wasm implementations of object.get to
allow a key to also be an array.

When passed an array, each element in the array will be used as a key in
turn. This allows values at deeply nested paths to be extracted from
objects.

It also supports getting indexes of nested arrays.

The functionality was originally inspired by Ruby's Hash.dig function:
https://ruby-doc.org/core-2.3.0_preview1/Hash.html#method-i-dig however
we opted to include the behavior in object.get instead after being
uncertain 'dig' was a commonly understood name.

Signed-off-by: Charlie Egan <charlieegan3@users.noreply.github.com>
@anderseknert anderseknert merged commit 301efc6 into open-policy-agent:main Jan 27, 2022
@charlieegan3
Copy link
Contributor Author

Thanks guys! ☺️ Happy to have this in, thanks too for all the support.

@anderseknert
Copy link
Member

Thank you Charlie! This is a great addition to OPA 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants