Add support for allowing dot being used inside the rql query key path by lo-simon · Pull Request #224 · sony/nmos-cpp

lo-simon · 2021-12-14T22:28:18Z

In RQL, the . character is used to represent the key separator for the key path, use of the . escape sequence which allows . can be used inside the key.
e.g. handle RQL on tags, where the tag names might include .

garethsb · 2021-12-14T22:44:01Z

This would be great to solve, and requiring dots within names to be percent-encoded is a good idea.

However, to implement this, I think we need to treat . specially, not %2E.

Consider foo.bar.baz%252Equx.

We want it to mean foo.bar.baz%2Equx but I think if you handle it with two rounds of percent-decode, you'll end up with foo.bar.baz.qux.

garethsb

Initial suggested test case tweak...

garethsb · 2021-12-15T14:06:08Z

+    const auto test_value = web::json::value_of({
+        { U("foo"), web::json::value_of({
+            { U("bar"), web::json::value_of({
+                { U("baz.qux"), U("quux")}


Suggested change

{ U("baz.qux"), U("quux")}

{ U("baz.qux"), U("quux")},

{ U("baz%2Equx"), U("quuux")}

garethsb · 2021-12-15T14:06:45Z

+    const utility::string_t query_rql = U("matches(foo.bar.baz%2Equx,quux)");
+    const auto rql_query = rql::parse_query(query_rql);
+
+    const auto args = rql_query.at(U("args"));
+    const auto key_path = args.as_array().at(0).as_string();
+    web::json::value results;
+
+    BST_REQUIRE_EQUAL(true, web::json::extract(test_value.as_object(), results, key_path));
+    BST_REQUIRE_EQUAL(U("quux"), results.as_string());


Suggested change

const utility::string_t query_rql = U("matches(foo.bar.baz%2Equx,quux)");

const auto rql_query = rql::parse_query(query_rql);

const auto args = rql_query.at(U("args"));

const auto key_path = args.as_array().at(0).as_string();

web::json::value results;

BST_REQUIRE_EQUAL(true, web::json::extract(test_value.as_object(), results, key_path));

BST_REQUIRE_EQUAL(U("quux"), results.as_string());

{

const utility::string_t query_rql = U("matches(foo.bar.baz%2Equx,quux)");

const auto rql_query = rql::parse_query(query_rql);

const auto args = rql_query.at(U("args"));

const auto key_path = args.as_array().at(0).as_string();

web::json::value results;

BST_REQUIRE_EQUAL(true, web::json::extract(test_value.as_object(), results, key_path));

BST_REQUIRE_EQUAL(U("quux"), results.as_string());

}

{

const utility::string_t query_rql = U("matches(foo.bar.baz%252Equx,quux)");

const auto rql_query = rql::parse_query(query_rql);

const auto args = rql_query.at(U("args"));

const auto key_path = args.as_array().at(0).as_string();

web::json::value results;

BST_REQUIRE_EQUAL(true, web::json::extract(test_value.as_object(), results, key_path));

BST_REQUIRE_EQUAL(U("quuux"), results.as_string());

}

garethsb · 2021-12-15T14:36:35Z

I think an approach to satisfy the requirements could be something like this:

Modify original web::json::extract to be:
```
bool extract(const web::json::object& object, web::json::value& results, const std::vector<utility::string_t>& key_path)
```
and make the existing function do a simple split on . (using boost::algorithm::split) before calling this new one. Neither function should do web::uri::decode.
Modify the nmos::make_extractor in nmos/query_utils.cpp to do split on . and web::uri::decode each component (using boost::transformed) before calling the new web::json::extract overload
Modify rql::make_value so that it doesn't necessarily web::uri::decode the value (and maybe type, not sure?), e.g. by adding a Boolean flag and/or a 'user'-specified decode function
Modify rql::parse_query to take that flag too and pass it to make_value

To maintain backward compatibility and flexibility, we would need to think harder about the last two bullet points, to ensure that type look-up and value equality works they way we want, e.g. is f%6F%6F equal to f%6f%6f and/or foo, and does %34%32 equal 42? Remember that at the moment, 42 becomes a JSON number, and so does %34%32... whereas if we don't decode, it'll be a JSON string.

Maybe the answer is to pass a parse_value function to parse_query which for the existing overload would be the existing details::make_value.

garethsb

This is heading in the right direction apart from one bug. However it doesn't address the issue I pointed out with value equality.

Consider the request URL:

/x-nmos/query/v1.3/nodes?query.rql=eq(label,f%6F%6F)

Does that match a Node with a label of foo or not?

Similarly what about a query like:

/x-nmos/query/v1.3/flows?query.rql=ge(bit_depth,%31%36)

Does that match audio Flows with bit_depth greater than or equal to 16?

garethsb · 2022-01-06T12:45:53Z

-                const utility::string_t::size_type key_last = key_path.find_first_of(_XPLATSTR("."), key_first);
-                const utility::string_t key = key_path.substr(key_first, details::count(key_first, key_last));
-                if (utility::string_t::npos != key_last)
+                if (key != *(key_path.end() - 1))


That checks that the value of the key isn't equal to the last one in the key_path... but what about paths like foo.bar.foo?

Well-spotted will fix it

garethsb · 2022-01-06T12:52:02Z

        bool extract(const web::json::object& object, web::json::value& results, const utility::string_t& key_path)
+        {
+            std::vector<utility::string_t> key_path_;
+            boost::algorithm::split(key_path_, key_path, [](utility::char_t c) { return U('.') == c; });


~~I think we try to avoid U preprocessor macro in the library code? Use _XPLATSTR as below.~~
Hmm, no, that's not true... can't now remember what our rule was! 😄 Looks odd having U and _XPLATSTR in the same file though!

Okay, will change to use _XPLATSTR instead of U.

FYI U is already used in the preprocess function.

utility::string_t preprocess(const utility::string_t& value)
{
// regex pattern matches JSON strings, or single or multi-line comments
// only strings are captured
static const utility::regex_t string_or_comment(U(R"-regex-(("[^"\\](?:\.[^"\\])")|(?://[^\r\n]+)|(?:/*[\s\S]?*/))-regex-"));
// format pattern uses the first capture group to copy strings into the output
// having inserted a single space to ensure tokens are not coalesced
return bst::regex_replace(value, string_or_comment, U(" $1"));
}

FYI U is already used in the preprocess function.

You're right. I think the intention may have been that anything we might conceivably have wanted to push back to Microsoft/cpprestsdk should use _XPLATSTR so that the code could be built with U disabled. So that probably means anything in the Development/cpprest (or pplx) directories... However, there are obviously a few files where I failed to follow my own rule already, including the instance you pointed out! Moreover, since cpprestsdk is in maintenance mode, it's not going to happen, even for the least controversial additions. For now, why not just fix the rest of this file so it uses _XPLATSTR consistently, and leave it at that. Thanks. :-)

lo-simon · 2022-01-10T14:42:13Z

This is heading in the right direction apart from one bug. However it doesn't address the issue I pointed out with value equality.

Consider the request URL:

/x-nmos/query/v1.3/nodes?query.rql=eq(label,f%6F%6F)

Does that match a Node with a label of foo or not?

Similarly what about a query like:

/x-nmos/query/v1.3/flows?query.rql=ge(bit_depth,%31%36)

Does that match audio Flows with bit_depth greater than or equal to 16?

The not-to-decode is only applied on the key path, in the above examples, they are the label and the bit_depth, where decode is always applied on the value part. So the above queries do work as expected.

…_path

…or checking against

garethsb · 2022-01-11T11:17:35Z

The not-to-decode is only applied on the key path,

As discussed on Slack, the proposed implementation is currently to apply the Boolean flag not-to-decode to all but the last argument of each function. This works for many functions which are of the form operator(<property>,<value>), e.g. eq(label,foo.bar), but could fail for some functions we have already like matches(foo,b%61r,i) (case-insensitive match) and breaks the current separation between parsing, according to RQL syntax, on one hand and evaluating, using a set of application-specific functions, on the other. Only the particular functions know whether their arguments should be treated as property extractors or not.

We now propose to adopt the RQL-specified array-form for "nested" properties (some way down the RQL Rules) to solve the issue instead:

Any property can be nested by using an array of properties. To search by the bar property of the object in the foo property we can do:
(foo,bar)=3

(The example has =3 because it uses the FIQL syntactic sugar, the normalized form (required by IS-04 Advanced RQL Queries) would be eq((foo,bar),3).)

Since parse_query already implements the array form, the changes to it can be reverted, and the extractor created by nmos::make_extractor in nmos/query_utils.cpp just needs updating to handle an array key as well as a string key.

This would mean that the following are equivalent:

eq(foo.bar.baz,qux)

eq((foo,bar,baz),qux)

But property names with dots would be specified by a client like so:

eq((foo,bar.baz),qux)

Another nail in the coffin for the first suggestion of using foo.bar%2Ebaz is that . is in fact not a URI reserved character, and the relevant part of RFC 3986 insists that unreserved characters can be decoded from the percent-encoded form at any time without changing the meaning. See e.g. RFC 3986 Section 2.3. We could have chosen to implement the approach suggested using / which is a reserved character, and this is what the RQL spec briefly mentions. In other words, foo/bar.baz and foo/bar%2Ebaz would be 100% equivalent, and / would have to be percent-encoded if it appeared in a key... However, since IS-04 picked foo.bar.baz to mean what RQL wants to indicate by foo/bar/baz, there are a lot of opportunities for confusion, of implementations and of implementors! Therefore we propose to only add the array form documented above.

…he `.` for the key path, e.g. eq((foo,bar),baz) which is the same as eq(foo.bar,baz)

…changes and there are explicit tests for web::json::extract

* for build and dependencies, e.g. sony#197, sony#198, sony#207, sony#211, sony#215, sony#229, sony#230, sony#235, sony#243 * for SDP parser/generator, e.g. sony#201, sony#205, sony#219, sony#241, sony#242, sony#244 * for RQL, e.g. sony#224 * for CI tests, e.g. sony#218, sony#231, sony#239, sony#250

Add support for allowing dot being used inside the rql query key path

b74e1b0

garethsb reviewed Dec 15, 2021

View reviewed changes

Simplify dots used within the RQL key names

b590faa

garethsb reviewed Jan 6, 2022

View reviewed changes

lo-simon added 2 commits January 10, 2022 14:52

Fix the test condition to check against the last key value in the key…

41c6d55

…_path

Add a new test to ensure the last key value of the key_path is used f…

dffece6

…or checking against

lo-simon and others added 5 commits January 11, 2022 20:34

Extend rql::extractor to handle json array, which allows not to use t…

7c36790

…he `.` for the key path, e.g. eq((foo,bar),baz) which is the same as eq(foo.bar,baz)

General tidy up

23e65df

Tweak naming and fix comments

ccc9afb

Add equivalent overload of web::json::insert

1d3f0c3

Simplify tests for rql::parse_query now that there really aren't any …

ef8def5

…changes and there are explicit tests for web::json::extract

jonathan-r-thorpe merged commit fa1d946 into sony:master Jan 13, 2022

garethsb mentioned this pull request Apr 28, 2022

Brief README updates for recent PRs #255

Merged

	{ U("baz.qux"), U("quux")}
	{ U("baz.qux"), U("quux")},
	{ U("baz%2Equx"), U("quuux")}

Conversation

lo-simon commented Dec 14, 2021

Uh oh!

garethsb commented Dec 14, 2021

Uh oh!

garethsb left a comment

Choose a reason for hiding this comment

Uh oh!

garethsb Dec 15, 2021

Choose a reason for hiding this comment

Uh oh!

garethsb Dec 15, 2021

Choose a reason for hiding this comment

Uh oh!

garethsb commented Dec 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

garethsb left a comment

Choose a reason for hiding this comment

Uh oh!

garethsb Jan 6, 2022

Choose a reason for hiding this comment

Uh oh!

lo-simon Jan 10, 2022

Choose a reason for hiding this comment

Uh oh!

garethsb Jan 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lo-simon Jan 10, 2022

Choose a reason for hiding this comment

Uh oh!

garethsb Jan 10, 2022

Choose a reason for hiding this comment

Uh oh!

lo-simon Jan 11, 2022

Choose a reason for hiding this comment

Uh oh!

lo-simon commented Jan 10, 2022

Uh oh!

garethsb commented Jan 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

garethsb commented Dec 15, 2021 •

edited

Loading

garethsb Jan 6, 2022 •

edited

Loading

garethsb commented Jan 11, 2022 •

edited

Loading