Skip to content

Add support for allowing dot being used inside the rql query key path#224

Merged
jonathan-r-thorpe merged 9 commits into
sony:masterfrom
lo-simon:dot_in_rql_query
Jan 13, 2022
Merged

Add support for allowing dot being used inside the rql query key path#224
jonathan-r-thorpe merged 9 commits into
sony:masterfrom
lo-simon:dot_in_rql_query

Conversation

@lo-simon
Copy link
Copy Markdown
Collaborator

In RQL, the . character is used to represent the key separator for the key path, use of the . escape sequence which allows . can be used inside the key.
e.g. handle RQL on tags, where the tag names might include .

@garethsb
Copy link
Copy Markdown
Contributor

This would be great to solve, and requiring dots within names to be percent-encoded is a good idea.

However, to implement this, I think we need to treat . specially, not %2E.

Consider foo.bar.baz%252Equx.

We want it to mean foo.bar.baz%2Equx but I think if you handle it with two rounds of percent-decode, you'll end up with foo.bar.baz.qux.

Copy link
Copy Markdown
Contributor

@garethsb garethsb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial suggested test case tweak...

Comment thread Development/rql/test/rql_test.cpp Outdated
const auto test_value = web::json::value_of({
{ U("foo"), web::json::value_of({
{ U("bar"), web::json::value_of({
{ U("baz.qux"), U("quux")}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{ U("baz.qux"), U("quux")}
{ U("baz.qux"), U("quux")},
{ U("baz%2Equx"), U("quuux")}

Comment thread Development/rql/test/rql_test.cpp Outdated
Comment on lines +18 to +26
const utility::string_t query_rql = U("matches(foo.bar.baz%2Equx,quux)");
const auto rql_query = rql::parse_query(query_rql);

const auto args = rql_query.at(U("args"));
const auto key_path = args.as_array().at(0).as_string();
web::json::value results;

BST_REQUIRE_EQUAL(true, web::json::extract(test_value.as_object(), results, key_path));
BST_REQUIRE_EQUAL(U("quux"), results.as_string());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const utility::string_t query_rql = U("matches(foo.bar.baz%2Equx,quux)");
const auto rql_query = rql::parse_query(query_rql);
const auto args = rql_query.at(U("args"));
const auto key_path = args.as_array().at(0).as_string();
web::json::value results;
BST_REQUIRE_EQUAL(true, web::json::extract(test_value.as_object(), results, key_path));
BST_REQUIRE_EQUAL(U("quux"), results.as_string());
{
const utility::string_t query_rql = U("matches(foo.bar.baz%2Equx,quux)");
const auto rql_query = rql::parse_query(query_rql);
const auto args = rql_query.at(U("args"));
const auto key_path = args.as_array().at(0).as_string();
web::json::value results;
BST_REQUIRE_EQUAL(true, web::json::extract(test_value.as_object(), results, key_path));
BST_REQUIRE_EQUAL(U("quux"), results.as_string());
}
{
const utility::string_t query_rql = U("matches(foo.bar.baz%252Equx,quux)");
const auto rql_query = rql::parse_query(query_rql);
const auto args = rql_query.at(U("args"));
const auto key_path = args.as_array().at(0).as_string();
web::json::value results;
BST_REQUIRE_EQUAL(true, web::json::extract(test_value.as_object(), results, key_path));
BST_REQUIRE_EQUAL(U("quuux"), results.as_string());
}

@garethsb
Copy link
Copy Markdown
Contributor

garethsb commented Dec 15, 2021

I think an approach to satisfy the requirements could be something like this:

  • Modify original web::json::extract to be:
    bool extract(const web::json::object& object, web::json::value& results, const std::vector<utility::string_t>& key_path)
    and make the existing function do a simple split on . (using boost::algorithm::split) before calling this new one. Neither function should do web::uri::decode.
  • Modify the nmos::make_extractor in nmos/query_utils.cpp to do split on . and web::uri::decode each component (using boost::transformed) before calling the new web::json::extract overload
  • Modify rql::make_value so that it doesn't necessarily web::uri::decode the value (and maybe type, not sure?), e.g. by adding a Boolean flag and/or a 'user'-specified decode function
  • Modify rql::parse_query to take that flag too and pass it to make_value

To maintain backward compatibility and flexibility, we would need to think harder about the last two bullet points, to ensure that type look-up and value equality works they way we want, e.g. is f%6F%6F equal to f%6f%6f and/or foo, and does %34%32 equal 42? Remember that at the moment, 42 becomes a JSON number, and so does %34%32... whereas if we don't decode, it'll be a JSON string.

Maybe the answer is to pass a parse_value function to parse_query which for the existing overload would be the existing details::make_value.

Copy link
Copy Markdown
Contributor

@garethsb garethsb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is heading in the right direction apart from one bug. However it doesn't address the issue I pointed out with value equality.

Consider the request URL:

/x-nmos/query/v1.3/nodes?query.rql=eq(label,f%6F%6F)

Does that match a Node with a label of foo or not?

Similarly what about a query like:

/x-nmos/query/v1.3/flows?query.rql=ge(bit_depth,%31%36)

Does that match audio Flows with bit_depth greater than or equal to 16?

Comment thread Development/cpprest/json_utils.cpp Outdated
const utility::string_t::size_type key_last = key_path.find_first_of(_XPLATSTR("."), key_first);
const utility::string_t key = key_path.substr(key_first, details::count(key_first, key_last));
if (utility::string_t::npos != key_last)
if (key != *(key_path.end() - 1))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That checks that the value of the key isn't equal to the last one in the key_path... but what about paths like foo.bar.foo?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-spotted will fix it

Comment thread Development/cpprest/json_utils.cpp Outdated
bool extract(const web::json::object& object, web::json::value& results, const utility::string_t& key_path)
{
std::vector<utility::string_t> key_path_;
boost::algorithm::split(key_path_, key_path, [](utility::char_t c) { return U('.') == c; });
Copy link
Copy Markdown
Contributor

@garethsb garethsb Jan 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we try to avoid U preprocessor macro in the library code? Use _XPLATSTR as below.
Hmm, no, that's not true... can't now remember what our rule was! 😄 Looks odd having U and _XPLATSTR in the same file though!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, will change to use _XPLATSTR instead of U.

FYI U is already used in the preprocess function.

utility::string_t preprocess(const utility::string_t& value)
{
// regex pattern matches JSON strings, or single or multi-line comments
// only strings are captured
static const utility::regex_t string_or_comment(U(R"-regex-(("[^"\\](?:\.[^"\\])")|(?://[^\r\n]+)|(?:/*[\s\S]?*/))-regex-"));
// format pattern uses the first capture group to copy strings into the output
// having inserted a single space to ensure tokens are not coalesced
return bst::regex_replace(value, string_or_comment, U(" $1"));
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI U is already used in the preprocess function.

You're right. I think the intention may have been that anything we might conceivably have wanted to push back to Microsoft/cpprestsdk should use _XPLATSTR so that the code could be built with U disabled. So that probably means anything in the Development/cpprest (or pplx) directories... However, there are obviously a few files where I failed to follow my own rule already, including the instance you pointed out! Moreover, since cpprestsdk is in maintenance mode, it's not going to happen, even for the least controversial additions. For now, why not just fix the rest of this file so it uses _XPLATSTR consistently, and leave it at that. Thanks. :-)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

@lo-simon
Copy link
Copy Markdown
Collaborator Author

This is heading in the right direction apart from one bug. However it doesn't address the issue I pointed out with value equality.

Consider the request URL:

/x-nmos/query/v1.3/nodes?query.rql=eq(label,f%6F%6F)

Does that match a Node with a label of foo or not?

Similarly what about a query like:

/x-nmos/query/v1.3/flows?query.rql=ge(bit_depth,%31%36)

Does that match audio Flows with bit_depth greater than or equal to 16?

The not-to-decode is only applied on the key path, in the above examples, they are the label and the bit_depth, where decode is always applied on the value part. So the above queries do work as expected.

@garethsb
Copy link
Copy Markdown
Contributor

garethsb commented Jan 11, 2022

The not-to-decode is only applied on the key path,

As discussed on Slack, the proposed implementation is currently to apply the Boolean flag not-to-decode to all but the last argument of each function. This works for many functions which are of the form operator(<property>,<value>), e.g. eq(label,foo.bar), but could fail for some functions we have already like matches(foo,b%61r,i) (case-insensitive match) and breaks the current separation between parsing, according to RQL syntax, on one hand and evaluating, using a set of application-specific functions, on the other. Only the particular functions know whether their arguments should be treated as property extractors or not.

We now propose to adopt the RQL-specified array-form for "nested" properties (some way down the RQL Rules) to solve the issue instead:

Any property can be nested by using an array of properties. To search by the bar property of the object in the foo property we can do:
(foo,bar)=3

(The example has =3 because it uses the FIQL syntactic sugar, the normalized form (required by IS-04 Advanced RQL Queries) would be eq((foo,bar),3).)

Since parse_query already implements the array form, the changes to it can be reverted, and the extractor created by nmos::make_extractor in nmos/query_utils.cpp just needs updating to handle an array key as well as a string key.

This would mean that the following are equivalent:

eq(foo.bar.baz,qux)
eq((foo,bar,baz),qux)

But property names with dots would be specified by a client like so:

eq((foo,bar.baz),qux)

Another nail in the coffin for the first suggestion of using foo.bar%2Ebaz is that . is in fact not a URI reserved character, and the relevant part of RFC 3986 insists that unreserved characters can be decoded from the percent-encoded form at any time without changing the meaning. See e.g. RFC 3986 Section 2.3. We could have chosen to implement the approach suggested using / which is a reserved character, and this is what the RQL spec briefly mentions. In other words, foo/bar.baz and foo/bar%2Ebaz would be 100% equivalent, and / would have to be percent-encoded if it appeared in a key... However, since IS-04 picked foo.bar.baz to mean what RQL wants to indicate by foo/bar/baz, there are a lot of opportunities for confusion, of implementations and of implementors! Therefore we propose to only add the array form documented above.

@jonathan-r-thorpe jonathan-r-thorpe merged commit fa1d946 into sony:master Jan 13, 2022
garethsb added a commit to garethsb/nmos-cpp that referenced this pull request Apr 28, 2022
* for build and dependencies, e.g. sony#197, sony#198, sony#207, sony#211, sony#215, sony#229, sony#230, sony#235, sony#243
* for SDP parser/generator, e.g. sony#201, sony#205, sony#219, sony#241, sony#242, sony#244
* for RQL, e.g. sony#224
* for CI tests, e.g. sony#218, sony#231, sony#239, sony#250
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants