Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Add support for nested interpolations in Override grammar #1594

Closed

Conversation

odelalleau
Copy link
Collaborator

Still WIP, but I wanted to put it up to discuss something

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 30, 2021
@odelalleau odelalleau marked this pull request as draft April 30, 2021 23:03
Copy link
Collaborator Author

@odelalleau odelalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to discuss

# If it is an interpolation, then OmegaConf will take care of un-escaping
# the `\\`. But if it is not, then we need to do it here.
if not is_interpolation:
ret = ret.replace("\\\\", "\\")
Copy link
Collaborator Author

@odelalleau odelalleau Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to discuss this point. It looks (and is) a bit clumsy but it's the best I have right now.

Consider the two following override values (here we really have 4 backslashes ending each quoted string, I'm assuming these are the actual characters after any Python un-escaping):

  1. ["abc\\\\", "def\\\\"]
  2. ["${abc}\\\\", "${def}\\\\"]

In both situations we would like the \\\\ to be turned into \\ (i.e., \\\\ == two consecutive escaped backslashes).

In case (1), if the content of each quoted string is kept unchanged (ex: abc\\\\) then we'll end up with extra unwanted backslashes at the end of the string (since the intent here is to obtain abc\\).

In case (2), if Hydra un-escapes the backslahes, then it will feed the string ${abc}\\ to OmegaConf, which will resolve it to <value_of_abc>\ (it will un-escape \\ into \), so we would end up with only one \ at the end instead of two.

As a result, we need to know whether or not OmegaConf will process the string with its grammar, in order to decide whether or not to un-escape the \\. I guess we could decide to ignore this and let the user figure it out, but this wouldn't seem intuitive.

@omry how do you feel about this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this is a side effect of the fact that in OmegaConf, setting a value node to abc\\ will use this string unchanged, while ${abc}\\ will be resolved to <value_of_abc>\.
One potential approach to resolve this issue would be the one from omry/omegaconf#621 where \\ is only un-escaped when required.

Copy link
Collaborator

@omry omry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can merge the change "Quoted values are not valid dictionary keys anymore" independently (just need to add a news item).

Before jumping into the details here, let's answer this question:

Why should the Hydra override grammar be aware of interpolations now that the OmegaConf has an actual grammar than is handling them?

@odelalleau
Copy link
Collaborator Author

odelalleau commented May 3, 2021

I think we can merge the change "Quoted values are not valid dictionary keys anymore" independently (just need to add a news item).

Yeah, don't look at this PR's diff yet, I'm splitting it into multiple independent components (#1599 is the first step) -- I just had this draft up to discuss the specific point about un-escaping \\ in quoted strings.

Before jumping into the details here, let's answer this question:

Why should the Hydra override grammar be aware of interpolations now that the OmegaConf has an actual grammar than is handling them?

Two reasons:

  1. It needs to know when an interpolation ends, which is not trivial with nested interpolations. Note however that I'm planning to try and make it as "ignorant" as possible of what's inside an interpolation (it should be quite different from the version currently in this PR)
  2. The point I mentioned in the comment above ([Draft] Add support for nested interpolations in Override grammar #1594 (comment)): we need (as far I can tell) to add the un-escaping of \\ to quoted strings (to properly address [Bug] Override grammar: quoted strings ending with a trailing backslash may trigger parsing errors #1600), but if there is an interpolation in the quoted string and we do it both within Hydra and OmegaConf, it will cause a potentially confusing double un-escaping. For instance, "key='abc \\\\${foo}'" would be required to obtain abc \<value_of_foo> instead of just "key='abc \\${foo}'" (note: I'm ignoring shell/Python escaping here to keep things somewhat redable). Alternatively, we could reduce the amount of un-escaping as in [ON HOLD] Simpler escaping in the grammar, and fix to quoted values omry/omegaconf#621 (currently I'm not planning to do it, but I think it could still be a valid option).

@omry
Copy link
Collaborator

omry commented May 3, 2021

It needs to know when an interpolation ends

Why?
Can you show case(s) where not knowing that would hinder parsing?

The point I mentioned in the comment above (#1594 (comment))

I think nested interpolations in the override will be extremely rare.
If we can make the Hydra overrides agnostic to interpolations (and thus to nested interpolations) it will only leave us with quoting of quotes. while not pretty (you have the shell, hydra and omegaconf all unescaping, I think in practice this is unlikely to get in the way because the usage will of nested interpolation in the overrides will be so rare.
I am not sure it's worth making an already complex grammar so much more complicated for this.

@odelalleau
Copy link
Collaborator Author

It needs to know when an interpolation ends

Why?
Can you show case(s) where not knowing that would hinder parsing?

I'm not sure I understand your question -- it seems obvious to me that not knowing where an interpolation ends is going to cause trouble. For instance in

key=[${a}, b, {c: d}]

how would you know that it's a valid list unless you can tell where ${a} ends?

Here is an example where detecting the end gets trickier:

key=[${foo:'}, ${b}, {c: d'}]

The point I mentioned in the comment above (#1594 (comment))

I think nested interpolations in the override will be extremely rare.
If we can make the Hydra overrides agnostic to interpolations (and thus to nested interpolations) it will only leave us with quoting of quotes. while not pretty (you have the shell, hydra and omegaconf all unescaping, I think in practice this is unlikely to get in the way because the usage will of nested interpolation in the overrides will be so rare.
I am not sure it's worth making an already complex grammar so much more complicated for this.

(I think) this is related to your other comment here: #1600 (comment)

Do you believe it would be possible to properly parse the second example above with just regexes for quoted strings & interpolations? I don't see it myself (I mean, obviously it would be possible with this specific example, but more generally, you see the idea)

Also, just to make sure it's clear, regarding the specific un-escaping of \\ I first wanted to discuss here (in #1594 (comment)):

  • it isn't related to the nesting of interpolations, it can typically happen when there are both \ and (non-nested) interpolations in a quoted string
  • it adds extra complexity to the visitor (the specific bit of code I highlighted above) but not the grammar

@omry
Copy link
Collaborator

omry commented May 3, 2021

It needs to know when an interpolation ends

Why?
Can you show case(s) where not knowing that would hinder parsing?

I'm not sure I understand your question -- it seems obvious to me that not knowing where an interpolation ends is going to cause trouble. For instance in

key=[${a}, b, {c: d}]

how would you know that it's a valid list unless you can tell where ${a} ends?

${a} is an unquoted string that does not contain a comma.
It must be an element. It does not look like a dictionary or a list, so it's a string.

Here is an example where detecting the end gets trickier:

key=[${foo:'}, ${b}, {c: d'}]

With the emerging definition, I think you can't have unescaped quotes in the middle of an string.

If you want:

key=[${foo:'CONFUSING_STUFF_HERE'}]

You must quote the enclosing string and potentially escape it the inner quotes:

key=['${foo:"CONFUSING_STUFF_HERE"}']
# or
key=['${foo:\'CONFUSING_STUFF_HERE\'}']
# or even this (although the shell may try to interpret the inner interplation):
key=["${foo:'CONFUSING_STUFF_HERE'}"]

If you do need a comma, e.g a custom resolver: a=${add:10, 20} will have to become:

a='${add:10, 20}'

Do you believe it would be possible to properly parse the second example above with just regexes for quoted strings & interpolations? I don't see it myself (I mean, obviously it would be possible with this specific example, but more generally, you see the idea)

Not in the current form.
I am asking a higher level question: Is this an important enough use case to warrant this level of support?

Also, just to make sure it's clear, regarding the specific un-escaping of \\ I first wanted to discuss here (in #1594 (comment)):

  • it isn't related to the nesting of interpolations, it can typically happen when there are both \ and (non-nested) interpolations in a quoted string
  • it adds extra complexity to the visitor (the specific bit of code I highlighted above) but not the grammar

Between the unescaping in the shell, in Hydra and in OmegaConf, no one will actually manage to use a quoted string in a nested interpolation in an override in the shell. It will just be too confusing.
Do you think you will manage to support foo=${concat:"a,b", c}, def in a command line in the shell?

@odelalleau
Copy link
Collaborator Author

For instance in

key=[${a}, b, {c: d}]

how would you know that it's a valid list unless you can tell where ${a} ends?

${a} is an unquoted string that does not contain a comma.
It must be an element. It does not look like a dictionary or a list, so it's a string.

Currently unquoted strings don't allow {} in them, it may be a bit tricky to include them without introducing a confusion with dictionaries. But let's say we get it to work: what about ${a:b,c} instead? (now it has a comma).
(answered below: use quotes)

Here is an example where detecting the end gets trickier:

key=[${foo:'}, ${b}, {c: d'}]

With the emerging definition, I think you can't have unescaped quotes in the middle of an string.

I'm not sure I follow, escaped quotes are for quotes inside quoted strings, but here I don't have any quote inside my quoted string.

Based on what you wrote below though, I believe you're saying that I should be adding quotes around the interpolation:

If you want:

key=[${foo:'CONFUSING_STUFF_HERE'}]

You must quote the enclosing string and potentially escape it the inner quotes:

key=['${foo:"CONFUSING_STUFF_HERE"}']
# or
key=['${foo:\'CONFUSING_STUFF_HERE\'}']
# or even this (although the shell may try to interpret the inner interplation):
key=["${foo:'CONFUSING_STUFF_HERE'}"]

In other words, any interpolation that can't be matched with the basic INTERPOLATION regex must be quoted: I think that could indeed work and would solve pretty much all issues, at the expense of some potentially surprising / cumbersome behavior (but like you said, we may not really care). I'll give it a shot.

I am asking a higher level question: Is this an important enough use case to warrant this level of support?

I agree that the main goal is to make it possible to pass arbitrary values to OmegaConf, and it's ok if some niche use cases are tricky to get right. I hadn't thought of putting all tricky interpolations in quoted strings until you mentioned it now: I think it's a reasonable trade-off.

Between the unescaping in the shell, in Hydra and in OmegaConf, no one will actually manage to use a quoted string in a nested interpolation in an override in the shell. It will just be too confusing.
Do you think you will manage to support foo=${concat:"a,b", c}, def in a command line in the shell?

With what I had it should work as:

'foo="${concat:"a,b", c}, def"'

which isn't that complex, but again, I agree it's ok to make it more complex if it means the underlying grammar/code is simpler.

@omry
Copy link
Collaborator

omry commented May 3, 2021

Based on what you wrote below though, I believe you're saying that I should be adding quotes around the interpolation:

Yes.

In other words, any interpolation that can't be matched with the basic INTERPOLATION regex must be quoted: I think that could indeed work and would solve pretty much all issues, at the expense of some potentially surprising / cumbersome behavior (but like you said, we may not really care). I'll give it a shot.

I was thinking that we might be able to remove the INTERPOLATION token completely, but I think it's indeed conflicting with the dict syntax.
But I realize that we need it to differentiate between cases like these:

a=${b} # good interpolation
a={b} # bad dict

I agree that the main goal is to make it possible to pass arbitrary values to OmegaConf, and it's ok if some niche use cases are tricky to get right. I hadn't thought of putting all tricky interpolations in quoted strings until you mentioned it now: I think it's a reasonable trade-off.

Great.
I think it's also good because it will allow the two grammars to evolve more freely.
quoted strings are basically a universal escape hatch.

@odelalleau
Copy link
Collaborator Author

I was thinking that we might be able to remove the INTERPOLATION token completely, but I think it's indeed conflicting with the dict syntax.

What about deprecating interpolations in unquoted strings in 1.1? This way INTERPOLATION could be removed in 1.2, and it would also avoid some potential confusion regarding which interpolations are supported vs which aren't (in unquoted strings).

@omry
Copy link
Collaborator

omry commented May 4, 2021

What about deprecating interpolations in unquoted strings in 1.1? This way INTERPOLATION could be removed in 1.2, and it would also avoid some potential confusion regarding which interpolations are supported vs which aren't (in unquoted strings).

Normally I would say yes to this, but because the primary usage is in the shell, it gets a bit confusing:

# not a properly quoted interpolation
a='${b}'
# properly quoted interpolation
a='"${b}"'
# or
a='"${b}"'

@odelalleau
Copy link
Collaborator Author

Normally I would say yes to this, but because the primary usage is in the shell, it gets a bit confusing:

Alright -- not a big deal (any kind of interpolation on the command line is probably a niche use case though)

@odelalleau
Copy link
Collaborator Author

Since quoted strings are enough to wrap any kind of interpolation, #1601 is all we need => closing this PR

@odelalleau odelalleau closed this May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants