Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builtins.toJSON impossible to create single backslash followed by certain characters #10082

Closed
NobbZ opened this issue Feb 26, 2024 · 3 comments
Labels

Comments

@NobbZ
Copy link
Contributor

NobbZ commented Feb 26, 2024

Describe the bug

It seems to be impossible to create a JSON that has a single literal backslash in the string to do some escape sequences.

It seems as if certain escapes (means, literal characters) nix is aware of, get translated correctly, though sequences not understood by nix itself, can not get produced. Like the \u followed by 4 hex digits to create arbitrary unicode points in the string.

Steps To Reproduce

  1. cat $(nix eval --expr 'builtins.toFile "json" (builtins.toJSON {a = "\u1234";})' --raw) => {"a":"u1234"}
  2. cat $(nix eval --expr 'builtins.toFile "json" (builtins.toJSON {a = "\\u1234";})' --raw) =>
    {"a":"\\u1234"}

Expected behavior

A JSON with the content {"a":"\u1234"} can be created.

nix-env --version output

$ nix --version
nix (Nix) 2.21.0pre20240224_d83008c

This problem has been reported repeadetely over the last years though, and I only realized today, that there doesn't seem to be an issue to track.

Additional context

Priorities

Add 👍 to issues you find important.

@NobbZ NobbZ added the bug label Feb 26, 2024
@thufschmitt
Copy link
Member

Discussed during the Nix maintainers meeting on 2024-02-26.
Wontfix, unless someone comes with a convincing backwards-compatible way of allowing that.

  • @roberth: JSON is defined to be UTF-8, and I don't think we should give control about which representation of UTF-8 characters is output. If a parser cares about which syntax is used, that's a bug in the parser.

  • The underlying problem is that we don't have an escape syntax in Nix strings for producing UTF-8 characters, except indirectly through fromJSON.

  • this is a won't fix on the builtins.toJSON level: A nix string value shouldn't be surjective onto an aribitrary JSON string expressions, but just JSON values.

    • Naively fixing the bug would be akin to creating a new https://en.wikipedia.org/wiki/SQL_injection value, in that the JSON parser is reinterpetating the string in an unexpected way. (Imagine builins.toJSON ("\\" + "u2345").

The solution right now is to just put the actual unicode character in the Nix file.
If having non-ascii Nix source is a problem, then we could consider adding \u... escape sequences as a nix language feature, but we need infra for breaking language changes first. Some expressions may rely on "\u" == "u".

Alternatively, we might instead of "stealing" existing syntax find a currently-invalid ("unused") syntax for this. E.g. we could define "${\1234}" to be the UTF-8 representation of code point 1234.

@thufschmitt thufschmitt closed this as not planned Won't fix, can't repro, duplicate, stale Feb 29, 2024
@benjamb
Copy link

benjamb commented Apr 15, 2024

@thufschmitt I've just stumbled into this while trying to configure waybar without adding glyphs from icon fonts directly into my Nix configuration.

While I potentially just need to get over myself, I would certainly appreciate an alternative method, such as those mentioned above. Has a new issue been created to track this as a possibility?

@roberth
Copy link
Member

roberth commented Apr 16, 2024

Adding \u unicode escapes into the language is not possible without something like language versioning or giving up on long term reproducibility. I've added it to the list here:

Workaround:

settings.foo.bar = builtins.fromJSON ''"\u1234"'';

This will output an unescaped string, which is valid and equivalent JSON.
If that doesn't work, that's either a bug in the code that reads the JSON, or you might be sending the JSON through something that doesn't support UTF-8 before it's parsed, in which case you could postprocess the JSON that was generated by Nix to replace the unicode by their equivalent escapes; something like:

configFile = runCommand "config.ascii.json" {
  nativeBuildInputs = [ jq ];
  json = builtins.toJSON x;  # where x may contain unicode, whether that's from actual UTF-8 used in string literals, or produced by fromJSON;
  passAsFile = [ "json" ];
} ''
  jq . --ascii-output <$jsonPath >$outPath
''

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants