Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve registry value return types #10806

Merged
merged 2 commits into from Oct 23, 2023
Merged

Improve registry value return types #10806

merged 2 commits into from Oct 23, 2023

Conversation

CAD97
Copy link
Contributor

@CAD97 CAD97 commented Oct 22, 2023

r? @fdncred
Last one, I hope. At least short of completely redesigning registry query's interface. (Which I wouldn't implement without asking around first.)

Description

User-Facing Changes has the general overview. Inline comments provide a lot of justification on specific choices. Most of the type conversions should be reasonably noncontroversial, but expanding REG_EXPAND_SZ needs some justification. First, an example of the behavior there:

> # release nushell:
> version | select version commit_hash | to md --pretty
| version | commit_hash                              |
| ------- | ---------------------------------------- |
| 0.85.0  | a6f62e05ae5b4e9ba4027fbfffd21025a898783e |
> registry query --hkcu Environment TEMP | get value
%USERPROFILE%\AppData\Local\Temp

> # with this patch:
> version | select version commit_hash | to md --pretty
| version | commit_hash                              |
| ------- | ---------------------------------------- |
| 0.86.1  | 0c5a4c991f1a77bcbe5a86bc8f4469ecf1218fe9 |
> registry query --hkcu Environment TEMP | get value
C:\Users\CAD\AppData\Local\Temp

> # Microsoft CLI tooling behavior:
> ^pwsh -c `(Get-ItemProperty HKCU:\Environment).TEMP`
C:\Users\CAD\AppData\Local\Temp
> ^reg query HKCU\Environment /v TEMP
HKEY_CURRENT_USER\Environment
    TEMP    REG_EXPAND_SZ    %USERPROFILE%\AppData\Local\Temp

As noted in the inline comments, I'm arguing that it makes more sense to eagerly expand the %EnvironmentString% placeholders, as none of Nushell's path functionality will interpret these placeholders. This makes the behavior of registry query match the behavior of pwsh's Get-ItemProperty registry access, and means that paths (the most common use of REG_EXPAND_SZ) are actually usable.

This does not break nu_script's update-path; it will just be slightly inefficient as it will not find any %Placeholder%s to manually expand anymore. But also, note that update-path is currently wrong, as a path including %LocalAppData%Low is perfectly valid and sometimes used (to go to Appdata\LocalLow); expansion isn't done solely on a path segment basis, as is implemented by update-path.

I believe that the type conversions implemented by this patch are essentially always desired. But if we want to keep registry query "pure", we could easily introduce a registry get1 which does the more complete interpretation of registry types, and leave registry query alone as doing the bare minimum. Or we could teach path expand to do ExpandEnvironmentStringsW. But REG_EXPAND_SZ being the odd one out of not getting its registry type semantics decoded by registry query seems wrong.

User-Facing Changes

  • registry query's produced value has changed. Specifically:
    • ❗ Rows where type == REG_EXPAND_SZ now expand %EnvironmentVarable% placeholders for you. For example, registry query --hkcu Environment TEMP | get value returns C:\Users\CAD\AppData\Local\Temp instead of %USERPROFILE%\AppData\Local\Temp.
      • You can restore the old behavior and preserve the placeholders by passing a new --no-expand switch.
    • Rows where type == REG_MULTI_SZ now provide a list<string> value. They previously had that same list, but | str join "\n".
    • Rows where type == REG_DWORD_BIG_ENDIAN now provide the correct numeric value instead of a byte-swapped value.
    • Rows where type == REG_QWORD now provide the correct numeric value2 instead of the value modulo 232.
    • Rows where type == REG_LINK now provide a string value of the link target registry path instead of an internal debug string representation. (This should never be visible, as links should be transparently followed.)
    • Rows where type =~ RESOURCE now provide a binary value instead of an internal debug string representation.

Footnotes

  1. This is the potential redesign I alluded to at the top. One potential change could be to make registry get Environment produce record<Path: string, TEMP: string, TMP: string> instead of registry query's table<name: string, value: string, type: string>, the idea being to make it feel as native as possible. We could even translate between Nu's cell-path and registry paths -- cell paths with spaces do actually work, if a bit awkwardly -- or even introduce lazy records so the registry can be traversed with normal data manipulation ... but that all seems a bit much.

  2. Nu's int is a signed 64-bit integer. As such, values >= 263 will be reported as their negative two's compliment value. This might sometimes be the correct interpretation -- the registry does not distinguish between signed and unsigned integer values -- but regedit and pwsh display all values as unsigned.

Specifically:
- REG_MULTI_SZ returns list<string> instead of | str join (char nl)
- REG_EXPAND_SZ automatically expands %EnvironmentString% placeholders
- REG_DWORD_BIG_ENDIAN returns the endianness decoded value
- REG_QWORD actually ingests values over 2**32
- REG_LINK produces a decoded string
- REG_*RESOURCE* produces binary data
@fdncred
Copy link
Collaborator

fdncred commented Oct 22, 2023

Thanks @CAD97. I appreciate your work here.

I'm fine with making your changes the default behavior, because these changes are kind of a QOL improvement. Most people will probably want to use this manipulated output that you've created. I see that value, for sure, but I don't want to lose track of my original intent, which was to get the values, without manipulation, from the registry. With that in mind, would you be interested in adding a --raw, -r parameter that returns what is in the registry keys without manipulating it?

This way, we can have the helpful defaults that you've written and then for the people that need it, we can have the raw data.

At least short of completely redesigning registry query's interface.

You should feel free. I know you well enough by your PRs to know that you have good instincts. If there's something else you want to discuss, feel free to ping me on Discord or throw up an issue for discussion.

@CAD97
Copy link
Contributor Author

CAD97 commented Oct 22, 2023

The thing is, the current behavior isn't really "without manipulation" either. If we start from the raw bytes retrieved from the registry, the decoding currently done is:

vtype value
REG_NONE | ignore
REG_BINARY | $in
REG_DWORD | into int
REG_DWORD_BIG_ENDIAN | into int
REG_QWORD | take 4 | into int
REG_SZ | decode utf-16 | str trim --right "\u{0}"
REG_MULTI_SZ | decode utf-16 | str trim --right "\u{0}" | str replace --all "\u{0}" "\n"
REG_EXPAND_SZ | decode utf-16 | str trim --right "\u{0}"
REG_LINK | $"($in)"
REG_RESOURCE_LIST | $"($in)"
REG_FULL_RESOURCE_DESCRIPTOR | $"($in)"
REG_RESOURCE_REQUIREMENTS_LIST | $"($in)"

This proposed implementation instead does:

vtype value
REG_NONE | ignore
REG_BINARY | $in
REG_DWORD | into int
REG_DWORD_BIG_ENDIAN | into int --endian big
REG_QWORD | into int
REG_SZ | decode utf-16 | str trim --right "\u{0}"
REG_MULTI_SZ | decode utf-16 | str trim --right "\u{0}" | split row "\u{0}"
REG_EXPAND_SZ | decode utf-16 | str trim --right "\u{0}" | str expand-env-strings1
REG_LINK | decode utf-16 | str trim --right "\u{0}"
REG_RESOURCE_LIST | $in
REG_FULL_RESOURCE_DESCRIPTOR | $in
REG_RESOURCE_REQUIREMENTS_LIST | $in

Note that while I do mutate reg_value.vtype, this is purely for the purpose of reusing String::from_reg_value (which errors (and I unwrap) if vtype not-in [REG_SZ REG_EXPAND_SZ REG_MULTI_SZ]). The reg_type actually returned to Nushell is now grabbed before calling reg_value_to_nu_value. (I would fully agree that changing the type produced would be undesirable.)

When you look at it like that, it's not that big of a diff...
 | each { update value (match $in.type {
     REG_NONE => {|| null }
     REG_BINARY => {|it| $it }
     REG_DWORD => {|it| $it | into int }
-    REG_DWORD_BIG_ENDIAN => {|it| $it | into int }
+    REG_DWORD_BIG_ENDIAN => {|it| $it | into int --endian big }
-    REG_QWORD => {|it| $it | take 4 | into int }
+    REG_QWORD => {|it| $it | into int }
     REG_SZ => {|it| $it | decode utf-16 | str trim --right "\u{0}" }
-    REG_MULTI_SZ => {|it| $it | decode utf-16 | str trim --right "\u{0}"` | str replace --all "\u{0}" "\n" }
+    REG_MULTI_SZ => {|it| $it | decode utf-16 | str trim --right "\u{0}"` | split row "\u{0}" }
-    REG_EXPAND_SZ => {|it| $it | decode utf-16 | str trim --right "\u{0}"}
+    REG_EXPAND_SZ => {|it| $it | decode utf-16 | str trim --right "\u{0}" | str expand-env-strings}
-    REG_LINK => {|it| $"($it)" }
+    REG_LINK => {|it| $it | decode utf-16 | str trim --right "\u{0}" }
-    REG_RESOURCE_LIST => {|it| $"($it)" }
-    REG_FULL_RESOURCE_DESCRIPTOR => {|it| $"($it)" }
-    REG_RESOURCE_REQUIREMENTS_LIST => {|it| $"($it)" }
+    REG_RESOURCE_LIST => {|it| $it }
+    REG_FULL_RESOURCE_DESCRIPTOR => {|it| $it }
+    REG_RESOURCE_REQUIREMENTS_LIST => {|it| $it }
     _ => (make error "invalid registry type")
 })}

The biggest reason for the net line add is the very explicit reasoning around calling ExpandEnvironmentStringsW and handling of REG_LINK and REG_.*RESOURCE_.*. The rest could potentially pass as just a code quality refactor.

redesigning registry query's interface

You should feel free.

In that case.... Give me a couple days to think through it a bit more and I'll have a concrete proposal.

Before giving it much more thought I'm considering roughly a shape of...
# Commands for interacting with the Windows registry.
#
# This command does nothing by itself.
# You must use one of the subcommands.
export def registry [] { help registry }

def "nu-complete registry hive" [context: string] {
    mut opts = []
    # note: only completes the short form lowercase,
    # but should accept long form, case insensitive
    if context !~ "--app " { $opts = $opts | append [
        {value: "hkcr", description: "the HKEY_CLASSES_ROOT hive"}
        {value: "hkcu", description: "the HKEY_CURRENT_USER hive"}
        {value: "hklm", description: "the HKEY_LOCAL_MACHINE hive"}
        {value: "hku", description: "the HKEY_USERS hive"}
        {value: "hkpd", description: "the HKEY_PERFORMANCE_DATA hive"}
        {value: "hkpt", description: "the HKEY_PERFORMANCE_TEXT hive"}
        {value: "hknls", description: "the HKEY_PERFORMANCE_NLSTEXT hive"}
        {value: "hkcc", description: "the HKEY_CURRENT_CONFIG hive"}
        {value: "hkdd", description: "the HKEY_DYN_DATA hive"}
        {value: "hkculs", description: "the HKEY_CURRENT_USER_LOCAL_SETTINGS hive"}
    ]}
    if context !~ "--sys " { $opts = $opts | append (
        # yes I know this doesn't actually work as-is
        nu-complete path $context
    )}
    $opts
}

# Query raw entries from the Windows registry.
export extern "registry query" [
    --sys # restrict query to system registry hives (incompatible with --app)
    --app # restrict query to application registry hives (incompatible with --sys)
    hive: string@"nu-complete registry hive" # name of a system hive or path to an application hive
    key?: string # registry keypath to query; empty string means the hive root
    value?: string # registry value to query; empty string means the default value
]: [
    <nothing> | registry query <string> <string?> -> <table<name: string, value: bytes, type: string>>
    <nothing> | registry query <string> <string> <string> -> <record<name: string, value: bytes, type: string>>
]

# Expands environment variable expansion placeholders.
#
# If neither switch is provided, defaults to expanding the OS native format.
export extern "str expand-env" [
    --windows # expand windows-style %VAR% placeholders
    --unix # expand unix-style $VAR placeholders
]: [
    list<string> -> list<string>
    string -> string
]
# and/or a new --env switch for "path expand"
# (e.g. Python provides as os.path.expandvars)

# Decode raw Windows registry entries.
export def "registry decode" []: [
    table<name: string, value: bytes, type: string> -> record
    record<name: string, value: bytes, type: string> -> <bytes or string or int>
] {
    let it = $in
    if ($it | describe | str replace -r "<.*" "") in [table list] {
        return $in | each { registry decode }
    }

    let decode = match $it.type {
        # as in the previous details panel's diff
    }
    do $decode $it.value
}
# or: "registry into record" and "registry into value", a la "into values"
# *maybe* teach it to decode registry performance queries

# Retrieve values from the Windows registry.
export def "registry get" [
    --sys # restrict query to system registry hives (incompatible with --app)
    --app # restrict query to application registry hives (incompatible with --sys)
    hive: string@"nu-complete registry hive" # name of a system hive or path to an application hive
    key?: string # registry keypath to query; empty string means the hive root
    value?: string # registry value to query; empty string means the default value
]: [
    <nothing> | registry get <string> <string?> -> <record>
    <nothing> | registry get <string> <string> <string> -> <bytes or string or int>
] {
    match [$sys $app] {
        [true true] => registry query --sys --app $hive $key $value
        [true false] => registry query --sys $hive $key $value
        [false true] => registry query --app $hive $key $value
        [false false] => registry query $hive $key $value
    } | registry decode
}

# And I do like the concept of a --depth N switch for recursively opening subkeys,
# but I don't know how that'd fit in, especially as IIUC subkey/value are separate namespaces

Need to think more about it but felt the need to write that sketch down, I guess.

Footnotes

  1. str expand-env-strings is an imaginary command which does ExpandEnvironmentStringsW (or equivalent) on Windows. I suppose it would expand $ENV placeholders on unixes?

@fdncred
Copy link
Collaborator

fdncred commented Oct 22, 2023

The thing is, the current behavior isn't really "without manipulation" either.

Technically, I agree. You are right. However, I'd still like to have a --raw, -r mode so that we don't expand %BLAH%, and things like that so, if people need that, we have it. Otherwise, there's no way to get that data in that format.

I look forward to your proposal.

When provided, we don't expand %ENV% placeholders in REG_EXPAND_SZ
registry values, restoring previous behavior.
@CAD97
Copy link
Contributor Author

CAD97 commented Oct 23, 2023

I added a --no-expand(-n) (cf. path expand --no-symlink)1 which disables %ENV% expansion to maybe unblock this while I think about what a more thorough solution might look like. Not to say that where this patch doesn't put us in a reasonable state; I believe this is fairly close to a local "arguably bugfixes" maxima.

> registry query Environment TEMP
╭───────┬─────────────────────────────────╮
│ name  │ TEMP                            │
│ value │ C:\Users\CAD\AppData\Local\Temp │
│ type  │ REG_EXPAND_SZ                   │
╰───────┴─────────────────────────────────╯
> registry query Environment TEMP --no-expand
╭───────┬──────────────────────────────────╮
│ name  │ TEMP                             │
│ value │ %USERPROFILE%\AppData\Local\Temp │
│ type  │ REG_EXPAND_SZ                    │
╰───────┴──────────────────────────────────╯

Footnotes

  1. I feel like --raw should turn off more than solely %ENV% expansion, so I went with the more specific switch given path expand as precedent.

@fdncred
Copy link
Collaborator

fdncred commented Oct 23, 2023

I'm good with this now too. Thanks. I like --raw because it's a common thing in nushell commands, but until we figure out what --raw means in this context, I'm fine with leaving this as is. Thanks!

@fdncred fdncred merged commit b5e09b8 into nushell:main Oct 23, 2023
20 checks passed
@sholderbach sholderbach added pr:breaking-change This PR implies a change affecting users and has to be noted in the release notes windows A Windows specific issue labels Oct 23, 2023
@CAD97 CAD97 deleted the registry branch October 23, 2023 15:40
@fdncred
Copy link
Collaborator

fdncred commented Nov 2, 2023

@CAD97 I just noticed that there are 2 unwraps() here https://github.com/nushell/nushell/pull/10806/files#diff-b395629258122f55118d05c97fc8449f9b307e4bc4f4f7558d7c9ac42b8d3db3R310-R322

I'm not sure how the CI didn't catch these. Do these just need to be unwrap_or(0) or what?

@CAD97
Copy link
Contributor Author

CAD97 commented Nov 3, 2023

The unwraps are effectively infallible; the precondition (the vtype is correct) is checked just before. This is where unwrap is most justified, but I understand if projects want to avoid bare unwrap. I'd swap it for .expect("registry value type should be ~") like the other conversion helpers use instead of an unwrap_or, as that's more honest.

If we do eventually decide to factor out some sort of registry decode, we would need to change the code to handle RegValue which might have insufficient data. Likely to a better backup behavior than just 0. The current code does assume the RegValue comes from reading a RegKey. (cf. gentoo90/winreg-rs#61 – this assumption is implicitly a soundness one, unfortunately, and one still making a further unsound (but typically okay) assumption at that. Maybe it would be better to bypass winreg's from_reg_value trait entirely here, with a comment explaining why…. Not to avoid the unwrap, that'd be making poor choices for religious "don't unwrap" reasons instead of good ones, but to avoid the unsoundness.)

@fdncred
Copy link
Collaborator

fdncred commented Nov 3, 2023

I guess nushell is in the religious "don't unwrap" camp since the CI is supposed to fail with any unwrap, except for tests. I'm not really sure how this PR squeaked by with unwraps. So, we have to change them. For now, I'll just make them expects and reference your points. Thanks for the input.

hardfau1t pushed a commit to hardfau1t/nushell that referenced this pull request Dec 14, 2023
r? @fdncred
Last one, I hope. At least short of completely redesigning `registry
query`'s interface. (Which I wouldn't implement without asking around
first.)

# Description

User-Facing Changes has the general overview. Inline comments provide a
lot of justification on specific choices. Most of the type conversions
should be reasonably noncontroversial, but expanding `REG_EXPAND_SZ`
needs some justification. First, an example of the behavior there:

```shell
> # release nushell:
> version | select version commit_hash | to md --pretty
| version | commit_hash                              |
| ------- | ---------------------------------------- |
| 0.85.0  | a6f62e0 |
> registry query --hkcu Environment TEMP | get value
%USERPROFILE%\AppData\Local\Temp

> # with this patch:
> version | select version commit_hash | to md --pretty
| version | commit_hash                              |
| ------- | ---------------------------------------- |
| 0.86.1  | 0c5a4c9 |
> registry query --hkcu Environment TEMP | get value
C:\Users\CAD\AppData\Local\Temp

> # Microsoft CLI tooling behavior:
> ^pwsh -c `(Get-ItemProperty HKCU:\Environment).TEMP`
C:\Users\CAD\AppData\Local\Temp
> ^reg query HKCU\Environment /v TEMP
HKEY_CURRENT_USER\Environment
    TEMP    REG_EXPAND_SZ    %USERPROFILE%\AppData\Local\Temp
```

As noted in the inline comments, I'm arguing that it makes more sense to
eagerly expand the %EnvironmentString% placeholders, as none of
Nushell's path functionality will interpret these placeholders. This
makes the behavior of `registry query` match the behavior of pwsh's
`Get-ItemProperty` registry access, and means that paths (the most
common use of `REG_EXPAND_SZ`) are actually usable.

This does *not* break nu_script's
[`update-path`](https://github.com/nushell/nu_scripts/blob/main/sourced/update-path.nu);
it will just be slightly inefficient as it will not find any
`%Placeholder%`s to manually expand anymore. But also, note that
`update-path` is currently *wrong*, as a path including
`%LocalAppData%Low` is perfectly valid and sometimes used (to go to
`Appdata\LocalLow`); expansion isn't done solely on a path segment
basis, as is implemented by `update-path`.

I believe that the type conversions implemented by this patch are
essentially always desired. But if we want to keep `registry query`
"pure", we could easily introduce a `registry get`[^get] which does the
more complete interpretation of registry types, and leave `registry
query` alone as doing the bare minimum. Or we could teach `path expand`
to do `ExpandEnvironmentStringsW`. But REG_EXPAND_SZ being the odd one
out of not getting its registry type semantics decoded by `registry
query` seems wrong.

[^get]: This is the potential redesign I alluded to at the top. One
potential change could be to make `registry get Environment` produce
`record<Path: string, TEMP: string, TMP: string>` instead of `registry
query`'s `table<name: string, value: string, type: string>`, the idea
being to make it feel as native as possible. We could even translate
between Nu's cell-path and registry paths -- cell paths with spaces do
actually work, if a bit awkwardly -- or even introduce lazy records so
the registry can be traversed with normal data manipulation ... but that
all seems a bit much.

# User-Facing Changes

- `registry query`'s produced `value` has changed. Specifically:
- ❗ Rows `where type == REG_EXPAND_SZ` now expand `%EnvironmentVarable%`
placeholders for you. For example, `registry query --hkcu Environment
TEMP | get value` returns `C:\Users\CAD\AppData\Local\Temp` instead of
`%USERPROFILE%\AppData\Local\Temp`.
- You can restore the old behavior and preserve the placeholders by
passing a new `--no-expand` switch.
- Rows `where type == REG_MULTI_SZ` now provide a `list<string>` value.
They previously had that same list, but `| str join "\n"`.
- Rows `where type == REG_DWORD_BIG_ENDIAN` now provide the correct
numeric value instead of a byte-swapped value.
- Rows `where type == REG_QWORD` now provide the correct numeric
value[^sign] instead of the value modulo 2<sup>32</sup>.
- Rows `where type == REG_LINK` now provide a string value of the link
target registry path instead of an internal debug string representation.
(This should never be visible, as links should be transparently
followed.)
- Rows `where type =~ RESOURCE` now provide a binary value instead of an
internal debug string representation.

[^sign]: Nu's `int` is a signed 64-bit integer. As such, values >=
2<sup>63</sup> will be reported as their negative two's compliment
value. This might sometimes be the correct interpretation -- the
registry does not distinguish between signed and unsigned integer values
-- but regedit and pwsh display all values as unsigned.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr:breaking-change This PR implies a change affecting users and has to be noted in the release notes windows A Windows specific issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants