Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NKRecord.iter_values() (sometimes?) decodes single-character and empty strings as an integer #267

Open
EvanED opened this issue May 30, 2024 · 1 comment

Comments

@EvanED
Copy link

EvanED commented May 30, 2024

I have a registry file that I concocted (more later) that has a single key with five values in it. Those values should be as follows:

$ hivexget test.registry '\key' 
"valueempty"=""
"value0"="0"
"value1"="1"
"valuea"="A"
"valueaa"="AA"

All value types are set to REG_SZ.

However, regipy does not parse the single-character strings as, well, strings. Given this script:

from regipy import registry

hive = registry.RegistryHive("test.registry")
key = hive.get_key("key")

for value in key.iter_values():
    print(value)

the output is:

Value(name='valueempty', value=0, value_type='REG_SZ', is_corrupted=False)
Value(name='value0', value=48, value_type='REG_SZ', is_corrupted=False)
Value(name='value1', value=49, value_type='REG_SZ', is_corrupted=False)
Value(name='valuea', value=65, value_type='REG_SZ', is_corrupted=False)
Value(name='valueaa', value='AA', value_type='REG_SZ', is_corrupted=False)

For valueempty/value0/value1/valuea, the resulting .value is an integer rather than a string. Note that chr(48) == "0", chr(49) == "1", chr(65) == "A"; and then of course the "AA" value is correct in the output.


I've tracked this down to here, in registry.py:

    def iter_values(self, as_json=False, max_len=MAX_LEN, trim_values=True):
        ...
        for _ in range(self.values_count):
            ...
            with boomerang_stream(self._stream) as substream:
                ...
                if data_type in ['REG_SZ', 'REG_EXPAND', 'REG_EXPAND_SZ']:
                    if vk.data_size >= 0x80000000:
                        # data is contained in the data_offset field
                        value.size -= 0x80000000
                        actual_value = vk.data_offset                                      # <-----
                    elif vk.data_size > 0x3fd8 and value.value[:2] == b'db':
                        ...
                    ...
                ...

where there is no string conversion in this case, vk.data_offset is just an integer.

I assume that because REG_SZ is (typically) UTF-16, a one-character string plus null termination fits into a four-byte size field in the binary format, but a two-character string plus null termination does not, and that's why two characters and up seems to work.

At least for my test case, I can fix this with a bit cast of vk.data_offset into bytes, which I then try to decode following the example of the other parts of the if data_type in [....] true branch:

                    if vk.data_size >= 0x80000000:
                        # data is contained in the data_offset field
                        value.size -= 0x80000000
                        packed = struct.pack("=l", vk.data_offset)
                        actual_value = try_decode_binary(packed, as_json=False, trim_values=False)

(you'll need to import struct of course).

This seems to work with higher Unicode code points as well; if I add a value "valueunicode"="ሴ" then I get Value(name='valueunicode', value=4660, value_type='REG_SZ', is_corrupted=False) on the release version and Value(name='valueunicode', value='ሴ', value_type='REG_SZ', is_corrupted=False) with my fix. I'm not positive that the pack call shouldn't use <l instead of =l, or L instead of l, or i instead of l, etc. For reporting purposes I tried to be conservative and go big, and extra null bytes after the string wouldn't hurt.


I suspect there's the same problem with REG_BINARY just below:

                elif data_type in ['REG_BINARY', 'REG_NONE']:
                    if vk.data_size >= 0x80000000:
                        # data is contained in the data_offset field
                        actual_value = vk.data_offset             # actual_value = struct.pack(...) ?

but I don't have a test case for this and am not set up to easily create one.


Speaking of the test case, I constructed mine "manually" using libhivex. I am attaching it here. However, I have seen the empty string version of this (coming out with a value of 0 instead of "") in the wild, with my actual registry.

Finally:

$ python
Python 3.11.7 (main, Dec 14 2023, 01:49:40) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import regipy
>>> regipy.__version__
'4.2.1'

test.registry.zip

@EvanED
Copy link
Author

EvanED commented May 30, 2024

Incidentally, this is unrelated to the report above, but I noticed that in the big if-else chain switching on data_type there's a second REG_SZ case that I think can't be hit:

    if data_type in ['REG_SZ', 'REG_EXPAND', 'REG_EXPAND_SZ']:
        ...
    elif data_type in ['REG_BINARY', 'REG_NONE']:
        ...
    elif data_type == 'REG_SZ':      # <--- case subsumed by the first in the sequence
        actual_value = try_decode_binary(value.value, as_json=as_json, trim_values=trim_values)
    elif data_type == 'REG_DWORD':
        ...
    ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant