`NKRecord.iter_values()` (sometimes?) decodes single-character and empty strings as an integer #267

EvanED · 2024-05-30T15:26:43Z

I have a registry file that I concocted (more later) that has a single key with five values in it. Those values should be as follows:

$ hivexget test.registry '\key' 
"valueempty"=""
"value0"="0"
"value1"="1"
"valuea"="A"
"valueaa"="AA"

All value types are set to REG_SZ.

However, regipy does not parse the single-character strings as, well, strings. Given this script:

from regipy import registry

hive = registry.RegistryHive("test.registry")
key = hive.get_key("key")

for value in key.iter_values():
    print(value)

the output is:

Value(name='valueempty', value=0, value_type='REG_SZ', is_corrupted=False)
Value(name='value0', value=48, value_type='REG_SZ', is_corrupted=False)
Value(name='value1', value=49, value_type='REG_SZ', is_corrupted=False)
Value(name='valuea', value=65, value_type='REG_SZ', is_corrupted=False)
Value(name='valueaa', value='AA', value_type='REG_SZ', is_corrupted=False)

For valueempty/value0/value1/valuea, the resulting .value is an integer rather than a string. Note that chr(48) == "0", chr(49) == "1", chr(65) == "A"; and then of course the "AA" value is correct in the output.

I've tracked this down to here, in registry.py:

    def iter_values(self, as_json=False, max_len=MAX_LEN, trim_values=True):
        ...
        for _ in range(self.values_count):
            ...
            with boomerang_stream(self._stream) as substream:
                ...
                if data_type in ['REG_SZ', 'REG_EXPAND', 'REG_EXPAND_SZ']:
                    if vk.data_size >= 0x80000000:
                        # data is contained in the data_offset field
                        value.size -= 0x80000000
                        actual_value = vk.data_offset                                      # <-----
                    elif vk.data_size > 0x3fd8 and value.value[:2] == b'db':
                        ...
                    ...
                ...

where there is no string conversion in this case, vk.data_offset is just an integer.

I assume that because REG_SZ is (typically) UTF-16, a one-character string plus null termination fits into a four-byte size field in the binary format, but a two-character string plus null termination does not, and that's why two characters and up seems to work.

At least for my test case, I can fix this with a bit cast of vk.data_offset into bytes, which I then try to decode following the example of the other parts of the if data_type in [....] true branch:

                    if vk.data_size >= 0x80000000:
                        # data is contained in the data_offset field
                        value.size -= 0x80000000
                        packed = struct.pack("=l", vk.data_offset)
                        actual_value = try_decode_binary(packed, as_json=False, trim_values=False)

(you'll need to import struct of course).

This seems to work with higher Unicode code points as well; if I add a value "valueunicode"="ሴ" then I get Value(name='valueunicode', value=4660, value_type='REG_SZ', is_corrupted=False) on the release version and Value(name='valueunicode', value='ሴ', value_type='REG_SZ', is_corrupted=False) with my fix. I'm not positive that the pack call shouldn't use <l instead of =l, or L instead of l, or i instead of l, etc. For reporting purposes I tried to be conservative and go big, and extra null bytes after the string wouldn't hurt.

I suspect there's the same problem with REG_BINARY just below:

                elif data_type in ['REG_BINARY', 'REG_NONE']:
                    if vk.data_size >= 0x80000000:
                        # data is contained in the data_offset field
                        actual_value = vk.data_offset             # actual_value = struct.pack(...) ?

but I don't have a test case for this and am not set up to easily create one.

Speaking of the test case, I constructed mine "manually" using libhivex. I am attaching it here. However, I have seen the empty string version of this (coming out with a value of 0 instead of "") in the wild, with my actual registry.

Finally:

$ python
Python 3.11.7 (main, Dec 14 2023, 01:49:40) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import regipy
>>> regipy.__version__
'4.2.1'

test.registry.zip

The text was updated successfully, but these errors were encountered:

EvanED · 2024-05-30T15:38:34Z

Incidentally, this is unrelated to the report above, but I noticed that in the big if-else chain switching on data_type there's a second REG_SZ case that I think can't be hit:

    if data_type in ['REG_SZ', 'REG_EXPAND', 'REG_EXPAND_SZ']:
        ...
    elif data_type in ['REG_BINARY', 'REG_NONE']:
        ...
    elif data_type == 'REG_SZ':      # <--- case subsumed by the first in the sequence
        actual_value = try_decode_binary(value.value, as_json=as_json, trim_values=trim_values)
    elif data_type == 'REG_DWORD':
        ...
    ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`NKRecord.iter_values()` (sometimes?) decodes single-character and empty strings as an integer #267

`NKRecord.iter_values()` (sometimes?) decodes single-character and empty strings as an integer #267

EvanED commented May 30, 2024

EvanED commented May 30, 2024 •

edited

Loading

NKRecord.iter_values() (sometimes?) decodes single-character and empty strings as an integer #267

NKRecord.iter_values() (sometimes?) decodes single-character and empty strings as an integer #267

Comments

EvanED commented May 30, 2024

EvanED commented May 30, 2024 • edited Loading

`NKRecord.iter_values()` (sometimes?) decodes single-character and empty strings as an integer #267

`NKRecord.iter_values()` (sometimes?) decodes single-character and empty strings as an integer #267

EvanED commented May 30, 2024 •

edited

Loading