You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For valueempty/value0/value1/valuea, the resulting .value is an integer rather than a string. Note that chr(48) == "0", chr(49) == "1", chr(65) == "A"; and then of course the "AA" value is correct in the output.
I've tracked this down to here, in registry.py:
defiter_values(self, as_json=False, max_len=MAX_LEN, trim_values=True):
...
for_inrange(self.values_count):
...
withboomerang_stream(self._stream) assubstream:
...
ifdata_typein ['REG_SZ', 'REG_EXPAND', 'REG_EXPAND_SZ']:
ifvk.data_size>=0x80000000:
# data is contained in the data_offset fieldvalue.size-=0x80000000actual_value=vk.data_offset# <-----elifvk.data_size>0x3fd8andvalue.value[:2] ==b'db':
...
...
...
where there is no string conversion in this case, vk.data_offset is just an integer.
I assume that because REG_SZ is (typically) UTF-16, a one-character string plus null termination fits into a four-byte size field in the binary format, but a two-character string plus null termination does not, and that's why two characters and up seems to work.
At least for my test case, I can fix this with a bit cast of vk.data_offset into bytes, which I then try to decode following the example of the other parts of the if data_type in [....] true branch:
ifvk.data_size>=0x80000000:
# data is contained in the data_offset fieldvalue.size-=0x80000000packed=struct.pack("=l", vk.data_offset)
actual_value=try_decode_binary(packed, as_json=False, trim_values=False)
(you'll need to import struct of course).
This seems to work with higher Unicode code points as well; if I add a value "valueunicode"="ሴ" then I get Value(name='valueunicode', value=4660, value_type='REG_SZ', is_corrupted=False) on the release version and Value(name='valueunicode', value='ሴ', value_type='REG_SZ', is_corrupted=False) with my fix. I'm not positive that the pack call shouldn't use <l instead of =l, or L instead of l, or i instead of l, etc. For reporting purposes I tried to be conservative and go big, and extra null bytes after the string wouldn't hurt.
I suspect there's the same problem with REG_BINARY just below:
elifdata_typein ['REG_BINARY', 'REG_NONE']:
ifvk.data_size>=0x80000000:
# data is contained in the data_offset fieldactual_value=vk.data_offset# actual_value = struct.pack(...) ?
but I don't have a test case for this and am not set up to easily create one.
Speaking of the test case, I constructed mine "manually" using libhivex. I am attaching it here. However, I have seen the empty string version of this (coming out with a value of 0 instead of "") in the wild, with my actual registry.
Finally:
$ python
Python 3.11.7 (main, Dec 14 2023, 01:49:40) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import regipy
>>> regipy.__version__
'4.2.1'
Incidentally, this is unrelated to the report above, but I noticed that in the big if-else chain switching on data_type there's a second REG_SZ case that I think can't be hit:
ifdata_typein ['REG_SZ', 'REG_EXPAND', 'REG_EXPAND_SZ']:
...
elifdata_typein ['REG_BINARY', 'REG_NONE']:
...
elifdata_type=='REG_SZ': # <--- case subsumed by the first in the sequenceactual_value=try_decode_binary(value.value, as_json=as_json, trim_values=trim_values)
elifdata_type=='REG_DWORD':
...
...
I have a registry file that I concocted (more later) that has a single key with five values in it. Those values should be as follows:
All value types are set to
REG_SZ
.However,
regipy
does not parse the single-character strings as, well, strings. Given this script:the output is:
For
valueempty/value0
/value1
/valuea
, the resulting.value
is an integer rather than a string. Note thatchr(48) == "0"
,chr(49) == "1"
,chr(65) == "A"
; and then of course the "AA" value is correct in the output.I've tracked this down to here, in
registry.py
:where there is no string conversion in this case,
vk.data_offset
is just an integer.I assume that because
REG_SZ
is (typically) UTF-16, a one-character string plus null termination fits into a four-byte size field in the binary format, but a two-character string plus null termination does not, and that's why two characters and up seems to work.At least for my test case, I can fix this with a bit cast of
vk.data_offset
into bytes, which I then try to decode following the example of the other parts of theif data_type in [....]
true branch:(you'll need to
import struct
of course).This seems to work with higher Unicode code points as well; if I add a value
"valueunicode"="ሴ"
then I getValue(name='valueunicode', value=4660, value_type='REG_SZ', is_corrupted=False)
on the release version andValue(name='valueunicode', value='ሴ', value_type='REG_SZ', is_corrupted=False)
with my fix. I'm not positive that thepack
call shouldn't use<l
instead of=l
, orL
instead ofl
, ori
instead ofl
, etc. For reporting purposes I tried to be conservative and go big, and extra null bytes after the string wouldn't hurt.I suspect there's the same problem with
REG_BINARY
just below:but I don't have a test case for this and am not set up to easily create one.
Speaking of the test case, I constructed mine "manually" using
libhivex
. I am attaching it here. However, I have seen the empty string version of this (coming out with a value of0
instead of""
) in the wild, with my actual registry.Finally:
test.registry.zip
The text was updated successfully, but these errors were encountered: