Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DBIStream: true number of NameRef is in the sum of cRefCnt #61

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

psrok1
Copy link

@psrok1 psrok1 commented Jul 30, 2024

Hi and thanks for the great library!

I found that when I try to parse PDB for combase.dll with GUID 6c146f310d333559974d1d5d3fa2e4da1, it fails to decode some strings contained in DBI stream structures.

File "/opt/venvs/drakrun/lib/python3.8/site-packages/pdbparse/init.py", line 554, in parse
return PDB7(f, fast_load)
File "/opt/venvs/drakrun/lib/python3.8/site-packages/pdbparse/init.py", line 521, in __init__
self.read_root(self.root_stream)
File "/opt/venvs/drakrun/lib/python3.8/site-packages/pdbparse/init.py", line 460, in read_root
pdb_cls(
File "/opt/venvs/drakrun/lib/python3.8/site-packages/pdbparse/init.py", line 154, in __init__
self.load()
File "/opt/venvs/drakrun/lib/python3.8/site-packages/pdbparse/init.py", line 276, in load
debug = dbi.parse_stream(self.stream_file)
File "/opt/venvs/drakrun/lib/python3.8/site-packages/pdbparse/dbi.py", line 160, in parse_stream
Name = ("Name" / CString(encoding = "utf8")).parse(Names[NameRef[j]:])
...
File "/opt/venvs/drakrun/lib/python3.8/site-packages/construct/core.py", line 1490, in _decode
return obj.decode(self.encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa5 in position 0: invalid start byte

The reason is that cRefCnt is incorrect number of names when the true number exceeds 64K (this field is pretty short, just 16-bit). This behavior is documented here: https://llvm.org/docs/PDB/DbiStream.html#file-info-substream

NumSourceFiles: In theory this is supposed to contain the number of source files for which this substream contains information. But that would present a problem in that the width of this field being 16-bits would prevent one from having more than 64K source files in a program. In early versions of the file format, this seems to have been the case. In order to support more than this, this field of the is simply ignored, and computed dynamically by summing up the values of the ModFileCounts array (discussed below). In short, this value should be ignored.
FileNameOffsets - An array of NumSourceFiles integers (where NumSourceFiles here refers to the 32-bit value obtained from summing ModFileCountArray), where each integer is an offset into NamesBuffer pointing to a null terminated string.

After fix, combase.pdb is parsed correctly.

@psrok1
Copy link
Author

psrok1 commented Jul 30, 2024

By the way, I temporarily merged your library code into https://github.com/CERT-Polska/drakpdb as you haven't made any releases for longer time and I can't pin to Git commit if I want to publish dependent package on PyPi.

I need to say that I really like the simplicity of your library and the fact that it doesn't give up when the new, unknown structure or leaf type is reached. I have tested few libraries on current Windows PDBs and pdbparse is the only library so far that is able to deliver basic information about exports and simple structures. I have tried the other solutions like:

So I hope you're still interested in maintaining this library and I think I will be coming back with patches from time to time. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant