Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbound variable when scanning PDF with hex characters #10

Closed
AbdelrahmanKhaledAmer opened this issue Jun 6, 2023 · 1 comment
Closed

Comments

@AbdelrahmanKhaledAmer
Copy link

If a PDF is given with hex characters (for example obfuscated JS tags like /JavaScript --> /#4AavaScript), the following error is encountered:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/worker/venv/lib/python3.10/site-packages/pdfid/pdfid.py", line 1096, in PDFiDMain
    ProcessFile(filename, options, plugins, list_of_dict["reports"], disarmed_buffers["buffers"])
  File "/home/worker/venv/lib/python3.10/site-packages/pdfid/pdfid.py", line 819, in ProcessFile
    PDFID2Dict(xmlDoc, options.nozero, options.force, list_of_dict)
  File "/home/worker/venv/lib/python3.10/site-packages/pdfid/pdfid.py", line 698, in PDFID2Dict
    filename_dict['%s_hexcode_count' % name] = int(node.getAttribute('HexcodeCount'))
NameError: name 'name' is not defined

The bit of code responsible for this is in the function PDFID2Dict here where in line 698 it references a variable name that does not exist within the scope of the function (or anywhere else for that matter):

pdfid/pdfid/pdfid.py

Lines 683 to 720 in f7674ff

def PDFID2Dict(xmlDoc, nozero, force, list_of_dict):
filename_dict = {}
filename_dict['version'] = xmlDoc.documentElement.getAttribute('Version')
filename_dict['filename'] = xmlDoc.documentElement.getAttribute('Filename')
if xmlDoc.documentElement.getAttribute('ErrorOccured') == 'True':
filename_dict['error_occured'] = xmlDoc.documentElement.getAttribute('ErrorMessage')
return
if not force and xmlDoc.documentElement.getAttribute('IsPDF') == 'False':
filename_dict['error_occured'] = ' Not a PDF document\n'
return
filename_dict['header'] = xmlDoc.documentElement.getAttribute('Header')
for node in xmlDoc.documentElement.getElementsByTagName('Keywords')[0].childNodes:
if not nozero or nozero and int(node.getAttribute('Count')) > 0:
filename_dict[node.getAttribute('Name')] = int(node.getAttribute('Count'))
if int(node.getAttribute('HexcodeCount')) > 0:
filename_dict['%s_hexcode_count' % name] = int(node.getAttribute('HexcodeCount'))
if xmlDoc.documentElement.getAttribute('CountEOF') != '':
filename_dict['eof'] = int(xmlDoc.documentElement.getAttribute('CountEOF'))
if xmlDoc.documentElement.getAttribute('CountCharsAfterLastEOF') != '':
filename_dict['after_last_eof'] = int(xmlDoc.documentElement.getAttribute('CountCharsAfterLastEOF'))
for node in xmlDoc.documentElement.getElementsByTagName('Dates')[0].childNodes:
filename_dict[node.getAttribute('Value')] = node.getAttribute('Name')
if xmlDoc.documentElement.getAttribute('TotalEntropy') != '':
filename_dict['entropy'] = {
"total": xmlDoc.documentElement.getAttribute('TotalEntropy'),
"bytes": '%10s bytes' % xmlDoc.documentElement.getAttribute('TotalCount')
}
if xmlDoc.documentElement.getAttribute('StreamEntropy') != '':
filename_dict['entropy_inside_streams'] = {
"total": xmlDoc.documentElement.getAttribute('StreamEntropy'),
"bytes": '%10s bytes' % xmlDoc.documentElement.getAttribute('StreamCount')
}
if xmlDoc.documentElement.getAttribute('NonStreamEntropy') != '':
filename_dict['entropy_outside_streams'] = {
"total": xmlDoc.documentElement.getAttribute('NonStreamEntropy'),
"bytes": '%10s bytes' % xmlDoc.documentElement.getAttribute('NonStreamCount')
}
list_of_dict.append(filename_dict)

I cannot provide a fix since I do not know what name is supposed to be in the first place. If anyone can help, that would be much appreciated. :)

@mlodic mlodic closed this as completed in f46f702 Jun 6, 2023
@mlodic
Copy link
Owner

mlodic commented Jun 6, 2023

I made a fix and created a release with a fix. Please try it out with your sample. It should work now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants