Unbound variable when scanning PDF with hex characters #10

AbdelrahmanKhaledAmer · 2023-06-06T13:22:41Z

If a PDF is given with hex characters (for example obfuscated JS tags like /JavaScript --> /#4AavaScript), the following error is encountered:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/worker/venv/lib/python3.10/site-packages/pdfid/pdfid.py", line 1096, in PDFiDMain
    ProcessFile(filename, options, plugins, list_of_dict["reports"], disarmed_buffers["buffers"])
  File "/home/worker/venv/lib/python3.10/site-packages/pdfid/pdfid.py", line 819, in ProcessFile
    PDFID2Dict(xmlDoc, options.nozero, options.force, list_of_dict)
  File "/home/worker/venv/lib/python3.10/site-packages/pdfid/pdfid.py", line 698, in PDFID2Dict
    filename_dict['%s_hexcode_count' % name] = int(node.getAttribute('HexcodeCount'))
NameError: name 'name' is not defined

The bit of code responsible for this is in the function PDFID2Dict here where in line 698 it references a variable name that does not exist within the scope of the function (or anywhere else for that matter):

pdfid/pdfid/pdfid.py

Lines 683 to 720 in f7674ff

    
           def PDFID2Dict(xmlDoc, nozero, force, list_of_dict): 
        
               filename_dict = {} 
        
               filename_dict['version'] = xmlDoc.documentElement.getAttribute('Version') 
        
               filename_dict['filename'] = xmlDoc.documentElement.getAttribute('Filename') 
        
               if xmlDoc.documentElement.getAttribute('ErrorOccured') == 'True': 
        
                   filename_dict['error_occured'] = xmlDoc.documentElement.getAttribute('ErrorMessage') 
        
                   return 
        
               if not force and xmlDoc.documentElement.getAttribute('IsPDF') == 'False': 
        
                   filename_dict['error_occured'] = ' Not a PDF document\n' 
        
                   return 
        
               filename_dict['header'] = xmlDoc.documentElement.getAttribute('Header') 
        
               for node in xmlDoc.documentElement.getElementsByTagName('Keywords')[0].childNodes: 
        
                   if not nozero or nozero and int(node.getAttribute('Count')) > 0: 
        
                       filename_dict[node.getAttribute('Name')] = int(node.getAttribute('Count')) 
        
                       if int(node.getAttribute('HexcodeCount')) > 0: 
        
                           filename_dict['%s_hexcode_count' % name] = int(node.getAttribute('HexcodeCount')) 
        
               if xmlDoc.documentElement.getAttribute('CountEOF') != '': 
        
                   filename_dict['eof'] = int(xmlDoc.documentElement.getAttribute('CountEOF')) 
        
               if xmlDoc.documentElement.getAttribute('CountCharsAfterLastEOF') != '': 
        
                   filename_dict['after_last_eof'] = int(xmlDoc.documentElement.getAttribute('CountCharsAfterLastEOF')) 
        
               for node in xmlDoc.documentElement.getElementsByTagName('Dates')[0].childNodes: 
        
                   filename_dict[node.getAttribute('Value')] = node.getAttribute('Name') 
        
               if xmlDoc.documentElement.getAttribute('TotalEntropy') != '': 
        
                   filename_dict['entropy'] = { 
        
                       "total": xmlDoc.documentElement.getAttribute('TotalEntropy'), 
        
                       "bytes": '%10s bytes' % xmlDoc.documentElement.getAttribute('TotalCount') 
        
                   } 
        
               if xmlDoc.documentElement.getAttribute('StreamEntropy') != '': 
        
                   filename_dict['entropy_inside_streams'] = { 
        
                       "total": xmlDoc.documentElement.getAttribute('StreamEntropy'), 
        
                       "bytes": '%10s bytes' % xmlDoc.documentElement.getAttribute('StreamCount') 
        
                   } 
        
               if xmlDoc.documentElement.getAttribute('NonStreamEntropy') != '': 
        
                   filename_dict['entropy_outside_streams'] = { 
        
                       "total": xmlDoc.documentElement.getAttribute('NonStreamEntropy'), 
        
                       "bytes": '%10s bytes' % xmlDoc.documentElement.getAttribute('NonStreamCount') 
        
                   } 
        
               list_of_dict.append(filename_dict)

I cannot provide a fix since I do not know what name is supposed to be in the first place. If anyone can help, that would be much appreciated. :)

The text was updated successfully, but these errors were encountered:

mlodic · 2023-06-06T14:12:22Z

I made a fix and created a release with a fix. Please try it out with your sample. It should work now

mlodic closed this as completed in f46f702 Jun 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unbound variable when scanning PDF with hex characters #10

Unbound variable when scanning PDF with hex characters #10

AbdelrahmanKhaledAmer commented Jun 6, 2023

mlodic commented Jun 6, 2023

Unbound variable when scanning PDF with hex characters #10

Unbound variable when scanning PDF with hex characters #10

Comments

AbdelrahmanKhaledAmer commented Jun 6, 2023

mlodic commented Jun 6, 2023