The swiss army knife of PDF Analysis Tools. Based on peepdf - http://peepdf.eternal-todo.com. This README builds on the peepdf README.
This tool was developed as part of my M.Sc dissertation/project for the School of Computing (University of Kent, Canterbury, UK). The man behind the idea for this tool was Julio Hernandez-Castro (www.azhala.com).
- In order to crack passwords:
- PdfCrack needed (apt-get install pdfcrack)
- In order to remove DRM (editing, copying Etc.):
- Calibre's ebook-convert needed (apt-get install calibre)
- In order to decrypt PDFs:
- qpdf needed (apt-get install qpdf)
- In order to use the command redact:
- NLTK (Natural Language ToolKit) needed (apt-get install python-nltk)
- Java (Stanford parser is written in Java) needed (apt-get install default-jre)
- To support XML output "lxml" is needed:
- Included modules: lzw, colorama, jsbeautifier, ccitt, pythonaes (Thanks to all the developers!!)
No installation is needed apart of the commented dependencies, just execute:
There are two important options when ParanoidF is executed:
-f: Ignores the parsing errors. Analysing malicious files propably leads to parsing errors, so this parameter should be set. -l: Sets the loose mode, so does not search for the endobj tag because it's not obligatory. Helpful with malformed files.
- Simple execution
Shows the statistics of the file after being decoded/decrypted and analysed:
python paranoiDF.py [options] pdf_file
- Interactive console
Executes the interactive console, giving a wide range of tools to play with.
python paranoiDF.py -i
- Batch execution
It's possible to use a commands file to specify the commands to be executed in the batch mode. This type of execution is good to automatise analysis of several files:
python paranoiDF.py [options] -s script_file
If the information shown when a PDF file is parsed is not enough to know if it's harmful or not, the following commands can help to do it:
Shows the tree graph of the file or specified version. Here we can see suspicious elements.
Shows the physical map of the file or the specified version of the document. This is helpful to see unusual big objects or big spaces between objects.
Search the specified string or hexadecimal string in the objects (decoded and encrypted streams included).
Shows the (raw) content of the object.
Shows the (raw) content of the stream.
TODO (with date that I intend to start work on)
- Refine and test redaction thresholds: 30/08/2014
- Add automation of retrieval of redaction box information such as font size, font and redaction box coordinates: 30/08/2014
- Add other encryption algorithms (such as the AES): 1/10/2014
- Digital Signatures analysis: 1/10/2014
- Add a GUI
Feel free to send bugs/criticisms/praises/comments to patrickdw123(at)gmail(dot)com.