-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
process.py locking up on certain pdfs #54
Comments
ps I was eyeing the CSB2 process.py but it was sufficiently different for me to give up on that idea (just replacing the spender-sandbox ones doesn't work) |
We see this too, extremely frequently. |
Hmm the workaround tip is much appreciated . Do you up wrap the self parse here? Or even more outside and somehow indicate part of static or all of it failed? (Would you mind sharing?) log.debug("Starting to load PDF") I was thinking of wrapping the results bit but might be too deep and things above might not like partially populated results Ala from interruptingcow import timeout log.debug("Starting to load PDF") except RuntimeError: return results While on the subject of pdfs, @fryyyy - do you also often get corrupted pdfs locking up reader on opening in the guest VM and producing large bson logs that consume a lot of memory on csb host processing? Any thoughts on that (I just increased RAM in line with number of parallel tasks, and reduced Max log size, but that doesn't make me feel fuzzy) |
Unfortunately I'm not permitted to share code, though I am working on getting permission. |
I see. No worries and thanks again |
If someone gives me a hash for one of these PDFs, I can fix the actual problem instead of working around it. -Brad |
Thanks Brad, I think fryyyy and I assumed or Fry may have checked it's upstream in peepdf. (I suppose checking is one break or print statement away.) Hence looking for workarounds I'll have a look tomorrow and if already on malwr or ha shared, will supply hash. if it's a public unrelated to employer happy to share. I can probably try to get permission to share a public related one but it'd have to go out of band for an archive password. Is the latter an option? |
Ps are you sure the pdf specific issue aside there shouldn't be a timeout processing or per processing module or signature and reporting feature (or not worth it /normally works?) ? I'd rather find out from syslog alerts saying warning task processing failed then find all tasks hung next time (from an ops monitoring perspective bounded run times feel all warm and fuzzy inside ) |
I've modified static.py to dump the PDFs out to a temp directory when it triggers the timeout. Once we've got a few, I'll see if I can get some cleared for release or at least do some analysis on them to see if there's anything similar about them. |
3eaef2ca2c9d29e936919c7c6f8e5614aef6edf8cec6c92008291bafea0388d0 took more than a minute to statically analyse, which hits our timeout. Edit for additional samples: 1e3db20bb77178cabe8e32a47510a027bb38bc585ed02a95052e3965ac9a9b26 |
i seemed to stumble upon one of these every once inna while. Here's the latest one where i turned up as much debug as I could. The jserror log does not have any time stamps though. If you need the PDF just let me know. SHA256 2ab11d83ae2cbd12f0f6c30aacad8a8e16df5255646d08e923054b9f521c4b83 |
ps if anyone's interested in the specifics of what Fryyy was suggesting , what we've done for now as a workaround is wrap the whole static analysis case statement in interrupting cow. (appreciate the Tip Fry). Could optionally copy these off, but for now we just print task id info if required later. in case anyone wants it: pip install interruptingcow vi modules/processing/static.py add new import: replace Static(Processing): with a wrapped version.
|
here's a PDF that I can consistently get PDFParser to stall out on. let me know if you need more information. ed2732bce8351b839e924a0cf5512ce90fcdfd4274796824bab40eb2d1850ff0.pdf |
garrr.. i'm still having problems with this. @mallorybobalice - i tried your wrapper but it doesn't seem to send an interrupt? any help with this would be great - thinking about just restarting the processing script every hour or slapping something together to regenerate unprocessed tasks. |
Hmm processing log please.(or cuckoo if processing is not done by I utils) Mine interrupts ok. You sure you definitely installed ic and added the import aside from the wrapper? Thing is restarting processing won't help if it tries the same file again. Interrupting partially fails static analysis but continues |
Ps and a copy of the modified modules/processing/static.py |
And a sample freezing pdf to try if different than above |
Ps but really we need to open an issue here https://github.com/jesparza/peepdf |
That said, personally I'd rather have the interrupt code to make static analysis time bounded |
`class Static(Processing):
` `==> jserror.log <== ==> errors.txt <== ==> jserror.log <== Additionally and strangely, this seems to happen the most when i submit things over the API which i didn't think should matter but I see the behavior a lot when i submit that way. |
File "/usr/local/lib/python2.7/dist-packages/interruptingcow/_init.py", line 24, in handler raise exceptionRuntimeError That sounds like interrupting cow is working? The expectation is it issues a runtime error after hitting the timeout |
Although, the line number is a bit weird 24 Same version? |
i have analysed the pdf posted by housemusic42 a time ago. Processing ( 60.978 seconds ) System is Manjaro/Arch with very very latest versions of modules installed. i had uploaded the whole storage folder, maybe i can be helpful for you guys .) |
same problem here. I've to cuckoos (supposed to have the same config and updates) but one the processing ends with no problem with other the process locks on pdfparser.py, |
which bits are from Dider Stevens - pdfid? |
both: pdfparser and pdfid |
hi,
we seem to have a 'lucky stream' of corrupted PCAP extracts where PDFs cause the processing to hang indefinitely...(normally executable or other PDFs submit, process and report ok)
While I'll try to post a PCAP, that might not be possible (can Brad message me privately re that?) .
Anyhow, we're running processing tasks not within the main cuckoo process, but via the utils/process.py helper.
e.g. /utils/process.py -d -p 7 auto
eventually enough of those PDFs queue up that processing halts completely as all process processes hang if enough corrupted PDFs get submitted.
2016-02-18 23:03:57,947 [modules.processing.static] DEBUG: Starting to load PDF
2016-02-18 23:04:00,365 [modules.processing.static] DEBUG: About to parse with PDFParser
2016-02-18 23:04:03,622 [modules.processing.static] DEBUG: About to parse with PDFParser
2016-02-18 23:04:06,989 [modules.processing.static] DEBUG: About to parse with PDFParser
2016-02-18 23:04:07,711 [modules.processing.static] DEBUG: About to parse with PDFParser
2016-02-18 23:04:18,115 [modules.processing.static] DEBUG: About to parse with PDFParser
2016-02-18 23:04:18,923 [modules.processing.static] DEBUG: About to parse with PDFParser
2016-02-18 23:04:19,794 [modules.processing.static] DEBUG: About to parse with PDFParser
^and then nothing will happen , and without extra debug statements in static.py after instantiating pdf parser and trying to see if it freezes there or on result processing not too sure. Suspect in PDFParser/peepdf PDFCore
occasionally it seems to restart processing (probably if i kick it over )and then the same set of tasks freezes again
as far as I can tell static analysis happens via peepdf (that's a bit newer in their repo by the way ? (using that still makes processing freeze ). While on the subject - did they merge the google summer of code pdf malscore changes and did we pull them into csb or brad-csb?)
*So anyhow , was hoping there was a quick workaround someone (who knows a bit more python and about safely killing processing tasks and marking them as processing or reporting failed) can help with, for example a watchdog timer in process.py that we get an extra config setting for. and we see processing taking over 600 seconds that task gets marked as failed and the helper process for that task ID terminates itself, preferably safely. *
thought/please help/questions?
Mb.
The text was updated successfully, but these errors were encountered: