python process pileup, needs reproduction steps. #150

vadi2 · 2019-02-09T06:47:44Z

Terrible issue title but we'll fix it to something better (or close it) with more investigation.

I left clusterfuzz running overnight trying to find heartbleed and the fuzzer seems to have hung according to the bot logs:

2019-02-09 03:04:35,256 - run_bot - INFO - Locating generated test cases.
2019-02-09 03:04:35,257 - run_bot - INFO - Generated 4/4 testcases.
2019-02-09 03:04:35,269 - run_bot - INFO - Uploaded file to logs bucket.
2019-02-09 03:04:35,331 - run_bot - INFO - Recorded use of fuzz target libFuzzer_handshake-fuzzer.
2019-02-09 03:04:38,720 - run_bot - INFO - 503 corpus files for target handshake-fuzzer synced to disk.
2019-02-09 03:04:38,768 - run_bot - INFO - Starting to process testcases.
2019-02-09 03:04:38,768 - run_bot - INFO - Redzone is 64 bytes.
2019-02-09 03:04:38,769 - run_bot - INFO - Timeout multiplier is 1.0.
2019-02-09 03:04:38,773 - run_bot - INFO - App launch command is python /home/vadi/Programs/Mudlet/mudlet1/clusterfuzz/src/python/bot/fuzzers/libFuzzer/launcher.py .
2019-02-09 04:25:11,600 - run_bot - WARNING - Hang detected.
None
2019-02-09 04:25:11,600 - run_bot - INFO - Upto 1
2019-02-09 05:45:44,376 - run_bot - WARNING - Hang detected.
None
2019-02-09 05:45:44,376 - run_bot - INFO - Upto 2
2019-02-09 07:06:17,193 - run_bot - WARNING - Hang detected.
None
2019-02-09 07:06:17,194 - run_bot - INFO - Upto 3

(full log)

This has resulted in quite a few Python processes from clusterfuzz maxing out the CPU from the system and the clusterfuzz server from what it looks like trying to stop somewhat frantically: https://paste.ubuntu.com/p/b8JW2sX44w/

The end result of this is that the web interface is unresponsive and all calls time out with a 503.

The text was updated successfully, but these errors were encountered:

inferno-chromium · 2019-02-09T06:59:37Z

It already found the heartbleed, see logs

2019-02-08 14:40:40,543 - run_bot - INFO - Process the crash group (file=fuzz-1, fuzzed_key=25002300-d91f-4b14-b894-6e2a5f4ed858, return code=1, crash time=4, crash type=Heap-buffer-overflow
READ {*}, crash state=tls1_process_heartbeat
ssl3_read_bytes
ssl3_get_message

and there should be testcase on localhost:9000.

can you paste the job definitions you have, maybe some processes are piling up. Did you modify any templates or anything ? Are you running AFL or libFuzzer ? Need more reproducer instructions.

inferno-chromium · 2019-02-09T07:02:15Z

can you also start from a clean slate and see if you can reproduce. like try pkill -9 -f clusterfuzz and pkill -9 -f gcloud.

vadi2 · 2019-02-09T07:04:36Z

I followed the steps exactly for heartbleed, should be the same as on the wiki. It says it found heartbleed right away - but I was checking the test cases in the UI and it was empty from 2pm - 12am. Did not change any templates.

inferno-chromium · 2019-02-09T07:08:04Z

this could be related to your broken config as in #136. your ports seem to be not free, can you try a restart and see if you can still reproduce.

inferno-chromium · 2019-02-09T07:10:50Z

also which platform and os version you are running.

vadi2 · 2019-02-09T07:13:35Z

I'm doubtful I have those ports blocked, as I showed checking the ports right after clusterfuzz tried using them revealed them to be unused. I restarted anyway!

I'm on Ubuntu 18.04 LTS.

vadi2 · 2019-02-09T07:16:29Z

I've restarted and heartbleed indeed now shows up in the test cases. Should I do anything else?

vadi2 · 2019-02-09T07:17:17Z

Metadata says the following:

	[2019-02-08 13:06:35 UTC] mudlet1: Fuzz task : Fuzzer libFuzzer_handshake-fuzzer generated testcase crashed in 3 seconds (r1). 
	[2019-02-08 13:09:01 UTC] mudlet1: Minimize task started. 
	[2019-02-08 13:24:16 UTC] mudlet1: Minimize task finished. 
	[2019-02-09 00:19:37 UTC] mudlet1: Progression task started. 
	[2019-02-09 00:19:39 UTC] mudlet1: Progression task finished.

Which could explain why I didn't see the test case for many hours.

inferno-chromium · 2019-02-09T07:23:29Z

Something was up with clearing ports. Basically, run_bot could not send testcase to run_server since run_server wasn't running and failing on not being able to find the port. run_bot was finding the crash fine, but run_server didnt get it.
Right now, your log in last message is correct. so everything should be working fine. if you ever hit this again, try remembering the steps. use ctrl+c to kill run_server and run_bot, not kill a particular process id. we can reopen the bug if you hit this again.

mhlakhani · 2019-02-14T04:14:01Z

I'm also running into this, but with less impact.

I did a fresh install on an Ubuntu 18.04 LTS VM. Followed the pre-requisites: https://google.github.io/clusterfuzz/getting-started/prerequisites/

I then ran the local server, went to the web UI, and created a bot instance a few minutes later in a separate tab. I see the port error in my logs, though interestingly enough things work (I see the test case in the testcases UI).

My full server logs are here: https://pastebin.com/JKya8WNB - bot logs are here: https://pastebin.com/YN6vCRBZ

Hope this helps!

inferno-chromium · 2019-02-14T23:45:25Z

@mhlakhani - this should get fixed as part of #197. these never happen in production since we use appengine cron. locally, we create some threads to run every 60 secs. we can make them more reliable.

@oliverchang as fyi.

inferno-chromium changed the title ~~Clusterfuzz went awry~~ python process pileup, needs reproduction steps. Feb 9, 2019

inferno-chromium closed this as completed Feb 9, 2019

inferno-chromium mentioned this issue Feb 9, 2019

Frequent port bindings errors #136

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python process pileup, needs reproduction steps. #150

python process pileup, needs reproduction steps. #150

vadi2 commented Feb 9, 2019 •

edited

Loading

inferno-chromium commented Feb 9, 2019

inferno-chromium commented Feb 9, 2019

vadi2 commented Feb 9, 2019

inferno-chromium commented Feb 9, 2019

inferno-chromium commented Feb 9, 2019

vadi2 commented Feb 9, 2019

vadi2 commented Feb 9, 2019

vadi2 commented Feb 9, 2019 •

edited

Loading

inferno-chromium commented Feb 9, 2019

mhlakhani commented Feb 14, 2019

inferno-chromium commented Feb 14, 2019

python process pileup, needs reproduction steps. #150

python process pileup, needs reproduction steps. #150

Comments

vadi2 commented Feb 9, 2019 • edited Loading

inferno-chromium commented Feb 9, 2019

inferno-chromium commented Feb 9, 2019

vadi2 commented Feb 9, 2019

inferno-chromium commented Feb 9, 2019

inferno-chromium commented Feb 9, 2019

vadi2 commented Feb 9, 2019

vadi2 commented Feb 9, 2019

vadi2 commented Feb 9, 2019 • edited Loading

inferno-chromium commented Feb 9, 2019

mhlakhani commented Feb 14, 2019

inferno-chromium commented Feb 14, 2019

vadi2 commented Feb 9, 2019 •

edited

Loading

vadi2 commented Feb 9, 2019 •

edited

Loading