Local development: make triage and cleanup cron more reliable (not in production) #197

kcwu · 2019-02-13T12:40:34Z

I am running local server and local bot.
I add a fuzzing job, upload a custom build. The bot runs and it generates few crashes and these cases appear in "Testcases" page. But it hadn't found certain one crashing testcase I found by other means.
So I uploaded the said crashing testcase (under "Upload Testcase" page)
It is verified by the bot that the testcase indeed crash.
This crashing testcase didn't appear in "Testcases" page. Is this expected behavior (uploaded case won't add into testcase collections) ?
I guess yes, so I continue next steps.
I uploaded a new custom build (under "Jobs" page, change file of existing jobs, then save). Just minor change which doesn't matter.
Add more bots and hope they will find the said crashing case by themselves faster.
The bots picked up the new build and did find the crashing testcase (*).
But I still don't see the crashing test case on "Testcases" page.

(*)
I know bots found the crash because of logs in the fuzzing log inside local/storage/local_gcs/test-fuzz-logs-bucket/objects/

The new testcase is stack-overflow, which is different "crash type" to existing groups (which are timeout, assert, integer-overflow, etc.). So it's not hidden due to grouping.

inferno-chromium · 2019-02-13T15:27:24Z

Do you see any files in local/storage/local_gcs/test-blobs-bucket/objects/. if yes, that confirms those testcase files got saved. If this does not happen, some issue with your port forwarding setup.

if you see files, next thing to try is go to http://localhost:9002/cron, then scroll to bottom and run triage cron manually. wait a minute or two, then refresh http://localhost:9000 to see if testcase came there.

jonathanmetzman · 2019-02-13T15:43:15Z

This crashing testcase didn't appear in "Testcases" page. Is this expected behavior (uploaded case won't add into testcase collections) ?

No. When you go to locahost:9000/upload-testcase do you see the testcase you uploaded? What happens when you click on it? That same testcase isn't showing up when you go to localhost:9000/testcsases?

kcwu · 2019-02-13T16:02:41Z

Do you see any files in local/storage/local_gcs/test-blobs-bucket/objects/. if yes, that confirms those testcase files got saved. If this does not happen, some issue with your port forwarding setup.

I saw lots of files in local/storage/local_gcs/test-blobs-bucket/objects/ but the testcase I'm interested in is not there. Are those objects minimized testcase, unminimized, or both?

Because my bots run on localhost, so it should be not related to port forwarding.

I will try triage cron. However, they didn't appear for many hours. I guess it won't help.

No. When you go to locahost:9000/upload-testcase do you see the testcase you uploaded? What happens when you click on it? That same testcase isn't showing up when you go to localhost:9000/testcsases?

Right, I can see my uploaded testcase in /upload-testcase and the bots have already verified they crash. Some of them are labeled "Confirmed" and some of them are labeled "Duplicated".
However, they don't show in /testcases page.

kcwu · 2019-02-13T16:04:22Z

oops, they appear after triage cron manually!

inferno-chromium · 2019-02-13T17:18:48Z

@kcwu - how long was your run_server running. did you see any stack that can tell how triage cron died ? can you start up a fresh run_server instance with --bootstrap, and retry just to see if triage cron can go down again ?

kcwu · 2019-02-13T17:39:17Z

My run_sever has run for more than 20 hours (before #177). The console log is longer than my terminal scrollback history (about 2.5 hours), so I don't have stack. Is the server log written to disk somewhere?

From existing console log, there is only one "GET /triage" request in past 2.5 hours, which is my manual one. In other words, the cron stopped for some reason.

I guess #177 make the cron down.

kcwu · 2019-02-13T17:45:09Z

More info, there are no "cron-service" messages in server's console log in past 2.5 hours. So it's not only triage cron down, it's cron fully stopped.

After server restarted, I can see "cron-service" messages in sever's console log again.

inferno-chromium · 2019-02-13T18:18:34Z

can you leave the server running for another 20 hours with fuzzing ongoing. lets see if triage goes down for you. you can check triage cron running in run_server log on console.

inferno-chromium · 2019-02-13T18:27:38Z

Closing for now since #177 is unsupported usecase for local. If you can still reproduce it, we will reopen.

kcwu · 2019-02-14T03:51:46Z

This reproducible.

restart server
there are several "cron-service: "GET /triage HTTP/1.1" 200 -" in console log after server start (about 1 per minutes)
one /triage 503 error
no more /triage requests (at least 1.5 hours). Since the /triage cron is scheduled as 1 per hour, I guess it's really dead.

| INFO     2019-02-14 02:16:47,515 module.py:861] cron-service: "GET /triage HTTP/1.1" 503 59
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "src/local/butler/run_server.py", line 106, in trigger
    response = urllib2.urlopen(request, timeout=request_timeout)
  File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 473, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 556, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 503: Service Unavailable

BTW, similar error happened for /cleanup as well. That is,

server start
first /cleanup appear 1 minute after server start
the /cleanup 503, identical message as above
no more /cleanup

inferno-chromium · 2019-02-14T23:40:37Z

What caused the 503 ? Any explanation can help to fix.

)

inferno-chromium · 2019-02-16T01:33:27Z

@kcwu @mhlakhani - this should be fixed now. please comment back if you see any similar issues. locally, crons should continue to work now, even in case of error.

mhlakhani · 2019-02-16T01:34:39Z

thanks @inferno-chromium! Will let you know here if I see it again.

inferno-chromium changed the title ~~crash testcase didn't appear in "Testcases" page~~ Local development: crash testcase didn't appear in "Testcases" page Feb 13, 2019

inferno-chromium added the local development label Feb 13, 2019

kcwu closed this as completed Feb 13, 2019

inferno-chromium reopened this Feb 13, 2019

inferno-chromium changed the title ~~Local development: crash testcase didn't appear in "Testcases" page~~ Local development: make triage cron more reliable Feb 13, 2019

inferno-chromium closed this as completed Feb 13, 2019

inferno-chromium reopened this Feb 14, 2019

inferno-chromium changed the title ~~Local development: make triage cron more reliable~~ Local development: make triage and cleanup cron more reliable (not in production) Feb 14, 2019

inferno-chromium mentioned this issue Feb 14, 2019

python process pileup, needs reproduction steps. #150

Closed

oliverchang added a commit that referenced this issue Feb 14, 2019

local run_server: ignore exceptions when triggering crons (#197).

2dea3cc

oliverchang added a commit that referenced this issue Feb 15, 2019

local run_server: ignore exceptions when triggering crons (#197). (#208

599019b

)

inferno-chromium closed this as completed Feb 16, 2019

jonathanmetzman mentioned this issue Feb 19, 2019

Why it only shown 1 test case in testcases page? #217

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local development: make triage and cleanup cron more reliable (not in production) #197

Local development: make triage and cleanup cron more reliable (not in production) #197

kcwu commented Feb 13, 2019

inferno-chromium commented Feb 13, 2019 •

edited

jonathanmetzman commented Feb 13, 2019

kcwu commented Feb 13, 2019 •

edited

kcwu commented Feb 13, 2019

inferno-chromium commented Feb 13, 2019

kcwu commented Feb 13, 2019

kcwu commented Feb 13, 2019

inferno-chromium commented Feb 13, 2019

inferno-chromium commented Feb 13, 2019

kcwu commented Feb 14, 2019

inferno-chromium commented Feb 14, 2019

inferno-chromium commented Feb 16, 2019

mhlakhani commented Feb 16, 2019

Local development: make triage and cleanup cron more reliable (not in production) #197

Local development: make triage and cleanup cron more reliable (not in production) #197

Comments

kcwu commented Feb 13, 2019

inferno-chromium commented Feb 13, 2019 • edited

jonathanmetzman commented Feb 13, 2019

kcwu commented Feb 13, 2019 • edited

kcwu commented Feb 13, 2019

inferno-chromium commented Feb 13, 2019

kcwu commented Feb 13, 2019

kcwu commented Feb 13, 2019

inferno-chromium commented Feb 13, 2019

inferno-chromium commented Feb 13, 2019

kcwu commented Feb 14, 2019

inferno-chromium commented Feb 14, 2019

inferno-chromium commented Feb 16, 2019

mhlakhani commented Feb 16, 2019

inferno-chromium commented Feb 13, 2019 •

edited

kcwu commented Feb 13, 2019 •

edited