Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local development: make triage and cleanup cron more reliable (not in production) #197

Closed
kcwu opened this issue Feb 13, 2019 · 13 comments
Closed

Comments

@kcwu
Copy link
Contributor

kcwu commented Feb 13, 2019

  1. I am running local server and local bot.

  2. I add a fuzzing job, upload a custom build. The bot runs and it generates few crashes and these cases appear in "Testcases" page. But it hadn't found certain one crashing testcase I found by other means.

  3. So I uploaded the said crashing testcase (under "Upload Testcase" page)
    It is verified by the bot that the testcase indeed crash.
    This crashing testcase didn't appear in "Testcases" page. Is this expected behavior (uploaded case won't add into testcase collections) ?
    I guess yes, so I continue next steps.

  4. I uploaded a new custom build (under "Jobs" page, change file of existing jobs, then save). Just minor change which doesn't matter.

  5. Add more bots and hope they will find the said crashing case by themselves faster.

  6. The bots picked up the new build and did find the crashing testcase (*).

  7. But I still don't see the crashing test case on "Testcases" page.

(*)
I know bots found the crash because of logs in the fuzzing log inside local/storage/local_gcs/test-fuzz-logs-bucket/objects/

The new testcase is stack-overflow, which is different "crash type" to existing groups (which are timeout, assert, integer-overflow, etc.). So it's not hidden due to grouping.

@inferno-chromium
Copy link
Collaborator

inferno-chromium commented Feb 13, 2019

Do you see any files in local/storage/local_gcs/test-blobs-bucket/objects/. if yes, that confirms those testcase files got saved. If this does not happen, some issue with your port forwarding setup.

if you see files, next thing to try is go to http://localhost:9002/cron, then scroll to bottom and run triage cron manually. wait a minute or two, then refresh http://localhost:9000 to see if testcase came there.

@inferno-chromium inferno-chromium changed the title crash testcase didn't appear in "Testcases" page Local development: crash testcase didn't appear in "Testcases" page Feb 13, 2019
@jonathanmetzman
Copy link
Collaborator

This crashing testcase didn't appear in "Testcases" page. Is this expected behavior (uploaded case won't add into testcase collections) ?

No. When you go to locahost:9000/upload-testcase do you see the testcase you uploaded? What happens when you click on it? That same testcase isn't showing up when you go to localhost:9000/testcsases?

@kcwu
Copy link
Contributor Author

kcwu commented Feb 13, 2019

Do you see any files in local/storage/local_gcs/test-blobs-bucket/objects/. if yes, that confirms those testcase files got saved. If this does not happen, some issue with your port forwarding setup.

I saw lots of files in local/storage/local_gcs/test-blobs-bucket/objects/ but the testcase I'm interested in is not there. Are those objects minimized testcase, unminimized, or both?

Because my bots run on localhost, so it should be not related to port forwarding.

I will try triage cron. However, they didn't appear for many hours. I guess it won't help.

No. When you go to locahost:9000/upload-testcase do you see the testcase you uploaded? What happens when you click on it? That same testcase isn't showing up when you go to localhost:9000/testcsases?

Right, I can see my uploaded testcase in /upload-testcase and the bots have already verified they crash. Some of them are labeled "Confirmed" and some of them are labeled "Duplicated".
However, they don't show in /testcases page.

@kcwu
Copy link
Contributor Author

kcwu commented Feb 13, 2019

oops, they appear after triage cron manually!

@kcwu kcwu closed this as completed Feb 13, 2019
@inferno-chromium inferno-chromium changed the title Local development: crash testcase didn't appear in "Testcases" page Local development: make triage cron more reliable Feb 13, 2019
@inferno-chromium
Copy link
Collaborator

@kcwu - how long was your run_server running. did you see any stack that can tell how triage cron died ? can you start up a fresh run_server instance with --bootstrap, and retry just to see if triage cron can go down again ?

@kcwu
Copy link
Contributor Author

kcwu commented Feb 13, 2019

My run_sever has run for more than 20 hours (before #177). The console log is longer than my terminal scrollback history (about 2.5 hours), so I don't have stack. Is the server log written to disk somewhere?

From existing console log, there is only one "GET /triage" request in past 2.5 hours, which is my manual one. In other words, the cron stopped for some reason.

I guess #177 make the cron down.

@kcwu
Copy link
Contributor Author

kcwu commented Feb 13, 2019

More info, there are no "cron-service" messages in server's console log in past 2.5 hours. So it's not only triage cron down, it's cron fully stopped.

After server restarted, I can see "cron-service" messages in sever's console log again.

@inferno-chromium
Copy link
Collaborator

can you leave the server running for another 20 hours with fuzzing ongoing. lets see if triage goes down for you. you can check triage cron running in run_server log on console.

@inferno-chromium
Copy link
Collaborator

Closing for now since #177 is unsupported usecase for local. If you can still reproduce it, we will reopen.

@kcwu
Copy link
Contributor Author

kcwu commented Feb 14, 2019

This reproducible.

  1. restart server
  2. there are several "cron-service: "GET /triage HTTP/1.1" 200 -" in console log after server start (about 1 per minutes)
  3. one /triage 503 error
  4. no more /triage requests (at least 1.5 hours). Since the /triage cron is scheduled as 1 per hour, I guess it's really dead.
| INFO     2019-02-14 02:16:47,515 module.py:861] cron-service: "GET /triage HTTP/1.1" 503 59
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "src/local/butler/run_server.py", line 106, in trigger
    response = urllib2.urlopen(request, timeout=request_timeout)
  File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 473, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 556, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 503: Service Unavailable

BTW, similar error happened for /cleanup as well. That is,

  1. server start
  2. first /cleanup appear 1 minute after server start
  3. the /cleanup 503, identical message as above
  4. no more /cleanup

@inferno-chromium inferno-chromium changed the title Local development: make triage cron more reliable Local development: make triage and cleanup cron more reliable (not in production) Feb 14, 2019
@inferno-chromium
Copy link
Collaborator

What caused the 503 ? Any explanation can help to fix.

@inferno-chromium
Copy link
Collaborator

@kcwu @mhlakhani - this should be fixed now. please comment back if you see any similar issues. locally, crons should continue to work now, even in case of error.

@mhlakhani
Copy link

thanks @inferno-chromium! Will let you know here if I see it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants