Make upload more robust by ignoring spurious errors while polling the scan status. #480

Merged
merged 1 commit into from Apr 29, 2016

Conversation

Projects
None yet
5 participants
Contributor

vilagithub commented Apr 21, 2016

… scan status.

While the snap is scanned on the store, the status is polled until completion.

If an error occurs while acquiring the status, ignore the errors.

Those errors are rare and the next retry generally succeeds so ignoring those errors is harmless.

If they persist until the last retry, something else may be going on and this will still be reported.

LP: #1572963

Can one of the admins verify this patch?

Can one of the admins verify this patch?

+ try:
+ resp = session.get(url)
+ return resp
+ except (requests.ConnectionError, requests.HTTPError):
@sergiusens

sergiusens Apr 21, 2016

Collaborator

What was HTTPError, will we loop forever if the resource does not exist?

@vilagithub

vilagithub Apr 22, 2016

Contributor

What was HTTPError

I don't remember encountering it in this particular case but the code involved can trigger it. I can remove it if you prefer.

will we loop forever if the resource does not exist?

No. The caller retries a fixed number of times and will report the last error.

+ # level (is_scan_completed) will deal with the None response
+ # meaning we don't know the status. This avoid a spurious
+ # connection error breaking an upload for a wrong reason.
+ return None
@elopio

elopio Apr 22, 2016

Member

According to [1], we should return an exception instead of None. That makes sense because the name of the exception will make it's meaning clearer than None.

[1] https://www.goodreads.com/book/show/23020812-effective-python

@vilagithub

vilagithub Apr 22, 2016

Contributor

we should return an exception instead of None.

No, that's the bug. The caller (intermediate, hidden, out of our reach) can't catch it nor act properly. Returning None tells the caller: nah, not completed yet.

@elopio

elopio Apr 27, 2016

Member

I see, @retry doesn't let you handle the exception. The comments make this clear, so I'm ok with this.

+ return False
+
+
+def get_scan_status(session, url):
@elopio

elopio Apr 22, 2016

Member

I think I would name these helper funcions with a leading underscore, to make it clear that they are called only from higher level functions.

@vilagithub

vilagithub Apr 22, 2016

Contributor

There is a refactor in progress in another branch, I'd rather address that there ?

@elopio

elopio Apr 27, 2016

Member

works for me.

Member

elopio commented Apr 22, 2016

this looks good to me. It's good that you moved the func out of common, because it was called only once.
I would prefer not to return None to signal the problem.

Member

elopio commented Apr 27, 2016

retest this please

Contributor

vilagithub commented Apr 27, 2016

From http://162.213.35.179:8080/job/github-snapcraft-autopkgtest-cloud/597/console
adt-run [16:32:03]: version 3.20.3
...
adt-run [17:02:03]: ERROR: testbed failure: cannot send to testbed: ['BrokenPipeError: [Errno 32] Broken pipe\n']

So 30 minutes hanging on the instance creation. Re-test ?
3.20.4 is out for xenial but I don't think that will fix that issue.

From http://162.213.35.179:8080/job/github-snapcraft-examples-tests-cloud/697/console

2016/04/27 16:31:55 *** Creating a Snappy m1.medium instance for release rolling, channel edge and image type custom ***

goroutine 1 [running]:
log.Panicf(0x5b1d20, 0x36, 0xc820039e58, 0x3, 0x3)
/usr/lib/go/src/log/log.go:327 +0xd8
main.main()
/build/snappy-tests-job-LsZciy/snappy-tests-job-1.0.0ubuntu1/obj-x86_64-linux-gnu/src/launchpad.net/snappy-tests-job/cmd/snappy-cloud-client/main.go:68 +0xb93

vilagithub pushed a commit to vilagithub/snapcraft that referenced this pull request Apr 28, 2016

Fix test_upload.UploadTestCase.test_upload_with_login to get coverage…
… numbers on a successful run.

See snapcore#480 for a fix against master.
Collaborator

sergiusens commented Apr 28, 2016

retest this

Member

elopio commented Apr 28, 2016

OK to test

Make upload more robust by ignoring suprious errors while polling the…
… scan status.

While the snap is scanned on the store, the status is polled until completion.

If an error occurs while acquiring the status, ignore the errors.

Those errors are rare and the next retry generally succeeds so ignoring those errors is harmless.

If they persist until the last retry, something else may be going on and this will still be reported.

LP: #1572963
Member

kyrofa commented Apr 29, 2016

Looks good to me 👍

@sergiusens sergiusens merged commit 2cae24a into snapcore:master Apr 29, 2016

4 checks passed

Examples tests Success
Details
autopkgtest Success
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.01%) to 96.171%
Details

@kyrofa kyrofa changed the title from Make upload more robust by ignoring suprious errors while polling the… to Make upload more robust by ignoring spurious errors while polling the scan status. Apr 29, 2016

kalikiana pushed a commit to kalikiana/snapcraft that referenced this pull request Apr 6, 2017

Make upload more robust by ignoring suprious errors while polling the…
… scan status. (#480)

While the snap is scanned on the store, the status is polled until completion.
If an error occurs while acquiring the status, ignore the errors.
Those errors are rare and the next retry generally succeeds so ignoring those errors is harmless.
If they persist until the last retry, something else may be going on and this will still be reported.

LP: #1572963
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment