Register logfile before starting assignment #263

guidow · 2015-04-07T16:06:42Z

If we set the state of tasks not just on the task itself, but also on the current tasklog, we must make sure the tasklog is registered before we do that, otherwise that will fail with a 404.

In extreme cases, where assignments finish very fast, even uploading the final tasklog file at the end may fail because of this. (This can happen in practice if a jobtype class decides that no process needs to be started.)

Unfortunately, this PR still has a problem: When setting our agent's state, the request fails with this message:

2015-04-07 17:36:57 INFO     - pf.jobtypes.core - Spawning Command(command='/usr/bin/blender', arguments=('-b', '/home/guido/stift_aniex43_finish.blend', '-s', '196.0', '-e', '200.0', '-o', '/tmp/stift_####.exr', '-a'), cwd='/home/guido/git/pyfarm', user=None, group=None, env={})
2015-04-07 17:36:57 ERROR    - pf.agent.http.client - POST http://127.0.0.1:5000/api/v1/agents/a113b5a8-9822-413b-b1fd-fce3014956cf has failed (uid: 2d57ff05bd4f):
Traceback (most recent call last):
Failure: twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure <class 'twisted.internet.error.ConnectionDone'>>]

2015-04-07 17:36:57 ERROR    - pf.agent.service - Failed to announce self to the master: [<twisted.python.failure.Failure <class 'twisted.internet.error.ConnectionDone'>>].  Will retry in 1.96419640754 seconds.

It works on retry, but I have no idea where this problem comes from.

coveralls · 2015-04-07T16:17:51Z

Coverage decreased (-0.63%) to 75.54% when pulling 29b2faa on guidow:register_log_before_start into 606c366 on pyfarm:master.

opalmer · 2015-04-08T04:28:27Z

pyfarm/jobtypes/core/internals.py

-        self._started_deferred = None
-
-    @property
-    def stopped_deferred(self):


Could you go over the specifics as for why we're getting rid of these properties instead of updating them somehow? These are setup to try and ensure it's harder to missorder the setup of start_deferred/stop_deferred.

Since Process._start() will now register the tasklog before actually starting the process, it will have to return these two properties before the process has actually started. So, yeah, in this case we will have to access both of these before self.start_called is set to True.

guidow · 2015-04-08T18:59:05Z

As for the problems described in the opening comments, I think I've gotten to the bottom of those, as well as, probably, the issues in #249. This seems to be a bug in Twisted-web (which treq uses as its backend), where if you do two consecutive, non-overlapping http requests with data against a server that doesn't support keep-alive, but doesn't make that explicit by setting a Connection: close header, the second request will fail because twisted tries to reuse the connection not realizing the server has closed it already...

See: https://twistedmatrix.com/trac/ticket/7843
and: http://twistedmatrix.com/pipermail/twisted-web/2015-April/005139.html

Since, in production, farms will probably use a real webserver instead of flask's built-in standalone mode, this doesn't seem like a major issue. Besides, as long as we're restarting the request when this happens, we're fine anyway.

opalmer · 2015-04-09T03:50:55Z

Interestingly the problem you're describing here is something I was able to replicate at work. I basically came to the same conclusions you did by accident when I did

data = treq.json_content(response)

when I meant to do

data = yield treq.json_content(response)

which in my case caused overlapping requests and it seemed to trigger the ConnectionDone error. I wanted to reply to your thread on twisted-web but was not subscribed at the time unfortunately. Regardless, nice work and thanks for taking a deep look at that.

Anyway I'm going to go ahed and merge this, I think I have all my questions answered and I don't see anything code wise to fix here right now.

Register logfile before starting assignment

Merging pyfarm#263 ended up breaking handling exceptions the jobtype's start() method. This should fix it again.

guidow added 2 commits April 7, 2015 13:07

Register logfile before starting assignment

4d3624e

Remove tests for started/stopped properties

29b2faa

opalmer reviewed Apr 8, 2015
View reviewed changes

opalmer added bug jobtypes labels Apr 8, 2015

opalmer added this to the 0.8.4 milestone Apr 8, 2015

opalmer assigned guidow Apr 8, 2015

opalmer added a commit that referenced this pull request Apr 9, 2015

Merge pull request #263 from guidow/register_log_before_start

9de8967

Register logfile before starting assignment

opalmer merged commit 9de8967 into pyfarm:master Apr 9, 2015

guidow added a commit to guidow/pyfarm-agent that referenced this pull request Apr 13, 2015

Fix handling errors from jobtypes when starting

c9985ee

Merging pyfarm#263 ended up breaking handling exceptions the jobtype's start() method. This should fix it again.

guidow mentioned this pull request Apr 13, 2015

Fix handling errors from jobtypes when starting #268

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Register logfile before starting assignment #263

Register logfile before starting assignment #263

guidow commented Apr 7, 2015

coveralls commented Apr 7, 2015

opalmer Apr 8, 2015

guidow Apr 8, 2015

guidow commented Apr 8, 2015

opalmer commented Apr 9, 2015

Register logfile before starting assignment #263

Register logfile before starting assignment #263

Conversation

guidow commented Apr 7, 2015

coveralls commented Apr 7, 2015

opalmer Apr 8, 2015

Choose a reason for hiding this comment

guidow Apr 8, 2015

Choose a reason for hiding this comment

guidow commented Apr 8, 2015

opalmer commented Apr 9, 2015