[pantsd] Add an option to configure the watchman startup timeout. #4332

Merged
merged 2 commits into from Mar 15, 2017

Conversation

Projects
None yet
3 participants
@kwlzn
Member

kwlzn commented Mar 15, 2017

Fixes #4224

Problem

In large repos, the initial watch-project command can take longer than the default timeout of 5 seconds due to watchman startup. This can cause a SocketTimeout exception, which leads to a FSEventService crash and subsequently tears down the daemon.

Solution

Add an option to set the initial watchman client timeout, which is then set to the default timeout after the initial watch-project command is issued.

Result

After testing in our monorepo, I'm unable to reproduce the startup crash seen in #4224 with this option set to 30 seconds.

@@ -31,6 +31,9 @@ def register_options(cls, register):
register('--supportdir', advanced=True, default='bin/watchman',
help='Find watchman binaries under this dir. Used as part of the path to lookup '
'the binary with --binary-util-baseurls and --pants-bootstrapdir.')
+ register('--startup-timeout', type=float, advanced=True, default=Watchman.SOCKET_TIMEOUT_SECONDS,

This comment has been minimized.

@stuhood

stuhood Mar 15, 2017

Member

It seems like it would be reasonable to set the startup timeout to a higher value (ie, your 30) by default.

@stuhood

stuhood Mar 15, 2017

Member

It seems like it would be reasonable to set the startup timeout to a higher value (ie, your 30) by default.

This comment has been minimized.

@kwlzn

kwlzn Mar 15, 2017

Member

sgtm - done.

@kwlzn

kwlzn Mar 15, 2017

Member

sgtm - done.

@stuhood

Thanks Kris.

@@ -137,7 +142,12 @@ def watch_project(self, path):
:param string path: the path to the watchman project root/pants build root.
"""
- return self.client.query('watch-project', os.path.realpath(path))
+ # TODO(kwlzn): Add a client.query(timeout=X) param to the upstream pywatchman project.

This comment has been minimized.

@baroquebobcat

baroquebobcat Mar 15, 2017

Contributor

can watch_project be called more than once? Would it make sense to reset the timeout to the startup timeout for each call into this?

@baroquebobcat

baroquebobcat Mar 15, 2017

Contributor

can watch_project be called more than once? Would it make sense to reset the timeout to the startup timeout for each call into this?

This comment has been minimized.

@kwlzn

kwlzn Mar 15, 2017

Member

in the general context, it could be - yeah. in our current usage case, it isn't ever - we only ever watch the singular build root once per daemon launch.

I had gone the route you suggested initially, but due to issues in the pywatchman upstream client calling client.setTimeout() call before issuing an actual query (which is the only way to invoke the private client._connect() without violating the private method convention) fails with an AttributeError trying to set an attribute on a None. we could attempt that each time in a try/except block and catch AttributeError - such that any subsequent secondary calls to .watch_project() respect the startup_timeout - but ultimately based on the way we use this class it it's not likely to affect us afaict, so just went with the simplest strategy w/ a TODO. would be happy to do the try/except if you think that sounds better - as it is more correct for the general case, but imo mildly kludgy.

definitely hope to circle back to improve the upstream pywatchman client a bit and clean this and at least one other TODO up.

@kwlzn

kwlzn Mar 15, 2017

Member

in the general context, it could be - yeah. in our current usage case, it isn't ever - we only ever watch the singular build root once per daemon launch.

I had gone the route you suggested initially, but due to issues in the pywatchman upstream client calling client.setTimeout() call before issuing an actual query (which is the only way to invoke the private client._connect() without violating the private method convention) fails with an AttributeError trying to set an attribute on a None. we could attempt that each time in a try/except block and catch AttributeError - such that any subsequent secondary calls to .watch_project() respect the startup_timeout - but ultimately based on the way we use this class it it's not likely to affect us afaict, so just went with the simplest strategy w/ a TODO. would be happy to do the try/except if you think that sounds better - as it is more correct for the general case, but imo mildly kludgy.

definitely hope to circle back to improve the upstream pywatchman client a bit and clean this and at least one other TODO up.

@kwlzn kwlzn merged commit 21fc6af into pantsbuild:master Mar 15, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

lenucksi added a commit to lenucksi/pants that referenced this pull request Apr 25, 2017

[pantsd] Add an option to configure the watchman startup timeout. (#4332
)


Fixes #4224

Problem

In large repos, the initial watch-project command can take longer than the default timeout of 5 seconds due to watchman startup. This can cause a SocketTimeout exception, which leads to a FSEventService crash and subsequently tears down the daemon.

Solution

Add an option to set the initial watchman client timeout, which is then set to the default timeout after the initial watch-project command is issued.

Result

After testing in our monorepo, I'm unable to reproduce the startup crash seen in #4224 with this option set to 30 seconds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment