-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-20493 Allow connect and read timeout to be configured in ESP #12392
Conversation
https://track.hpccsystems.com/browse/HPCC-20493 |
TODO: still working on providing the test cases for this. |
3674095
to
49b24b5
Compare
@afishbeck Please can you review these changes relating being able to set the timeouts for ESP services. By the way, I've tested this manually but (1) created a multi-node thor (2) spraying to this thor (3) shutting down the remote thor node's dafilesrv (4) executing WUInfo, DFUQuery and DFUInfo ESP queries. Compare how long it takes to timeout with and without the new settings. (I'm inclined not produce test cases because it seems to be quite a lot of work for not too much benefit) |
esp/platform/espcfg.cpp
Outdated
@@ -325,6 +326,10 @@ CEspConfig::CEspConfig(IProperties* inputs, IPropertyTree* envpt, IPropertyTree* | |||
if (m_cfg->getProp("@daliServers", daliservers)) | |||
initDali(daliservers.str()); //won't init if detached | |||
|
|||
const unsigned dafilesrvConnectTimeout = m_cfg->getPropInt("@dafilesrvConnectTimeout", 2000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How were the defaults decided on? Will a change in default behavior catch the users off guard? Looks like old behavior was the equivalent of 0's here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed the default timeout period last Monday. @jakesmith suggested that having shorter timeout period would mean be sensible (resolves issues such as https://track.hpccsystems.com/browse/HPCC-19002) and if it were to cause problems, the timeout period could be extended with the configuration option.
I agree that it would cause problems if say a 30 second timeout is necessary and eventually returns a results. But does any remote file access require this length of time? May be 2 seconds timeout is too short?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion was that the global default (not just in esp) should be much shorter than it is now.
( Currently the defaults are 100 seconds and 3 retries. )
And it should be globally configurable (e.g. via environment.conf), so that if needed / if problematic, it can be raised.
I think I had suggested changing it to 10 seconds (3 retries may be still ok).
Once that is done, whilst having a separate configuration in Esp is okay, it becomes less relevant.
There is also the further improvement of implementing a better mechanism via a timed blacklist, so clients don't keep retrying the same endponit and repeatedly timing out, which cumulatively can be significant even if short. (There is an implementation like this in thorsoapcall, which should be looked at to see whether it can be reused).
@shamser - was a JIRA opened for the blacklist implementation ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakesmith I have just created the blackout jira: https://track.hpccsystems.com/browse/HPCC-21927. @afishbeck Do you think it should be set to '0' or is 10 seconds ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just created the blackout jira
@shamser - blacklist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it should be set to '0' or is 10 seconds ok?
@shamser - are you asking re. this esp custom timeout, or the global timeout I was talking about above?
Has a JIRA been opened to change the global timeout?
If the global timeout is set as suggested (10 secs) - then I'd default the esp timeout to unset , will default to global timeout - but they (ops) can override it if they want to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakesmith I'm talking about the ESP custom and default timeouts. This one is designed to address a number of issues reported relating to ESP services (making remote file requests) taking too long to timeout. As far as other aspects related to timeouts, I feel that i may not be immediately pressing as no has complained (yet) and we can merge this one until we have agreed the more comprehensive approach for timings. By the way, there was a bug in the previous commit, which I have now fixed and pushed. (I've also set the default timeout period to 10 seconds).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser one question.
|
||
<xs:attribute name="dafilesrvConnectTimeout" type="xs:nonNegativeInteger" | ||
hpcc:displayName="Remote file connection timeout in seconds" | ||
hpcc:presentValue="2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be 10 now ?
hpcc:tooltip="Remote file access connection timeout in seconds"/> | ||
<xs:attribute name="dafilesrvReadTimeout" type="xs:nonNegativeInteger" | ||
hpcc:displayName="Remote file read timeout in seconds" | ||
hpcc:presentValue="2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be 10 now ?
@shamser - 1 comment. Please go ahead and squash after changing. |
Signed-off-by: Shamser Ahmed <shamser.ahmed@lexisnexis.co.uk>
@jakesmith Squashed. Please can you do final review. |
Automated Smoketest: ✅ Unit tests result:
Regression test result:
HPCC Stop: OK
|
Looks good. @richardkchapman - ready to merge. |
@JamesDeFabia @shamser Is there a documentation impact? |
@JamesDeFabia @richardkchapman I have created a documentation Jira: https://track.hpccsystems.com/browse/HPCC-21974. This change affects current behaviour. How do we get this documented in the Redbook? |
I have added a tag |
Signed-off-by: Shamser Ahmed shamser.ahmed@lexisnexis.co.uk
Type of change:
Checklist:
Testing: