New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add exponential increase on timeout in #sshable? for services that do not respond in 8s #214
Conversation
… not respond in 8s Issue fog/fog-aws#372 For AWS Spot Request, instances never complete the `#setup` process eventually timing out through the `#wait_for` block because the default 8s is apparently not long enough to get the ssh connection. Through testing 11s seemed to work, but this is likely highly variable for regions, instance types, images, and providers. This change implements a 1.5 increase in timeout each time #sshable? is called, capped at 60s. If a successful connection is made, then the timeout is reset to the initial value.
spec/compute/models/server_spec.rb
Outdated
|
||
describe "#sshable?" do | ||
before do | ||
# I'm not sure why #sshable? depends on a method that's not defined except in implementing classes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. Maybe we should define it in the base class also, but just have it always return false? That would clear up this confusion/oddity, hopefully. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a method that raises NotImplementedError
? That's what we usually do for abstract interfaces on my team. It basically maintains the same behavior as currently without hidden side effects.
OTOH, http://chrisstump.online/2016/03/23/stop-abusing-notimplementederror/. Maybe we should just document that it must be implemented?
Looking good overall, one small inline comment above and one small comment below. The question of which errors to retry on is an interesting one. I guess my inclination would be to start conservatively, with the fewest needed errors resulting in retry. This is the smallest expansion in scope/change in behavior, which limits impact. But more importantly, it should be easier to add more things to the retry later without breaking things for any one than to remove them later. Does that sound reasonable? Thanks! |
Agreed on retrying with limited errors. I think increasing the timeout should only happen in |
@geemus 2c1d839 updates the logic to only increment the timeout period in the case of I also cleaned up some of the deprecation warnings from minitest. I hope you don't mind tacking that into this PR. There are still some ruby warnings, but I didn't want to get too deep in code cleanup on code I wasn't working on. Let me know your preference on how to handle |
fog-core.gemspec
Outdated
@@ -2,6 +2,7 @@ | |||
lib = File.expand_path("../lib", __FILE__) | |||
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib) | |||
require "fog/core/version" | |||
require 'english' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think english is supported by older versions of ruby which fog-core still supports. Also, double quotes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@icco Sorry about that. I was just trying to get rid of the warning that $INPUT_RECORD_SEPARATOR
isn't defined. I'll back that out.
Thanks for continuing to discuss and iterate. Looks good. As for the abstract interface part, I don't think I have a good default answer ready and there are many pros/cons as you suggest. I guess in this particular case I would still lean toward having the base class just return false, the abstract server is in fact not ready, and if something hasn't defined that yet, also not being ready seems reasonable. As long as we have a comment to that effect in case people come back to it later and wonder. Does that seem reasonable to you? Thanks! |
@jeremywadsack seems good to me. Just to double check, is this complete and ready from your perspective? |
@geemus Yes, I believe I've addressed all the issues and Travis is passing. Do you need me to squash/rebase? |
@jeremywadsack thanks for the offer, but I'm not worried about it. Thanks! |
Thanks @geemus. I just noticed that the documentation comments I wrote for |
Good catch, I think that ^ should help. I'll see about cutting a release in the next couple days as well. |
Released in v1.45.0, thanks again! |
See issue fog/fog-aws#372
For AWS Spot Request, instances never complete the
#setup
process eventually timing out through the#wait_for
block because the default 8s is apparently not long enough to get the ssh connection. Through testing 11s seemed to work, but this is likely highly variable for regions, instance types, images, and providers.This change implements a 1.5 increase in timeout each time
#sshable?
is called, capped at 60s. If a successful connection is made, then the timeout is reset to the initial value.I tried to keep this as simple as possible. This solves the problem for AWS Spot Requets and does not adversely affect regular AWS instances. I have not tested this with other providers, but a quick check it didn't seem like this would be a problem.
The only concern I have is that there's no way to reset the
@sshable_timeout
value, but I don't see an instance wheresshable?
would be called successively on an instance, would continue to fail, and then you'd want to call it on that instance again, but starting at the lower timeout.One other question is whether it's appropriate to increase the timeout for all failures, or if we should only increase the timeout with a
Timeout::Error
.Note that there were no specs for
Fog::Compute::Server
, so I implemented specs for the existing behavior of#sshable?
(leaving the specs of the remaining methods to others to complete coverage).I'm happy to squash commits, but for readability, I thought it would help to see the separation of the two parts.