New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-38146: Update Princeton site interface from ib0 to op0 #13
Conversation
2527a2e
to
72bf965
Compare
Codecov ReportBase: 70.37% // Head: 70.37% // No change to project coverage 👍
Additional details and impacted files@@ Coverage Diff @@
## main #13 +/- ##
=======================================
Coverage 70.37% 70.37%
=======================================
Files 3 3
Lines 27 27
Branches 6 4 -2
=======================================
Hits 19 19
Misses 6 6
Partials 2 2 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@@ -104,4 +104,4 @@ def get_address(self) -> str: | |||
interface, because the cluster nodes can't connect to the head node | |||
through the regular internet. | |||
""" | |||
return address_by_interface("ib0") | |||
return address_by_interface("op0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fragile if we have to change it a lot. Have you considered having an environment variable fallback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My feeling is that this shouldn't change a lot, but I may be off base about that. Do you have an example of env vars in use elsewhere that we might use here as an example?
One possibility I considered is making use of the get_all_addresses
function within parsl.addresses, but I'm not sure how to use the information provided there to make the choice on the 'best' address to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or a yaml config value (I think self contains the bps config)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been in touch with Princeton IT regarding whether or not this variable is likely to change again in the future. To quote:
this will not and cannot change. op0 was always the main high speed interface for tiger2 cluster
I think therefore that the fix we're making on this ticket is changing the Princeton site settings to what they should have been originally, and this is not likely to change again in the future. With that in mind, I'm inclined to leave this as-is and not add any user-specified fall-back. I.e., if this does break in the future, it's probably something we want to fail fast and fix in consultation with Princeton IT, rather than be hidden behind any kind of fall back procedure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On further discussion with a different member of staff within Princeton IT, it appears that this device interface may indeed change back to ib0
or ibN
at some future date, likely when the clusters are eventually upgraded. With that in mind, I've modified this ticket to make use of the same back-end within parsl
(the psutil
function net_if_addrs
) to list all possible network interfaces and match only to those starting with either ib*
or op*
. I hope this will make us robust to any future configuration updates, removing the need to fix this issue if/when it crops up again.
@PaulPrice, do these new changes look okay to you?
0260ec0
to
ca68380
Compare
Add this to your [mypy-psutil.*]
ignore_missing_imports = True |
net_interfaces = [interface for interface in net_if_addrs().keys() if interface[:2] in ["ib", "op"]] | ||
if net_interfaces: | ||
return address_by_interface(net_interfaces[0]) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop the else
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function may then return nothing, which would cause make_executor
inside lsst/ctrl/bps/parsl/sites/slurm.py
to fail with a generic error. Is that better than a specific error raised here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I interpreted @PaulPrice as saying write:
if net_interfaces:
return blah
raise RuntimError("reason")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, literally just the else
line, thanks. Yes, that makes sense, cheers.
235d216
to
071ee23
Compare
071ee23
to
1bf5330
Compare
Checklist
doc/changes