Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-38146: Update Princeton site interface from ib0 to op0 #13

Merged
merged 1 commit into from Feb 28, 2023

Conversation

leeskelvin
Copy link
Contributor

@leeskelvin leeskelvin commented Feb 27, 2023

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@codecov
Copy link

codecov bot commented Feb 27, 2023

Codecov Report

Base: 70.37% // Head: 70.37% // No change to project coverage 👍

Coverage data is based on head (071ee23) compared to base (c3cf636).
Patch has no changes to coverable lines.

❗ Current head 071ee23 differs from pull request most recent head 1bf5330. Consider uploading reports for the commit 1bf5330 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #13   +/-   ##
=======================================
  Coverage   70.37%   70.37%           
=======================================
  Files           3        3           
  Lines          27       27           
  Branches        6        4    -2     
=======================================
  Hits           19       19           
  Misses          6        6           
  Partials        2        2           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@@ -104,4 +104,4 @@ def get_address(self) -> str:
interface, because the cluster nodes can't connect to the head node
through the regular internet.
"""
return address_by_interface("ib0")
return address_by_interface("op0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fragile if we have to change it a lot. Have you considered having an environment variable fallback?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feeling is that this shouldn't change a lot, but I may be off base about that. Do you have an example of env vars in use elsewhere that we might use here as an example?

One possibility I considered is making use of the get_all_addresses function within parsl.addresses, but I'm not sure how to use the information provided there to make the choice on the 'best' address to use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a yaml config value (I think self contains the bps config)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been in touch with Princeton IT regarding whether or not this variable is likely to change again in the future. To quote:

this will not and cannot change. op0 was always the main high speed interface for tiger2 cluster

I think therefore that the fix we're making on this ticket is changing the Princeton site settings to what they should have been originally, and this is not likely to change again in the future. With that in mind, I'm inclined to leave this as-is and not add any user-specified fall-back. I.e., if this does break in the future, it's probably something we want to fail fast and fix in consultation with Princeton IT, rather than be hidden behind any kind of fall back procedure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further discussion with a different member of staff within Princeton IT, it appears that this device interface may indeed change back to ib0 or ibN at some future date, likely when the clusters are eventually upgraded. With that in mind, I've modified this ticket to make use of the same back-end within parsl (the psutil function net_if_addrs) to list all possible network interfaces and match only to those starting with either ib* or op*. I hope this will make us robust to any future configuration updates, removing the need to fix this issue if/when it crops up again.

@PaulPrice, do these new changes look okay to you?

@leeskelvin leeskelvin force-pushed the tickets/DM-38146 branch 2 times, most recently from 0260ec0 to ca68380 Compare February 27, 2023 23:07
@timj
Copy link
Member

timj commented Feb 27, 2023

Add this to your mypy.ini:

[mypy-psutil.*]
ignore_missing_imports = True

net_interfaces = [interface for interface in net_if_addrs().keys() if interface[:2] in ["ib", "op"]]
if net_interfaces:
return address_by_interface(net_interfaces[0])
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop the else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function may then return nothing, which would cause make_executor inside lsst/ctrl/bps/parsl/sites/slurm.py to fail with a generic error. Is that better than a specific error raised here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I interpreted @PaulPrice as saying write:

if net_interfaces:
    return blah
raise RuntimError("reason")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, literally just the else line, thanks. Yes, that makes sense, cheers.

@leeskelvin leeskelvin force-pushed the tickets/DM-38146 branch 2 times, most recently from 235d216 to 071ee23 Compare February 27, 2023 23:24
@leeskelvin leeskelvin merged commit 6a9ae97 into main Feb 28, 2023
@leeskelvin leeskelvin deleted the tickets/DM-38146 branch February 28, 2023 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants