Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fleet sensor refactor #2352

Merged
merged 3 commits into from
Feb 19, 2021
Merged

Fleet sensor refactor #2352

merged 3 commits into from
Feb 19, 2021

Conversation

fjarri
Copy link
Contributor

@fjarri fjarri commented Oct 13, 2020

The goal of this PR is to make interactions with FleetSensor consistent and avoid exposing implementation details, or carrying around ad-hoc states (fleet_state_* attributes). More specifically:

  • FleetState is made into a real class. It encapsulates incremental updates and collection-like interface to iterate over the state's nodes. It is exposed from FleetSensor as current_state.
  • FleetSensor itself manages the buffers with new/deleted nodes and a list of previous states.
  • Previous states are kept as simple ArchivedFleetState instances that do not retain references to actual nodes (since we don't use them anyway), only checksum/nickname/population.
  • The user (meaning the higher levels of the codebase) must call record_fleet_state() in order for the state to get updated. Currently there are multiple cases of accessing the node list without an explicit update.
  • Fleet states for remote nodes are kept in FleetSensor.remote_states instead of in fleet_state_* attributes.
  • abridged_* methods are renamed to better represent their purpose. The former abridged_node_details() (now status_info() is now used for both JSON and HTML /status endpoints outputs.

Rough edges and possible improvements:

  • This PR may include a fix for Use "canonical address" instead of checksum address to key known_nodes #1995, if we decide on the course of actions.
  • It is possible to avoid calculating the new state's checksum in some cases. Since remote nodes are immutable, if we retain the local node's metadata and remote nodes' addresses in archived states, we can compare those with the current ones, and use the archived state's checksum if there's a match. Not sure if there will be a noticeable benefit.
  • The usage of the current state can be made more explicit if we don't forward most of the calls from FleetSensor to FleetSensor.current_state.
  • An alternative to the explicit record_fleet_state() call is to automatically update the state (if there are new nodes) whenever current_state is accessed.
  • In case of matching fleet states, the teacher sends FLEET_STATES_MATCH along with the fleet state. That seems excessive, but changing that without a proper protocol versioning will be a problem.

Note: if this PR is merged, https://github.com/nucypher/nucypher-monitor/blob/master/monitor/crawler.py will need to be updated.

@fjarri fjarri force-pushed the fleet-sensor-refactor branch 8 times, most recently from f3cd6df to c81922e Compare October 18, 2020 16:34
# Checking if the node already has a checksum address
# (it may be created later during the constructor)
# or if it mutated since the last check.
if self._this_node_ref is not None and getattr(self._this_node_ref(), 'finished_initializing', False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I like the way this logic is evolving, these weakref gymnastics aren't really doing it for me. Is this all just to avoid having to lug around additional_nodes_to_track?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. There are several issues being solved here.

  1. Circular references. The current code with additional_nodes_to_track has them too: Ursula -> known_nodes -> additional_nodes_to_track -> Ursula. Hence the weakrefs.

  2. This code can be called at Ursula construction time, when its full metadata is still not available. A check for the local node to be available (currently a very awkward getattr of finished_initializing) removes the need in an additional mutating call that you have to remember to invoke sometime after you create the Ursula.

  3. The local node can be mutated anytime. So we have to request the new metadata every time the state is updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these are strong enough to justify the enormous hit in readability, IMO.

weakref is something best saved for when it's truly needed, like when it's the only way out of a hairy performance bottleneck with an object whose representation is too large (in memory) to justify doing some other way.

Copy link
Contributor Author

@fjarri fjarri Oct 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these are strong enough to justify the enormous hit in readability, IMO.

I wouldn't call two dereferences of a weakref an "enormous hit in readability". And the weakref is there only for dealing with the first point; if you get rid of it, the code will remain pretty much the same because there are still points 2 and 3 to worry about. Now if we prohibit node mutation, that will improve the readability.

weakref is something best saved for when it's truly needed, like when it's the only way out of a hairy performance bottleneck with an object whose representation is too large (in memory) to justify doing some other way.

Ursula is the main object being held by the cycle here, and it's pretty large. Of course, it mainly matters for testing, but we've already hit a similar problem not so long ago.

Personally, I believe that it is better to take a slight readability hit and use a weakref than debug a problem caused by a reference cycle half a year later.

P.S. I guess it's close to the difference in our views on immutability: your position is that you only need to use weakrefs when you absolutely must, and mine is that you only need to keep reference cycles when you absolutely must.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit that this line is pretty tough to read...

if self._this_node_ref is not None and getattr(self._this_node_ref(), 'finished_initializing', False):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@KPrasch KPrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent readability improvements throughout - +1 on ArchivedStates as part of the fleet state lifecycle.

nucypher/network/nodes.py Outdated Show resolved Hide resolved
nucypher/network/nodes.py Outdated Show resolved Hide resolved
nucypher/network/server.py Outdated Show resolved Hide resolved
nucypher/network/server.py Outdated Show resolved Hide resolved
@KPrasch KPrasch mentioned this pull request Jan 11, 2021
7 tasks
@fjarri fjarri force-pushed the fleet-sensor-refactor branch 2 times, most recently from 4b140d1 to ec25522 Compare January 15, 2021 05:53
fjarri added a commit to fjarri/nucypher that referenced this pull request Jan 15, 2021
@fjarri fjarri changed the title [WIP] Fleet sensor refactor Fleet sensor refactor Jan 15, 2021
@fjarri fjarri marked this pull request as ready for review January 15, 2021 05:58
@fjarri fjarri added the Ursula 👩‍🚀 Effects the "Ursula" development area label Jan 15, 2021
nucypher/acumen/nicknames.py Show resolved Hide resolved
nucypher/acumen/perception.py Show resolved Hide resolved
nucypher/acumen/perception.py Outdated Show resolved Hide resolved
nucypher/acumen/perception.py Outdated Show resolved Hide resolved
def unpack_snapshot(data):
return FleetState.unpack_snapshot(data)

def record_fleet_state(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be some consequences to the status monitor code which will need to be updated - https://github.com/nucypher/nucypher-monitor/blob/master/monitor/crawler.py#L251. Not a huge deal, just something to note (@KPrasch )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing that, it'll have to be updated.

@KPrasch
Copy link
Member

KPrasch commented Jan 15, 2021

How many reviewers would you like on this PR?

Copy link
Member

@cygnusv cygnusv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @fjarri, as usual!

nucypher/acumen/perception.py Outdated Show resolved Hide resolved
nucypher/network/nodes.py Show resolved Hide resolved
nucypher/network/nodes.py Outdated Show resolved Hide resolved
fjarri added a commit to fjarri/nucypher that referenced this pull request Jan 28, 2021
Copy link
Member

@KPrasch KPrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fjarri, left a few comments for ya!

nucypher/acumen/perception.py Outdated Show resolved Hide resolved
nucypher/acumen/perception.py Outdated Show resolved Hide resolved
nucypher/acumen/perception.py Outdated Show resolved Hide resolved
nucypher/acumen/perception.py Outdated Show resolved Hide resolved
# Checking if the node already has a checksum address
# (it may be created later during the constructor)
# or if it mutated since the last check.
if self._this_node_ref is not None and getattr(self._this_node_ref(), 'finished_initializing', False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit that this line is pretty tough to read...

if self._this_node_ref is not None and getattr(self._this_node_ref(), 'finished_initializing', False):

nucypher/acumen/perception.py Outdated Show resolved Hide resolved
nucypher/acumen/perception.py Outdated Show resolved Hide resolved
response = jsonify(payload)
return response

return_json = request.args.get('json') == 'true'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why check for true explicitly here - to avoid the case of evaluating false as True? I find this to be a bit awkward, perhaps we need to consider using an alternate endpoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that the correct way to do it? As far as I understand, URL arguments are given to the request object as strings, casting them is the user's responsibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, alright, fair enough.

tests/integration/learning/test_fleet_state.py Outdated Show resolved Hide resolved
tests/unit/test_external_ip_utilities.py Show resolved Hide resolved
fjarri added a commit to fjarri/nucypher that referenced this pull request Feb 16, 2021
@fjarri fjarri force-pushed the fleet-sensor-refactor branch 2 times, most recently from 9f90455 to cbc204b Compare February 16, 2021 06:14
Copy link
Member

@KPrasch KPrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done @fjarri - Thanks for taking to time to think through the nuances of this PR.

response = jsonify(payload)
return response

return_json = request.args.get('json') == 'true'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, alright, fair enough.

if this_node_changed or remote_nodes_updated or remote_nodes_slashed:
# TODO: if nodes were kept in a Merkle tree,
# we'd have to only recalculate log(N) checksums.
# Is it worth it?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very interesting suggestion, perhaps worth moving off this PR for further discussion.

msg = f"Rejected node {node} because its domain is '{node.domain}' but we're only tracking '{self._domain}'"
self.log.warn(msg)

def __getitem__(self, item):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: I've deprecated this method in #2513

timestamp: maya.MayaDT,
population: int):
nickname = Nickname.from_seed(state_checksum, length=1)
self.remote_states[checksum_address] = ArchivedFleetState(checksum=state_checksum,
Copy link
Member

@KPrasch KPrasch Feb 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 Great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ursula 👩‍🚀 Effects the "Ursula" development area
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants