-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fleet sensor refactor #2352
Fleet sensor refactor #2352
Conversation
f3cd6df
to
c81922e
Compare
nucypher/acumen/perception.py
Outdated
# Checking if the node already has a checksum address | ||
# (it may be created later during the constructor) | ||
# or if it mutated since the last check. | ||
if self._this_node_ref is not None and getattr(self._this_node_ref(), 'finished_initializing', False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I like the way this logic is evolving, these weakref gymnastics aren't really doing it for me. Is this all just to avoid having to lug around additional_nodes_to_track
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. There are several issues being solved here.
-
Circular references. The current code with
additional_nodes_to_track
has them too:Ursula
->known_nodes
->additional_nodes_to_track
->Ursula
. Hence the weakrefs. -
This code can be called at
Ursula
construction time, when its full metadata is still not available. A check for the local node to be available (currently a very awkwardgetattr
offinished_initializing
) removes the need in an additional mutating call that you have to remember to invoke sometime after you create theUrsula
. -
The local node can be mutated anytime. So we have to request the new metadata every time the state is updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of these are strong enough to justify the enormous hit in readability, IMO.
weakref
is something best saved for when it's truly needed, like when it's the only way out of a hairy performance bottleneck with an object whose representation is too large (in memory) to justify doing some other way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of these are strong enough to justify the enormous hit in readability, IMO.
I wouldn't call two dereferences of a weakref an "enormous hit in readability". And the weakref is there only for dealing with the first point; if you get rid of it, the code will remain pretty much the same because there are still points 2 and 3 to worry about. Now if we prohibit node mutation, that will improve the readability.
weakref is something best saved for when it's truly needed, like when it's the only way out of a hairy performance bottleneck with an object whose representation is too large (in memory) to justify doing some other way.
Ursula
is the main object being held by the cycle here, and it's pretty large. Of course, it mainly matters for testing, but we've already hit a similar problem not so long ago.
Personally, I believe that it is better to take a slight readability hit and use a weakref than debug a problem caused by a reference cycle half a year later.
P.S. I guess it's close to the difference in our views on immutability: your position is that you only need to use weakrefs when you absolutely must, and mine is that you only need to keep reference cycles when you absolutely must.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to admit that this line is pretty tough to read...
if self._this_node_ref is not None and getattr(self._this_node_ref(), 'finished_initializing', False):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent readability improvements throughout - +1 on ArchivedStates
as part of the fleet state lifecycle.
4b140d1
to
ec25522
Compare
nucypher/acumen/perception.py
Outdated
def unpack_snapshot(data): | ||
return FleetState.unpack_snapshot(data) | ||
|
||
def record_fleet_state(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be some consequences to the status monitor code which will need to be updated - https://github.com/nucypher/nucypher-monitor/blob/master/monitor/crawler.py#L251. Not a huge deal, just something to note (@KPrasch )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing that, it'll have to be updated.
How many reviewers would you like on this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @fjarri, as usual!
4d121f0
to
f49b26e
Compare
f49b26e
to
08d7ce5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fjarri, left a few comments for ya!
nucypher/acumen/perception.py
Outdated
# Checking if the node already has a checksum address | ||
# (it may be created later during the constructor) | ||
# or if it mutated since the last check. | ||
if self._this_node_ref is not None and getattr(self._this_node_ref(), 'finished_initializing', False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to admit that this line is pretty tough to read...
if self._this_node_ref is not None and getattr(self._this_node_ref(), 'finished_initializing', False):
response = jsonify(payload) | ||
return response | ||
|
||
return_json = request.args.get('json') == 'true' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why check for true
explicitly here - to avoid the case of evaluating false
as True? I find this to be a bit awkward, perhaps we need to consider using an alternate endpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that the correct way to do it? As far as I understand, URL arguments are given to the request object as strings, casting them is the user's responsibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, alright, fair enough.
9f90455
to
cbc204b
Compare
575da0a
to
37929b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done @fjarri - Thanks for taking to time to think through the nuances of this PR.
response = jsonify(payload) | ||
return response | ||
|
||
return_json = request.args.get('json') == 'true' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, alright, fair enough.
if this_node_changed or remote_nodes_updated or remote_nodes_slashed: | ||
# TODO: if nodes were kept in a Merkle tree, | ||
# we'd have to only recalculate log(N) checksums. | ||
# Is it worth it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very interesting suggestion, perhaps worth moving off this PR for further discussion.
msg = f"Rejected node {node} because its domain is '{node.domain}' but we're only tracking '{self._domain}'" | ||
self.log.warn(msg) | ||
|
||
def __getitem__(self, item): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: I've deprecated this method in #2513
timestamp: maya.MayaDT, | ||
population: int): | ||
nickname = Nickname.from_seed(state_checksum, length=1) | ||
self.remote_states[checksum_address] = ArchivedFleetState(checksum=state_checksum, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻 Great.
The goal of this PR is to make interactions with
FleetSensor
consistent and avoid exposing implementation details, or carrying around ad-hoc states (fleet_state_*
attributes). More specifically:FleetState
is made into a real class. It encapsulates incremental updates and collection-like interface to iterate over the state's nodes. It is exposed fromFleetSensor
ascurrent_state
.FleetSensor
itself manages the buffers with new/deleted nodes and a list of previous states.ArchivedFleetState
instances that do not retain references to actual nodes (since we don't use them anyway), only checksum/nickname/population.record_fleet_state()
in order for the state to get updated. Currently there are multiple cases of accessing the node list without an explicit update.FleetSensor.remote_states
instead of infleet_state_*
attributes.abridged_*
methods are renamed to better represent their purpose. The formerabridged_node_details()
(nowstatus_info()
is now used for both JSON and HTML/status
endpoints outputs.Rough edges and possible improvements:
FleetSensor
toFleetSensor.current_state
.record_fleet_state()
call is to automatically update the state (if there are new nodes) whenevercurrent_state
is accessed.FLEET_STATES_MATCH
along with the fleet state. That seems excessive, but changing that without a proper protocol versioning will be a problem.Note: if this PR is merged, https://github.com/nucypher/nucypher-monitor/blob/master/monitor/crawler.py will need to be updated.