New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up similarity.py and use dataclasses for storing state #5831
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the various comprehension/loop variable cleanups and dataclass usage, though I think one of the dataclasses is unnecessary since it only holds a single value (see inline comments). Maybe that change is beyond the scope of what you had in mind for this PR, in which case it can be addressed in followup.
Overall I approve as the proposed changes are clear improvements!
networkx/algorithms/similarity.py
Outdated
|
||
maxcost = MaxCost() | ||
maxcost = MaxCost(Cv.C.sum() + Ce.C.sum() + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this even need to be a data class? Since it's a single value, can't we just use a variable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder if MaxCost needs to be used. it seems to be a one variable dataclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a way to remove it, but then we need to put in nonlocal
for the maxcost val. Python doesn't pass around variables but apparently objects are fine? Updated the code in 67db6c0
More fun python things:
In [1]: from dataclasses import dataclass
In [2]: @dataclass
...: class Test:
...: val: int
...:
In [3]: def funky_func():
...: x_val = x_val + 1
...: print(x_val)
...: return x_val
...:
In [4]: def funky_func_class():
...: x.val = x.val + 1
...: print(x.val)
...: return x.val
...:
In [5]: x = Test(1)
In [6]: funky_func_class()
2
Out[6]: 2
In [7]: x = 1
In [8]: funky_func()
---------------------------------------------------------------------------
UnboundLocalError Traceback (most recent call last)
Input In [9], in <cell line: 1>()
----> 1 funky_func()
Input In [3], in funky_func()
1 def funky_func():
----> 2 x_val = x_val + 1
3 print(x_val)
4 return x_val
UnboundLocalError: local variable 'x_val' referenced before assignment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love that the nonlocal
is necessary, but I still prefer that to defining a dataclass with a single attribute 👍
My approval stands!
# assert matched_cost <= maxcost.value | ||
maxcost.value = min(maxcost.value, matched_cost) | ||
# assert matched_cost <= maxcost_value | ||
nonlocal maxcost_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nonlocal
is a code smell. Possibly outside the scope of this PR to address though.
@@ -187,7 +186,7 @@ def graph_edit_distance( | |||
|
|||
""" | |||
bestcost = None | |||
for vertex_path, edge_path, cost in optimize_edit_paths( | |||
for _, _, cost in optimize_edit_paths( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't necessarily think this an improvement. I would typically prefix unused loop variables with an underscore, but by removing the names entirely you've stripped out some of the context
for _, _, cost in optimize_edit_paths( | |
for _vertex_path, _edge_path, cost in optimize_edit_paths( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well this is what flake8 would complain about 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That depends on the configuration that you use. A leading underscore is a widely used convention (in many languages, not just python) for indicating an unused variable, that you still wish to explicitly name.
I'm not sure vanilla flake8 will pick this up, but flake8-bugbear
certainly will
C: ... | ||
lsa_row_ind: ... | ||
lsa_col_ind: ... | ||
ls: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we all just agree that dataclasses without type annotations are a bit daft?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add annotations once we have settled on them. Dataclasses in itself doesn't care or need annotations https://docs.python.org/3/library/dataclasses.html#dataclasses.dataclass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simply mean in the sense that you need to have an ellipsis as a placeholder for annotations anyway. So the arguments against type annotations made in that other monster thread (namely visual clutter) fall away entirely in the case of these dataclasses.
@MridulS does the need for non-local go away if you use a namedtuple instead of a dataclass? If so that might be a nice alternative |
…x#5831) * Clean up similarity.py and use dataclasses for storing state * use nonlocal to stop using an object to store maxcost value
…x#5831) * Clean up similarity.py and use dataclasses for storing state * use nonlocal to stop using an object to store maxcost value
…x#5831) * Clean up similarity.py and use dataclasses for storing state * use nonlocal to stop using an object to store maxcost value
Made some subjective maintenance changes to similarity.py and fix #5532
I have also used dataclasses for the 2 classes used in similarity.py as the only usecase here was to store state.