Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up similarity.py and use dataclasses for storing state #5831

Merged
merged 2 commits into from Feb 28, 2023

Conversation

MridulS
Copy link
Member

@MridulS MridulS commented Jun 29, 2022

Made some subjective maintenance changes to similarity.py and fix #5532

I have also used dataclasses for the 2 classes used in similarity.py as the only usecase here was to store state.

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the various comprehension/loop variable cleanups and dataclass usage, though I think one of the dataclasses is unnecessary since it only holds a single value (see inline comments). Maybe that change is beyond the scope of what you had in mind for this PR, in which case it can be addressed in followup.

Overall I approve as the proposed changes are clear improvements!

networkx/algorithms/similarity.py Show resolved Hide resolved

maxcost = MaxCost()
maxcost = MaxCost(Cv.C.sum() + Ce.C.sum() + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this even need to be a data class? Since it's a single value, can't we just use a variable here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if MaxCost needs to be used. it seems to be a one variable dataclass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a way to remove it, but then we need to put in nonlocal for the maxcost val. Python doesn't pass around variables but apparently objects are fine? Updated the code in 67db6c0

More fun python things:

In [1]: from dataclasses import dataclass

In [2]: @dataclass
   ...: class Test:
   ...:     val: int
   ...:

In [3]: def funky_func():
   ...:     x_val = x_val + 1
   ...:     print(x_val)
   ...:     return x_val
   ...:

In [4]: def funky_func_class():
   ...:     x.val = x.val + 1
   ...:     print(x.val)
   ...:     return x.val
   ...:

In [5]: x = Test(1)

In [6]: funky_func_class()
2
Out[6]: 2

In [7]: x = 1

In [8]: funky_func()
---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
Input In [9], in <cell line: 1>()
----> 1 funky_func()

Input In [3], in funky_func()
      1 def funky_func():
----> 2     x_val = x_val + 1
      3     print(x_val)
      4     return x_val

UnboundLocalError: local variable 'x_val' referenced before assignment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love that the nonlocal is necessary, but I still prefer that to defining a dataclass with a single attribute 👍

My approval stands!

@jarrodmillman jarrodmillman added this to the networkx-3.0 milestone Jul 15, 2022
# assert matched_cost <= maxcost.value
maxcost.value = min(maxcost.value, matched_cost)
# assert matched_cost <= maxcost_value
nonlocal maxcost_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nonlocal is a code smell. Possibly outside the scope of this PR to address though.

@@ -187,7 +186,7 @@ def graph_edit_distance(

"""
bestcost = None
for vertex_path, edge_path, cost in optimize_edit_paths(
for _, _, cost in optimize_edit_paths(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't necessarily think this an improvement. I would typically prefix unused loop variables with an underscore, but by removing the names entirely you've stripped out some of the context

Suggested change
for _, _, cost in optimize_edit_paths(
for _vertex_path, _edge_path, cost in optimize_edit_paths(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this is what flake8 would complain about 🙃

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That depends on the configuration that you use. A leading underscore is a widely used convention (in many languages, not just python) for indicating an unused variable, that you still wish to explicitly name.

I'm not sure vanilla flake8 will pick this up, but flake8-bugbear certainly will

Comment on lines +676 to +679
C: ...
lsa_row_ind: ...
lsa_col_ind: ...
ls: ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we all just agree that dataclasses without type annotations are a bit daft?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add annotations once we have settled on them. Dataclasses in itself doesn't care or need annotations https://docs.python.org/3/library/dataclasses.html#dataclasses.dataclass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply mean in the sense that you need to have an ellipsis as a placeholder for annotations anyway. So the arguments against type annotations made in that other monster thread (namely visual clutter) fall away entirely in the case of these dataclasses.

@rossbar
Copy link
Contributor

rossbar commented Sep 13, 2022

@MridulS does the need for non-local go away if you use a namedtuple instead of a dataclass? If so that might be a nice alternative

@rossbar rossbar modified the milestones: networkx-3.0, networkx-3.1 Nov 29, 2022
@rossbar rossbar merged commit e6b0062 into networkx:main Feb 28, 2023
Alex-Markham pushed a commit to Alex-Markham/networkx that referenced this pull request Oct 13, 2023
…x#5831)

* Clean up similarity.py and use dataclasses for storing state

* use nonlocal to stop using an object to store maxcost value
dschult pushed a commit to BrunoBaldissera/networkx that referenced this pull request Oct 23, 2023
…x#5831)

* Clean up similarity.py and use dataclasses for storing state

* use nonlocal to stop using an object to store maxcost value
cvanelteren pushed a commit to cvanelteren/networkx that referenced this pull request Apr 22, 2024
…x#5831)

* Clean up similarity.py and use dataclasses for storing state

* use nonlocal to stop using an object to store maxcost value
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Update code in similarity.py
5 participants