Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error out when pydot fails to correctly parse node names #5667

Merged
merged 1 commit into from
May 26, 2022

Conversation

MridulS
Copy link
Member

@MridulS MridulS commented May 26, 2022

Maybe we should error out gracefully when there is an issue with converting a python string (name of the node in the networkx object) to a pydot Node object.

This should help out with issues like #5662, #4663

Also this way we can probably catch other parsing errors, not just : ones.

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of this as I think it's an improvement over the current behavior (i.e. unexpected results without warning).

In the longer term, I wonder if we shouldn't consider deprecating support for using pydot as a "backend" for converting networkx objects to dot format. Our test suite for nx_pydot is still failing due to pyparsing changes that have not percolated up into pydot (at least not in a way that's relevant for our test cases).

Anyways, that's a bigger discussion. IMO there'd have to be a viable, easily-installable alternative (I'm thinking wheels for pygraphviz) before considering it too seriously.

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- this is an improvement.
And it'd be nice if we didn't have to support pydot. But for now this is good. :}

@jarrodmillman jarrodmillman merged commit 9c29872 into networkx:main May 26, 2022
@jarrodmillman jarrodmillman added this to the networkx-2.8.3 milestone May 28, 2022
effigies added a commit to effigies/nipype that referenced this pull request Jun 6, 2022
Accommodates behavior documented in pydot/pydot#258 which causes errors
since networkx/networkx#5667.
@peterjc
Copy link
Contributor

peterjc commented Jun 7, 2022

To me this is a regression, consider the following example inspired by your test case, using networkx 2.8.2,

$ python
Python 3.9.13 | packaged by conda-forge | (main, May 27 2022, 17:00:52) 
[Clang 13.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import networkx as nx
>>> nx.__version__
'2.8.2'
>>> for x in ["Hello", "'Hello'", '"Hello"']:
...     print(f"Graph with node named {x}")
...     try:
...         nx.nx_pydot.to_pydot(nx.Graph([(x, 1)]))
...     except ValueError:
...         print("Failed")
... 
Graph with node named Hello
<pydot.Dot object at 0x10fb25be0>
Graph with node named 'Hello'
<pydot.Dot object at 0x10ff62a90>
Graph with node named "Hello"
<pydot.Dot object at 0x10ff81be0>

But with networkx 2.8.3, only node names with explicit double quotes seem to work:

>>> import networkx as nx
>>> nx.__version__
'2.8.3'
>>> for x in ["Hello", "'Hello'", '"Hello"']:
...     print(f"Graph with node named {x}")
...     try:
...         nx.nx_pydot.to_pydot(nx.Graph([(x, 1)]))
...     except ValueError:
...         print("Failed")
... 
Graph with node named Hello
Failed
Graph with node named 'Hello'
Failed
Graph with node named "Hello"
<pydot.Dot object at 0x115c87e20>

Is this intensional, or was I relying on an undocumented feature?

Edit: Update example from Hello world to use a single word Hello, avoiding the complications of a space. My real use case has checksum strings as single word node names.

Edit: July 2022 - confirming it was indeed my dropping the space which broke the test case as identified below.

@MridulS
Copy link
Member Author

MridulS commented Jun 7, 2022

@peterjc I'm not able to reproduce this with 2.8.3. Which version of pydot are you using locally? The following is tested on pydot 1.4.2 and python 3.9

In [1]: import networkx as nx

In [2]: nx.__version__
Out[2]: '2.8.3'

In [3]: for x in ["Hello", "'Hello'", '"Hello"']:
   ...:     print(f"Graph with node named {x}")
   ...:     try:
   ...:         nx.nx_pydot.to_pydot(nx.Graph([(x, 1)]))
   ...:     except ValueError:
   ...:         print("Failed")
   ...:
Graph with node named Hello
Graph with node named 'Hello'
Failed
Graph with node named "Hello"

Just a normal string seems to work fine:

In [4]: nx.nx_pydot.to_pydot(nx.Graph([("Hello", 1)]))
Out[4]: <pydot.Dot at 0x10a5c3e80>
In [5]: nx.nx_pydot.to_pydot(nx.Graph([('Hello', 1)]))
Out[5]: <pydot.Dot at 0x10da82520>

Only the "'Hello'" string naming fails as pydot treats the quotes very differently:

In [12]: pydot_node = pydot.Node(str("'Hello'")).get_name()

In [13]: pydot_node
Out[13]: '"\'Hello\'"'

In [14]: len(pydot_node)
Out[14]: 9

In [15]: pydot_node = pydot.Node(str('"Hello"')).get_name()

In [16]: pydot_node
Out[16]: '"Hello"'

In [17]: len(pydot_node)
Out[17]: 7

@peterjc
Copy link
Contributor

peterjc commented Jun 7, 2022

I'm at a different machine now, older Python and initially even older networkx:

$ python
Python 3.8.5 (default, Sep  4 2020, 02:22:02) 
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import networkx as nx
>>> nx.__version__
'2.6.3'
>>> import pydot
>>> pydot.__version__
'1.4.2'
>>> for x in ["Hello", "'Hello'", '"Hello"']:
...     print(f"Graph with node named {x}")
...     try:
...         nx.nx_pydot.to_pydot(nx.Graph([(x, 1)]))
...     except ValueError:
...         print("Failed")
... 
Graph with node named Hello
<pydot.Dot object at 0x7fbe1326b220>
Graph with node named 'Hello'
<pydot.Dot object at 0x7fbe146dc700>
Graph with node named "Hello"
<pydot.Dot object at 0x7fbe146fd580>
>>> 

After updating to networkx 2.8.3,

$ python
Python 3.8.5 (default, Sep  4 2020, 02:22:02) 
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import networkx as nx
>>> nx.__version__
'2.8.3'
>>> import pydot
>>> pydot.__version__
'1.4.2'
>>> import pyparsing
>>> pyparsing.__version__
'2.4.7'
>>> for x in ["Hello", "'Hello'", '"Hello"']:
...     print(f"Graph with node named {x}")
...     try:
...         nx.nx_pydot.to_pydot(nx.Graph([(x, 1)]))
...     except ValueError:
...         print("Failed")
... 
Graph with node named Hello
<pydot.Dot object at 0x7f80bfb67c40>
Graph with node named 'Hello'
Failed
Graph with node named "Hello"
<pydot.Dot object at 0x7f80c0ec6190>

Curious, not the same as before. I'll recheck on the first macOS machine another day (with Python 3.9 via conda), but the original issue came up on CircleCI under Linux so this is not macOS specific. The version of Python could be important...

Update: That was with pyparsing 2.4.7

@rossbar
Copy link
Contributor

rossbar commented Jun 7, 2022

NetworkX has experienced problems with pydot for a while now, including some behavior that changed in pydot with pyparsing v3, so you might want to check the pyparsing version as well to see if that makes a difference.

Unfortunately there's only so much we can do about transitive dependencies - the goal was to raise an informative error when there was a likely pydot parsing issue, but I wouldn't be surprised if there were false positives.

@peterjc
Copy link
Contributor

peterjc commented Jun 7, 2022

I updated pyparsing from 2.4.7 to 3.0.9 on this second macOS machine, no change:

$ python
Python 3.8.5 (default, Sep  4 2020, 02:22:02) 
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyparsing
>>> pyparsing.__version__
'3.0.9'
>>> import networkx as nx
>>> nx.__version__
'2.8.3'
>>> import pydot
>>> pydot.__version__
'1.4.2'
>>> for x in ["Hello", "'Hello'", '"Hello"']:
...     print(f"Graph with node named {x}")
...     try:
...         nx.nx_pydot.to_pydot(nx.Graph([(x, 1)]))
...     except ValueError:
...         print("Failed")
... 
Graph with node named Hello
<pydot.Dot object at 0x7fd710b94e50>
Graph with node named 'Hello'
Failed
Graph with node named "Hello"
<pydot.Dot object at 0x7fd712fb8700>

I guess we need to narrow down what triggers the apparent false positive I fell over... or I just explicitly wrap my node names with double quotes just in case?

@MridulS
Copy link
Member Author

MridulS commented Jun 7, 2022

I guess we need to narrow down what triggers the apparent false positive I fell over... or I just explicitly wrap my node names with double quotes just in case?

Could you share what is the exact string you are trying to write as a node? If it's a normal string with no "special" characters. I would just suggest a normal "hello" if it doesn't have special characters like : which pydot doesn't like.

@peterjc
Copy link
Contributor

peterjc commented Jun 7, 2022

I'm using MD5 checksums like 29de890989becddc5e0b10ecbbc11b1a, so just normal strings of letters and digits.

So I should add double quotes when giving this to networkx? i.e. rather than:

node_name = "29de890989becddc5e0b10ecbbc11b1a"  # sometimes breaks

it is safer to use:

node_name = '"29de890989becddc5e0b10ecbbc11b1a"'  # explicit double quotes should work

tomwhite added a commit to cubed-dev/cubed that referenced this pull request Jun 7, 2022
@dschult
Copy link
Member

dschult commented Jun 7, 2022

Is there a reason you are using pydot instead of pygraphviz?

@peterjc
Copy link
Contributor

peterjc commented Jun 8, 2022

@dschult As I recall, pygraphviz was a bigger dependency chain (esp. on Windows), but I made that choice was years ago. It looks like now it is easy via conda-forge (including Windows), but PyPI only has a source code archive which could be painful to install due to needing a compiler.

In comparison, pydot is pure Python so trivial to install via conda or pip.

--

Should I open a dedicated issue on this regression (although we still need to narrow down when it is triggered)?

@MridulS
Copy link
Member Author

MridulS commented Jun 8, 2022

Alright the issue here is:

In [70]: pydot.Node(str('a1')).get_name()
In [70]: pydot.Node(str('a1')).get_name()
Out[70]: 'a1'

In [71]: pydot.Node(str('1a')).get_name()
Out[71]: '"1a"'

if the node name starts with an integer, pydot does something funky and adds quotes and this fails the new test added in this PR.

@peterjc
Copy link
Contributor

peterjc commented Jun 8, 2022

Thank you - a minimal test case for triggering this false positive.

(I'm not back at the first macOS machine to confirm, but I think on that particular combination of dependency versions it was broader than just names starting with an integer.)

peterjc added a commit to peterjc/thapbi-pict that referenced this pull request Jun 9, 2022
peterjc added a commit to peterjc/thapbi-pict that referenced this pull request Jun 9, 2022
MridulS added a commit to MridulS/networkx that referenced this pull request Feb 4, 2023
cvanelteren pushed a commit to cvanelteren/networkx that referenced this pull request Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

5 participants