v0.7.0 write_graphml changes integer node attribute type and value #796

macks22 · 2014-12-27T13:47:17Z

I have a graph with 2,146,334 nodes, each with a 'name' attribute which contains unique integer IDs. When I write this graph to a GraphML file using the write_graphml or write_graphmlz methods of the Graph instance, the 'name' attributes are given the type double and many of them change. The code below illustrates:

g = igraph.Graph()
len(set(ids))  # 2146334

g.add_vertices(ids)
len(set([v['name'] for v in g.vs]))  # 2146334
g.vs[2146331]['name']  # 1347793

g.write_graphml('test.graphml')
g = igraph.Graph.Read_GraphML('test.graphml')

len([v['name'] for v in g.vs])  # 2146334
len(set([v['name'] for v in g.vs]))  # 1114563
g.vs[2146331]['name']  # 1347790.0

The same issue occurs if I create a new attribute for the IDs rather than relying on the default 'name' attribute.

For now I am circumventing this problem by converting the integer IDs to strings, which are handled properly. However, it would be nice to have this resolved.

The text was updated successfully, but these errors were encountered:

ntamas · 2014-12-27T22:00:21Z

We are dealing with two separate issues here. One is the fact that igraph converts integer attributes to doubles when the graph is saved to GraphML. Unfortunately this is not easy to deal with because the GraphML writer is implemented deep down in igraph's C core, and on the C level igraph distinguishes between three types of attributes only: numbers (which are stored as doubles), strings and Booleans. This means that by the time the GraphML writer function is called, the Python interface has already converted the attribute values to "regular" C doubles because this is the only way it can pass the values down to the C layer. However, if you are not tied to the GraphML format, you can simply pickle your graphs instead (using the pickle module), which preserves the exact Python type of every attribute (because the saving and loading is done in the Python layer and not in C).

The other problem (the fact that it seems that the attributes are not loaded back properly) is more interesting, but unfortunately I can investigate this only if you could upload a full, self-contained script (and most likely a corresponding GraphML file) somewhere that reproduces the error on your machine. Please post the URL here if you managed to produce such a script so I can check what's going on here.

macks22 · 2014-12-28T21:01:11Z

Thank you for your prompt reply. Actually, the only thing you should need in addition to the code I posted in the comment above is the list of integer IDs. I could give this to you, but it turns out you don't actually need it.

Interestingly, the attributes only change when the integers are above a certain value. I found the break point in my original graph case and then isolated the value above which errors start to occur. It seems any integer values over 1,000,000 gets rounded to the nearest tenth place using something similar to the typical decimal rounding procedure. You can replicate this with the following:

g = igraph.Graph()
ids = range(999980, 1000020)
g.add_vertices(ids)
g.write_graphml('test.graphml')
tg = igraph.Graph.Read_GraphML('test.graphml')
zip(g.vs['name'], tg.vs['name'])

This is the output:

[(999980, 999980.0),
 (999981, 999981.0),
 (999982, 999982.0),
 (999983, 999983.0),
 (999984, 999984.0),
 (999985, 999985.0),
 (999986, 999986.0),
 (999987, 999987.0),
 (999988, 999988.0),
 (999989, 999989.0),
 (999990, 999990.0),
 (999991, 999991.0),
 (999992, 999992.0),
 (999993, 999993.0),
 (999994, 999994.0),
 (999995, 999995.0),
 (999996, 999996.0),
 (999997, 999997.0),
 (999998, 999998.0),
 (999999, 999999.0),
 (1000000, 1000000.0),
 (1000001, 1000000.0),
 (1000002, 1000000.0),
 (1000003, 1000000.0),
 (1000004, 1000000.0),
 (1000005, 1000000.0),
 (1000006, 1000010.0),
 (1000007, 1000010.0),
 (1000008, 1000010.0),
 (1000009, 1000010.0),
 (1000010, 1000010.0),
 (1000011, 1000010.0),
 (1000012, 1000010.0),
 (1000013, 1000010.0),
 (1000014, 1000010.0),
 (1000015, 1000020.0),
 (1000016, 1000020.0),
 (1000017, 1000020.0),
 (1000018, 1000020.0),
 (1000019, 1000020.0)]

Oddly, the rounding at multiples of 5 alternates between rounding down and rounding up. The first rounds down, the next up, and so on. This may have something to do with the floating point rounding protocol.

ntamas · 2014-12-28T22:45:37Z

Okay, this is due to how the standard C library prints floats into the GraphML file. Up to 999999 we are fine because we write the number exactly into GraphML. From 1000000 the standard C library switches to scientific notation with rounding, so we get 1.00000e+06 instead, and of course the least significant digit is lost. I'll post a patch soon.

(Fun fact: your code crashes my machine with zsh: illegal hardware instruction python attr_test.py - probably when we try to parse scientific notation back to a C double).

ntamas · 2014-12-29T00:10:25Z

Note to self: we are probably looking for a solution that strives to

represent integers using plain decimals only, without any scientific notation involved (to improve readability of the GraphML file) and
display as many significant digits from non-integers as possible to ensure the smallest loss of precision when a GraphML file is saved and then loaded back

…ing precision for numeric attributes, fixes #796

ntamas · 2014-12-29T14:15:00Z

Fixed in fdcaa14.

ntamas added the Python label Dec 27, 2014

ntamas self-assigned this Dec 27, 2014

ntamas added C high High-priority issue; typically for cases when igraph returns incorrect result for non-corner cases and removed Python labels Dec 28, 2014

ntamas added this to the 0.7.2 milestone Dec 28, 2014

ntamas added a commit that referenced this issue Dec 29, 2014

GraphML writer now also uses igraph_real_fprintf_precise to avoid los…

fdcaa14

…ing precision for numeric attributes, fixes #796

ntamas closed this as completed Dec 29, 2014

ntamas mentioned this issue Nov 5, 2015

write_graphml rounds float values to 4 decimal digits igraph/python-igraph#45

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.7.0 write_graphml changes integer node attribute type and value #796

v0.7.0 write_graphml changes integer node attribute type and value #796

macks22 commented Dec 27, 2014

ntamas commented Dec 27, 2014

macks22 commented Dec 28, 2014

ntamas commented Dec 28, 2014

ntamas commented Dec 29, 2014

ntamas commented Dec 29, 2014

v0.7.0 write_graphml changes integer node attribute type and value #796

v0.7.0 write_graphml changes integer node attribute type and value #796

Comments

macks22 commented Dec 27, 2014

ntamas commented Dec 27, 2014

macks22 commented Dec 28, 2014

ntamas commented Dec 28, 2014

ntamas commented Dec 29, 2014

ntamas commented Dec 29, 2014