-
Notifications
You must be signed in to change notification settings - Fork 9
1.3 Specifications #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you @mbastian for moving this forward. On top of my head I have two issues with the GEXF format. I am sharing it here without any opinion on whether they should be included in the new spec or not. I am just sharing what I experienced. viz attributes' originIn my use cases 90% of times viz parameters are actually derived from attributes. To sum-up, the purpose of such a feature is to help drawing legend in any Gexf visualizations. I am aware that this feature would probably require to change some internal Gephi feature too. optional ID for edgesA long time ago I stumbled upon an issue with the networkx implementation of Gexf networkx/networkx#1296 These are rough comment. It would need a more in depth work. I would be glad to contribute in these directions if the maintainers and the community think there might be a place/need for it. |
My pleasure, long overdue!
Very interesting idea, thanks for sharing. I had not appreciate how important it might be to connect the viz attributes to the element attributes, and what it could enable. Having the ability to eventually draw a legend just based on the GEXF seems like an attractive idea indeed. I would love to hear more opinions on how we might implement this. An "origin" attribute could indeed be appropriate. I have a bit trouble to follow the rest of your suggestion so maybe you can paste some examples of how you would see this implemented.
Makes a lot of sense. In fact, the GEXF importer in Gephi doesn't even throw a warning if you omit the edge ids. It creates them dynamically if they are missing. The processor also seems to rely on |
|
Hello @mbastian, one specific thing which I don't completely understand with gexf, and it might just be that I did not understand the specification as clearly stating this but is there an enforced order of tags in a gexf file? What I mean is, is this stated somewhere that the model must be declared before the nodes and then the edges. I am asking because I already found in the wild gexf where the order was shuffled and I remember this to break Gephi for instance (this might not be the case anymore. One other pain point, on top of my head, was the confusion sometimes between the |
Sorry what do you mean by model here?
Makes sense. Normally the title shouldn't be used at all for a reference. But it address this problem, tis is why I propose this alternative way of representing attribute values, which I think would be more JSON friendly: In 1.2, we do require attributes to be defined: In 1.3, we could support an alternative way that would omit the attributes and simply list the id+type in the What do you think? For the parsers it most likely wouldn't be a large change and it would be a lot more JSON friendly. |
By model I mean the attributes definition. What I mean is that implicitly a gexf file should be ordered thusly: Which make senses, especially if you need to stream the xml file for some reason. But I am unsure whether this order is enforced by the specs. And I have already seen weird things in the wild such as: for instance, produced by some xml writers that work on unordered key-value structure conversion. I think this order was breaking Gephi import at some point (it might still be the case).
Does this mean the attributes declaration on top of the file would not be mandatory anymore? In which case it sounds like a bad idea, especially for parsers that need to allocate static amount of memory beforehand having a knowledge of the attributes, no? |
Good point, as far as I can see the nodes and edges order is enforced but attributes not given that it's in a different
Yes that's what I had in mind. How bad is it really? The only difference would be to allocate when you first see a new id versus at the beginning. For which parser do you think that could be an issue? |
I do not know nor use such a parser currently I think, but any low-level language parser that would define some kind of static size struct for nodes & edges based on attributes declaration would probably have issues with the fact that now attributes might be defined on the fly when perusing nodes or edges (what's more, we can imagine some attributes not existing on the first nodes but on subsequent ones, if the attribute can be undefined in the source data/language representation, such as it can be the case with Another argument could be one of the complexification of parser implementation because now you have to consider two different methods of attribute declaration. Another question would also be: what should happen if I have a node: <node id="42" label="node A">
<attvalues>
<attvalue id="url" type="string" value="http://gephi.org">
</attvalues>
</node>and another one: <node id="43" label="node B">
<attvalues>
<attvalue id="url" type="double" value="4.5">
</attvalues>
</node>with different types for the same attribute? Should this be ok/tolerated? Should this raise some kind of validation error? |
|
Thanks @Yomguithereal !
That's right. Let's leave this out for 1.3 version then to avoid overcomplexifying the parsers. It was a nice to have anyway. |
I think it should be a plain error. Also, although defining the attr type on the fly is possible, I think it is better to be consistent, you know, KISS.
Defining the nodes at the beginning has one big practical benefit: identifying errors faster. For example, I have often captured errors in some network datasets with ties to undeclared nodes. This may not be as important in small networks, but if you are analyzing a very large file, it could have some performance benefits. I imagine the parser processing an edge, and before continuing, checking against a hash table (not sure how the parser of Gephi is implemented) and making sure the nodes were declared; if not, then throw an error. |
|
@gvegayon Agreed. I changed the specs to make sure the order is enforced: attributes -> nodes -> edges. |
|
After today's discussion over Zoom and the latest tweaks I'm confident the specification and the primer is ready to be shipped. We can make some fixes in the documentation later. |
Attempt to create clean 1.3 specifications. Changelog is provided but for discussion purpose I'll post it here too:
Notes:
Possible improvements
Changelog
kindattribute onedgeto support multi-graph (i.e. parallel edges)weightis now adoubleinstead of afloatxsd:longas possibleidtypeon<graph>bigdecimal,biginteger,char,shortandbytelistbooleanorlistintegerfor each atomic typeDynamics
timezoneattribute on<graph>to use as a timezone in case it's omitted in the element timestampsstartopenandendopenare removed. Use regular inclusivestartandendinsteadmode,startandendattributes on<attributes>as it was redundant with<graph>attributesTimestamp support
Add the ability to represent time with single timestamps instead of intervals. We want feature parity between the two time representations but note they can't be mixed.
timerepresentationenum in<graph>with eitherinterval(default) ortimestampto configure the way the time is representedtimestampattribute to<node>,<edge>,<spell>and<attvalue>to support this new time representationAlternative to spell elements
timestampsattribute to<node>and<edge>to represent a list of timestamps without having to use spellsintervalsattribute to<node>and<edge>New slice mode
The optional
modeattribute on<graph>now has an additionalslicevalue, in addition ofstaticanddynamic. With slice, the expectation is that the<graph>also has either atimestamporstart/endintervals.timestampattribute on<graph>to characterise the slice this graph representstartandendattributes on<graph>to either characterise the slide instead of the time bounds, which should rather be inferredViz
hexattribute on<color>so it can support values like#FF00FFzposition is no longer requiredstart,endor child elements<spells>are no longer supported for viz attributes. To represent viz attributes over time, an alternative is to create multiple graphs each representing a slice