Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.3.1 legend related scales in viz + title in meta #18

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

paulgirard
Copy link

@paulgirard paulgirard commented Aug 31, 2022

This 1.3.1 proposal aims at adding to GEXF data to document graph drawing with legend and title.
GEXF viewer tools (such as https://gitlab.com/ouestware/retina/) can't so far indicate what is the logic behind the visual aspects without reverse engineering the viz parameters/node attributes sets with some heuristic magic.
A good graph drawing often also starts with a good title on top of description.

Therefore this proposals has two main parts:

  • add a title element in the GEXF meta
  • extend the viz module to add ways to describe how the viz parameters were calculated from node/edge attributes. It adds ways to store the ranking/partition parameters and layout settings used in Gephi or in other GEXF producers. It has primarily a documentation objective but the current specs looks complete enough to allow drawing tools to not only draw a legend but also recompute the viz parameters from attributes.

To get a more precise idea of the proposal see:

ping @duncdrum and @gvegayon for comments.

Ideas to be discussed before writting proper relaxng
a workaround a multiple include issue with common in viz and gexf
- use a common rnc to chare type declarations in gexf and viz
- legend/scale features in viz
- one example to test validation
@mbastian mbastian linked an issue Sep 1, 2022 that may be closed by this pull request
@paulgirard
Copy link
Author

In the layout documentation, the current specs use a layoutalgorithm attribute which is a string.
This suppose that GEXF related tools agrees on a layout algorithm name convention. Not only the algorithm name but also the parameters.
Since the primary objective is only documentation that's probably fine. Recomputing the layout would require to recognize layout algo and parameters names...

Copy link
Member

@jacomyma jacomyma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I just have 2 observations.

Scales

Two systems coexist for the scale factors (splines):

  • 10 data points
  • the name (ex: "square-root")

When it comes to generate the scale from a device, I assume that neither the name nor the 10 data points are an issue. If the goal were just to document, we could reasonably stop there. But one of the motives for the proposal is to use these data as inputs for some tools (GEXF viewers). Let us then consider the reading side of the question.

A tool might just read and write the 10 data points. It would work well aside from the "name" attribute being essentially useless. But some tools like current Gephi (0.9) work with native curves. Infering the curve after the data points only seems unreasonable to me, but it could work with the name. The settings, however, are not included. Here is the important remark: in most cases, a single data point suffices to infer the settings. For instance, for a power-law with a variable exponent, we can retrieve the exact equation from the name "power-law" and the single scalepoint at 0.5. For that nonobvious reason, I think adding a field for the settings, in addition to the name of the spline, is not necessary.


Layout

I understand where the logic comes from, and it is an OK compromise, but let me highlight that the final node coordinates are not necessarily the fruit of just one layout, and that the last layout applied is not necessarily the most relevant. Ex: Force Atlas 2 + label adjust as a finishing touch. Ultimately, it is each software's job to be clever about what it retains. Retaining everything does not make sense either (ex: a random layout "erases" whatever happened before). Current proposition covers the case of a single layout algorithm well, which is the most important, so I still support it. Edge cases will arise but I do not have a better idea.

@paulgirard
Copy link
Author

Thank you @jacomyma

Scales

I added the scalelabel for documentation (I mean for human read). I agree the scalepoint are not optimal to recreate the curve but it would work for any curve even complex non-function based one. An alternative would be to agree on a function expression language or a finite list of frequently used method. My opinion, the former could be an option the later feels too limited.

layout

Good point. I would propose to extend the layout element to allow to host a list of layouts rather than just one. The order is important but I guess we can use the order of XML children.

<viz:positions>
   <viz:layout algorithm="forceatlas2" referenceURL="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0098679">
       <viz:param name="scale" type="integer" value="10"/>
       <viz:param name="stronger gravity" type="boolean" value="true"/>
   </viz:layout>
   <viz:layout algorithm="nooverlap">
       <viz:param name="speed" type="integer" value="3"/>
       <viz:param name="ratio" type="float" value="1.2"/>
       <viz:param name="margin" type="float" value="5.0"/>
    </viz:layout>
 </viz:positions>

What do you think?

@duncdrum
Copy link
Contributor

duncdrum commented Sep 2, 2022

@paulgirard while we could rely on sequence position, the actual order of steps is kind of important, why not allow for an optional @step element that takes xs:positiveInteger as values. E.g.:

<viz:positions>
   <viz:layout algorithm="forceatlas2" referenceURL="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0098679" step="1">
       <viz:param name="scale" type="integer" value="10"/>
       <viz:param name="stronger gravity" type="boolean" value="true"/>
   </viz:layout>
   <viz:layout algorithm="nooverlap" step="2">
       <viz:param name="speed" type="integer" value="3"/>
       <viz:param name="ratio" type="float" value="1.2"/>
       <viz:param name="margin" type="float" value="5.0"/>
    </viz:layout>
 </viz:positions>

@paulgirard
Copy link
Author

Indeed, explicit is always better than implicit. I am adding this too.

@gvegayon
Copy link
Member

gvegayon commented Sep 2, 2022

@paulgirard, thanks for this; it is looking very useful. I like @duncdrum's idea about step, especially in reproducible research. Now, the challenge will be on the Gephi side, which sequence of steps to store. Thinking out loud here, before saving GEXF files, Gephi could show the user the last n layout changes and select which ones to store; but that's a problem for later, I guess.

I also like @duncdrum's idea about using a common math language for the scalelabel attribute. Since all is going web, I would suggest something like JavaScript's Math. In such a case, it could be beneficial to perhaps define attributes as functions, for example, instead of:

<viz:sizes scale=”quantitative” scalelabel=”square−root”>
  <viz:scalepoint forratio=”0” factor=”0” />
  <viz:scalepoint forratio=”0.1” factor=”0.316227766”/>
  <viz:scalepoint forratio=”0.2” factor=”0.447213595”/>
  <viz:scalepoint forratio=”0.3” factor=”0.547722558”/>
  <viz:scalepoint forratio=”0.4” factor=”0.632455532”/>
  <viz:scalepoint forratio=”0.5” factor=”0.707106781”/>
  <viz:scalepoint forratio=”0.6” factor=”0.774596669”/>
  <viz:scalepoint forratio=”0.7” factor=”0.836660027”/>
  <viz:scalepoint forratio=”0.8” factor=”0.894427191”/>
  <viz:scalepoint forratio=”0.9” factor=”0.948683298”/>
  <viz:scalepoint forratio=”1.0” factor=”1”/>
  <viz:range min=”1” max=”10” default=”1” />
</viz:sizes>

Do

<viz:sizes scale=”function” scalelabel=”square−root”>
    function(x) {
      return Math.sqrt((x-1)/(10-1));
    }
</viz:sizes>

I am no expert on XML, but having something like this would be super. Is this something worth implementing?

@paulgirard
Copy link
Author

paulgirard commented Sep 5, 2022

Thank you @gvegayon
I don't think accepting plain JavaScript is a good idea as it opens code injection security risks and it supposes to chose/promote one programming language into a neutral data format.

Ideally such expression should be mathematics only.
To take your example it should reduce to:

sqrt((x-1)/(10-1))

I can't find a standard for mathematical expression syntax targeting evaluation and not rendering (MathML is for rendering).
(note: the shunting yard algo is for parsing and ordering tokens and not a math language standard https://en.wikipedia.org/wiki/Shunting_yard_algorithm)

In my opinion such a mathematical expression should be easy to evaluate in: Java, Python and JavaScript worlds.
It looks like every math expression evaluation library is using its own syntax without pointing to a standard:

But maths main functions looks like having the same name is those examples. Which means that we should check/document what are the supported math functions for this expression after having check they are common to most frequently used implementations...
Doable but not exactly exciting. Or to put differently looks like a more complicated not bringing much more than a set of common mathematic functions we add to the GEXF format.

To finish we should keep in mind that this would require GEXF producer/consumer such as Gephi to implement math expressions production/evaluation. So we should evaluate the ease of use of our representation choice in this regards.

To finish on this here are the so far encountered possible ways to describe a quantitative scale non-linear function in GEXF:

  • a finite list of common mathematical functions (log, sqrt, pow...) to add in GEXF format
  • the GEPHI spline solution : two points in 0-1 0-1 space defining a bézier curve from 0,0 to 1,1
  • a discrete version of the curve (what is in the current proposal): finite list of normalization curve points
  • a mathematic expression as discussed in this comment

At this point my personal feeling is to chose a finite list of common math functions (D3 does that : https://github.com/d3/d3-scale#continuous-scales) or splines (already implemented in Gephi and flexible).

@duncdrum
Copy link
Contributor

duncdrum commented Sep 5, 2022

@paulgirard since we are talking about gexf as data format, is there anything missing from Xpath math functions? https://www.w3.org/2005/xpath-functions/math/#fo-math-summary I'd say these would be a more natural fit than Java or a custom syntax. Any xpath processor would be able to handle these already.

just to note not having math expressions is not a showstopper for me.

@Yomguithereal
Copy link

I tend to agree with @paulgirard personally and would be happy with only well-known, parametrizable, scale options following d3 etc. such as pow, log, lin and sqrt. I would go as far as using the splines for Gephi compat and if you need more complexity but I draw the line at custom math expressions as it would introduce too much complexity and potential hurdles. I am not very fond of curve discretization with points (but it could be helpful with color and their strange spaces).

@gvegayon
Copy link
Member

gvegayon commented Sep 6, 2022

Good points, @paulgirard! The thing about personalized math functions, @Yomguithereal, is mostly about flexibility. In general, I like building tools/standards that provide some wiggle room for things I have not thought of. Nevertheless, I also appreciate having a well-encapsulated file format! On a related note, the NeXML file format (for phylogenics) includes a meta tag that allows adding arbitrary annotations.

That said, I agree with your last comment, @paulgirard,

At this point my personal feeling is to chose a finite list of common math functions (D3 does that : https://github.com/d3/d3-scale#continuous-scales) or splines (already implemented in Gephi and flexible).

@mbastian
Copy link
Member

mbastian commented Sep 7, 2022

+1 on supporting a finite set of common functions, in addition of the splines for compatibility. If we have this, do we really need to support the discrete version?

- removed scalepoint
- added transform as a finite list of math functions or spline definition
- primer and readme not updated yet
@paulgirard
Copy link
Author

Thank you all.
As we converged to a solution I updated the proposal :

  • added pow, sqrt, log10, log, exp, exp10 transform functions
  • added spline
  • removed discretized solution
      <attribute id="degree" title="Degree" type="integer">
        <default>0</default>
        <viz:sizes scale="quantitative" scalelabel="square-root">
          <viz:transform>
            <viz:sqrt />
          </viz:transform>
          <viz:range min="1" max="10" default="1" />
        </viz:sizes>
      </attribute>
      <attribute id="size" title="Size" type="integer">
        <default>0</default>
        <viz:sizes scale="quantitative" scalelabel="square-root">
          <viz:transform>
            <viz:pow exponent="2"/>
          </viz:transform>
          <viz:range min="1" max="25" default="1" />
        </viz:sizes>
      </attribute>
      <attribute id="pagerank" title="Page Rank" type="integer">
        <default>0</default>
        <viz:sizes scale="quantitative" scalelabel="spline">
          <viz:transform>
            <viz:spline>
              <viz:origin-control-point x="0.6" y="0.01"/>
              <viz:destination-control-point x="0.8" y="0.9" />
            </viz:spline>
          </viz:transform>
          <viz:range min="1" max="5" default="1" />
        </viz:sizes>
      </attribute>

What do you think?

I am waiting for some approvals before updating the primer.

@paulgirard
Copy link
Author

ps: I couldn't find a way to reuse XPATH math function definition as XMl specs are very new to me. If anyone think there is a better way to specify math transform function please let my know 🙏

@gvegayon
Copy link
Member

Thank you, @paulgirard! Question: How do <default>0</default> and <viz:range min="1" max="5" default="1" /> coexist (honest question)?

@duncdrum
Copy link
Contributor

@paulgirard just saw this, I ll try to have a fork of your PR ready with xpath math before the weekend.

@mbastian
Copy link
Member

@paulgirard One thought about degree columns. Normally, a GEXF wouldn't include a degree, in-degree, out-degree or edge kind columns as those directly depend on the graph so not really needed to have it as an attribute. We would't plan to export those columns in GEXF via Gephi for instance. But if a legend is based on the degree column we should still include it somehow, right? What do you suggest?

@paulgirard paulgirard mentioned this pull request Oct 25, 2022
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GEXF Legend
6 participants