Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain what flow_selection does in Bundles (for self-loops) #21

Open
2 tasks
ricklupton opened this issue Mar 13, 2018 · 7 comments
Open
2 tasks

Explain what flow_selection does in Bundles (for self-loops) #21

ricklupton opened this issue Mar 13, 2018 · 7 comments

Comments

@ricklupton
Copy link
Owner

When a Bundle has the same source and target, you need to specify which flows to include.

  • Explain this in the docs
  • Make the exception message more helpful
@ricklupton ricklupton added this to I'm using/want to use floWeaver to make Sankey diagrams in What shall I do?! Apr 25, 2018
@Talgat2
Copy link

Talgat2 commented Sep 27, 2018

Unfortunately I couldn't understand the exception message. Is there an explanation anywhere about how to code self-loops?

@ricklupton
Copy link
Owner Author

No, this needs better documentation. I'll try to give an example which might help, but really I think it needs a few diagrams to explain!

The issue is that it's ambiguous what you want. For example, with these flows:

start ---> a
a -> b
b -> a
b -> end

and these nodes:

nodes = {
    'start': ProcessGroup(['start']),
    'middle': ProcessGroup(['a', 'b']),
    'end': ProcessGroup(['end']),
}

Then it's ambiguous what to do with the a -> b and b -> a flows which are internal to the middle group. Do you want both flows to appear as loops? Just one? Just the other?

@Talgat2
Copy link

Talgat2 commented Oct 5, 2018

I will try to explain what I need based on these nodes.
The ordering is start, middle, end from left to write.
The forward flows I need are:
start -> a
start -> b
start -> end
a -> end
b -> end
The backward flows I need are:
a -> start
b -> start
Up to this point everything works as I need. Here is an example based on dummy data:
image
What I need now is to have kind of backward self flows:
start -> start
a -> a
b -> b
Just like in your paper:
image

@ricklupton
Copy link
Owner Author

I think you probably want to include all the self-flows. You can do that like this:

bundles = [
    ...
    Bundle('a', 'a', flow_selection='source != ""'),
    ...
]

Does that do what you want?

This isn't very obvious, I agree. Perhaps we could make one of the following changes:

  • If there's only one process in the ProcessGroup, it's not ambiguous and we can allow the flow_selection parameter not to be set. Perhaps that's your case?

  • Improve the exception message and add a special value flow_selection=ALL that does the same thing as the example above, but in a more readable way.

  • If there is exactly one self-loop Bundle, assume it should include all flows with the same source and target -- but this may not do what people expect if the ProcessGroup includes a lot of internal flows.

There's another example in the tests

@Talgat2
Copy link

Talgat2 commented Oct 8, 2018

First of all, I am sorry for the lack of proper editing.

Below is the code I have problems with, and a file with dummy data.

The last row in the file (speed,speed,back,4) is the instance of a self loop I need to visualize. Most of my ProcessGroups have several processes. By "self loop" I mean the flows between the same processes, not the flows within the same ProcessGroup between different processes.

In this example I have only one self loop (speed -> speed), but in the actual data most of the processes will have such self loops (e.g. main_menu -> main_menu, services -> services, etc.). It is the data about behavior of users in an IVR system, each process is a menu option and a self loop means a repeated listening of an option.

I have added a bundle with a flow_selection as you recommended, but it didn't work. I guess, I've missed something. Can you please take a look at my case?

File:
test_4.xlsx
This file is in xlsx format because I couldn't attach my original csv file.

Code:
from floweaver import *

size = dict(width=1140, height=600)

nodes = {
'mm': ProcessGroup(['main_menu']),
'level1': ProcessGroup(['price_plan', 'services', 'inet']),
'level2': ProcessGroup(['my_pp', 'other_pp', 'content', 'mfs', 'settings', 'speed']),
'level3': ProcessGroup(['out', 'oper']),
}

ordering = [
['mm'],
['level1'],
['level2'],
['level3'],
]

bundles = [
Bundle('mm', 'level1'),
Bundle('mm', 'level3'),
Bundle('level1', 'level2'),
Bundle('level1', 'mm'),
Bundle('level1', 'level3'),
Bundle('level2', 'level3'),
Bundle('level2', 'level1'),
Bundle('level2', 'mm'),
Bundle('level2', 'level2', flow_selection='source != ""')
]

level2_part = Partition.Simple('process', [
'my_pp',
'other_pp',
'content',
'mfs',
'settings',
'speed',
])

level1_part = Partition.Simple('process', [
'price_plan', 'services', 'inet'
])

level3_part = Partition.Simple('process', [
'out', 'oper'
])

nodes['level2'].partition = level2_part
nodes['level1'].partition = level1_part
nodes['level3'].partition = level3_part

sdd = SankeyDefinition(nodes, bundles, ordering)
weave(sdd, flows).to_widget(**size)

The resulting graph looks almost exactly like the graph of data without a self loop. Difference is highlighted.

New graph:
image

Graph of data without a self loop:
image

@ricklupton
Copy link
Owner Author

I think there's a bug in d3-sankey-diagram about how the self-loop is drawn -- it is there, but it has very small radius so you can't see it. The workaround is to use a Waypoint to make the self-loop bigger. I've edited your code to do this below, and also added in some Waypoints for the return flows (from stage2 to stage1 etc) that I think make it look neater.

from floweaver import *

size = dict(width=1140, height=600)

level2_part = Partition.Simple('process', [
    'my_pp',
    'other_pp',
    'content',
    'mfs',
    'settings',
    'speed',
])

level1_part = Partition.Simple('process', [
    'price_plan', 'services', 'inet'
])

level3_part = Partition.Simple('process', [
    'out', 'oper'
])

nodes = {
    'mm': ProcessGroup(['main_menu']),
    'level1': ProcessGroup(['price_plan', 'services', 'inet'], level1_part),
    'level2': ProcessGroup(['my_pp', 'other_pp', 'content', 'mfs', 'settings', 'speed'], level2_part),
    'level3': ProcessGroup(['out', 'oper'], level3_part),
    
    'level2_loop': Waypoint(direction='L'),
    
    'level1_return1': Waypoint(direction='L'),
    'level2_return2': Waypoint(direction='L'),
    'level2_return1': Waypoint(direction='L'),
    'mm_return': Waypoint(direction='L'),
}

# I added 2 "layers" to put the return flows more neatly at the bottom
ordering = [
    [['mm'], ['mm_return']],
    [['level1'], ['level1_return1', 'level2_return1']],
    [['level2'], ['level2_loop', 'level2_return2']],
    [['level3']],
]

bundles = [
    # These are the main flows through
    Bundle('mm', 'level1'),
    Bundle('mm', 'level3'),
    Bundle('level1', 'level2'),
    Bundle('level1', 'level3'),
    Bundle('level2', 'level3'),
    
    # Return flows to previous stages
    Bundle('level1', 'mm', waypoints=['level1_return1', 'mm_return']),
    Bundle('level2', 'mm', waypoints=['level2_return2', 'level2_return1', 'mm_return']),
    Bundle('level2', 'level1', waypoints=['level2_return2', 'level2_return1']),
    
    # Loops to same stage
    Bundle('level2', 'level2', flow_selection='source != ""', waypoints=['level2_loop'])
]

sdd = SankeyDefinition(nodes, bundles, ordering)
weave(sdd, flows).to_widget(**size)

image

I've left the waypoint titles visible so it's easier to see what's going on but you can hide them:

nodes = {
    'level2_loop': Waypoint(direction='L', title=''),
    # ...
}

image

@ricklupton
Copy link
Owner Author

Logged in ricklupton/d3-sankey-diagram#16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
What shall I do?!
I'm using/want to use floWeaver t...
Development

No branches or pull requests

2 participants