bipartition_tree and bipartition_tree_random have identical behavior #407

mkarrmann · 2022-11-17T18:33:46Z

Once #406 is merged, then the only difference between bipartition_tree and bipartition_tree_random will be that bipartition_tree_random supports a repeat_until_valid parameter. However, the documentation indicates the intended difference is the bipartition_tree_random returns a random valid cut, while bipartition_tree returns the first valid cut.

I assume the motivation for this distinction is that bipartition_tree_random can be used want a more truly random graph partition, while bipartition_tree can be used if one is willing to tradeoff some randomness for a performance improvement (assuming that the balance_edge_fn support early escape once a valid cut is found, which it seems none of the provided ones do).

I propose:

Adding a parameter to find_balanced_edge_cuts_contraction and find_balanced_edge_cuts_memoization to optionally early escape as soon as a valid cut is found. Default value of this parameter will be False to support backwards compatibility.
Call the balanced_edge_fn with early_escape set to true in bipartition_tree and false in bipartition_tree_random.

Two considerations:

By changing the expected interface for balanced_edge_fns, this technically breaks backwards compatibility, e.g. if someone was running a recom chain with a partition function that used bipartition_tree but with a custom balanced_edge_fn, then their chain would break. I'm not sure if that introduces any specific concerns. Needless this to say, this is a very rare use case, so I assume isn't a major concern unless Gerrychain was strict policies around versioning semantics.
Using the first valid cut as opposed to a random valid cut introduces an interesting bias, and it's not completely obvious to me that this bias isn't trivial. Additionally, the documentation doesn't make this intended behavior clear, so many may not realize they are introducing this bias, especially considering bipartition_tree is used by default by Recom. My preferred solution would be to make bipartition_tree_random the default used by Recom, and update the docstrings to make the intended distinction between bipartition_tree_random and bipartition_tree more clear. Do others agree?

The text was updated successfully, but these errors were encountered:

peterrrock2 · 2024-03-02T00:17:42Z

Thank you for bringing this to our attention! There is now a difference between these two functions in v0.3.1 based on the new region-aware capabilities of GerryChain.

mkarrmann · 2024-03-02T00:30:03Z

@peterrrock2 Nice! I'll check that out!

peterrrock2 closed this as completed Mar 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bipartition_tree and bipartition_tree_random have identical behavior #407

bipartition_tree and bipartition_tree_random have identical behavior #407

mkarrmann commented Nov 17, 2022

peterrrock2 commented Mar 2, 2024

mkarrmann commented Mar 2, 2024

bipartition_tree and bipartition_tree_random have identical behavior #407

bipartition_tree and bipartition_tree_random have identical behavior #407

Comments

mkarrmann commented Nov 17, 2022

peterrrock2 commented Mar 2, 2024

mkarrmann commented Mar 2, 2024