Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bipartition_tree and bipartition_tree_random have identical behavior #407

Closed
mkarrmann opened this issue Nov 17, 2022 · 2 comments
Closed

Comments

@mkarrmann
Copy link
Contributor

Once #406 is merged, then the only difference between bipartition_tree and bipartition_tree_random will be that bipartition_tree_random supports a repeat_until_valid parameter. However, the documentation indicates the intended difference is the bipartition_tree_random returns a random valid cut, while bipartition_tree returns the first valid cut.

I assume the motivation for this distinction is that bipartition_tree_random can be used want a more truly random graph partition, while bipartition_tree can be used if one is willing to tradeoff some randomness for a performance improvement (assuming that the balance_edge_fn support early escape once a valid cut is found, which it seems none of the provided ones do).

I propose:

  • Adding a parameter to find_balanced_edge_cuts_contraction and find_balanced_edge_cuts_memoization to optionally early escape as soon as a valid cut is found. Default value of this parameter will be False to support backwards compatibility.

  • Call the balanced_edge_fn with early_escape set to true in bipartition_tree and false in bipartition_tree_random.

Two considerations:

  1. By changing the expected interface for balanced_edge_fns, this technically breaks backwards compatibility, e.g. if someone was running a recom chain with a partition function that used bipartition_tree but with a custom balanced_edge_fn, then their chain would break. I'm not sure if that introduces any specific concerns. Needless this to say, this is a very rare use case, so I assume isn't a major concern unless Gerrychain was strict policies around versioning semantics.
  2. Using the first valid cut as opposed to a random valid cut introduces an interesting bias, and it's not completely obvious to me that this bias isn't trivial. Additionally, the documentation doesn't make this intended behavior clear, so many may not realize they are introducing this bias, especially considering bipartition_tree is used by default by Recom. My preferred solution would be to make bipartition_tree_random the default used by Recom, and update the docstrings to make the intended distinction between bipartition_tree_random and bipartition_tree more clear. Do others agree?
@peterrrock2
Copy link
Collaborator

Thank you for bringing this to our attention! There is now a difference between these two functions in v0.3.1 based on the new region-aware capabilities of GerryChain.

@mkarrmann
Copy link
Contributor Author

@peterrrock2 Nice! I'll check that out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants