Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Allow QAP to accept adjacency matrices of different sizes #13034

Closed
wants to merge 7 commits into from

Conversation

asaadeldin11
Copy link
Contributor

Reference issue

What does this implement/fix?

This PR implements "padding" to the quadratic_assignment() function in scipy.optimize, allowing users to input matrices A and B of different sizes. Two different padding schemes are implemented here, and the user is given the option to choose between the two. adopted (default) padding matches the appropriate induced subgraphs, and naive matches the appropriate subgraphs. More information can be found in section 2.5 of "Seeded Graph Matching"

Additional information

@asaadeldin11 asaadeldin11 marked this pull request as ready for review November 10, 2020 15:27
@asaadeldin11
Copy link
Contributor Author

@mdhaber please take a look when you have a chance!

@@ -507,6 +519,29 @@ def _quadratic_assignment_faq(A, B,
return OptimizeResult(res)


def _adj_pad(A, B, method):
# pads the matrix with less nodes such that A & B are same size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by "less nodes"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would "fewer nodes" be more clear

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be the better word, but "fewer" relative to what?
Do you mean it pads the smaller of A and B to be the same sized as the larger?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Your wording is more clear, so I'll replace what I had written with that.

B = 2 * B - np.ones((B_n, B_n))

if A.shape[0] == n[0]:
A = pad(A, n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A = pad(A, n)
A = np.pad(A, (0, n[1]-n[0]))

Comment on lines 525 to 528
def pad(X, n):
X_pad = np.zeros((n[1], n[1]))
X_pad[: n[0], : n[0]] = X
return X_pad
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def pad(X, n):
X_pad = np.zeros((n[1], n[1]))
X_pad[: n[0], : n[0]] = X
return X_pad

if A.shape[0] == n[0]:
A = pad(A, n)
else:
B = pad(B, n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
B = pad(B, n)
B = np.pad(B, (0, n[1]-n[0]))

A_n = A.shape[0]
B_n = B.shape[0]
n = np.sort([A_n, B_n])
if method == "adopted":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Should the n's in the paper have subscripts?

@@ -507,6 +519,29 @@ def _quadratic_assignment_faq(A, B,
return OptimizeResult(res)


def _adj_pad(A, B, method):
# pads the matrix with less nodes such that A & B are same size
# schemes according to section 2.5 of [2]
Copy link
Contributor

@mdhaber mdhaber Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this does seem to do the following:

image

n = 50
p = 0.4
G1 = _er_matrix(n, p)
G2 = G1[: (n - 1), : (n - 1)] # remove two nodes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
G2 = G1[: (n - 1), : (n - 1)] # remove two nodes
G2 = G1[: -1, -1] # remove two nodes

Doesn't this remove one node?

@rgommers rgommers added enhancement A new feature or improvement scipy.optimize labels Nov 15, 2020
@jovo
Copy link

jovo commented Dec 8, 2020

@mdhaber let us know if you need anything else from else prior to reviewing this, thanks!

@mdhaber
Copy link
Contributor

mdhaber commented Dec 8, 2020

@jovo Thanks for your patience. It would help if we could improve the testing.

  1. First, some questions:
    • Can you explain how the existing test works? As I'm not very familiar with all this, it doesn't jump out to me what the assertion is after and how strong of a test that is (e.g. how wrong the code could be and still pass the test).
    • Also, why is an "Erdos-Renyi graph" used?
    • I haven't looked carefully - does the padding work correctly with all of the existing options (e.g. partial_match and P0)?
  2. Can the existing test be adapted so that more than one node is removed? Perhaps it doesn't matter at all, but without thinking carefully about what is going on, one might assume that a test where the inputs are nearly the same size is "easier" to pass than one in which the inputs are very different sizes.
  3. The existing test is only on col_ind . Is there some sort of test you can perform on the resulting objective function value (even if it's just a bound of some sort)? I know that may be difficult.
  4. Would it be possible to add a test in which you pass in a matrix that has been manually padded in a way that is obviously correct and, e.g., confirm that you get the same result as if you let quadratic_assignment do the padding?
  5. Ideally, can you add a test that validates against a published result or code? I know that's not always possible. If not, can you construct an example in which the optimal solution is obvious and check the result?
  6. Especially if we can't do 3-5, let's add a few tests that check the result of _adj_pad. I'm sure we can check that against an expected result.
  7. The current test only checks the case in which the second input needs to be padded. Can we add a test in which the first input needs to be padded? (It's ok if it's essentially the same but the inputs are swapped.)
  8. Please consider the ideas above for both naive and adopted. We need to test both.

The logic looks right; I'm just trying to think of ways we can confirm that this will do the right thing.

@jovo
Copy link

jovo commented Dec 8, 2020

@mdhaber no worries, I understand the (often thankless) difficulties of maintaining high quality packages. I'll talk to @asaadeldin11 and we'll address all those concerns right away, thanks!

padding and low-density subgraphs of `B`.

"naive" : matches `A` to the best fitting subgraph of `B`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe point the reader to more detail in Section 2.5 of [2].

@asaadeldin11
Copy link
Contributor Author

Thanks for the feedback!

  1. First, some questions:

    • Can you explain how the existing test works? As I'm not very familiar with all this, it doesn't jump out to me what the assertion is after and how strong of a test that is (e.g. how wrong the code could be and still pass the test).

This is a relatively weak test, it was basically just a sanity check to make sure nothing was broken. I agree that we need stronger tests, please see my suggestions below.

  • Also, why is an "Erdos-Renyi graph" used?

No particular reason, just that it is a simple graph model. The main strength of this padding is to find subgraphs in a larger graph, so it's applications are mainly in the graph matching side of things, though it does also work with the QAP

  • I haven't looked carefully - does the padding work correctly with all of the existing options (e.g. partial_match and P0)?

P0 now must be the size of the larger graph (among A and B) so I will add that to the docs. For partial_match, I will clarify in the type check that the indices must be within the sizes of original input graphs pre-padding (all entries of seedsA must less than len(A), as with B). Other than that these options work as intended.

  1. Can the existing test be adapted so that more than one node is removed? Perhaps it doesn't matter at all, but without thinking carefully about what is going on, one might assume that a test where the inputs are nearly the same size is "easier" to pass than one in which the inputs are very different sizes.

Sure. I can generate three 25 x 25 ER graphs of different probabilities (say 0.6, 0.1, 0.2), and construct A as the block graph [[0.6, 0.1],[0.1, 0.2]] and B as just 0.6, so A is 50 x 50 and B is 25 x 25. Then i would expect the nodes from 1-25 to map to each other from A to B.

  1. The existing test is only on col_ind . Is there some sort of test you can perform on the resulting objective function value (even if it's just a bound of some sort)? I know that may be difficult.

Yes, for the above test I can make sure that the objective function value is between 0 and 25^2 as well

  1. Would it be possible to add a test in which you pass in a matrix that has been manually padded in a way that is obviously correct and, e.g., confirm that you get the same result as if you let quadratic_assignment do the padding?

sure. I can make small graphs of different sizes and manually pad one, then just make sure they get the same objective function value, if you think that's sufficient?

  1. Ideally, can you add a test that validates against a published result or code? I know that's not always possible. If not, can you construct an example in which the optimal solution is obvious and check the result?

Since the padding section of the SGM paper is pretty short, the only result they show is essentially figure 3, which I recreated here. This is more of a structural/visual result, so I'm not sure if it would be appropriate to put it in the tests.

  1. Especially if we can't do 3-5, let's add a few tests that check the result of _adj_pad. I'm sure we can check that against an expected result.
  2. The current test only checks the case in which the second input needs to be padded. Can we add a test in which the first input needs to be padded? (It's ok if it's essentially the same but the inputs are swapped.)
  3. Please consider the ideas above for both naive and adopted. We need to test both.

Sounds good on the above three.

The logic looks right; I'm just trying to think of ways we can confirm that this will do the right thing.

@mdhaber Let me know if have comments on my above suggestions, thanks!

@mdhaber
Copy link
Contributor

mdhaber commented Dec 11, 2020

Thanks for the responses. Yeah, I think all that would help. Only comment is about:

objective function value is between 0 and 25^2 as well

Sure. I suppose I was hoping for something a little stronger. Isn't this algorithm supposed to get within twice the optimal objective function value or something? Maybe that's only for the graphs being the same size?

@asaadeldin11
Copy link
Contributor Author

Sure. I suppose I was hoping for something a little stronger. Isn't this algorithm supposed to get within twice the optimal objective function value or something? Maybe that's only for the graphs being the same size?

The actual objective function range in the tests will be much more narrow than this, I just wasn't exactly sure what those values would be off the top of my head. I just meant that the range would be between 0 and 25^2 since those are the absolute bounds (in the tests it will likely be something like 250-350 to be more exact)

@mdhaber
Copy link
Contributor

mdhaber commented Dec 11, 2020

I know. I just wonder whether you can provide a bound from theory that is tighter than the (what I called trivial) absolute bound.

@mdhaber
Copy link
Contributor

mdhaber commented Oct 8, 2022

@asaadeldin11 @jovo Were you still interested in this? I may be able to help finish it up now if the new tests are stronger.

On the other hand, we haven't received any bug reports, enhancement requests, or attention from other maintainers for QAP, so I'm not sure how much the enhancement would be used. We could consider keeping this on the backburner until there is more interest.

@jovo
Copy link

jovo commented Oct 19, 2022

@bdpedigo what do you think? you are in charge of this now :)

@mdhaber mdhaber closed this Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.optimize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants