Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: weights.W.asymmetry does not support custom ids #540

Closed
martinfleis opened this issue Jul 30, 2023 · 11 comments · Fixed by #543
Closed

BUG: weights.W.asymmetry does not support custom ids #540

martinfleis opened this issue Jul 30, 2023 · 11 comments · Fixed by #543
Assignees
Labels
bug functionality that: returns invalid, erroneous, or meaningless results; or doesn't work at all. weights

Comments

@martinfleis
Copy link
Member

martinfleis commented Jul 30, 2023

>>> neighbors={"a":["b","c","d"], "b":["b","c","d"], "c":["a","b"], "d":["a","b"]}
>>> weights_d={"a":[1,1,1], "b":[1,1,1], "c":[1,1], "d":[1,1]}
>>> w=weights.W(neighbors,weights_d)
>>> w.asymmetry()
[(0, 1), (1, 0)]

This should be a list of (i,j) tuples, as per docstring but it is capturing integer ids coming from sparse.

It should be this:

[("a", "b"), ("b", "a")]
@jGaboardi jGaboardi self-assigned this Aug 6, 2023
@jGaboardi jGaboardi added bug functionality that: returns invalid, erroneous, or meaningless results; or doesn't work at all. weights labels Aug 6, 2023
@jGaboardi
Copy link
Member

jGaboardi commented Aug 6, 2023

Locally I've added the following to after line 1280:

if isinstance(self.id_order[0], str):
    id_order_array = np.asarray(self.id_order)
    ids = tuple(id_order_array[ids[i]] for i in (0, 1))

This returns the correct ID labels if the labels are strings. Another thing we may consider though is if IDs are not strings, but numeric values that are not a 0-indexed range...

@martinfleis Thoughts on this?

@jGaboardi
Copy link
Member

Tests pass locally when simply leaving out the conditional:

id_order_array = np.asarray(self.id_order)
ids = np.nonzero(wd)
ids = tuple(id_order_array[ids[i]] for i in (0, 1))

@martinfleis
Copy link
Member Author

Another thing we may consider though is if IDs are not strings, but numeric values that are not a 0-indexed range...

That is another valid case we should support. This one can be especially confusing. Your W is indexed by int, you get back int but it refers to position, not the value.

@jGaboardi
Copy link
Member

2 more complications with indices that are not both (a) monotonically increasing integers and (b) 0-based:

  • With the above array-based solution, if there are mixed-numeric types all values will be coerced to floats. This is probably a super edge case, but it exists nonetheless.
  • The sorting that is happening in line L1285 may not give the intended result (and does the even matter?)

@jGaboardi
Copy link
Member

Perhaps simply adding an i2id property (an inverse of id2i) and use that to translate the ijs object would be a prudent solution.

@martinfleis
Copy link
Member Author

Or just inverting id2i on the fly?

@jGaboardi
Copy link
Member

jGaboardi commented Aug 7, 2023

This was my first thought, and absolutely the most simple solution. Actually doing the on-the-fly translation after the sorting on L1285 along with setting the id_order upon instantiation seems to resolve the issues I listed above and allows for mixed string-numeric dtype indices.

Two more things to think about:

  • Are we sure, at the moment at least, that there would be no other use for being able to create and cache an i2id object in the current weights structure?
  • Since the on-the-fly translation is already happening in the ongoing W to Graph restructuring, perhaps we need to give further thought to the sorting that happens there, too? (Along with supporting mixed types in indices? Or no?)

I have PR reading to push up for this, BTW.

@martinfleis
Copy link
Member Author

Are we sure, at the moment at least, that there would be no other use for being able to create and cache an i2id object in the current weights structure?

I am not sure how much energy I want to give thinking about future of W to be fair.

Since the on-the-fly translation is already happening in the ongoing W to Graph restructuring, perhaps we need to give further thought to the sorting that happens there, too? (Along with supporting mixed types in indices? Or no?)

Does the sorting matter?

@jGaboardi
Copy link
Member

I am not sure how much energy I want to give thinking about future of W to be fair.

Same and agreed at this point

Does the sorting matter?

I might say yes and no, especially for a custom ID case, like non-monotonic integers, where the index may not only not be increasing, but also out of "order". However, so long as we are clear in the docstring that a sorting is happening (which it currently is not), I suppose it wouldn't matter.

@martinfleis
Copy link
Member Author

So you just expect the output of asymmetry to be a subset of pairs in their original order, correct?

@jGaboardi
Copy link
Member

jGaboardi commented Aug 7, 2023

As a user reading the docstring, I would not expect the returned list to sorted. And since the Example uses a standard monotonic integer index, it can't be inferred from context alone. I think probably you are correct that it doesn't matter that sorting should happen, but it should be clear that it is happening. If you agree, I will update #543 (code, tests, and docstring).

Perhaps we should get other devs opinion on this? @knaaptime, @ljwolf, @sjsrey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug functionality that: returns invalid, erroneous, or meaningless results; or doesn't work at all. weights
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants