-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in caching: Boolean value of tensor comparison is ambiguous #28
Comments
Unfortunately, we can't just calculate hash here, because objects can be not hashable, so we need to pass My proposal is to replace For @classmethod
def emit_object(cls, batch: List[SimilarityPairSample]) -> Any:
for sample in batch:
yield sample.obj_a
yield sample.obj_b For @classmethod
def emit_object(cls, batch: List[SimilarityPairSample]) -> Any:
for sample in batch:
yield sample.obj |
In this case, we will lose the whole functionality to prevent multiple calculation, won't we?
Ok, so what about this one:
now we can use it like |
Actually only part of it. Now the flow is like:
So actually |
It can be a solution, need to look more thoroughly into this |
We discussed and here are to possible solutions we figured out:
|
And here's the minimal code to reproduce this bug: import numpy as np
import torch
l = []
t1 = torch.from_numpy(np.array([1, 2, 3])) # remove `torch.from_numpy()` for the numpy version
t2 = torch.from_numpy(np.array([1, 2, 2]))
ts = [t1, t2]
for t in ts:
if t not in l:
l.append(t)
print("everything fine") |
Also another note on strange behaviors of tensors we figured out: there is no hash collision even for two tensors with the same values because import numpy as np
import torch
# create two tensors with the same values
t1 = torch.from_numpy(np.array([1, 2, 3]))
t2 = torch.from_numpy(np.array([1, 2, 3]))
d = {hash(t1): "some value"}
print(hash(t2) in d) # this is False to our surprise
d = {t1: "some value"}
print(t2 in d) # this is also False
# only this one is True
print(t1 in d) |
what could help in this discussion for sure - tests with examples for reproduction |
The reason of exception is in the way If instead of raw tensor we will pass d = {
"value": torch.Tensor(...)
"path_to_image": "source/path/to/image.png",
} Then if So we can't fetch unique objects from batch only with wrapping it in class ComparableClass:
def __init__(self, comparison_feature, value):
self.comparison_feature = ...
self.value = torch.tensor(...)
def __eq__(self, other):
return self.comparison_feature == other.comparison_feature
# we can provide default hash implementation here as well The alternative for this could be rejecting the idea of fetching unique objects from batch. |
Fixed in #34 |
I think this is an edge case. when we do
if sample.obj not in unique_objects
infetch_unique_objects
of data loader classes, it may return a tensor ofTrue
s andFalse
s if the individual elements in the tensor (sample.obj
) are equal in certain indexes and not equal in others. When we want to use that returned tensor in a conditional expression, it tries to reduce it to a single boolean value, but the combination of multipleTrue
s andFalse
s is ambiguous, so it throws a runtime error:Possible solution
unique_objects
, e.g.:unique_hashes
.unique_objects
.unique_objects
as usual.We can even return list of hashes to further use them later, maybe.
The text was updated successfully, but these errors were encountered: