Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in caching: Boolean value of tensor comparison is ambiguous #28

Closed
monatis opened this issue Feb 9, 2022 · 10 comments
Closed

Bug in caching: Boolean value of tensor comparison is ambiguous #28

monatis opened this issue Feb 9, 2022 · 10 comments
Assignees
Labels
bug Something isn't working

Comments

@monatis
Copy link
Contributor

monatis commented Feb 9, 2022

I think this is an edge case. when we do if sample.obj not in unique_objects in fetch_unique_objects of data loader classes, it may return a tensor of Trues and Falses if the individual elements in the tensor (sample.obj) are equal in certain indexes and not equal in others. When we want to use that returned tensor in a conditional expression, it tries to reduce it to a single boolean value, but the combination of multiple Trues and Falses is ambiguous, so it throws a runtime error:

...
  File "C:\Users\Yusuf\coding\quaterion\quaterion\dataset\cache_data_loader.py", line 68, in cache_collate_fn
    unique_objects = self.unique_objects_extractor(batch)
  File "C:\Users\Yusuf\coding\quaterion\quaterion\dataset\similarity_data_loader.py", line 42, in fetch_unique_objects
    if sample.obj_b not in unique_objects:
RuntimeError: Boolean value of Tensor with more than one value is ambiguous
Predicting:   0%|          | 0/387 [00:00<?, ?it/s]

Possible solution

  1. Create another container for unique hashes in addition to unique_objects, e.g.: unique_hashes.
  2. If hash(obj) not in the new container, add the hash there and add the object to unique_objects.
  3. Return unique_objects as usual.
    We can even return list of hashes to further use them later, maybe.
@monatis monatis added the bug Something isn't working label Feb 9, 2022
@joein joein self-assigned this Feb 10, 2022
@joein
Copy link
Member

joein commented Feb 11, 2022

Unfortunately, we can't just calculate hash here, because objects can be not hashable, so we need to pass key_extractor_fn somehow.
Also we can have several key_extractor_fn because we can have several encoders, thus we need to apply each of them or apply one of them, which could be arguable. And we do the same thing in CacheDataLoader.

My proposal is to replace fetch_unique_objects with just emit_objects with approximately such implementations:

For PairSimilarityDataLoader:

@classmethod
def emit_object(cls, batch: List[SimilarityPairSample]) -> Any:
    for sample in batch:
        yield sample.obj_a
        yield sample.obj_b

For GroupSimilarityDataLoader:

@classmethod
def emit_object(cls, batch: List[SimilarityPairSample]) -> Any:
    for sample in batch:
        yield sample.obj

@monatis
Copy link
Contributor Author

monatis commented Feb 11, 2022

replace fetch_unique_objects with just emit_objects

In this case, we will lose the whole functionality to prevent multiple calculation, won't we?

we can't just calculate hash here,

Ok, so what about this one:

def obj_in_list(obj, obj_list):
    try:
        return obj in obj_list
    except:  # we caught the reported exception here, so assume obj is not in list to add it anyway
        return False

now we can use it like if not obj_in_list(sample.obj, unique_objects):.
This will work as usual unless we hit the reported bug, but it will still safely process that object if we do. WDYT?

@joein
Copy link
Member

joein commented Feb 12, 2022

In this case, we will lose the whole functionality to prevent multiple calculation, won't we?

Actually only part of it. Now the flow is like:

  • fetch unique objects in batch
  • for every unique object calculate its key via key extractors
  • if calculated key has not been in current dataloader, calculate its embeddings, otherwise do nothing

So actually fetch_unique_objects prevents us from repeated key calculation, which I guess not that crucial

@joein
Copy link
Member

joein commented Feb 12, 2022

Ok, so what about this one:

It can be a solution, need to look more thoroughly into this

@joein
Copy link
Member

joein commented Feb 12, 2022

We discussed and here are to possible solutions we figured out:

  • torch.Tensors, np.ndarray and everything built upon them doomed to be broken (e.g. pandas objects), the first ones will raise RuntimeError, the second - ValueError. We can suppress this exceptions in method like obj_in_list and wait for feedback if they are not sufficient.
  • replace fetch_unique_objects with emit_objects like discussed earlier in this thread

@generall

@monatis
Copy link
Contributor Author

monatis commented Feb 12, 2022

And here's the minimal code to reproduce this bug:

import numpy as np
import torch

l = []
t1 = torch.from_numpy(np.array([1, 2, 3]))  # remove `torch.from_numpy()` for the numpy version
t2 = torch.from_numpy(np.array([1, 2, 2]))
ts = [t1, t2]

for t in ts:
    if t not in l:
        l.append(t)

print("everything fine")

@monatis
Copy link
Contributor Author

monatis commented Feb 12, 2022

Also another note on strange behaviors of tensors we figured out: there is no hash collision even for two tensors with the same values because Tensor.__hash__ hashes by id(tensor).

import numpy as np
import torch

# create two tensors with the same values
t1 = torch.from_numpy(np.array([1, 2, 3]))
t2 = torch.from_numpy(np.array([1, 2, 3]))

d = {hash(t1): "some value"}
print(hash(t2) in d)  # this is False to our surprise

d = {t1: "some value"}
print(t2 in d)  # this is also False

# only this one is True
print(t1 in d)

@generall
Copy link
Member

what could help in this discussion for sure - tests with examples for reproduction

@joein
Copy link
Member

joein commented Feb 14, 2022

The reason of exception is in the way in operator works. It compares new object with those already in collection. It checks If another couple of objects are the same object (obj_a is obj_b) or check if they are equal via == (that's the place where exception occurs, tensors and similar objects don't support this way of comparison).

If instead of raw tensor we will pass dict like

d = {
            "value": torch.Tensor(...)
            "path_to_image": "source/path/to/image.png",
        }

Then if value being compared first - it results in the same exception again.

So we can't fetch unique objects from batch only with wrapping it in dict, maybe we need some special class like the following to handle such cases.

class ComparableClass:
    def __init__(self, comparison_feature, value):
        self.comparison_feature = ...
        self.value = torch.tensor(...)

    def __eq__(self, other):
        return self.comparison_feature == other.comparison_feature

    # we can provide default hash implementation here as well

The alternative for this could be rejecting the idea of fetching unique objects from batch.
In this case we can handle complicated objects via dict and custom key_extractor_fn and successfully use cache.
The drawback is that we need to extract key from each object from batch for each encoder, but I don't think that it is that crucial.

@monatis
Copy link
Contributor Author

monatis commented Feb 22, 2022

Fixed in #34

@monatis monatis closed this as completed Feb 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants