Bug in caching: Boolean value of tensor comparison is ambiguous #28

monatis · 2022-02-09T15:51:18Z

I think this is an edge case. when we do if sample.obj not in unique_objects in fetch_unique_objects of data loader classes, it may return a tensor of Trues and Falses if the individual elements in the tensor (sample.obj) are equal in certain indexes and not equal in others. When we want to use that returned tensor in a conditional expression, it tries to reduce it to a single boolean value, but the combination of multiple Trues and Falses is ambiguous, so it throws a runtime error:

...
  File "C:\Users\Yusuf\coding\quaterion\quaterion\dataset\cache_data_loader.py", line 68, in cache_collate_fn
    unique_objects = self.unique_objects_extractor(batch)
  File "C:\Users\Yusuf\coding\quaterion\quaterion\dataset\similarity_data_loader.py", line 42, in fetch_unique_objects
    if sample.obj_b not in unique_objects:
RuntimeError: Boolean value of Tensor with more than one value is ambiguous
Predicting:   0%|          | 0/387 [00:00<?, ?it/s]

Possible solution

Create another container for unique hashes in addition to unique_objects, e.g.: unique_hashes.
If hash(obj) not in the new container, add the hash there and add the object to unique_objects.
Return unique_objects as usual.
We can even return list of hashes to further use them later, maybe.

The text was updated successfully, but these errors were encountered:

joein · 2022-02-11T22:15:26Z

Unfortunately, we can't just calculate hash here, because objects can be not hashable, so we need to pass key_extractor_fn somehow.
Also we can have several key_extractor_fn because we can have several encoders, thus we need to apply each of them or apply one of them, which could be arguable. And we do the same thing in CacheDataLoader.

My proposal is to replace fetch_unique_objects with just emit_objects with approximately such implementations:

For PairSimilarityDataLoader:

@classmethod
def emit_object(cls, batch: List[SimilarityPairSample]) -> Any:
    for sample in batch:
        yield sample.obj_a
        yield sample.obj_b

For GroupSimilarityDataLoader:

@classmethod
def emit_object(cls, batch: List[SimilarityPairSample]) -> Any:
    for sample in batch:
        yield sample.obj

monatis · 2022-02-11T23:01:18Z

replace fetch_unique_objects with just emit_objects

In this case, we will lose the whole functionality to prevent multiple calculation, won't we?

we can't just calculate hash here,

Ok, so what about this one:

def obj_in_list(obj, obj_list):
    try:
        return obj in obj_list
    except:  # we caught the reported exception here, so assume obj is not in list to add it anyway
        return False

now we can use it like if not obj_in_list(sample.obj, unique_objects):.
This will work as usual unless we hit the reported bug, but it will still safely process that object if we do. WDYT?

joein · 2022-02-12T06:39:14Z

In this case, we will lose the whole functionality to prevent multiple calculation, won't we?

Actually only part of it. Now the flow is like:

fetch unique objects in batch
for every unique object calculate its key via key extractors
if calculated key has not been in current dataloader, calculate its embeddings, otherwise do nothing

So actually fetch_unique_objects prevents us from repeated key calculation, which I guess not that crucial

joein · 2022-02-12T06:46:07Z

Ok, so what about this one:

It can be a solution, need to look more thoroughly into this

joein · 2022-02-12T08:10:53Z

We discussed and here are to possible solutions we figured out:

torch.Tensors, np.ndarray and everything built upon them doomed to be broken (e.g. pandas objects), the first ones will raise RuntimeError, the second - ValueError. We can suppress this exceptions in method like obj_in_list and wait for feedback if they are not sufficient.
replace fetch_unique_objects with emit_objects like discussed earlier in this thread

@generall

monatis · 2022-02-12T08:24:45Z

And here's the minimal code to reproduce this bug:

import numpy as np
import torch

l = []
t1 = torch.from_numpy(np.array([1, 2, 3]))  # remove `torch.from_numpy()` for the numpy version
t2 = torch.from_numpy(np.array([1, 2, 2]))
ts = [t1, t2]

for t in ts:
    if t not in l:
        l.append(t)

print("everything fine")

monatis · 2022-02-12T09:04:54Z

Also another note on strange behaviors of tensors we figured out: there is no hash collision even for two tensors with the same values because Tensor.__hash__ hashes by id(tensor).

import numpy as np
import torch

# create two tensors with the same values
t1 = torch.from_numpy(np.array([1, 2, 3]))
t2 = torch.from_numpy(np.array([1, 2, 3]))

d = {hash(t1): "some value"}
print(hash(t2) in d)  # this is False to our surprise

d = {t1: "some value"}
print(t2 in d)  # this is also False

# only this one is True
print(t1 in d)

generall · 2022-02-12T09:17:34Z

what could help in this discussion for sure - tests with examples for reproduction

joein · 2022-02-14T12:02:37Z

The reason of exception is in the way in operator works. It compares new object with those already in collection. It checks If another couple of objects are the same object (obj_a is obj_b) or check if they are equal via == (that's the place where exception occurs, tensors and similar objects don't support this way of comparison).

If instead of raw tensor we will pass dict like

d = {
            "value": torch.Tensor(...)
            "path_to_image": "source/path/to/image.png",
        }

Then if value being compared first - it results in the same exception again.

So we can't fetch unique objects from batch only with wrapping it in dict, maybe we need some special class like the following to handle such cases.

class ComparableClass:
    def __init__(self, comparison_feature, value):
        self.comparison_feature = ...
        self.value = torch.tensor(...)

    def __eq__(self, other):
        return self.comparison_feature == other.comparison_feature

    # we can provide default hash implementation here as well

The alternative for this could be rejecting the idea of fetching unique objects from batch.
In this case we can handle complicated objects via dict and custom key_extractor_fn and successfully use cache.
The drawback is that we need to extract key from each object from batch for each encoder, but I don't think that it is that crucial.

monatis · 2022-02-22T07:57:08Z

Fixed in #34

monatis added the bug Something isn't working label Feb 9, 2022

joein self-assigned this Feb 10, 2022

monatis closed this as completed Feb 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in caching: Boolean value of tensor comparison is ambiguous #28

Bug in caching: Boolean value of tensor comparison is ambiguous #28

monatis commented Feb 9, 2022

joein commented Feb 11, 2022

monatis commented Feb 11, 2022

joein commented Feb 12, 2022

joein commented Feb 12, 2022

joein commented Feb 12, 2022 •

edited

monatis commented Feb 12, 2022

monatis commented Feb 12, 2022

generall commented Feb 12, 2022

joein commented Feb 14, 2022

monatis commented Feb 22, 2022

Bug in caching: Boolean value of tensor comparison is ambiguous #28

Bug in caching: Boolean value of tensor comparison is ambiguous #28

Comments

monatis commented Feb 9, 2022

Possible solution

joein commented Feb 11, 2022

monatis commented Feb 11, 2022

joein commented Feb 12, 2022

joein commented Feb 12, 2022

joein commented Feb 12, 2022 • edited

monatis commented Feb 12, 2022

monatis commented Feb 12, 2022

generall commented Feb 12, 2022

joein commented Feb 14, 2022

monatis commented Feb 22, 2022

joein commented Feb 12, 2022 •

edited