[BUG] `device` should be attribute on `SentenceEncoder` #33

CarloLepelaars · 2022-12-27T12:03:43Z

The device argument in SentenceEncoder is not defined as an attribute. This leads to bugs when using it with sklearn. I encountered attribute errors when trying to print out a Pipeline representation that has SentenceEncoder as a component.

Should be easy to fix by just adding self.device in SentenceEncoder.__init__. We can consider adding tests for text encoders so we can catch these errors beforehand.

The scikit-learn development docs make it clear every argument should be defined as an attribute:

every keyword argument accepted by init should correspond to an attribute on the instance. Scikit-learn relies on this to find the relevant attributes to set on an estimator when doing model selection.

Error message:
AttributeError: 'SentenceEncoder' object has no attribute 'device'.

Reproduction:
Python 3.8 with embetter = "^0.2.2"

se = SentenceEncoder()
repr(se)

Fix:

Add self.device on SentenceEncoder

class SentenceEncoder(EmbetterBase):
    .
    .
    def __init__(self, name="all-MiniLM-L6-v2", device=None):
        if not device:
            device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.device = device
        self.name = name
        self.tfm = SBERT(name, device=self.device)

The text was updated successfully, but these errors were encountered:

koaning · 2022-12-27T13:52:30Z

Feel free to make a PR. It might make sense to have the actual reference to the torch object to be self._device if that makes the __repr__ look better.

CarloLepelaars · 2022-12-27T16:39:04Z

Cool! I made PR #34 which solves this and adds some tests for SentenceEncoder. Defining self._device additionally does not seem necessary as __repr__ output is already clean.

As for testing text embedders it looks like Sense2VecEncoder is a lot harder to test since it depends on loading a file from disk. Any ideas to test that or do you think its not necessary?

koaning · 2022-12-27T18:47:32Z

There's no harm in adding 'self.path' I think. Feel free to make that PR as well!

CarloLepelaars · 2022-12-28T12:44:01Z

There's no harm in adding 'self.path' I think. Feel free to make that PR as well!

Cool! Created #35 for the Sense2Vec attribute fix.

CarloLepelaars changed the title ~~[BUG] repr breaks on SentenceEncoder~~ [BUG] device should be attribute on SentenceEncoder Dec 27, 2022

CarloLepelaars mentioned this issue Dec 27, 2022

SentenceEncoder.device attribute fix + tests. #34

Merged

CarloLepelaars mentioned this issue Dec 28, 2022

Sense2Vec sklearn param fix #35

Merged

koaning closed this as completed in #34 Dec 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] `device` should be attribute on `SentenceEncoder` #33

[BUG] `device` should be attribute on `SentenceEncoder` #33

CarloLepelaars commented Dec 27, 2022 •

edited

Loading

koaning commented Dec 27, 2022 •

edited

Loading

CarloLepelaars commented Dec 27, 2022 •

edited

Loading

koaning commented Dec 27, 2022

CarloLepelaars commented Dec 28, 2022

[BUG] device should be attribute on SentenceEncoder #33

[BUG] device should be attribute on SentenceEncoder #33

Comments

CarloLepelaars commented Dec 27, 2022 • edited Loading

koaning commented Dec 27, 2022 • edited Loading

CarloLepelaars commented Dec 27, 2022 • edited Loading

koaning commented Dec 27, 2022

CarloLepelaars commented Dec 28, 2022

[BUG] `device` should be attribute on `SentenceEncoder` #33

[BUG] `device` should be attribute on `SentenceEncoder` #33

CarloLepelaars commented Dec 27, 2022 •

edited

Loading

koaning commented Dec 27, 2022 •

edited

Loading

CarloLepelaars commented Dec 27, 2022 •

edited

Loading