Skip to content
This repository has been archived by the owner on Feb 16, 2023. It is now read-only.

Traceback on classifier.predict_tags() #47

Closed
jayme-github opened this issue Nov 26, 2020 · 2 comments
Closed

Traceback on classifier.predict_tags() #47

jayme-github opened this issue Nov 26, 2020 · 2 comments
Labels
bug Something isn't working
Milestone

Comments

@jayme-github
Copy link
Contributor

First of all: Thanks for the awesome work and all the effort you put into this!

I wanted to try the classifier with parts of my data and ran into a traceback on consumption

Traceback (most recent call last):
  File "/usr/src/paperless/src/documents/consumer.py", line 171, in try_consume_file
    document_consumption_finished.send(
  File "/usr/local/lib/python3.8/dist-packages/django/dispatch/dispatcher.py", line 177, in send
    return [
  File "/usr/local/lib/python3.8/dist-packages/django/dispatch/dispatcher.py", line 178, in <listcomp>
    (receiver, receiver(signal=self, sender=sender, **named))
  File "/usr/src/paperless/src/documents/signals/handlers.py", line 127, in set_tags
    matched_tags = matching.match_tags(document.content, classifier)
  File "/usr/src/paperless/src/documents/matching.py", line 36, in match_tags
    predicted_tag_ids = classifier.predict_tags(document_content)
  File "/usr/src/paperless/src/documents/classifier.py", line 224, in predict_tags
    tags_ids = self.tags_binarizer.inverse_transform(y)[0]
  File "/usr/local/lib/python3.8/dist-packages/sklearn/preprocessing/_label.py", line 1017, in inverse_transform
    if yt.shape[1] != len(self.classes_):
IndexError: tuple index out of range

I did also see a warning during "create_classifier" run, which I bluntly ignored ofc. 😄

270 documents, 1 tag(s), 1 correspondent(s), 0 document type(s).
Vectorizing data...
Training tags classifier...
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return f(**kwargs)
Training correspondent classifier...
There are no document types. Not training document type classifier.
@jonaswinkler
Copy link
Owner

jonaswinkler commented Nov 26, 2020

Thank you! Reproduced, fixed, and added some test cases to cover edge cases. I assume you've got exactly one tag on Auto.

Background: for tags, I'm using a multi-label classifier (i.e., a classifier that may assign multiple classes/tags to a single input element), which transparently falls back to a simpler binary classifier if used with only one class/tag, causing the above error.

@jonaswinkler jonaswinkler added Back end bug Something isn't working labels Nov 26, 2020
@jonaswinkler jonaswinkler added this to the 1.0 milestone Nov 26, 2020
@bavarialogy
Copy link

I can confirm the issue is gone when more than one Auto tags exist. It took some minutes to take effect but now it's at least consuming again with the current release.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants