Traceback on classifier.predict_tags() #47

jayme-github · 2020-11-26T11:07:45Z

First of all: Thanks for the awesome work and all the effort you put into this!

I wanted to try the classifier with parts of my data and ran into a traceback on consumption

Traceback (most recent call last):
  File "/usr/src/paperless/src/documents/consumer.py", line 171, in try_consume_file
    document_consumption_finished.send(
  File "/usr/local/lib/python3.8/dist-packages/django/dispatch/dispatcher.py", line 177, in send
    return [
  File "/usr/local/lib/python3.8/dist-packages/django/dispatch/dispatcher.py", line 178, in <listcomp>
    (receiver, receiver(signal=self, sender=sender, **named))
  File "/usr/src/paperless/src/documents/signals/handlers.py", line 127, in set_tags
    matched_tags = matching.match_tags(document.content, classifier)
  File "/usr/src/paperless/src/documents/matching.py", line 36, in match_tags
    predicted_tag_ids = classifier.predict_tags(document_content)
  File "/usr/src/paperless/src/documents/classifier.py", line 224, in predict_tags
    tags_ids = self.tags_binarizer.inverse_transform(y)[0]
  File "/usr/local/lib/python3.8/dist-packages/sklearn/preprocessing/_label.py", line 1017, in inverse_transform
    if yt.shape[1] != len(self.classes_):
IndexError: tuple index out of range

I did also see a warning during "create_classifier" run, which I bluntly ignored ofc. 😄

270 documents, 1 tag(s), 1 correspondent(s), 0 document type(s).
Vectorizing data...
Training tags classifier...
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return f(**kwargs)
Training correspondent classifier...
There are no document types. Not training document type classifier.

The text was updated successfully, but these errors were encountered:

jonaswinkler · 2020-11-26T14:31:42Z

Thank you! Reproduced, fixed, and added some test cases to cover edge cases. I assume you've got exactly one tag on Auto.

Background: for tags, I'm using a multi-label classifier (i.e., a classifier that may assign multiple classes/tags to a single input element), which transparently falls back to a simpler binary classifier if used with only one class/tag, causing the above error.

bavarialogy · 2020-11-27T07:20:07Z

I can confirm the issue is gone when more than one Auto tags exist. It took some minutes to take effect but now it's at least consuming again with the current release.

jayme-github mentioned this issue Nov 26, 2020

Delay consumption of new files #46

Closed

jonaswinkler added Back end bug Something isn't working labels Nov 26, 2020

jonaswinkler added this to the 1.0 milestone Nov 26, 2020

jonaswinkler closed this as completed Nov 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Traceback on classifier.predict_tags() #47

Traceback on classifier.predict_tags() #47

jayme-github commented Nov 26, 2020

jonaswinkler commented Nov 26, 2020 •

edited

bavarialogy commented Nov 27, 2020

Traceback on classifier.predict_tags() #47

Traceback on classifier.predict_tags() #47

Comments

jayme-github commented Nov 26, 2020

jonaswinkler commented Nov 26, 2020 • edited

bavarialogy commented Nov 27, 2020

jonaswinkler commented Nov 26, 2020 •

edited