You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 16, 2023. It is now read-only.
First of all: Thanks for the awesome work and all the effort you put into this!
I wanted to try the classifier with parts of my data and ran into a traceback on consumption
Traceback (most recent call last):
File "/usr/src/paperless/src/documents/consumer.py", line 171, in try_consume_file
document_consumption_finished.send(
File "/usr/local/lib/python3.8/dist-packages/django/dispatch/dispatcher.py", line 177, in send
return [
File "/usr/local/lib/python3.8/dist-packages/django/dispatch/dispatcher.py", line 178, in <listcomp>
(receiver, receiver(signal=self, sender=sender, **named))
File "/usr/src/paperless/src/documents/signals/handlers.py", line 127, in set_tags
matched_tags = matching.match_tags(document.content, classifier)
File "/usr/src/paperless/src/documents/matching.py", line 36, in match_tags
predicted_tag_ids = classifier.predict_tags(document_content)
File "/usr/src/paperless/src/documents/classifier.py", line 224, in predict_tags
tags_ids = self.tags_binarizer.inverse_transform(y)[0]
File "/usr/local/lib/python3.8/dist-packages/sklearn/preprocessing/_label.py", line 1017, in inverse_transform
if yt.shape[1] != len(self.classes_):
IndexError: tuple index out of range
I did also see a warning during "create_classifier" run, which I bluntly ignored ofc. 😄
270 documents, 1 tag(s), 1 correspondent(s), 0 document type(s).
Vectorizing data...
Training tags classifier...
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
return f(**kwargs)
Training correspondent classifier...
There are no document types. Not training document type classifier.
The text was updated successfully, but these errors were encountered:
Thank you! Reproduced, fixed, and added some test cases to cover edge cases. I assume you've got exactly one tag on Auto.
Background: for tags, I'm using a multi-label classifier (i.e., a classifier that may assign multiple classes/tags to a single input element), which transparently falls back to a simpler binary classifier if used with only one class/tag, causing the above error.
I can confirm the issue is gone when more than one Auto tags exist. It took some minutes to take effect but now it's at least consuming again with the current release.
First of all: Thanks for the awesome work and all the effort you put into this!
I wanted to try the classifier with parts of my data and ran into a traceback on consumption
I did also see a warning during "create_classifier" run, which I bluntly ignored ofc. 😄
The text was updated successfully, but these errors were encountered: