-
-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_bad_utf8 fails when run in parallel #456
Comments
Can't reproduce locally. I think it may have to do with the logging state though. Would you mind trying this change? Probably introduced by commit 979b0bc. diff --git a/src/ocrmypdf/_jobcontext.py b/src/ocrmypdf/_jobcontext.py
index e50aa75..bfcc345 100644
--- a/src/ocrmypdf/_jobcontext.py
+++ b/src/ocrmypdf/_jobcontext.py
@@ -116,9 +116,9 @@ class LogNamePageAdapter(logging.LoggerAdapter):
def make_logger(options=None, prefix='ocrmypdf', filename=None, page=None):
log = logging.getLogger(prefix)
if filename and page:
- adapter = LogNamePageAdapter(log, dict(filename=filename, page=page))
+ adapter = LogNamePageAdapter(log, dict(input_filename=filename, page=page))
elif filename:
- adapter = LogNameAdapter(log, dict(filename=filename))
+ adapter = LogNameAdapter(log, dict(input_filename=filename))
else:
adapter = log
return adapter
diff --git a/src/ocrmypdf/exec/tesseract.py b/src/ocrmypdf/exec/tesseract.py
index afa0e4a..0a9afa9 100644
--- a/src/ocrmypdf/exec/tesseract.py
+++ b/src/ocrmypdf/exec/tesseract.py
@@ -55,6 +55,7 @@ HOCR_TEMPLATE = """<?xml version="1.0" encoding="UTF-8"?>
class TesseractLoggerAdapter(logging.LoggerAdapter):
def process(self, msg, kwargs):
+ kwargs['extra'] = self.extra
return '[tesseract] %s' % (msg), kwargs |
No, it doesn't seem to be that. If I revert db914d4, then it works, which I guess is a similar change since it's about logging too. It's a bit flaky and gets stuck sometimes, so I wonder if there's some underlying pytest or pytest-xdist problem. What versions are you running? What I have is:
|
It works on Fedora 31 with:
so it's probably one of the other changes, Python 3.8 or xdist 1.30.0. |
Main system
If you do |
No, it works fine if just running those by themselves. |
I am trying now it on a Docker image of Fedora Rawhide. 4 cores assigned.ocrmypdf 9.0.5 package installed. Used pip to upgrade to latest. Test suite gives:
Still waiting for results. Docker can be slow. I suppose, since it's in intermittent and likely an artifact of testing, I'm pretty confident you could safely disable the test for Fedora. The test ensures that we had a rare error condition (Tesseract's data files don't match the executable) correctly. This condition caused Tesseract to dump some binary data to stdout which in turn violated Python 3's UTF-8 assumptions. It's likely that an error in Tesseract and it's probably been fixed because they've done a lot of memory fixes lately. I am still interested in learning what's causing this. |
Finally reproduced the failure on Fedora: If you don't have tesseract-osd installed, that may be related in some way. ocrmypdf depends on it. I got a few failures related to this. Is there an equivalent to https://tracker.debian.org/pkg/ocrmypdf where I can see the Fedora build logs? |
Not an exact reproduction though:
|
Sure, Fedora builds are in koji. The Rawhide one got stuck though. |
Although I haven't made it an official release yet, I am removing this test as of commit 9db01c7 along with the functionality it tested, if you want to cherry pick it. I consider the test no longer valid as I don't believe the Tesseract executable behaves in that way any more. I also believe the issue you reported is an actually an issue with pytest-xdist on Python 3.8, since I found similar problems, again with Python 3.8 but nor previously. I reported that here in hopes of getting some ideals about how to diagnose it better: |
Decided test was not longer valid so removed in v9.2.0. I think the parallelism issue could have been related to the use of |
Somehow, I managed to build 9.3.0 even though it doesn't work locally. I'm not sure what exactly is wrong, but I guess I won't have to worry about it later. |
Describe the issue
I'm not sure what's going with it, but when I run tests in parallel (
-n auto
),test_bad_utf8
fails. When everything is run in serial, everything works fine. I guess there is some sort of global state somewhere that is modified by a parallel test that breaks things. Also, this seemed to work fine on 9.0.5, but I don't see much in the diff between it and 9.1.1 that would be indicative of anything.To Reproduce
pytest -n auto
System:
The text was updated successfully, but these errors were encountered: