Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable JPEG2000 format support #410

merged 1 commit into from Feb 2, 2022


Copy link

@caerulescens caerulescens commented Feb 1, 2022


When a JPEG2000 image is loaded with pillow and run using pytesseract, an exception is raised: TypeError: Unsupported image format/type. pillow and tesseract support JPEG2000 format images, and pytesseract should support the union of their behavior. Support for JPEG2000 images using pillow is enabled by adding JPEG2000 to SUPPORTED_FORMATS in pytesseract.

I included a file test.jpeg2000 image for a jpeg2000 test; the image was created by taking the test.png in tests/data and converting it to JPEG2000 format using the below:

import io
from PIL import Image

with open('test.png', 'rb') as f:
    image_data =
buffer = io.BytesIO(image_data)
image ="test.jpeg2000", "JPEG2000")

@caerulescens caerulescens changed the title feat: enable JPEG2000 support enable JPEG2000 format support Feb 1, 2022
@int3l int3l merged commit d32bbb5 into madmaze:master Feb 2, 2022
5 checks passed
Copy link

int3l commented Feb 2, 2022

Thanks for your contribution @caerulescens

@caerulescens caerulescens deleted the enable-jpeg2000-support branch February 3, 2022 18:41
@caerulescens caerulescens restored the enable-jpeg2000-support branch February 3, 2022 18:42
@caerulescens caerulescens deleted the enable-jpeg2000-support branch February 3, 2022 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

Successfully merging this pull request may close these issues.

'JPEG2000' images are supported by PIL and Tesseract-OCR, but not pytesseract
2 participants