Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please move to Python 3 #13

Open
markuschaaf opened this issue Mar 29, 2019 · 48 comments
Open

Please move to Python 3 #13

markuschaaf opened this issue Mar 29, 2019 · 48 comments
Labels

Comments

@markuschaaf
Copy link

Python 2 will be EOL end of 2019. Distributions will stop shipping it. https://pythonclock.org/

@jwilk
Copy link
Owner

jwilk commented Apr 12, 2019

I don't have any plans to port didjvu to Python 3.
Python 2 is a fine language and the motions to remove it from distros are ill-advised.

@FriedrichFroebel
Copy link

For me it seems like porting didjvu to Python 3 or making it compatible with Python 2 and Python 3 should be relatively easy once gamera supports Python 3 as well (see hsnr-gamera/gamera-3#19).

@jwilk jwilk added the wontfix label Aug 7, 2019
@blaueente
Copy link

I don't have any plans to port didjvu to Python 3.
Python 2 is a fine language and the motions to remove it from distros are ill-advised.

Python 2 is not maintained anymore regarding security. This means distros do not have a choice.

@mara004
Copy link

mara004 commented Oct 16, 2020

Gamera developer @cdalitz says that the main branch has already been ported completely to Python 3 (https://github.com/hsnr-gamera/gamera-4), however it is marked as 'experimental' in the description and it doesn't seem to have an official release yet.
@jwilk Could you please consider porting didjvu to Python 3 anyway? Python 2 is rarely used nowadays, and, as @blaueente pointed out, all major distributions are about to remove it or have done so already because they have to regarding security. I barely know of any other reasonably popular program which is still maintained and deliberately keeps using Python 2 ...

@cdalitz
Copy link

cdalitz commented Oct 20, 2020

Concerning the python 3 port of Gamera (gamera-4), this is indeed finished. It is nevertheless still marked as "experimental", because it is not extensively tested. As I no longer use Gamera myself in any of the projects that I currently work on, I do not have the opportunity to test and fix it. Thus, if someone finds any bugs, patches for fixing them are highly welcome.

@mara004
Copy link

mara004 commented Oct 25, 2020

@cdalitz Okay, thanks for clarifying!

@jsbien
Copy link

jsbien commented Sep 12, 2021

I understand a virtual environment for python 2 can be created on e.g. stable Debian and the program run inside it. I will appreciate a fool-proof instruction how to actually do it.

@mara004
Copy link

mara004 commented Sep 12, 2021

@jsbien Python 2 is still available as official Debian package up to sid, so you probably don't have to worry about Python 2 for (at least) the next 5 years if you're on Debian.
I'm not sure why they decided to keep Python 2 so long, though - an unmaintained programming language interpreter is a rather big security risk after all.

In general I think it might be better just not to use the djvu format anymore. The vast majority of djvu software is unmaintained, and outside the linux/bsd scope there are very few programs left that can open djvu at all.
You can also achieve good compression ratios with PDF, which is a much more compatible format.

@jsbien
Copy link

jsbien commented Sep 12, 2021

@mara004 As for DjVu: djview4 and djvulibre is very well maintained, and new software is created, e.g. https://github.com/trufanov-nok/minidjvu-mod/. For me the compression ratio is the least important feature of DjVu, it has a lot of other advantages which are demonstrated by our tools such as https://github.com/jsbien/djview4shapes and https://bitbucket.org/mrudolf/djview-poliqarp. Their use it demonstrated e.g. by https://github.com/jsbien/iLindeCSV and https://github.com/jsbien/Zaborowski-index4djview.

@mara004
Copy link

mara004 commented Sep 12, 2021

I won't deny there is still some active djvu software, but it seems most of it is rather intended for research than for practical use. Development of djvulibre has been slowing down a lot, and the djvu format is barely used compared to PDF or TIFF. Since most macOS, Windows or mobile users won't be able to open djvu, it is also very unsuitable for sharing.

@cdalitz
Copy link

cdalitz commented Sep 13, 2021

At least Gamera has been ported to Python 3 (use the Gamera 4 version). If you encounter any problems with Gamera under Python 4, please consder filing a bug report there. This should thus not be an obstacle to porting djvu to Python 3, I think.

@jsbien
Copy link

jsbien commented Sep 13, 2021

I made some experiments with Gamera 4 and encountered no problems.
Bastien Roucariès, who already ported ocrodjvu to Python 3, suggested "shotgun porting" of didjvu:

Use the testsuite, and the automatic conversion tool from python
porting. Fix every bug that show during test suite and voila. It take
me two your to fix the previous package.

Anybody willing to try this approach?

@FriedrichFroebel
Copy link

FriedrichFroebel commented Sep 13, 2021

I just had a look at porting didjvu to Python 3, with the following issues arising:

  • didjvu.tests.test_utils.test_enhance_import can be fixed as done for ocrodjvu, but this will not really enhance the error message any more.
  • didjvu.lib.cli.ArgumentParser.parse_args does not work as before and seems to require additional handling for the missing fg_bg_defaults attribute if no parameters are set.
  • didjvu.tests.test_gamera.test_to_pil_rgb.test_color fails. There is something wrong with the output, although I do not know whether this is an issue in gamera-4 or didjvu: https://user-images.githubusercontent.com/7279752/133065608-e7a6089b-af24-46b6-8cce-6d3bf60bc5eb.png (standalone version being compatible to Python 3: ycbcr-jpeg.py)
  • The gamera-4 version check is broken and therefore I had to disable the version check itself.

@rmast
Copy link

rmast commented Oct 21, 2021

@FriedrichFroebel wrote:

I just had a look at porting didjvu to Python 3

I don't see your fork?

@rmast
Copy link

rmast commented Oct 21, 2021

@mara004 I agree PDF is much more common, and I guess if you put the MRC-djvu result of didjvu through DjVuToy to translate it to PDF it will not be much bigger with JP2000 instead of the FG44 IW44 image masked by the JBIG2 in a similar way as is done in a multilayer DjVu.

FriedrichFroebel added a commit to FriedrichFroebel/didjvu that referenced this issue Oct 22, 2021
See jwilk#13 for a list of known problems.
@FriedrichFroebel
Copy link

@rmast I did this on an old clone of this repository back then for testing and realized the aforementioned porting issues, so I did not upload these changes to GitHub. The incomplete/partially broken Python 3 port is now available in my fork.

@rmast
Copy link

rmast commented Oct 22, 2021 via email

@rmast
Copy link

rmast commented Oct 22, 2021

The fork of @FriedrichFroebel just does the job in python3.8 on Mint 20.2 when I run didjvu encode, after compiling and installing Gamera-4 without wx. https://github.com/hsnr-gamera/gamera-4

I don't know how to call it to reproduce the issues that @FriedrichFroebel thought were still there?

Edit: I found it: run
make test

It only gives a test-issue with tests.test_gamera.test_to_pil_rgb.test_color. So the output has to be judged to be able to point to the right repo to solve it.

@rmast
Copy link

rmast commented Oct 23, 2021

@FriedrichFroebel

This shows the way to see the ycbcr-jpeg.tiff contains a given colorspace:

exiftool -S -PhotometricInterpretation didjvu/tests/data/ycbcr-jpeg.tiff
PhotometricInterpretation: YCbCr

Both In.jpg and Out.jpg appear not to have any Colorspace information:
od -A x -t x1z -v out.jpg

gives no APP* or whatever segment markers popping up in the right column as described here:
https://en.wikipedia.org/wiki/JPEG_File_Interchange_Format

So also no Adobe APP14 marker which could distinguish between RGB and YCbCr.
https://stackoverflow.com/questions/50798014/determining-color-space-for-jpeg/50861048

However the default color scheme for JPEG is YCBCR.

The documentation of to_pil says it only supports RGB and Grayscale. So putting in a YCbCr image probably already leads to an undefined situation.

The tested code seems to try to replace some Gamera-bugs, or try to speed up to_pil with a custom to_pil_rgb.

They might have a history in the commits that tells more about what happened and why they're introduced in the first place.

@FriedrichFroebel
Copy link

Glad to see that the port is working, as I have only used the tests before (after 2to3 conversion and some manual fixes). While I have no clean solution for the aforementioned issues (#13 (comment)), I do not feel like a PR makes sense - besides the fact that Python 3 support does not seem to be considered useful by upstream.

I am clearly not an expert on the colorspace stuff, so there is not much I can say about it. The commit history for the Gamera support does not seem to tell us much about it as well: 2337b8f, fdd6bf9.

@rmast
Copy link

rmast commented Oct 23, 2021

@FriedrichFroebel As I read those commits you pointed at it might be just an optimization step that made the assumption of RGB necessary, while most real-life images are usually YCrCb.
Reverting exact those two commits you pointed at drops the assumption on RGB, and even the failing test that comes with it.

Edit: unfortunately the program then fails on the inputpicture not being RGB.

The only thing that should be thoroughly tested then is behavior with source images of different color spaces, however usually images in scanned input will behave consistently, so if the colorspace fails someone will know at first try.

A PR is not necessary at te moment, as the Ubuntu 18-trick of getting the old dropped python-gamera package to work on Ubuntu 20 with Python 2.7 is still valid.

As soon as a valid python-gamera package is not reachable that way anymore because some dependencies of the Ubuntu 18 package get upgraded @jwilk will have to decide how to keep the didjvu usable.
That might be with the introduction of Ubuntu 22.04 LTS next year, which might even raise the bar further on the supported Python-version, and deprecate 2to3.

The package maintainer of Debian has abandoned python-gamera as has has its maintainer.

gamera-4 might get out of Alpha at some moment, that would be the moment to put effort in the upstream again, and probably even put effort in getting gamera-4 back in the debian packages.

I committed some python3-changes to my fork of the python3 branch as well, for getting the 'bundle' function to work properly.

I also made another branch for supporting minidjvu-mod with the -2 parameter to call when --pages-per-dict > 1.

However, even with minidjvu-mod in place I see only a small reduction of the size. The resulting djvu-filesize is still way bigger than I would expect from DjVuSolo 3.1. When I scan a letter with a colored logo, an autograph and some colored text on the bottom there is mostly lots of blur on the background-picture, but it takes way too much space in the djvu.

I studied DjVuSolo 3.1, it behaves differently with different content, optimizing away layers that practically don't contain useful information, but use an FGBZ instead of a FG44. I saw blur on the background picture behind the JB2 foreground-mask. The official DjVu uses cheap to compress content behind the foreground mask as it will not be shown.

@rmast
Copy link

rmast commented Oct 27, 2021

I just witnessed a case where the colorspace issue appeared with a posterized 8 color .png as input in the Python3.8 version, so the issue isn't only appearing in the test.
I'll have to further investigate how to solve it and watch whether also the failing test will get solved with a solution.

Here a suggestion to use OpenCV for the conversion:
https://stackoverflow.com/questions/62293077/why-is-pils-image-fromarray-distorting-my-image-color
https://note.nkmk.me/en/python-opencv-bgr-rgb-cvtcolor/
https://www.ccoderun.ca/programming/doxygen/opencv/group__imgproc__color__conversions.html

But before conversion you should know what colorspace is used in the image.
This hint is probably the direction to look in:
https://stackoverflow.com/questions/50641637/identify-colour-space-of-any-image-if-icc-profile-is-empty-pil

@rmast
Copy link

rmast commented Oct 28, 2021

The issue with the tested image is filed at Gamera-4:
hsnr-gamera/gamera-4#35

The issue with the posterized/palletized image can be solved by allowing mode P for PNG.

@rmast
Copy link

rmast commented Oct 30, 2021

@FriedrichFroebel, please take my Gamera-4-patch, on my fork-master to solve the to_pil_rgb-issue:
rmast/gamera-4@2d9877a
I didn't see anything wrong with color conversion, only copy routines translated too fancy and therefore buggy.
Probably only meant to work around the String to Bytes issue and probably also a memory-leak issue, but the commit texts involved weren't telling the exact reason of those changes.

When I run make test on didjvu with my new version of the python3-branch I run into issues with test_xmp.py, which attempts to use a deprecated way to del an imported module. As you are more experienced with Python, could you have a look?

@FriedrichFroebel
Copy link

FriedrichFroebel commented Oct 30, 2021

@rmast Are you sure the module problem is really related to your Gamera change and not to any of the three XMP backends (if I remember correctly, I did not install all of them for testing)? Which backend this is about? Do you have a specific error message I can use to have a look at?

@rmast
Copy link

rmast commented Oct 30, 2021 via email

@FriedrichFroebel
Copy link

@rmast I have been running each of the test files directly on its own, so I would not expect to see any change on it with your patch (with the same OS and Python as in your case). For this reason I asked which XMP backend libraries you have installed, as I used only one as far as I remember (probably python-xmp-toolkit if I am not mistaken; pyexiv2 does not work on Python 3 anyway if I recall correctly), where no problems arose for the XMP tests. So the backend and the specific traceback should help here until I am able to have another look at the code in the next days if I am able to reproduce your issue.

@rmast
Copy link

rmast commented Oct 30, 2021

I work on Mint 20.2. I just did apt-get upgrade to make it easy to follow.
These are the specs of my VMWare (virtual) x64 machine:
image

Lots of details of apt and pip install:
systeem.txt

I installed python-xmp-toolkit, it wasn't installed, but it didn't result in any difference.

This is the exact error at the end of make test:

Failure: NameError (name 'name' is not defined) ... ERROR

======================================================================
ERROR: Failure: NameError (name 'name' is not defined)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/nose/failure.py", line 39, in runTest
    raise self.exc_val.with_traceback(self.tb)
  File "/usr/lib/python3/dist-packages/nose/loader.py", line 416, in loadTestsFromName
    module = self.importer.importFromPath(
  File "/usr/lib/python3/dist-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/usr/lib/python3/dist-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/usr/lib/python3.8/imp.py", line 234, in load_module
    return load_source(name, filename, file)
  File "/usr/lib/python3.8/imp.py", line 171, in load_source
    module = _load(spec)
  File "<frozen importlib._bootstrap>", line 702, in _load
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/robert/didjvu/tests/test_xmp.py", line 59, in <module>
    del name
NameError: name 'name' is not defined

----------------------------------------------------------------------
Ran 122 tests in 8.271s

FAILED (errors=1)
make: *** [Makefile:53: test] Fout 1

This is the programtext in that file:

xmp_backends = [
    import_backend(name)
    for name in [
        'gexiv2',
        'libxmp',
        'pyexiv2',
    ]
]
del name  # this is line 59.

If I try to run the single test I get:
robert@robert-virtual-machine:~/didjvu/tests$ python3 test_xmp.py

Traceback (most recent call last):
  File "test_xmp.py", line 22, in <module>
    from .tools import (
ImportError: attempted relative import with no known parent package

Edit: I had no libxmp-dev installed via apt. That makes the python2.7 version completely skip the libxmp-tests. Does your version just skip the xmp-tests as well?
Installing libxmp-dev doens't change the error.

What is xmp? What of it should be remained in the new upgraded Python3 version?

@rmast
Copy link

rmast commented Oct 30, 2021

@FriedrichFroebel I forgot to tag you in above message with all details.

@FriedrichFroebel
Copy link

FriedrichFroebel commented Oct 31, 2021

@rmast Seems like I never actually run the XMP tests beforehand - now I could actually reproduce your issue about the undefined variable name, as well as some further scope issues. These should be fixed now. For the Gamera issue, I have not yet compiled your patched version and therefore not tested it (I might do this if your PR gets merged.)

From the docs of the python-xmp-toolkit module:

Python XMP Toolkit is a library for working with XMP metadata, as well as reading/writing XMP metadata stored in many different file formats.

Python XMP Toolkit is wrapping Exempi (using ctypes), a C/C++ XMP library based on Adobe XMP Toolkit, ensuring that future updates to the XMP standard are easily incorporated into the library with a minimum amount of work.

Wikipedia has some more information: https://en.wikipedia.org/wiki/Extensible_Metadata_Platform

I am not sure whether all three backends should be kept. With my latest changes, python-xmp-toolkit works fine, but py3exiv2 (the Python 3 port of pyexiv2) fails and would need some additional work. If anyone wants to fix this, feel free to submit a PR to my fork. (I did not yet test the gexiv2 backend, so no idea if it works out of the box. It might be worth to set up GitHub Actions here to simplify such tests.)

By the way: Running only one test module can be done with a modified version of the implementation of the make test command: python didjvu --test --verbose tests/test_xmp.py.

@rmast
Copy link

rmast commented Oct 31, 2021

@FriedrichFroebel Yes! all tests run fine now on your branch python3 when I put the default Python to 3.8 and apt -uninstall all xmp-stuff. I've only issued a PR to your python3-branch for 3 write-lines in djvu_support.py that need an .encode() for the bundle-flow.
I'm curious if there is anyone that would bother about pyexiv2. Would it withhold Jakub from doing the upgrade to keep enough attention to this repo?

@rmast
Copy link

rmast commented Oct 31, 2021

My PR at hsnr-gamera/gamera-4 has just been merged into master!

@rmast
Copy link

rmast commented Nov 1, 2021

@FriedrichFroebel I was looking for code test coverage, but see there is some code coverage statistic in the source tree:
tests/coverage

I bet this shows the code that has no test-coverage. So all those lines have to inspected on need to upgrade, for example the write bytes instead of string issue.

@rmast
Copy link

rmast commented Nov 1, 2021

Yes! The lines your new test covers don't show up in the code coverage anymore, however the bytes-issue also shows up in a standard coverage package. Don't know if I solved it right, but the private/update-coverage runs:
diff coverage statistics.txt
diff private update-coverage.txt
"~/.local/lib/python3.8/site-packages/coverage/summary.py" line 30:
self.outfile.write(line.rstrip().encode())
self.outfile.write(b"\n")

@FriedrichFroebel
Copy link

@rmast It actually is much simpler for Python3-only code: Just use report_stream = plugin.stream = io.StringIO() instead of the current BytesIO(). I have fixed this in the fork.

@rmast
Copy link

rmast commented Nov 22, 2021

@mara004 wrote

You can also achieve good compression ratios with PDF and lossless JBIG2 encoding, for example. The PDF format has the clear advantage of much better compatibility.

Ever seen this project? https://github.com/internetarchive/archive-pdf-tools
Unfortunately it doesn't work without provided hocr-file, with an open issue: internetarchive/archive-pdf-tools#11

@mara004
Copy link

mara004 commented Nov 23, 2021

@rmast

Ever seen this project? https://github.com/internetarchive/archive-pdf-tools

I didn't know this yet, but it's highly interesting. I wonder whether the author of OCRmyPDF knows about archive-pdf-tools.

@rmast
Copy link

rmast commented Nov 23, 2021 via email

@mara004
Copy link

mara004 commented Nov 23, 2021

I doubt it. I want to investigate how good it is, it probably only supports the written happy flow, It chokes with complex Python erros on leaving out some of those parameters.

You mean the project claims a reliability it does not offer?

@rmast
Copy link

rmast commented Nov 23, 2021 via email

@rmast
Copy link

rmast commented Nov 23, 2021 via email

@rmast
Copy link

rmast commented Jan 24, 2022

We should probably try to get it working on Python 3.10 as well:
jwilk-archive/python-djvulibre#13

@FriedrichFroebel
Copy link

@rmast I am currently on Python 3.8.10 due to my distro, so no way to directly check it (leaving GitHub Actions aside). But it seems like didjvu uses subprocess calls in the corresponding djvulibre wrapper (https://github.com/jwilk/didjvu/blob/master/lib/djvu_support.py) instead of the native wrapper, so it should work in theory.

@rmast
Copy link

rmast commented Jan 27, 2022

This instruction reveals Python3.10.2 at the moment:
https://computingforgeeks.com/how-to-install-python-on-ubuntu-linux-system/

@rmast
Copy link

rmast commented Jan 27, 2022

This instruction allows switching between default Python-versions:
https://stackoverflow.com/questions/43062608/how-to-update-alternatives-to-python-3-without-breaking-apt

@rmast
Copy link

rmast commented Jan 28, 2022

This non-LTS Ubuntu distro 21.04 has 3.10 in the package manager: https://packages.ubuntu.com/hirsute/python3.10-distutils

@rmast
Copy link

rmast commented Feb 6, 2022

I fixed a Python3.10 issue in Gamera-4: hsnr-gamera/gamera-4#39
and a Python3.9 issue in didjvu: https://github.com/rmast/didjvu/tree/python3.9

With Python3.9 there still are some new gi-import-warnings with test_xmp

@rmast
Copy link

rmast commented Feb 12, 2022

I've now seen all tests run Ok in Ubuntu 22.04 with these extra packages and my python3.9 branch.

sudo apt install python3-pip gir1.2-gexiv2-0.10 libexempi-dev libboost-python-dev libexiv2-dev libpng-dev libtiff-dev djvulibre-bin exiv2 python3-pil

pip install py3exiv2
pip install python-xmp-toolkit
pip install nose

Friedrich sees room for improvement of my GExiv2-fix. But I think we're near a viable Python3.10 version for the coming Ubuntu 22.04

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants