Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Port calibre to python 3 #870

Closed
wants to merge 64 commits into from
Closed

WIP: Port calibre to python 3 #870

wants to merge 64 commits into from

Conversation

@flaviut
Copy link
Contributor

flaviut commented Sep 3, 2018

What works:

  • Starting the calibre GUI
  • Calibre C extensions
  • Adding a book

What doesn't work:

  • Recipies
  • Everything else
  • Python 2 compatibility

Anyway, this pull request is just here to get some early feedback. There's a lot of work that is left to do, like manually testing as many code paths as possible, as well as porting recipies to use MechanicalSoup or something like that instead of mechanize.

My goal is to have this code work the same on python2 and on python3, with the goal of eventually dropping python2 support late 2019 or early 2020.

To bootstrap this version, you'll need to install the following pip packages:

apsw==3.9.2.post1
dukpy==0.3
Elements==0.2.0
feedparser==5.2.1
html2text==2018.1.9
html5-parser==0.4.5
netifaces==0.10.7

and delete all recipes except ajc.recipie

flaviut added 30 commits Aug 22, 2018
This was done using lib2to3.fixes.fix_numliterals

Python 2 supports 0o, so this is fine.
This is compatible with python 2 & 3, and was done using

sed -i -e 's/from future_builtins import \(.*\)$/from six.moves import \1/g' *.py src/**/*.py setup/**/*.py
This is using a fixer plugin that I wrote, and can be found at
https://gist.github.com/7088e5543b65b1dd1320a822c9cb2ef5
I'm not sure why this was using a byte string--this dict contains
regular strings.
Done using libmodernize.fixes.fix_imports_six
Replace with six.moves using

sed -i -e 's/from cStringIO import StringIO/from six.moves import StringIO/g' -e 's/cStringIO.StringIO/StringIO/g' -e 's/import cStringIO/from six.moves import StringIO/g' *.py src/**/*.py setup/**/*.py recipes/**/*.py

and some manual touch-up
This was done using the regex ([^']\s*)\bur'([\x01-\x26\x28-\xff]+)'
Otherwise we'd try to read an unassigned variable
@eli-schwartz

This comment has been minimized.

Copy link
Contributor

eli-schwartz commented May 7, 2019

Preferences => Toolbars and Menus
Try to remove an icon from a toolbar, and the following error message appears (but it is successfully removed):

Traceback (most recent call last):
  File "/home/eschwartz/git/calibre/src/calibre/gui2/preferences/toolbar.py", line 317, in remove_action
    self.all_actions.model().add(removed)
  File "/home/eschwartz/git/calibre/src/calibre/gui2/preferences/toolbar.py", line 124, in add
    self._data.sort()
TypeError: '<' not supported between instances of 'AuthorMapAction' and 'FakeAction'

These classes (Plugin and FakeAction) might need comparison methods to be defined. I'm not entirely sure what the code does when sorting, though, even on python2. AFAICT it is sorting these all based on their hash(), which is wrong. In fact, simply moving an action back and forth between a toolbar and the available actions, will cause the sort order on the left to be totally messed up and appear basically randomly.

This is a python2 bug that triggered a traceback on python3. \o/

The comparison methods won't help on python3, I think -- instead this needs a sort key. So, here is #984 to fix it.

@eli-schwartz

This comment has been minimized.

Copy link
Contributor

eli-schwartz commented May 7, 2019

I think we've mostly got the GUI figured out, so now the content server gets to be a huge issue... currently it starts, but returns nothing other than an empty response body. No errors even, except for a translation error:

Traceback (most recent call last):
  File "/home/eschwartz/git/calibre/src/calibre/srv/loop.py", line 571, in tick
    conn.handle_event(event)
  File "/home/eschwartz/git/calibre/src/calibre/srv/http_request.py", line 308, in parse_header_line
    self.finalize_headers(parser.hdict)
  File "/home/eschwartz/git/calibre/src/calibre/srv/web_socket.py", line 287, in finalize_headers
    return HTTPConnection.finalize_headers(self, inheaders)
  File "/home/eschwartz/git/calibre/src/calibre/srv/http_request.py", line 347, in finalize_headers
    self.read_request_body(inheaders, request_content_length, chunked_read)
  File "/home/eschwartz/git/calibre/src/calibre/srv/http_request.py", line 361, in read_request_body
    self.prepare_response(inheaders, BytesIO())
  File "/home/eschwartz/git/calibre/src/calibre/srv/http_response.py", line 448, in prepare_response
    self.translator_cache, self.tdir, self.forwarded_for
  File "/home/eschwartz/git/calibre/src/calibre/srv/http_response.py", line 236, in __init__
    self.set_translator(self.get_preferred_language())
  File "/home/eschwartz/git/calibre/src/calibre/srv/http_response.py", line 279, in get_preferred_language
    return preferred_lang(self.inheaders.get('Accept-Language'), self.get_translator)
  File "/home/eschwartz/git/calibre/src/calibre/srv/http_response.py", line 107, in preferred_lang
    found, lang, translator = get_translator_for_lang(x)
  File "/home/eschwartz/git/calibre/src/calibre/srv/http_response.py", line 276, in get_translator
    return get_translator_for_lang(self.translator_cache, bcp_47_code)
  File "/home/eschwartz/git/calibre/src/calibre/srv/utils.py", line 282, in get_translator_for_lang
    cache[bcp_47_code] = ans = get_translator(bcp_47_code)
  File "/home/eschwartz/git/calibre/src/calibre/utils/localization.py", line 167, in get_translator
    return found, lang, get_single_translator(lang)
  File "/home/eschwartz/git/calibre/src/calibre/utils/localization.py", line 139, in get_single_translator
    buf = io.BytesIO(zf.read(mpath + '/%s.mo' % which))
  File "/usr/lib/python2.7/zipfile.py", line 958, in read
    return self.open(name, "r", pwd).read()
  File "/usr/lib/python2.7/zipfile.py", line 984, in open
    zinfo = self.getinfo(name)
  File "/usr/lib/python2.7/zipfile.py", line 932, in getinfo
    'There is no item named %r in the archive' % name)
KeyError: "There is no item named 'en/messages.mo' in the archive"
@kovidgoyal

This comment has been minimized.

Copy link
Owner

kovidgoyal commented May 9, 2019

IIRC the content server mostly works fine in py3. That translation error was recently introduced and should now be fixed.

@eli-schwartz

This comment has been minimized.

Copy link
Contributor

eli-schwartz commented May 20, 2019

There are various conversion fixes in #996, after which most (but not all) conversions should begin to work okay. Mostly, these deal with less common formats, though. ;)

@ofek

This comment has been minimized.

Copy link

ofek commented May 20, 2019

Keep up the good work! Is the port almost over?

@eli-schwartz

This comment has been minimized.

Copy link
Contributor

eli-schwartz commented May 20, 2019

Lots of it does work. Some of it doesn't. For a fair amount of casual work, it does seem to work. ;)

I think there's a good possibility that, once the remaining conversion stuff which I am currently fighting with is resolved, we might be able to progress to public betas. :)

@kovidgoyal

This comment has been minimized.

Copy link
Owner

kovidgoyal commented May 20, 2019

The big remaining things to port are:

  1. odf2xhtml -- it needs python3 compat done from scratch, the work upstreamis very low quality, for example, it modifies sys.path to get around problems with absolute imports

  2. Fixing everything reported by ./setup.py unicode_check this is hundreds of files. Most of the fixes are very simple/no impact, but every once in a while it finds actual things that will break under py3

@eli-schwartz

This comment has been minimized.

Copy link
Contributor

eli-schwartz commented May 20, 2019

So yeah, some messy conversion holdouts. :)

I wonder whether odf2xhtml or rtf2xml is a bigger train wreck... the latter's code I've seen is quite gnarly even without the porting considerations.

@kovidgoyal

This comment has been minimized.

Copy link
Owner

kovidgoyal commented May 20, 2019

both have very bad code. Fortunately, both are now ported so hopefully will not have to look at their code again for a while.

@eli-schwartz

This comment has been minimized.

Copy link
Contributor

eli-schwartz commented May 21, 2019

So it is, so it is. My test ebook was reporting errors for rtf, but only with --debug-pipeline and that turned out to be reproducible on any historic version of calibre too... due to "RTF features that are not supported".

@kovidgoyal

This comment has been minimized.

Copy link
Owner

kovidgoyal commented May 21, 2019

What error is this? Since I am here, I might as well fix it. I tried converting a couple of RTF books with --debug-pipeline with no errors.

@eli-schwartz

This comment has been minimized.

Copy link
Contributor

eli-schwartz commented May 21, 2019

$ ebook-convert ./Harry\ Potter\ and\ the\ Goblet\ of\ Fire\ -\ J.\ K.\ Rowling.azw3 /tmp/roundtrip.rtf
[...]
$ ebook-convert /tmp/roundtrip.rtf /tmp/roundtrip.azw3 -d /tmp/roundtrip-pipeline
Conversion options changed from defaults:
  debug_pipeline: '/tmp/roundtrip-pipeline'
1% Converting input to HTML...
InputFormatPlugin: RTF Input running
on /tmp/roundtrip.rtf
Converting RTF to XML...
Running RTFParser in debug mode
Unable to parse RTF
Traceback (most recent call last):
  File "/home/eschwartz/git/calibre/src/calibre/ebooks/conversion/plugins/rtf_input.py", line 258, in convert
    xml = self.generate_xml(stream.name)
  File "/home/eschwartz/git/calibre/src/calibre/ebooks/conversion/plugins/rtf_input.py", line 115, in generate_xml
    parser.parse_rtf()
  File "/home/eschwartz/git/calibre/src/calibre/ebooks/rtf2xml/ParseRtf.py", line 319, in parse_rtf
    self.__bracket_match('make_preamble_divisions')
  File "/home/eschwartz/git/calibre/src/calibre/ebooks/rtf2xml/ParseRtf.py", line 556, in __bracket_match
    raise RtfInvalidCodeException(msg)
calibre.ebooks.rtf2xml.ParseRtf.RtfInvalidCodeException: closed bracket doesn't match, line 193 in file make_preamble_divisions


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/calibre-3/bin/ebook-convert", line 20, in <module>
    sys.exit(main())
  File "/home/eschwartz/git/calibre/src/calibre/ebooks/conversion/cli.py", line 400, in main
    plumber.run()
  File "/home/eschwartz/git/calibre/src/calibre/ebooks/conversion/plumber.py", line 1107, in run
    accelerators, tdir)
  File "/home/eschwartz/git/calibre/src/calibre/customize/conversion.py", line 246, in __call__
    log, accelerators)
  File "/home/eschwartz/git/calibre/src/calibre/ebooks/conversion/plugins/rtf_input.py", line 262, in convert
    'support. Convert it to HTML first and then try it.\n%s')%e)
ValueError: This RTF file has a feature calibre does not support. Convert it to HTML first and then try it.
closed bracket doesn't match, line 193 in file make_preamble_divisions

It should hopefully be easy to reproduce, since I can reproduce on multiple books if I roundtrip them through ebook-convert.

@kovidgoyal

This comment has been minimized.

Copy link
Owner

kovidgoyal commented May 21, 2019

Should be fine now.

@aimylios

This comment has been minimized.

Copy link
Contributor

aimylios commented Jun 21, 2019

One of the tests fails for me on a Python 3 build of Calibre (revision abd9e6b):

test_palmdoc_compression (calibre.ebooks.compression.palmdoc.find_tests.<locals>.Test) ... ERROR [0.0004 s]

======================================================================
ERROR: test_palmdoc_compression (calibre.ebooks.compression.palmdoc.find_tests.<locals>.Test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/aimylios/calibre-3.45.0/src/calibre/ebooks/compression/palmdoc.py", line 92, in test_palmdoc_compression
    self.assertEqual(py_compress_doc(test), x)
  File "/home/aimylios/calibre-3.45.0/src/calibre/ebooks/compression/palmdoc.py", line 51, in py_compress_doc
    och = ord(ch)
TypeError: ord() expected string of length 1, but int found

----------------------------------------------------------------------
Ran 316 tests in 44.733s

FAILED (errors=1, skipped=10)
get_categories: item Unknown is not in authors list!

I'm pretty sure all of the tests passed when I did the same exercise about a month ago. It might thus be related to the changes merged on 26th of May (f2e0181 and b842fe7).

@eli-schwartz

This comment has been minimized.

Copy link
Contributor

eli-schwartz commented Jun 21, 2019

In the second commit you reference, we started collecting a unittest for that function. It was probably always subtly broken -- in python3, a b'' string is a bytearray and indexing it returns a number, not a b'' string.

@miasma

This comment has been minimized.

Copy link

miasma commented Jul 14, 2019

Tried the Arch git build from AUR. The latest version fails to build with

test_prints_nothing_if_no_errors (calibre.db.cli.tests.PrintCheckLibraryResultsTest) ... ok [5e-05 s]

======================================================================
FAIL: test_bonjour (calibre.srv.tests.loop.LoopTest)
Test advertising via BonJour
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/makepkg/calibre-git/src/calibre/src/calibre/srv/tests/loop.py", line 129, in test_bonjour
    self.assertIsNotNone(info)
AssertionError: unexpectedly None

======================================================================
FAIL: test_searching (calibre.db.tests.reading.ReadingTest)
Test searching returns the same data for both backends
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/makepkg/calibre-git/src/calibre/src/calibre/db/tests/reading.py", line 338, in test_searching
    ans, nr, query))
AssertionError: Items in the second set but not the first:
2 : Old result: set([1]) != New result: set([1, 2]) for search: date:9/6/2011

----------------------------------------------------------------------
Ran 316 tests in 41.428s

FAILED (failures=2, skipped=9)

python-zeroconf is installed at least.

@eli-schwartz

This comment has been minimized.

Copy link
Contributor

eli-schwartz commented Jul 14, 2019

The package has dependencies on python-zeroconf so that should always be available anyway. ;)

On the topic of your errors, I do not get either error. However, note that the first error will unpredictably occur on both python2 and python3, so I usually just try rerunning the testsuite and it often succeeds again.

Not sure why the second error would happen.

@eli-schwartz

This comment has been minimized.

Copy link
Contributor

eli-schwartz commented Jul 14, 2019

FWIW, I did update the binary repository mentioned in the calibre-git AUR package details. So you can get the python3 build as a prebuilt package.

@eli-schwartz

This comment has been minimized.

Copy link
Contributor

eli-schwartz commented Jul 14, 2019

@kovidgoyal I wonder if you've noticed a curious oddity that I am noticing? Seems like when running the python2 and python3 tests in quick succession, regardless of how often test_bonjour/test_websocket_perf flakes out on python2, it seems to be much more reliably succeeding on python3.

I just ran the tests 5 times in a row, and on python2 it succeeded once, failed 4 times. On python3, it succeeded all 5 times.

EDIT: in 20 runs, python3 failed 3 times, all three with an entirely different error:

======================================================================
FAIL: test_http_basic (calibre.srv.tests.http.TestHTTP)
Test basic HTTP protocol conformance
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/build/calibre-git/src/calibre/src/calibre/srv/tests/http.py", line 286, in test_http_basic
    self.ae(r.read(), ('%d' % i).encode('ascii'))
AssertionError: b'3' != b'1'

----------------------------------------------------------------------
Ran 316 tests in 38.083s

FAILED (failures=1, skipped=10)
@kovidgoyal

This comment has been minimized.

Copy link
Owner

kovidgoyal commented Jul 15, 2019

The bonjour test is utterly reliable for me on both pythons. The http_basic python3 test failure I do see from time to time, have to investigate that as it indicates pielining is out of order which is a pretty bad bug.

@kovidgoyal

This comment has been minimized.

Copy link
Owner

kovidgoyal commented Jul 15, 2019

Actually given that pipelining is not really used by browsers or even
curl anymore, maybe best to just drop that test.

@eli-schwartz

This comment has been minimized.

Copy link
Contributor

eli-schwartz commented Sep 12, 2019

As of #1040 we are finally done porting unicode_check -- the setup.py helper which checks if all files correctly import __future__.unicode_literals.

So that should be one source of potential incompatibilities fixed. Also, all other future imports have been activated for the relevant touched files, especially division (which needed quite a few fixes along the way).

@eli-schwartz

This comment has been minimized.

Copy link
Contributor

eli-schwartz commented Sep 13, 2019

Now that #1042 is merged, I think all ebook format conversions should work. Or at least not crash obviously. (I roundtripped this format and tested it converted without tracebacks, and tried opening it with calibre itself, so that says something at least -- but who actually uses TCR anyway. ;))

@kovidgoyal

This comment has been minimized.

Copy link
Owner

kovidgoyal commented Sep 13, 2019

Yeah there are not a lot of low hanging fruit left. Now one just has to get betas out and have people test them and report bugs. And plugin developers use the betas to port their plugins. But it is going to have to wait until after the webengine migration is done and calibre 4 is released.

@eli-schwartz

This comment has been minimized.

Copy link
Contributor

eli-schwartz commented Nov 25, 2019

So... since calibre seems to be more or less running fairly stable so far...

With the calibre 4.4.0 release, I've uploaded a split package to the official archlinux [community] repository. calibre now depends on calibre-common, which contains all python-agnostic code, and calibre and calibre-python3 can now be installed together. Which one you use when you try to run the command line tools depends on the symlinks that were created by calibre-alternatives set [2|3].

While I anticipate that people will still want to use python2, simply because plugins are a very important part of the ecosystem, people now have the option to install the python3 version and try it out. Hopefully this will help us discover any more lingering bugs.

@kovidgoyal

This comment has been minimized.

Copy link
Owner

kovidgoyal commented Nov 25, 2019

Look at the py3 branch -- I am working on getting binary builds done
with python3. Once that is done, hopefully the plugin developers can
work on porting their plugins using those builds.

@eli-schwartz

This comment has been minimized.

Copy link
Contributor

eli-schwartz commented Nov 25, 2019

Looking forward to it. :)

eli-schwartz referenced this pull request Nov 25, 2019
@kovidgoyal

This comment has been minimized.

Copy link
Owner

kovidgoyal commented Dec 17, 2019

FYI binary builds are available as described here: https://www.mobileread.com/forums/showthread.php?t=325721

@escwartz: Feel free to post in that thread if you have tips for porting plugins to python 3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

8 participants
You can’t perform that action at this time.