Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3: add unicode/unichr wrappers to polyglot #941

Closed
wants to merge 2 commits into from

Conversation

eli-schwartz
Copy link
Contributor

@eli-schwartz eli-schwartz commented Mar 10, 2019

This prevents general errors when trying to use types that no longer exist in python3.

Not all code has been updated to use this, some things like src/tinycss/ and src/css_selectors/ already have small shims for this which I did not migrate to import from polyglot.

BeautifulSoup is something I left alone for now, because it seems like it would be complex to take care of and I was wondering if maybe it would make more sense, in the long run, to investigate the feasibility of migrating to beautifulsoup4.

Before I do src/odf, I was wondering if it would be feasible to push changes upstream to https://github.com/eea/odfpy (which currently supports python3 and seems to be getting somewhat active development) and then drop it from the calibre tree.

We cannot do the metadata sources, store plugins, and recipes until a released version of calibre has polyglot.builtins.unicode_type, so I've begun work on that here: eli-schwartz@642f454 and will submit a followup PR once a) there is a release, b) I finish up the recipes/ directory. This can probably be left alone for a number of release cycles to minimize the disruption...

@eli-schwartz
Copy link
Contributor Author

Test failures due to gettext having a function argument "unicode=" which, uh, should not be replaced with unicode_type.

OTOH it also seems to be python2-specific...

@kovidgoyal
Copy link
Owner

Regarding beautiful soup:

bs4 is API incompatible with bs3 which would mean it would break an unknown number of recipes. it should be trivial to port bs3. That is because in calibre bs3 is simply used as a tree API. All actual parsing is done by html5_parser. So we just need to make sure the tree API works on python3

I tried pushing the changes to odfpy upstream many years ago, got nowhere. Maybe things will be different now. If they can be mostly upstreamed I am happy to drop it from the calibre tree.

@eli-schwartz
Copy link
Contributor Author

What do you think about adding bs4 to the https://github.com/kovidgoyal/build-calibre base and porting official code and recipes over to it, then deprecating bs3 but retaining it for thirdparty recipes? If everything is just going through the tree API then hopefully this means recipes that use it directly as well as via calibre's own wrapper classes, should not be incompatible? Does bs4 generally have comparable updates to the handful of in-tree bs3 fixes?

I guess that isn't immediately relevant, since regarding the py3 porting work it needs to be ported either way...

@kovidgoyal
Copy link
Owner

Seems unneccessary work to me. Lets table bs3 for now. I will take a look at porting it when I have a moment. Given I managed to port mechanize, bs3 should be trivial in comparison. Especially since in calibre it is only used in unicode contexts already and not used for parsing at all. If I judge it too hard, then we can consider either switching to bs4 or maintaining dual APIs.

upstream bs4 has changed self.unicode to self.unicode_markup, but
calibre does not use UnicodeDammit. Leave this in its historic, horribly
confusing state, as it should not cause harm to have a class instance
attribute with the same name as a python2 object type.
@eli-schwartz
Copy link
Contributor Author

Well, the unicode wrappers are simple enough. In one case, the token "unicode" is not actually a type, but an attribute (renamed in bs4 for sanity), but we'll ignore that for now.

@kovidgoyal
Copy link
Owner

I have merged. Regarding the dynamically loaded parts (recipes/metadata plugins etc.) I prefer to not depend on poolyglot. Instead just replace unicode() with type(u'')()

I doubt any of those use unichr() but if they do, just define codepoint_as_chr in the file itself.

@kovidgoyal kovidgoyal closed this Mar 13, 2019
@eli-schwartz
Copy link
Contributor Author

Awesome!

There are no uses of unichr in those locations, one way or another. Using type(u'') sounds interesting but I am in no rush on that front.

@eli-schwartz
Copy link
Contributor Author

we can consider either switching to bs4

Thank you. :)

@kovidgoyal
Copy link
Owner

well I have switched, but I am sure there are going to be regressions. BS is a mess. The porting document it maintains is so incomplete it is laughable.

@eli-schwartz
Copy link
Contributor Author

Hmm, trying to install calibre results in:

Installing zsh completion to: /build/calibre-git/pkg/calibre-git/usr/share/zsh/site-functions/_calibre
Installing bash completion to: /build/calibre-git/pkg/calibre-git/usr/share/bash-completion/completions/calibre

____________________ WARNING ____________________
Setting up completion failed with error:
__________________________________________________
	Traceback (most recent call last):
	  File "/build/calibre-git/src/calibre/src/calibre/linux.py", line 744, in setup_completion
	    write_completion(bash_comp_dest, zsh)
	  File "/build/calibre-git/src/calibre/src/calibre/linux.py", line 510, in write_completion
	    from calibre.gui2.lrf_renderer.main import option_parser as lrfviewerop
	  File "/build/calibre-git/src/calibre/src/calibre/gui2/lrf_renderer/main.py", line 18, in <module>
	    from calibre.gui2.lrf_renderer.document import Document
	  File "/build/calibre-git/src/calibre/src/calibre/gui2/lrf_renderer/document.py", line 10, in <module>
	    from calibre.gui2.lrf_renderer.text import TextBlock, FontLoader, COLOR, PixmapItem
	  File "/build/calibre-git/src/calibre/src/calibre/gui2/lrf_renderer/text.py", line 161, in <module>
	    class TextBlock(object):
	  File "/build/calibre-git/src/calibre/src/calibre/gui2/lrf_renderer/text.py", line 167, in TextBlock
	    XML_ENTITIES = dict(zip(Tag.XML_SPECIAL_CHARS_TO_ENTITIES.values(), Tag.XML_SPECIAL_CHARS_TO_ENTITIES.keys()))
	AttributeError: type object 'Tag' has no attribute 'XML_SPECIAL_CHARS_TO_ENTITIES'

@kovidgoyal
Copy link
Owner

Should be fine now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants