Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python binding? #826

vsalvino opened this issue May 18, 2019 · 6 comments

Python binding? #826

vsalvino opened this issue May 18, 2019 · 6 comments


Copy link

This is not an issue, but a question. Has this project ever considered writing a Python module? Given that this is a C library, and Python does support writing "wrappers" around C libraries, it would be possible to create a Python interface for tidy. Combined with the newer Python wheel binary format, it would then be possible to distribute pre-compiled Python packages.

As an example, see Sass / libsass. It is a C library which has created their own Python module libsass-python. Pre-compiled binary wheels are then distributed on PyPI.

As a Python developer, I would be extremely interested in contributing or helping out if you consider doing this. Unfortunately I myself lack knowledge of the tidy-html5 C codebase.

@vsalvino vsalvino changed the title Native Python module? Python binding? May 18, 2019
Copy link

@vsalvino thank you for the question...

My simple answer would be I think a python binding for libTidy would be a great idea, for those using python...

I am surprised this issue has not come up before, but could not find any issue mentioning python... maybe I missed something... advise...

Searching in google for say python bindings for tidy found -

So it certainly seems there have been some efforts in the past...

Briefly looked at libsass-python, but again not being a python person, not sure what I am looking at...

I would certainly try to help in this, maybe in a separate htacg repo? or something... but I am only a C/C++ person, and some perl... can run python 2.7, and 3.4, in Windows 10, and 2.7, in my Ubuntu...

What knowledge do you need of the tidy-html5 C library?

It can be built as static or shared lib form... we are having trouble getting out releases, and thus even more trouble in distributions... read distributed, installable, binaries... so it is better if it can be built from the latest stable next branch...

Have you checked out the web site - - the API docs - - especially the sample use program... more samples in tidy-test... what do you need to know...

Has this answered your question... look forward to further feedback... thanks...

Copy link

Thanks for the reply Geoff. I also found those python libraries - the trouble is that they require html-tidy to already be installed, and they then invoke it similar to a command line interface. This is the easiest way; I would probably do something similar. But it makes it difficult to install, and non-portable as there are manual steps involved.

If you look at libsass-python, you will see they have a symlink called "libsass @ 8d220b7" which links directly to the libsass C++ codebase. In libsass-python there is also one C++ file that defines the translation between C++ and Python objects The rest of the codebase is then various Python code that provides wrappers, tooling, interface, etc.

This would definitely qualify as a separate project/repository. I am a Python person but have not written C in many years, and have never used C++, so I am a bit out of my territory. I will have to find others who are interested to help.

Copy link

@vsalvino thanks for the further feedback...

I looked again at libsass, particularly pysass.cpp, but maybe still miss the point... sorry...

What I can see is building a python library like say HTMLTidy, which contains all of libTidy services, such that you can script something like -

from HTMLTidy import HTMLTidy
tidy = HTMLTidy()
document, errors = tidy.tidy_document('<p>f&otilde;o <img src="bar.jpg">',
    options={'alt-text': 'baz','show-body-only': 1,'quiet': yes})

And the resultant output is -

line 1 column 15 - Warning: <img> inserting "alt" attribute using value "baz"
<p>fõo <img src="bar.jpg" alt="baz"></p>

Is that the idea?

And yes, may need others to help writting the glue...

Look forward to furhter developments... thanks...

Copy link

@geoffmcl you hit the nail on the head with your sample... that is exactly the idea.

Copy link

tbeu commented Jan 9, 2020

  • - last updated 2014... note tidy src is listed as, last released 25 March 2009... should still work with this github source... but could not find the pytidylib source, yet...

For me, pytidylib 0.3.2 does not work out with the mute option of HTML Tidy 5.6.0:

from tidylib import tidy_document
doc = '<img src="">'
options = {'mute-id': 1, 'mute': 'MISSING_ATTRIBUTE', 'doctype': 'omit', 'show-body-only': 1}
_, errors = tidy_document(doc, options=options)

The error in Python 2 or 3 is ValueError: (tidylib) Config: messages of type "MISSING_ATTRIBUTE" will not be output. The workaround is to not use the mute option and manually filter the messages based on the mute-ID. See for example:

Thus, one more reason to have a direct Python binding.

Copy link

I encourage a Python library, but this project is specifically for LibTidy.

If someone wants to come onboard "officially" to support Python bindings, Ruby bindings, Swift bindings, etc., then please let one of us know. We'll set up the repo here in HTACG, and give you the permissions needed.

I'll close this particular issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

No branches or pull requests

4 participants