New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allowing to write HTML fragments instead of only whole documents. #2546
Conversation
@@ -16,6 +16,8 @@ Features | |||
translated (Issue #2116) | |||
* Pass ``post`` object and ``lang`` to post compilers (Issue #2531) | |||
* Pass ``url_type`` into template's context. | |||
* ``render_template`` and ``generic_renderer`` can now create HTML |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS. new changes go at the top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I also move the url_type
part, then?
doc = lxml.html.document_fromstring(data, parser) | ||
self.rewrite_links(doc, src, context['lang'], url_type, is_fragment=is_fragment) | ||
if is_fragment: | ||
data = (doc.text or '').encode('utf-8') + ''.encode('utf-8').join([lxml.html.tostring(child, encoding='utf-8', method='html') for child in doc.iterchildren()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL lxml has awful trees.
How would the solution that adds the extra div tag look? I don’t think we would really mind one little <div>
there. Or worst case scenario, if we know the <div>
is guaranteed to always be there, we could slice it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extra <div></div>
is produced by lxml.html.tostring(doc, encoding='utf-8', method='html')
.
I don't really like cutting stuff out via text replacements.
"""Replace links in document to point to the right places.""" | ||
# First let lxml replace most of them | ||
doc.rewrite_links(lambda dst: self.url_replacer(src, dst, lang, url_type), resolve_base_href=False) | ||
|
||
# lxml ignores srcset in img and source elements, so do that by hand | ||
objs = list(doc.xpath('(*//img|*//source)')) | ||
objs = list(doc.xpath('({0}//img|{0}//source)'.format('' if is_fragment else '*'))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rewrite this as a 4-line if/else
tree instead of abusing str.format. We should also find out what that asterisk changes (can non-fragment code work without it? Does it reach all <img>
and <source>
elements if the asterisk is not present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't reach top-level <img>
tags with the asterisk there. I think (I don't know the xpath syntax well enough to be sure) that the asterisk eats up a top level element, and then it looks for children (or children-of-children etc.) of that top element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about no asterisk and full document? (Please rewrite in 4 lines anyway) done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that should work as well, since there are no top-level <img>
or <source>
elements which have to be handled differently. I've changed it accordingly, which removes the whole if
/else
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The original syntax was introduced in 24a45e1.)
self.rewrite_links(doc, src, context['lang'], url_type) | ||
data = b'<!DOCTYPE html>\n' + lxml.html.tostring(doc, encoding='utf8', method='html', pretty_print=True) | ||
if is_fragment: | ||
data = (doc.text or '').encode('utf-8') + ''.encode('utf-8').join([lxml.html.tostring(child, encoding='utf-8', method='html') for child in doc.iterchildren()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this does. Doesn't it end up producing 2 copies of the doc's text?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc.text
is all the text in doc
that appears before a HTML element. Text that appears after HTML elements is taken care of by tostring()
. (lxml is weird.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments with a similar explanation, hope it makes it clearer.
Ok then! El lun., 24 oct. 2016 16:17, Chris Warrick notifications@github.com
|
One more thing I had to try out after seeing that |
The one test fails because there were some connection problems to pypi while installing doit. All other tests had no problems. |
Those tests are pretty worthless anyways. |
Ok. And thanks for the approval. Anyone mind if I merge? |
Go ahead. |
Test implementation for the dicussion at the bottom of #2544.