Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing to write HTML fragments instead of only whole documents. #2546

Merged
merged 6 commits into from Oct 26, 2016

Conversation

@felixfontein
Copy link
Contributor

@felixfontein felixfontein commented Oct 23, 2016

Test implementation for the dicussion at the bottom of #2544.

@@ -16,6 +16,8 @@ Features
translated (Issue #2116)
* Pass ``post`` object and ``lang`` to post compilers (Issue #2531)
* Pass ``url_type`` into template's context.
* ``render_template`` and ``generic_renderer`` can now create HTML
Copy link
Member

@Kwpolska Kwpolska Oct 23, 2016

PS. new changes go at the top.

Copy link
Contributor Author

@felixfontein felixfontein Oct 23, 2016

Should I also move the url_type part, then?

doc = lxml.html.document_fromstring(data, parser)
self.rewrite_links(doc, src, context['lang'], url_type, is_fragment=is_fragment)
if is_fragment:
data = (doc.text or '').encode('utf-8') + ''.encode('utf-8').join([lxml.html.tostring(child, encoding='utf-8', method='html') for child in doc.iterchildren()])
Copy link
Member

@Kwpolska Kwpolska Oct 23, 2016

TIL lxml has awful trees.

How would the solution that adds the extra div tag look? I don’t think we would really mind one little <div> there. Or worst case scenario, if we know the <div> is guaranteed to always be there, we could slice it out.

Copy link
Contributor Author

@felixfontein felixfontein Oct 23, 2016

The extra <div></div> is produced by lxml.html.tostring(doc, encoding='utf-8', method='html').

I don't really like cutting stuff out via text replacements.

"""Replace links in document to point to the right places."""
# First let lxml replace most of them
doc.rewrite_links(lambda dst: self.url_replacer(src, dst, lang, url_type), resolve_base_href=False)

# lxml ignores srcset in img and source elements, so do that by hand
objs = list(doc.xpath('(*//img|*//source)'))
objs = list(doc.xpath('({0}//img|{0}//source)'.format('' if is_fragment else '*')))
Copy link
Member

@Kwpolska Kwpolska Oct 23, 2016

Please rewrite this as a 4-line if/else tree instead of abusing str.format. We should also find out what that asterisk changes (can non-fragment code work without it? Does it reach all <img> and <source> elements if the asterisk is not present?

Copy link
Contributor Author

@felixfontein felixfontein Oct 23, 2016

It doesn't reach top-level <img> tags with the asterisk there. I think (I don't know the xpath syntax well enough to be sure) that the asterisk eats up a top level element, and then it looks for children (or children-of-children etc.) of that top element.

Copy link
Member

@Kwpolska Kwpolska Oct 23, 2016

What about no asterisk and full document? (Please rewrite in 4 lines anyway) done

Copy link
Contributor Author

@felixfontein felixfontein Oct 23, 2016

I think that should work as well, since there are no top-level <img> or <source> elements which have to be handled differently. I've changed it accordingly, which removes the whole if/else.

Copy link
Contributor Author

@felixfontein felixfontein Oct 23, 2016

(The original syntax was introduced in 24a45e1.)

self.rewrite_links(doc, src, context['lang'], url_type)
data = b'<!DOCTYPE html>\n' + lxml.html.tostring(doc, encoding='utf8', method='html', pretty_print=True)
if is_fragment:
data = (doc.text or '').encode('utf-8') + ''.encode('utf-8').join([lxml.html.tostring(child, encoding='utf-8', method='html') for child in doc.iterchildren()])
Copy link
Member

@ralsina ralsina Oct 24, 2016

I don't understand what this does. Doesn't it end up producing 2 copies of the doc's text?

Copy link
Member

@Kwpolska Kwpolska Oct 24, 2016

doc.text is all the text in doc that appears before a HTML element. Text that appears after HTML elements is taken care of by tostring(). (lxml is weird.)

Copy link
Contributor Author

@felixfontein felixfontein Oct 25, 2016

I added some comments with a similar explanation, hope it makes it clearer.

@ralsina
Copy link
Member

@ralsina ralsina commented Oct 25, 2016

Ok then!

El lun., 24 oct. 2016 16:17, Chris Warrick notifications@github.com
escribió:

@Kwpolska commented on this pull request.

In nikola/nikola.py #2546:

     self.rewrite_links(doc, src, context['lang'], url_type)
  •    data = b'<!DOCTYPE html>\n' + lxml.html.tostring(doc, encoding='utf8', method='html', pretty_print=True)
    
  •    if is_fragment:
    
  •        data = (doc.text or '').encode('utf-8') + ''.encode('utf-8').join([lxml.html.tostring(child, encoding='utf-8', method='html') for child in doc.iterchildren()])
    

doc.text is all the text in doc that appears before a HTML element. Text
that appears after HTML elements is taken care of by tostring(). (lxml is
weird.)


You are receiving this because you commented.

Reply to this email directly, view it on GitHub
#2546, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAAGKxTwjY0Lt6_yVDjuoJ4sIwRpFP0Tks5q3L3egaJpZM4KeM0s
.

@felixfontein
Copy link
Contributor Author

@felixfontein felixfontein commented Oct 25, 2016

One more thing I had to try out after seeing that lxml.html.tostring has a doctype argument. :)

@felixfontein
Copy link
Contributor Author

@felixfontein felixfontein commented Oct 25, 2016

The one test fails because there were some connection problems to pypi while installing doit. All other tests had no problems.

@Kwpolska
Copy link
Member

@Kwpolska Kwpolska commented Oct 26, 2016

Those tests are pretty worthless anyways.

@felixfontein
Copy link
Contributor Author

@felixfontein felixfontein commented Oct 26, 2016

Ok. And thanks for the approval. Anyone mind if I merge?

@Kwpolska
Copy link
Member

@Kwpolska Kwpolska commented Oct 26, 2016

Go ahead.

@felixfontein felixfontein merged commit 9a09d5d into master Oct 26, 2016
3 of 4 checks passed
3 of 4 checks passed
continuous-integration/appveyor/pr AppVeyor build failed
Details
continuous-integration/appveyor/branch AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
@felixfontein felixfontein deleted the allow-html-fragment-output branch Oct 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants