-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't strip html, body, and head tags #37
Conversation
Alright, @jekyll/ecosystem, I could use some smart thinking here. The problem in jekyll/jekyll#4648, is that HTML Pipeline expects an HTML fragment, but the # This is a horrible hack, but I don't care
if tags.strip =~ /^<body/i
path = "/html/body"
else
path = "/html/body/node()"
end That means, that HTML documents (e.g., and page or doc with a layout), we need to parse the document ourself and pass the body fragment to pipeline, emojify it, and then swap it out for the body in the already-parsed document. Sounds simple, right? Wrong. If you've ever used Nokigiri, you know it loves to do two things:
So that leaves us with a few options:
Thoughts? I'm going to go ahead and downgrade GItHub Pages in the interim, to give us the time to find the right solution here. |
@benbalter I don't see the problem with using Nokogiri in a plugin. |
@benbalter did you try changing this to a |
Fixes #36, too. |
if doc.output =~ /<body/ | ||
parsed_doc = Nokogiri::HTML::Document.parse(doc.output) | ||
body = parsed_doc.at_css('body') | ||
body.replace filter_with_emoji(src).call(body.to_html)[:output] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this .to_html
– from what I could see in the docs, it can operate on the fragment, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're going to have to pass it as a string... body
at that point is a Nokogiri::XML::Element
, not a document, and thus has no parent
method (and errors out on has_ancestor?
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Tested with the jekyllrb.com
site and it strips the <body>
tag classes. Is there a way to preserve those?
AFAIK, that would be Markdown at that point, not HTML, meaning we couldn't parse to determine if a node was inside |
We use Kramdown, why don't get get a bit clever with that and use it's tokenizer? That just randomly dawned on me because recently we did a project where we abused Kramdown to get it to tokenize Markdown for us so we could alter it. Actually, we use Kramdown! You could even just build a plugin for it and mark this as a Kramdown only plugin and thus make it easy for all parties to use? |
@benbalter What do you think about cdb9cca? It skirts around the issue of loss of |
. We'll also likely need to port the changes to @mentions as well. |
@jekyllbot: merge |
Fixes (half of) jekyll/jekyll#4648.