New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

relativize_paths silently, incorrectly transforms `gcse:search` #1319

Closed
sirosen opened this Issue Feb 20, 2018 · 5 comments

Comments

Projects
None yet
2 participants
@sirosen

sirosen commented Feb 20, 2018

relativize_paths silently transforms gcse:search

Steps to reproduce

  1. Create an html file with Google's GCSE search snippet
    This includes an element for the Google JS to manipulate, <gcse:search></gcse:search>
  2. Run nanoc with rules to apply relativize_paths on that html file, with type: html
  3. Observe output will read <search></search>

Expected behavior

I think that nanoc should not be transforming that element at all. However, maybe there's some subtlety I'm missing about strict HTML specification.

Either of the below would be also acceptable:

  • nanoc should print a warning that the element is transformed -- it can surprise people, after all
  • nanoc should crashfail the build on the element

Actual behavior

Instead, nanoc succeeds, but changes the element.

Details

I'm surprised to be the first person reporting this -- maybe not that many people are using GCSE, or using relativize_paths, or maybe I'm doing something else wrong.

@ddfreyne

This comment has been minimized.

Show comment
Hide comment
@ddfreyne

ddfreyne Feb 21, 2018

Member

I can confirm this is a problem!

This appears to be problematic behavior with Nokogiri/libxml for parsing HTML… when reassembling the content, the namespaces are somehow left out.

Member

ddfreyne commented Feb 21, 2018

I can confirm this is a problem!

This appears to be problematic behavior with Nokogiri/libxml for parsing HTML… when reassembling the content, the namespaces are somehow left out.

@ddfreyne

This comment has been minimized.

Show comment
Hide comment
@ddfreyne

ddfreyne Feb 21, 2018

Member

This appears to be a common problem without a proper solution (see e.g. jekyll/jemoji#43).

Suggestions for workarounds that I’ve found are along the form of

<div id='gcse'></div>
<script>
  $('#gcse').html("<gcse:search></gcse:search>");
</script>

… which is pretty icky.

This might not be fixable within Nanoc itself, unfortunately, but perhaps Nanoc can offer an automatic workaround.

Member

ddfreyne commented Feb 21, 2018

This appears to be a common problem without a proper solution (see e.g. jekyll/jemoji#43).

Suggestions for workarounds that I’ve found are along the form of

<div id='gcse'></div>
<script>
  $('#gcse').html("<gcse:search></gcse:search>");
</script>

… which is pretty icky.

This might not be fixable within Nanoc itself, unfortunately, but perhaps Nanoc can offer an automatic workaround.

@ddfreyne

This comment has been minimized.

Show comment
Hide comment
@ddfreyne

ddfreyne Feb 21, 2018

Member

Icky workaround in #1320.

Member

ddfreyne commented Feb 21, 2018

Icky workaround in #1320.

@sirosen

This comment has been minimized.

Show comment
Hide comment
@sirosen

sirosen Feb 21, 2018

Thanks for the really quick detective work on this.
There's been an available patch to fix libxml sitting unattended for a few years now, so this might get fixed someday... but probably not soon.

Given that GCSE is all JS-based, it will never work when blocked by NoScript, etc.
In that light, the workaround you've noted doesn't have any significant practical faults.

This is all a matter of where the grossness exists, not whether or not it exists.

If you decide that you dislike #1320 very strongly, I think it would be satisfactory to detect gcse:search in content prior to applying filters and print a warning with a link:

Warning!
  Nanoc relies on libxml to parse HTML. This is known to cause issues with GCSE.
  Recommended workaround: https://github.com/nanoc/nanoc/issues/1319#issuecomment-367234453

sirosen commented Feb 21, 2018

Thanks for the really quick detective work on this.
There's been an available patch to fix libxml sitting unattended for a few years now, so this might get fixed someday... but probably not soon.

Given that GCSE is all JS-based, it will never work when blocked by NoScript, etc.
In that light, the workaround you've noted doesn't have any significant practical faults.

This is all a matter of where the grossness exists, not whether or not it exists.

If you decide that you dislike #1320 very strongly, I think it would be satisfactory to detect gcse:search in content prior to applying filters and print a warning with a link:

Warning!
  Nanoc relies on libxml to parse HTML. This is known to cause issues with GCSE.
  Recommended workaround: https://github.com/nanoc/nanoc/issues/1319#issuecomment-367234453
@ddfreyne

This comment has been minimized.

Show comment
Hide comment
@ddfreyne

ddfreyne Feb 22, 2018

Member

Sucks to hear about the libxml patch that has been lying around for so long!

My main problem with #1320 is that it only applies to gcse:search, not g:plusone, fb:like, …. The only way to solve this problem in general is to fix the HTML parser, or use a different one.

It might be worth looking into using Oga, for instance.

Member

ddfreyne commented Feb 22, 2018

Sucks to hear about the libxml patch that has been lying around for so long!

My main problem with #1320 is that it only applies to gcse:search, not g:plusone, fb:like, …. The only way to solve this problem in general is to fix the HTML parser, or use a different one.

It might be worth looking into using Oga, for instance.

@ddfreyne ddfreyne closed this in #1320 Feb 22, 2018

ddfreyne added a commit that referenced this issue Feb 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment