New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix character corruption in /frame #3

Merged
merged 2 commits into from Feb 24, 2015

Conversation

Projects
None yet
2 participants
@vzvu3k6k
Collaborator

vzvu3k6k commented Feb 15, 2015

  • HTMLに<input value="&#x2713;">などの文字列が入っていると文字化けする
  • 元のページの文字コードがUTF-8以外で、metaタグなどに元の文字コードの指定があると文字化けする

以上の2点の問題を修正するプルリクエストです。

vzvu3k6k added some commits Dec 21, 2014

Avoid implicit encoding upgrading in _resolve
When /frame?url=... is given a URL of a page which is not encoded with
latin-1 and contains a character reference whose code point is greater
than 255 in an attribute, a server will fail to return a response or
give a broken HTML.

Example:
    <html>
      <head><meta charset="utf-8"></head>
      <body><input value="&#x2713;">ほげ</body>
    </html>

HTML::Parser, which is used in HTML::ResolveLink, decodes character
references in attributes by default. If its code point is greater than
255, "UTF8 flag" is on. In concatenation of a UTF8-flagged string and
non-UTF8-flagged one, Perl implicitly decodes non-UTF8-flagged one
assuming it is encoded with latin-1. If it is not encoded with latin-1,
the result of this implicit decoding (so-called upgrading) will be a
broken string.

And the concatenated string is flagged as UTF8. Since syswrite()
refuses UTF8-flagged string containing characters whose code point is
greater than 255, it causes various errors in returning a response.
Use the original encoding in /frame
If a page is originally encoded with other than UTF-8 and has
<meta charset="Shift_JIS"> or something, a /frame page would get garbled
by `Encode::encode('utf-8', $content)`.

onishi added a commit that referenced this pull request Feb 24, 2015

Merge pull request #3 from vzvu3k6k/frame-encoding
Fix character corruption in /frame

@onishi onishi merged commit 8aea9b5 into onishi:master Feb 24, 2015

@onishi

This comment has been minimized.

Show comment
Hide comment
@onishi

onishi Feb 24, 2015

Owner

🙆

Owner

onishi commented Feb 24, 2015

🙆

@vzvu3k6k vzvu3k6k deleted the vzvu3k6k:frame-encoding branch Feb 24, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment