Permalink
Browse files

Finalize fix of troubles with parsing of some unicode html.

Only problem was to use StringIO instead of cStringIO beacuse it does not support unicode strings.
Except this well hidden bug was previous solution correct.

Revert "Revert "Fix error in encoding which occured on discovery of some unicode pages.""

This reverts commit 2b5235a.
  • Loading branch information...
1 parent 2daf147 commit 08382e503aafa56cd703f456719876e266d082a8 @ziima ziima committed Dec 10, 2010
Showing with 14 additions and 2 deletions.
  1. +14 −2 openid/yadis/discover.py
View
@@ -1,7 +1,7 @@
# -*- test-case-name: openid.test.test_yadis_discover -*-
__all__ = ['discover', 'DiscoveryResult', 'DiscoveryFailure']
-from cStringIO import StringIO
+from StringIO import StringIO
from openid import fetchers
@@ -126,8 +126,20 @@ def whereIsYadis(resp):
# XXX: do we want to do something with content-type, like
# have a whitelist or a blacklist (for detecting that it's
# HTML)?
+
+ # decode body by encoding of file
+ encoding = content_type.rsplit(';', 1)
+ if len(encoding) == 2 and encoding[1].strip().startswith('charset='):
+ encoding = encoding[1].split('=', 1)[1]
+ else:
+ encoding = 'UTF-8'
+ try:
+ content = resp.body.decode(encoding)
+ except UnicodeError:
+ content = resp.body
+
try:
- yadis_loc = findHTMLMeta(StringIO(resp.body))
+ yadis_loc = findHTMLMeta(StringIO(content))
except MetaNotFound:
pass

0 comments on commit 08382e5

Please sign in to comment.