forked from alastair/python-musicbrainzngs
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
force unicode strings on field input
Adds util.unicode(string) (in new util module) This gets the preferred encoding and tries to decode the string. It is safe to pass in numbers or unicode objects. The result will still be unicode. Decoding errors are ignored, the corresponding characters are skipped. Hopefully Lucene will give some results when some chars are missing. Since we have all strings in unicode now, we don't need the unicode literals u'...' anymore in _do_mb_search. (tested) This might help supporting Python3. Signed-off-by: Johannes Dewender <github@JonnyJD.net>
- Loading branch information
Showing
2 changed files
with
33 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# This file is part of the musicbrainzngs library | ||
# Copyright (C) Alastair Porter, Adrian Sampson, and others | ||
# This file is distributed under a BSD-2-Clause type license. | ||
# See the COPYING file for more information. | ||
|
||
import sys | ||
import locale | ||
|
||
def _unicode(string, encoding=None): | ||
"""Try to decode byte strings to unicode. | ||
This can only be a guess, but this might be better than failing. | ||
It is safe to use this on numbers or strings that are already unicode. | ||
""" | ||
if isinstance(string, str): | ||
# use given encoding, stdin, preferred until something != None is found | ||
if encoding is None: | ||
encoding = sys.stdin.encoding | ||
if encoding is None: | ||
encoding = locale.getpreferredencoding() | ||
unicode_string = unicode(string, encoding, "ignore") | ||
else: | ||
unicode_string = unicode(string) | ||
return unicode_string.replace('\x00', '').strip() |