New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
throw error when meet non ascii #2229
Conversation
ping @StrikerRUS for review |
Is |
@StrikerRUS there are some parameters with string type, e.g. data_filename, could be set by user and store in model file. |
Maybe then add the same check for them too? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check works fine with Python 3.
def test_non_ascii_names(self):
X = np.random.random((100, 3))
with np.testing.assert_raises_regex(lgb.basic.LightGBMError, "Do not support non-ascii*"):
lgb.Dataset(X, feature_name=["测试名称", "тестовая_колонка", "clé à bougie"]).construct()
But fails with Python 2.
string = '\xe6\xb5\x8b\xe8\xaf\x95\xe5\x90\x8d\xe7\xa7\xb0'
def c_str(string):
"""Convert a Python string to C string."""
> return ctypes.c_char_p(string.encode('utf-8'))
E UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)
So, I think it's not worth to play around with new tests for such simple check.
@StrikerRUS can this PR be merged? |
@guolinke What do you think about enabling this check for other user's strings, like different paths as you've mentioned? #2229 (comment) cc @jameslamb |
@StrikerRUS done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
to fix #2226