throw error when meet non ascii #2229

guolinke · 2019-06-14T15:03:02Z

guolinke · 2019-06-17T01:53:50Z

StrikerRUS · 2019-06-17T11:58:40Z

Is feature_name the only model's field in which users' arbitrary strings are stored?

guolinke · 2019-06-17T13:02:00Z

@StrikerRUS there are some parameters with string type, e.g. data_filename, could be set by user and store in model file.

StrikerRUS · 2019-06-17T13:53:43Z

Maybe then add the same check for them too?

StrikerRUS

The check works fine with Python 3.

def test_non_ascii_names(self):
    X = np.random.random((100, 3))
    with np.testing.assert_raises_regex(lgb.basic.LightGBMError, "Do not support non-ascii*"): 
        lgb.Dataset(X, feature_name=["测试名称", "тестовая_колонка", "clé à bougie"]).construct()

But fails with Python 2.

string = '\xe6\xb5\x8b\xe8\xaf\x95\xe5\x90\x8d\xe7\xa7\xb0'

    def c_str(string):
        """Convert a Python string to C string."""
>       return ctypes.c_char_p(string.encode('utf-8'))
E       UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

So, I think it's not worth to play around with new tests for such simple check.

jameslamb · 2019-07-17T05:22:26Z

@StrikerRUS can this PR be merged?

StrikerRUS · 2019-07-17T12:26:14Z

@guolinke What do you think about enabling this check for other user's strings, like different paths as you've mentioned? #2229 (comment)

cc @jameslamb

guolinke · 2019-07-18T02:53:36Z

@StrikerRUS done

StrikerRUS

LGTM, thanks!

throw error when meet non ascii

4f26d50

guolinke requested a review from StrikerRUS June 17, 2019 01:53

StrikerRUS approved these changes Jun 17, 2019

View reviewed changes

check ascii for config strings.

e014de0

guolinke closed this Jul 18, 2019

guolinke reopened this Jul 18, 2019

Merge branch 'master' into guolinke-patch-2

f8265d4

StrikerRUS approved these changes Jul 18, 2019

View reviewed changes

guolinke merged commit 0d59859 into master Jul 18, 2019

StrikerRUS deleted the guolinke-patch-2 branch July 19, 2019 10:48

StrikerRUS mentioned this pull request Sep 25, 2019

plot_tree can't display feature names that contains Chinese character #2442

Closed

kidotaka mentioned this pull request Dec 10, 2019

Wrong size of feature_names #2226

Closed

lock bot locked as resolved and limited conversation to collaborators Mar 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

throw error when meet non ascii #2229

throw error when meet non ascii #2229

guolinke commented Jun 14, 2019

guolinke commented Jun 17, 2019

StrikerRUS commented Jun 17, 2019

guolinke commented Jun 17, 2019

StrikerRUS commented Jun 17, 2019

StrikerRUS left a comment

jameslamb commented Jul 17, 2019

StrikerRUS commented Jul 17, 2019

guolinke commented Jul 18, 2019

StrikerRUS left a comment

throw error when meet non ascii #2229

throw error when meet non ascii #2229

Conversation

guolinke commented Jun 14, 2019

guolinke commented Jun 17, 2019

StrikerRUS commented Jun 17, 2019

guolinke commented Jun 17, 2019

StrikerRUS commented Jun 17, 2019

StrikerRUS left a comment

Choose a reason for hiding this comment

jameslamb commented Jul 17, 2019

StrikerRUS commented Jul 17, 2019

guolinke commented Jul 18, 2019

StrikerRUS left a comment

Choose a reason for hiding this comment