Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crack::JSON is not parsing UTF-8 correctly #48

Open
alhafoudh opened this issue Oct 27, 2013 · 1 comment
Open

Crack::JSON is not parsing UTF-8 correctly #48

alhafoudh opened this issue Oct 27, 2013 · 1 comment

Comments

@alhafoudh
Copy link

Hi,
I have found out, that UTF-8 string parsing is not working correctly.

Sample input:

{"winstrom":{"widget":[{"name":"John Ďoe","age":"3.14"}]}}

I get this:

{"winstrom"=>{"widget"=>[{"name"=>"John Ďoe", " age"=>" 3.14"}]}}
                                               ^       ^

This fixes the problem

https://github.com/jnunemaker/crack/blob/master/lib/crack/json.rb#L46

# changing this
scanner, quoting, marks, pos, date_starts, date_ends = StringScanner.new(json), false, [], nil, [], []

# to this
scanner, quoting, marks, pos, date_starts, date_ends = StringScanner.new(json.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')), false, [], nil, [], []

Info found here:

I am not sure if this is a right solution to this problem. It looks like ruby StringScanner does not do well with UTF-8 strings.

Both gems crack and WebMock have this problem since WebMock uses stripped down version of crack's code.

@showlovel
Copy link

Not well with Chinese characters too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants