-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
forward error error=#<Encoding::UndefinedConversionError: "\xE6" from ASCII-8BIT to UTF-8> error_class=Encoding::UndefinedConversionError #31
Comments
Could you paste what you did, with fluentd's configuration? And do you have stack trace for that error? |
i have the same problem. i will show the how the problem occurs.
each line of log in the 'tail' file(test_log.json in the demo) is a json (utf-8 encoded) like:
here is the stack trace:
additional: ruby 2.1.5, td-agent 2.2.x while i change the output from webhdfs to kafka, the problem doesn't occur, and i can read the correct data from kafka. I think perhaps it is the webhdfs output plugin that leads to it. i read the source code a moment ago, and i have nearly no knowledge about ruby, but personally, i was wondering if msgpack needed here? if i was wrong, forgive my ignorance. Looking forward to you apply! |
Your data contains invalid character for UTF-8 (JSON requires valid utf-8 chars). |
Thanks for replying, but, i can guarantee my data is valid UTF-8 chars. I can correctly output my data into a file with out_file, kafka with out_kafka(_buffered), but not hdfs with out_webhdfs. I compared the source code between out_file.rb and out_webhdfs.rb, and I found the difference: Plugin.new_formatter(@format) out_hdfs.rb include Fluent::Mixin::PlainTextFormatter The two formatter is different and the error is appeared in exactly in Fluent::Mixin::PlainTextFormatter. sincerely |
I know about Fluent::Mixin::PlainTextFormatter because it's also my product... |
I replaced "JSON module of ruby" in the PlainTextFormatter with Yajl, and it works well record.to_json replaced by Yajl.dump(record) I am wondering why you've chosen "JSON module of ruby" instead of "Yajl"? Anyway, it works well now, thanks for your help! sincerely |
I'm also having this issue. It took me a while to figure out, but I have some raw logs that are getting escaped with '\xAE' It would seem that \xAE doesn't get padded or interpreted as \u00AE for some reason. Any one byte character code above 7F seems to be failing to write at the "to_json" conversion. Possibly because it expects a padded/wide character, and it's given a short. I get a "warn" in the logs relating to JSON::GeneratorError and it doesn't emit the record. Because x00-xFF are identically mapped to U+0000-U+00FF, why not allow them as valid unicode characters as many others do? I guess this is a bug in the Ruby JSON package then. I'm looking into making the above edit, but at this point it may be faster for me to deploy kafka to my cluster and use that instead. I hope this can be resolved in future td-agent package releases. Where I'm using the rpm, I seem to be locked into this bug. Is there a reason you don't use the same Yajl record writer as other plugins? |
telnet to server and send data
error info
The text was updated successfully, but these errors were encountered: