-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression since ruby 2.4 - unknown encoding name - bom|utf-8
#23
Comments
unknown encoding name - bom|utf-8
Is it a good practice to post bugs here or better post in on https://bugs.ruby-lang.org/ ? |
I can confirm this issue. It looks like the problem started in this commit: d4dd5d1
On current master, the relevant section looks like this: # honor the IO encoding if we can, otherwise default to ASCII-8BIT
internal_encoding = Encoding.find(internal_encoding) if internal_encoding
external_encoding = Encoding.find(external_encoding) if external_encoding
if encoding
encoding, = encoding.split(":", 2) if encoding.is_a?(String)
encoding = Encoding.find(encoding)
end
@encoding = raw_encoding(nil) || internal_encoding || encoding ||
Encoding.default_internal || Encoding.default_external Beyond the issue that @ShockwaveNN raised, I also noticed we aren't using the |
@kou, @hsbt: def raw_encoding(default = Encoding::ASCII_8BIT)
if @io.respond_to? :internal_encoding
@io.internal_encoding || @io.external_encoding
elsif @io.is_a? StringIO
@io.string.encoding
elsif @io.respond_to? :encoding
@io.encoding
else
default
end
end To me, it looks like BTW, neither Ruby 2.4.3 or 2.5.0 recognize irb(main):001:0> RUBY_VERSION
=> "2.4.3"
irb(main):002:0> Encoding.find("bom|utf-8")
ArgumentError: unknown encoding name - bom|utf-8
from (irb):2:in `find'
from (irb):2
from /Users/steven/.rbenv/versions/2.4.3/bin/irb:11:in `<main>' It doesn't look like we use the |
This fixes ruby#23 and makes the following modification: + @encoding can be set to external_encoding. Previously, it ignored external_encoding. To me, it looks like the encoding option is usually ignored. `CSV.read|CSV.open` will use the encoding options provided by a user if: 1. the user passes in an `IO`-like object where `respond_to?(:internal_encoding) == true && (io.internal_encoding || io.external_encoding).nil?` 2. the user passes in a `StringIO`-like object where the string has no encoding. 3. the user passes in an `IO`-like object where`respond_to?(:encoding) == true && io.encoding.nil?` As far as I know, the above situations can only occur if the user is passing in a custom IO object that doesn't have encoding information. This analysis is based off of the `raw_encoding` method ```ruby def raw_encoding(default = Encoding::ASCII_8BIT) if @io.respond_to? :internal_encoding @io.internal_encoding || @io.external_encoding elsif @io.is_a? StringIO @io.string.encoding elsif @io.respond_to? :encoding @io.encoding else default end end ```
@ShockwaveNN Thanks for your report. I've fixed it. |
OK. |
Of course! I sent a PR for |
Great! |
Issue with CSV library not liking the encoding option: ruby/csv#23 Bump to use newer version of CSV and not the version that ships with jruby
I am still having issues trying to read a CSV string that has a byte order mark prepended to it: require 'csv'
puts "CSV VERSION #{CSV::VERSION}" # Shows 3.0.0
bom_character = 65_279
contents = "first_name\nRyan".codepoints.unshift(bom_character).pack("U*")
csv = CSV.parse(contents, headers: true, encoding: 'bom|utf-8')
csv.each do |row|
p row.to_h.keys.first.codepoints
p "ROW FIRST NAME IS #{row["first_name"]}"
end This outputs |
BOM is for opening a file not parse target string.
You should not reuse closed issue. You should open a new issue. |
Hi there I update my app to ruby-2.5 and have some problems with sending encoding
bom|utf-8
to csv parser.I create a sample Dockerfiles for this:
With
ruby-2.4.3
everything working fine:output is
"test"
But for ruby-2.5.0 encoding error is happend
Error is:
The text was updated successfully, but these errors were encountered: