Skip to content

Commit

Permalink
ARROW-6813: [Ruby] Arrow::Table.load with headers=true leads to excep…
Browse files Browse the repository at this point in the history
…tion in Arrow 0.15

After reviewing the commentary on apache#5206 , and examining the new read options, I've concluded that supporting a boolean `:headers` option at this level is not a good idea.  In particular, all the interpretations of `headers: false` that I could think of would be surprising to a typical Ruby CSV programmer (does it mean `:autogenerate_column_names`? are you required to add `:column_names` instead?).

So I've just made the exception clearer.  It would be great if there were a way to pull the [documentation of the options](https://github.com/apache/arrow/blob/02d1e9736808d9a9624bef5577c880d8c165e853/python/pyarrow/_csv.pyx#L42) over from the `.py` files into the Ruby doc.

Closes apache#5609 from cobbr2/bug/ARROW-6813-ruby-headers-option-fail and squashes the following commits:

c941b10 <Rick Cobb> Retry the docker-compose build now that I see it succeed for anybody
0d67b65 <Rick Cobb> Retry the docker-compose build, can't find button in Github UX
0134d5e <Rick Cobb> simplify headers handling
2dedfad <Rick Cobb> and now with a headers: string test
7655138 <Rick Cobb> Merge work that makes headers work more like CSV
1215bfe <Rick Cobb> truthy headers should work like CSV
c7fe77f <Rick Cobb> Make `headers:` compatible with Ruby's CSV.new
a4934b6 <Rick Cobb> Should never have committed Gemfile.lock
4af0111 <Rick Cobb> Make a better error message than NoMethodError
0341652 <Rick Cobb> WIP: is this the correct test... and will skip_rows work?

Authored-by: Rick Cobb <rick@grandrounds.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
  • Loading branch information
cobbr2 authored and kszucs committed Oct 21, 2019
1 parent 4569f97 commit 564f468
Show file tree
Hide file tree
Showing 2 changed files with 85 additions and 3 deletions.
13 changes: 10 additions & 3 deletions ruby/red-arrow/lib/arrow/csv-loader.rb
Expand Up @@ -93,10 +93,17 @@ def reader_options
@options.each do |key, value|
case key
when :headers
if value
options.n_header_rows = 1
case value
when ::Array
options.column_names = value
when String
return nil
else
options.n_header_rows = 0
if value
options.generate_column_names = false
else
options.generate_column_names = true
end
end
when :column_types
value.each do |name, type|
Expand Down
75 changes: 75 additions & 0 deletions ruby/red-arrow/test/test-csv-loader.rb
Expand Up @@ -121,6 +121,81 @@ def load_csv(data, options)
Arrow::CSVLoader.load(data, options)
end

sub_test_case(":headers") do
test("true") do
values = Arrow::StringArray.new(["a", "b", "c"])
assert_equal(Arrow::Table.new(value: values),
load_csv(<<-CSV, headers: true))
value
a
b
c
CSV
end

test(":first_line") do
values = Arrow::StringArray.new(["a", "b", "c"])
assert_equal(Arrow::Table.new(value: values),
load_csv(<<-CSV, headers: :first_line))
value
a
b
c
CSV
end

test("truthy") do
values = Arrow::StringArray.new(["a", "b", "c"])
assert_equal(Arrow::Table.new(value: values),
load_csv(<<-CSV, headers: 0))
value
a
b
c
CSV
end

test("Array of column names") do
values = Arrow::StringArray.new(["a", "b", "c"])
assert_equal(Arrow::Table.new(column: values),
load_csv(<<-CSV, headers: ["column"]))
a
b
c
CSV
end

test("false") do
values = Arrow::StringArray.new(["a", "b", "c"])
assert_equal(Arrow::Table.new(f0: values),
load_csv(<<-CSV, headers: false))
a
b
c
CSV
end

test("nil") do
values = Arrow::StringArray.new(["a", "b", "c"])
assert_equal(Arrow::Table.new(f0: values),
load_csv(<<-CSV, headers: nil))
a
b
c
CSV
end

test("string") do
values = Arrow::StringArray.new(["a", "b", "c"])
assert_equal(Arrow::Table.new(column: values),
load_csv(<<-CSV, headers: "column"))
a
b
c
CSV
end
end

test(":column_types") do
assert_equal(Arrow::Table.new(:count => Arrow::UInt16Array.new([1, 2, 4])),
load_csv(<<-CSV, column_types: {count: :uint16}))
Expand Down

0 comments on commit 564f468

Please sign in to comment.