Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add handling for ambiguous parsing options #226

Merged
merged 2 commits into from
Nov 18, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions lib/csv.rb
Original file line number Diff line number Diff line change
Expand Up @@ -341,14 +341,14 @@
# liberal_parsing: false,
# nil_value: nil,
# empty_value: "",
# strip: false,
# # For generating.
# write_headers: nil,
# quote_empty: true,
# force_quotes: false,
# write_converters: nil,
# write_nil_value: nil,
# write_empty_value: "",
# strip: false,
# }
#
# ==== Options for Parsing
Expand All @@ -366,8 +366,9 @@
# - +header_converters+: Specifies the header converters to be used.
# - +skip_blanks+: Specifies whether blanks lines are to be ignored.
# - +skip_lines+: Specifies how comments lines are to be recognized.
# - +strip+: Specifies whether leading and trailing whitespace are
# to be stripped from fields..
# - +strip+: Specifies whether leading and trailing whitespace are to be
# stripped from fields. This must be compatible with +col_sep+; if it is not,
# then an +ArgumentError+ exception will be raised.
# - +liberal_parsing+: Specifies whether \CSV should attempt to parse
# non-compliant data.
# - +nil_value+: Specifies the object that is to be substituted for each null (no-text) field.
Expand Down Expand Up @@ -946,14 +947,14 @@ def initialize(message, line_number)
liberal_parsing: false,
nil_value: nil,
empty_value: "",
strip: false,
# For generating.
write_headers: nil,
quote_empty: true,
force_quotes: false,
write_converters: nil,
write_nil_value: nil,
write_empty_value: "",
strip: false,
}.freeze

class << self
Expand Down Expand Up @@ -1879,11 +1880,11 @@ def initialize(data,
encoding: nil,
nil_value: nil,
empty_value: "",
strip: false,
quote_empty: true,
write_converters: nil,
write_nil_value: nil,
write_empty_value: "",
strip: false)
write_empty_value: "")
raise ArgumentError.new("Cannot parse nil as CSV") if data.nil?

if data.is_a?(String)
Expand Down
23 changes: 23 additions & 0 deletions lib/csv/parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -361,6 +361,7 @@ def prepare
prepare_skip_lines
prepare_strip
prepare_separators
validate_strip_and_col_sep_options
prepare_quoted
prepare_unquoted
prepare_line
Expand Down Expand Up @@ -531,6 +532,28 @@ def prepare_separators
@not_line_end = Regexp.new("[^\r\n]+".encode(@encoding))
end

# This method verifies that there are no (obvious) ambiguities with the
# provided +col_sep+ and +strip+ parsing options. For example, if +col_sep+
# and +strip+ were both equal to +\t+, then there would be no clear way to
# parse the input.
def validate_strip_and_col_sep_options
return unless @strip

if @strip.is_a?(String)
if @column_separator.start_with?(@strip) || @column_separator.end_with?(@strip)
raise ArgumentError,
"The provided strip (#{@escaped_strip}) and " \
"col_sep (#{@escaped_column_separator}) options are incompatible."
end
else
if Regexp.new("\\A[#{@escaped_strip}]|[#{@escaped_strip}]\\z").match?(@column_separator)
raise ArgumentError,
"The provided strip (true) and " \
"col_sep (#{@escaped_column_separator}) options are incompatible."
end
end
adamroyjones marked this conversation as resolved.
Show resolved Hide resolved
end

def prepare_quoted
if @quote_character
@quotes = Regexp.new(@escaped_quote_character +
Expand Down
29 changes: 29 additions & 0 deletions test/csv/parse/test_strip.rb
Original file line number Diff line number Diff line change
Expand Up @@ -80,4 +80,33 @@ def test_do_not_strip_crlf
%Q{"a" ,"b " \r\n},
strip: true))
end

def test_col_sep_incompatible_true
message = "The provided strip (true) and " \
"col_sep (\\t) options are incompatible."
assert_raise_with_message(ArgumentError, message) do
CSV.parse_line(%Q{"a"\t"b"\n},
col_sep: "\t",
strip: true)
end
end

def test_col_sep_incompatible_string
message = "The provided strip (\\t) and " \
"col_sep (\\t) options are incompatible."
assert_raise_with_message(ArgumentError, message) do
CSV.parse_line(%Q{"a"\t"b"\n},
col_sep: "\t",
strip: "\t")
end
end

def test_col_sep_compatible_string
assert_equal(
["a", "b"],
CSV.parse_line(%Q{\va\tb\v\n},
col_sep: "\t",
strip: "\v")
)
end
end