Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced RDoc for filter #149

Merged
merged 4 commits into from Jun 28, 2020
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
77 changes: 52 additions & 25 deletions lib/csv.rb
Expand Up @@ -769,33 +769,60 @@ def instance(data = $stdout, **options)
end
end

#
# :call-seq:
# filter( **options ) { |row| ... }
# filter( input, **options ) { |row| ... }
# filter( input, output, **options ) { |row| ... }
#
# This method is a convenience for building Unix-like filters for CSV data.
# Each row is yielded to the provided block which can alter it as needed.
# After the block returns, the row is appended to +output+ altered or not.
#
# The +input+ and +output+ arguments can be anything CSV::new() accepts
# (generally String or IO objects). If not given, they default to
# <tt>ARGF</tt> and <tt>$stdout</tt>.
#
# The +options+ parameter is also filtered down to CSV::new() after some
# clever key parsing. Any key beginning with <tt>:in_</tt> or
# <tt>:input_</tt> will have that leading identifier stripped and will only
# be used in the +options+ Hash for the +input+ object. Keys starting with
# <tt>:out_</tt> or <tt>:output_</tt> affect only +output+. All other keys
# are assigned to both objects.
#
# See {Options for Parsing}[#class-CSV-label-Options+for+Parsing]
# and {Options for Generating}[#class-CSV-label-Options+for+Generating].
#
# The <tt>:output_row_sep</tt> +option+ defaults to
# <tt>$INPUT_RECORD_SEPARATOR</tt> (<tt>$/</tt>).
# filter(**options) {|row| ... } -> integer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we say that "the return value is nothing"?
The current #filter implementation returns an integer but it just so happens that input.each returns integer.
We don't want users to expect #filter returns an integer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is weird. We see this in a number of places. Can we return something useful when possible, or nil otherwise?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it's enough that we say it in document.
Should we return nil explicitly? If we should return something, #filter should return nil.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

::filter could return the output CSV object or both CSV objects.
::foreach could return the CSV object.
::parse could return the CSV object.

What the user would do with these is none of our business?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just nil everywhere we're now returning an integer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer a doc-only PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou, can you take a look?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to state that returning nil or something is the our spec for now.

In the current our knowledge, we don't have enough reason to state that #filter should return nil, the output CSV object or both CSV objects. We may find enough reason to state that #filter should return XXX in the future. If we state that #filter should return ZZZ now, we need to break backward compatibility.

So I think that we should state that #filter returns nothing or #filter's return value is undefined for now.
What do you think about this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me. I'll remove the nil returns and delete the nil returns from the call-seqs.
Some of them say "thingy or nil"; I'll just say "thingy". OK/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# filter(in_string, **options) {|row| ... } -> integer
# filter(in_io, **options) {|row| ... } -> integer
# filter(in_string, out_string, **options) {|row| ... } -> integer
# filter(in_string, out_io, **options) {|row| ... } -> integer
# filter(in_io, out_string, **options) {|row| ... } -> integer
# filter(in_io, out_io, **options) {|row| ... } -> integer
#
# Reads \CSV input and writes \CSV output; returns an \Integer.
# For each input row:
# - Forms the data into:
# - A CSV::Row object, if headers are in use.
# - An \Array of Arrays, otherwise.
# - Calls the block with that object.
# - Appends the block's return value to the output.
#
# Arguments:
# * \CSV source:
# * Argument +in_string+, if given, should be a \String object;
# it will be put into a new StringIO object positioned at the beginning.
# * Argument +in_io+, if given, should be an IO object that is
# open for reading; on return, the IO object will be closed.
# * If neither +in_string+ nor +in_io+ is given,
# the input stream defaults to {ARGF}[https://ruby-doc.org/core/ARGF.html].
# * \CSV output:
# * Argument +out_string+, if given, should be a \String object;
# it will be put into a new StringIO object positioned at the beginning.
# * Argument +out_io+, if given, should be an IO object that is
# ppen for writing; on return, the IO object will be closed.
# * If neither +out_string+ nor +out_io+ is given,
# the output stream defaults to <tt>$stdout</tt>.
# * Argument +options+ should be keyword arguments.
# - Each argument name that is prefixed with +in_+ or +input_+
# is stripped of its prefix and is treated as an option
# for parsing the input.
# Option +input_row_sep+ defaults to <tt>$INPUT_RECORD_SEPARATOR</tt>.
# - Each argument name that is prefixed with +out_+ or +output_+
# is stripped of its prefix and is treated as an option
# for generating the output.
# Option +output_row_sep+ defaults to <tt>$INPUT_RECORD_SEPARATOR</tt>.
# - Each argument not prefixed as above is treated as an option
# both for parsing the input and for generating the output.
# - See {Options for Parsing}[#class-CSV-label-Options+for+Parsing]
# and {Options for Generating}[#class-CSV-label-Options+for+Generating].
#
# Example:
# in_string = "foo,0\nbar,1\nbaz,2\n"
# out_string = ''
# CSV.filter(in_string, out_string) do |row|
# row[0] = row[0].upcase
# row[1] *= 4
# end
# out_string # => "FOO,0000\nBAR,1111\nBAZ,2222\n"
def filter(input=nil, output=nil, **options)
# parse options for input, output, or both
in_options, out_options = Hash.new, {row_sep: $INPUT_RECORD_SEPARATOR}
Expand Down