Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for passing a block to IO#readlines and IO.readlines. #9025

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ioquatix
Copy link
Member

IO#readlines can allocate memory proportional to the file size. This can be a problem. While IO#each_line exists, IO#readlines is convenient. So, let's extend IO#readlines and IO.readlines to accept a block, and yield one line at a time, to avoid memory usage issues when dealing with large files.

cc @youchan

@ioquatix
Copy link
Member Author

ioquatix commented Nov 24, 2023

Here is a small example program I was using for testing:

#!/usr/bin/env ruby

PATH = ARGV.pop || '/tmp/fifo'

class Producer
  def initialize
    @queue = Thread::Queue.new
  end

  def dequeue
    @queue.pop
  end

  def run
    Thread.new do
      # read data from named pipe
      File.readlines(PATH) do |line|
        @queue.push line
      end
    end
  end
end

# consumer
producer = Producer.new
producer.run
loop do
  p producer.dequeue
end

Most of the credit goes to @youchan.

@ioquatix
Copy link
Member Author

@mame mentioned that this function can already be done by IO.foreach. I agree there is significant overlap.

@nobu
Copy link
Member

nobu commented Nov 30, 2023

Can be closed?

@ioquatix
Copy link
Member Author

ioquatix commented Dec 1, 2023

I am fine to close this, however I think what this shows is that there is a lack of consistency in the IO interface. I've actually made exactly the same mistake before, expecting IO.readlines takes a block.

The other problem is, whether we should encourage this approach at all. IO.readlines could work for trivial cases, but fail with big files. Users may not be aware of the limitations. For Ruby 3.3, I'm fine with not making any changes. However, I wonder if we should consider this more carefully. After all, the public interface on core classes will guide our users/programmers and I think we should guide them towards incremental/lazy each_line style patterns.

@ioquatix
Copy link
Member Author

ioquatix commented Dec 1, 2023

If you check https://github.com/search?q=%2FIO%5C.readlines%28.*%3F%29do%2F&type=code you can see plenty of examples of poor quality code, e.g. IO.readlines.each do which with this PR becomes IO.readlines do. Without this PR, it should be IO.foreach(...) do. So, please ask yourself the question, why do people ignore IO.foreach? I think it's because IO.readlines name matches closely with how people are thinking about the problem. In other words, IO.foreach is not obvious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants