Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use IO.copy_stream when reading, writing #6958

Merged
merged 3 commits into from Dec 18, 2023
Merged

Conversation

martinemde
Copy link
Member

What was the end-user or developer problem that led to this PR?

Looking at increasing read/write efficiency. I'm not sure if this is actually better, but I believe that's the goal of IO.copy_stream.

What is your fix for the problem, implemented in this PR?

Use IO.copy_stream when writing gems to files from the gem archive.

Make sure the following tasks are checked

Copy link
Member

@segiddins segiddins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fundamentally what IO.copy_stream is doing under the hood, agree it's what we should be doing

@martinemde martinemde changed the title Use IO.copy_stream when reading, then writing Use IO.copy_stream when reading, writing Sep 14, 2023
@martinemde
Copy link
Member Author

martinemde commented Sep 15, 2023

Strangely truffleruby fails every time but only on this branch. IO.copy_stream discrepancy? It fails, If I'm not mistaken, because a binary file from a gem (a file that would pass through this change) is producing a different checksum. It is presumably checking this same checksum on the other rubies.

@@ -712,6 +712,16 @@ def verify_gz(entry) # :nodoc:
rescue Zlib::GzipFile::Error => e
raise Gem::Package::FormatError.new(e.message, entry.full_name)
end

if RUBY_ENGINE == "truffleruby"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a ruby engine version check as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a fixed version? I haven't heard anything. A ticket was mentioned but it had to do with the return value of copy_stream. I don't think this is cause by the return value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oracle/truffleruby#3280 (comment) -- 23.1.2, to be released next month

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that makes sense. If it leaves our tar reader pos at the beginning, we can't possibly read the files right.


if RUBY_ENGINE == "truffleruby"
def copy_stream(src, dst) # :nodoc:
dst.write src.read 16_384 until src.eof?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if tests keep failing, maybe change this to dst.write src.read ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to assume it's the second change that's failing, reading from the tar.gz. This is the original code for the first change.

Copy link
Member Author

@martinemde martinemde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe full read then write is required when reading from tgz in truffleruby.

lib/rubygems/package.rb Outdated Show resolved Hide resolved
@martinemde
Copy link
Member Author

If this doesn't work I'll just make one change at a time.

The befuddlement to code ratio of this change is surprisingly high.

@segiddins
Copy link
Member

@martinemde I think this is ready to merge?

@martinemde martinemde merged commit 558f516 into master Dec 18, 2023
72 checks passed
@martinemde martinemde deleted the martinemde/io-copy-stream branch December 18, 2023 02:22
@martinemde
Copy link
Member Author

I'll merge now, then when truffleruby is released we can follow up with an engine version when we know it works.

deivid-rodriguez pushed a commit that referenced this pull request Dec 21, 2023
Use IO.copy_stream when reading, writing

(cherry picked from commit 558f516)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants