Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Download gems with threads #649

Open
wants to merge 6 commits into from

6 participants

@charliesome

This pull request modifies Gem::RequestSet to use threads to download gems.

This significantly reduces the time it takes to download gems with many dependencies, especially on high latency connections.

Without this patch, installing Rails 4.0.0 on my home connection takes just over a minute.

When downloading gems in parallel with this patch, the total time taken to install Rails 4.0.0 drops to just 18 seconds.

This could be further reduced by installing gems in parallel, although more work would need to be done to ensure gems are not installed before their dependencies.

@luislavena
Collaborator

@charliesome isn't this what Bundler offers? I believe they recently added parallel gem installation.

@charliesome

@luislavena Very similar, but then again I don't install everything through Bundler, so having this in RubyGems itself is quite beneficial.

@drbrain
Owner

I don't feel good about removing the functionality of VerboseDownloadReporter.

How about adding ThreadedDownloadReporter and calling a new method in request.rb to instantiate it (when verbose) like StreamUI#download_reporter?

@charliesome

@drbrain I've added a ThreadedDownloadReporter and restored VerboseDownloadReporter to what it was before.

I also added a configuration option to enable/disable threaded downloads. Did I do this right? I mainly just cargo-culted the existing code for the verbose option.

@luislavena
Collaborator

@charliesome so for each gem in sorted_requests, a new thread will be spawned and download will be performed?

Last time I checked doing gem install rails resulted in 28 gems being pulled, that will translate into 28 threads.

Not worried but a bit concerned that a gem with 20 dependencies (and the dependencies of dependencies) result in 100 gems to be installed and that simply clog your network connection :tongue:

@charliesome

@luislavena Yeah that's a fair point. I was thinking about adding a thread pool, but I decided to send this PR first and then get feedback.

Regarding limiting the number of concurrent downloads, does anyone have any suggestions about how I should do this?

There'd be a different sweet spot on each connection for how many threads are best. My connection is reasonably fast, but because I'm in Australia the latency often kills speeds when downloading from the US. For me, using a thread per gem is the fastest way to download all of Rails' dependencies. Obviously this wouldn't be the case for everyone though. Any arbitrary thread limit might be too high for some users, but also too low for some other users.

@drbrain
Owner

Historically I knew browsers used between two and four connections per hostname for downloading content, so why not have 4 as the default?

The threaded_downloads configuration option could be used to control the number of threads.

@drbrain
Owner

Moving this to the 2.3 milestone.

@Spaceghost

I'd assume bringing in something like facter wouldn't be welcome for detecting core count. Can this option be specifiable in a .gemrc file when it's shipped?

@josephholsten

celluloid removed its facter dependency by implementing its own cpu counter: celluloid/celluloid@568708d

current implementation: https://github.com/celluloid/celluloid/blob/master/lib/celluloid/cpu_counter.rb

@charliesome

I don't think we particularly care about CPU/core count for this. If I'm running RubyGems on a VPS with just one CPU, I don't want it to not download gems in parallel. Similarly, if I'm running RubyGems on a high powered machine with 64 cores, that doesn't necessarily mean it's capable of downloading that many gems at once.

@drbrain
Owner

I think the bigger issue for parallelism is how many connections can the typical user support? Out CDN should have no problems here, and I think browsers these days use 4–8 connections. Maybe we should pick 8?

@josephholsten

@drbrain the 4 connection thing is mostly a consensus interpretation of the http spec, a good desc is here: http://onejavascript.blogspot.com/2008/09/internet-explorer-8-and-maximum.html

@josephholsten

I'd say a max of 8 is totally sane.

@charliesome

8 seems good to me, I'll work on this today.

@josephholsten

btw, it looks like Request#connection_for is sharing http connections. It looks like it's allowing for a single connection per thread. That's correct, yes?

@olivierlacan

@charliesome This is really useful work. Do you need any help getting it ready for a merge?

@drbrain
Owner

I think the next step is to re-review.

I'll deal with the connection persistent changes that make this unmergable via button.

@drbrain drbrain modified the milestone: Future, 2.3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
View
15 lib/rubygems/config_file.rb
@@ -26,6 +26,7 @@
# +:backtrace+:: See #backtrace
# +:sources+:: Sets Gem::sources
# +:verbose+:: See #verbose
+# +:concurrent_downloads+:: See #concurrent_downloads
#
# gemrc files may exist in various locations and are read and merged in
# the following order:
@@ -42,6 +43,7 @@ class Gem::ConfigFile
DEFAULT_BULK_THRESHOLD = 1000
DEFAULT_VERBOSITY = true
DEFAULT_UPDATE_SOURCES = true
+ DEFAULT_CONCURRENT_DOWNLOADS = 8
##
# For Ruby packagers to set configuration defaults. Set in
@@ -123,6 +125,11 @@ class Gem::ConfigFile
attr_accessor :verbose
##
+ # Number of gem downloads that should be performed concurrently
+
+ attr_accessor :concurrent_downloads
+
+ ##
# True if we want to update the SourceInfoCache every time, false otherwise
attr_accessor :update_sources
@@ -190,6 +197,7 @@ def initialize(args)
@bulk_threshold = DEFAULT_BULK_THRESHOLD
@verbose = DEFAULT_VERBOSITY
@update_sources = DEFAULT_UPDATE_SOURCES
+ @concurrent_downloads = DEFAULT_CONCURRENT_DOWNLOADS
operating_system_config = Marshal.load Marshal.dump(OPERATING_SYSTEM_DEFAULTS)
platform_config = Marshal.load Marshal.dump(PLATFORM_DEFAULTS)
@@ -212,6 +220,7 @@ def initialize(args)
@path = @hash[:gempath] if @hash.key? :gempath
@update_sources = @hash[:update_sources] if @hash.key? :update_sources
@verbose = @hash[:verbose] if @hash.key? :verbose
+ @concurrent_downloads = @hash[:concurrent_downloads] if @hash.key? :concurrent_downloads
@disable_default_gem_server = @hash[:disable_default_gem_server] if @hash.key? :disable_default_gem_server
@ssl_verify_mode = @hash[:ssl_verify_mode] if @hash.key? :ssl_verify_mode
@@ -427,6 +436,12 @@ def to_yaml # :nodoc:
DEFAULT_VERBOSITY
end
+ yaml_hash[:concurrent_downloads] = if @hash.key?(:concurrent_downloads)
+ @hash[:concurrent_downloads]
+ else
+ DEFAULT_CONCURRENT_DOWNLOADS
+ end
+
keys = yaml_hash.keys.map { |key| key.to_s }
keys << 'debug'
re = Regexp.union(*keys)
View
16 lib/rubygems/request_set.rb
@@ -117,7 +117,12 @@ def install options, &block # :yields: request, installer
specs = []
- sorted_requests.each do |req|
+ queue = Queue.new
+ Gem.configuration.concurrent_downloads.times do
+ queue << :ticket
+ end
+
+ threads = sorted_requests.map { |req|
if req.installed? then
req.spec.spec.build_extensions
@@ -127,6 +132,15 @@ def install options, &block # :yields: request, installer
end
end
+ Thread.start {
+ ticket = queue.deq
+ path = req.download(cache_dir)
+ queue << ticket
+ [req, path]
+ }
+ }
+
+ threads.compact.map(&:value).each do |req, path|
path = req.download cache_dir
inst = Gem::Installer.new path, options
View
71 lib/rubygems/user_interaction.rb
@@ -530,7 +530,7 @@ def download_reporter(*args)
when nil, false
SilentDownloadReporter.new(@outs, *args)
else
- VerboseDownloadReporter.new(@outs, *args)
+ ThreadedDownloadReporter.new(@outs, *args)
end
end
@@ -566,82 +566,33 @@ def done
end
end
- ##
- # A progress reporter that prints out messages about the current progress.
-
- class VerboseDownloadReporter
-
- ##
- # The current file name being displayed
+ class ThreadedDownloadReporter
+ MUTEX = Mutex.new
attr_reader :file_name
- ##
- # The total bytes in the file
-
- attr_reader :total_bytes
-
- ##
- # The current progress (0 to 100)
-
- attr_reader :progress
-
- ##
- # Creates a new verbose download reporter that will display on
- # +out_stream+. The other arguments are ignored.
-
- def initialize(out_stream, *args)
+ def initialize(out_stream)
@out = out_stream
- @progress = 0
end
- ##
- # Tells the download reporter that the +file_name+ is being fetched and
- # contains +total_bytes+.
-
def fetch(file_name, total_bytes)
@file_name = file_name
- @total_bytes = total_bytes.to_i
- @units = @total_bytes.zero? ? 'B' : '%'
-
- update_display(false)
+ locked_puts "Downloading: #{file_name}"
end
- ##
- # Updates the verbose download reporter for the given number of +bytes+.
-
def update(bytes)
- new_progress = if @units == 'B' then
- bytes
- else
- ((bytes.to_f * 100) / total_bytes.to_f).ceil
- end
-
- return if new_progress == @progress
-
- @progress = new_progress
- update_display
+ # nah
end
- ##
- # Indicates the download is complete.
-
def done
- @progress = 100 if @units == '%'
- update_display(true, true)
+ locked_puts "Finished: #{file_name}"
end
- private
-
- def update_display(show_progress = true, new_line = false) # :nodoc:
- return unless @out.tty?
-
- if show_progress then
- @out.print "\rFetching: %s (%3d%s)" % [@file_name, @progress, @units]
- else
- @out.print "Fetching: %s" % @file_name
+ private
+ def locked_puts(msg)
+ MUTEX.synchronize do
+ @out.puts msg
end
- @out.puts if new_line
end
end
end
View
2  test/rubygems/test_gem_stream_ui.rb
@@ -173,7 +173,7 @@ def test_download_reporter_silent_false
def test_download_reporter_anything
@cfg.verbose = 0
reporter = @sui.download_reporter
- assert_kind_of Gem::StreamUI::VerboseDownloadReporter, reporter
+ assert_kind_of Gem::StreamUI::ThreadedDownloadReporter, reporter
end
def test_verbose_download_reporter
Something went wrong with that request. Please try again.