Skip to content

Conversation

joshuay03
Copy link
Contributor

@joshuay03 joshuay03 commented Sep 9, 2025

Motivation / Background

When using URL helpers with a host that includes the protocol (e.g., { host: "https://api.example.com" }), we (@buildkite) found that the already-extracted protocol was being unnecessarily processed through #normalize_protocol, creating extra allocations and showing up as a hotspot in our production profiles.

Example local profile: output.json.gz (viewable at https://vernier.prof, select allocations for data source)

Detail

Before:

  1. HOST_REGEXP extracts "https://" from host
  2. normalize_protocol processes "https://" through PROTOCOL_REGEXP and creates new string
  3. protocol.dup creates another allocation

After:

  1. HOST_REGEXP extracts "https://" from host
  2. Skip normalize_protocol since it's already in correct format
  3. Skip dup since regex capture groups are mutable

This optimization only applies when no explicit protocol option is provided and HOST_REGEXP successfully extracts a protocol. The HOST_REGEXP pattern ensures extracted protocols are always in the final "protocol://" format that normalize_protocol would return, making the normalization step redundant.

The change eliminates 2 string allocations per URL generation when protocol is included in the host, resulting in a ~10-12% performance improvement and ~16% reduction in total allocations for that case. Other cases remain unaffected with no performance regression.

Benchmarking

Script:

# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  gem "rails", path: "./"

  gem "benchmark-ips"
  gem "benchmark-memory"
end

require "action_controller/railtie"
require "minitest/autorun"

class TestApp < Rails::Application
  config.load_defaults Rails::VERSION::STRING.to_f
  config.root = __dir__
  config.eager_load = false
  config.hosts << "example.org"
  config.secret_key_base = "secret_key_base"

  config.logger = Logger.new($stdout)
end
Rails.application.initialize!

Rails.application.routes.draw do
  get "/", to: proc { [200, {}, ["Home"]] }, as: :home
end

class TestHelperWithoutProtocol
  include Rails.application.routes.url_helpers

  def default_url_options
    { host: "example.org" }
  end
end

class TestHelperWithExplicitProtocol
  include Rails.application.routes.url_helpers

  def default_url_options
    { host: "example.org", protocol: "http://" }
  end
end

class TestHelperWithProtocolInHost
  include Rails.application.routes.url_helpers

  def default_url_options
    { host: "http://example.org" }
  end
end

class BugTest < ActiveSupport::TestCase
  def test_benchmark
    th1 = TestHelperWithoutProtocol.new
    th2 = TestHelperWithExplicitProtocol.new
    th3 = TestHelperWithProtocolInHost.new

    Benchmark.ips do |x|
      x.report("home_url ips without protocol") { th1.home_url }
      x.report("home_url ips with explicit protocol") { th2.home_url }
      x.report("home_url ips with protocol in host") { th3.home_url }
    end

    puts "\n"

    Benchmark.memory do |x|
      x.report("home_url allocations without protocol") { th1.home_url }
      x.report("home_url allocations with explicit protocol") { th2.home_url }
      x.report("home_url allocations with protocol in host") { th3.home_url }
    end

    assert true
  end
end

Results:

main (604e8e057020d3ae522f674e3fdcaf4dea160d31):

ruby 3.4.5 (2025-07-16 revision 20cda200d3) +YJIT +PRISM [arm64-darwin25]
Warming up --------------------------------------
home_url without protocol
                        24.470k i/100ms
home_url with explicit protocol
                        21.399k i/100ms
home_url with protocol in host
                        22.408k i/100ms
Calculating -------------------------------------
home_url without protocol
                        245.376k (± 2.8%) i/s    (4.08 μs/i) -      1.248M in   5.090068s
home_url with explicit protocol
                        220.910k (± 2.5%) i/s    (4.53 μs/i) -      1.113M in   5.040106s
home_url with protocol in host
                        227.877k (± 2.8%) i/s    (4.39 μs/i) -      1.143M in   5.019124s

Calculating -------------------------------------
home_url allocations without protocol
                         2.680k memsize (   360.000  retained)
                        26.000  objects (     3.000  retained)
                         4.000  strings (     1.000  retained)
home_url allocations with explicit protocol
                         2.928k memsize (   360.000  retained)
                        29.000  objects (     3.000  retained)
                         6.000  strings (     1.000  retained)
home_url allocations with protocol in host
                         3.008k memsize (   360.000  retained)
                        31.000  objects (     3.000  retained)
                         6.000  strings (     1.000  retained)


optimise-build-host-url-with-protocol-in-host rebased on main (604e8e057020d3ae522f674e3fdcaf4dea160d31):

ruby 3.4.5 (2025-07-16 revision 20cda200d3) +YJIT +PRISM [arm64-darwin25]
Warming up --------------------------------------
home_url without protocol
                        24.536k i/100ms
home_url with explicit protocol
                        21.617k i/100ms
home_url with protocol in host
                        25.034k i/100ms
Calculating -------------------------------------
home_url without protocol
                        254.096k (± 2.5%) i/s    (3.94 μs/i) -      1.276M in   5.024416s
home_url with explicit protocol
                        230.278k (± 2.5%) i/s    (4.34 μs/i) -      1.167M in   5.072466s
home_url with protocol in host
                        254.472k (± 2.6%) i/s    (3.93 μs/i) -      1.277M in   5.020847s

Calculating -------------------------------------
home_url allocations without protocol
                         2.680k memsize (   360.000  retained)
                        26.000  objects (     3.000  retained)
                         4.000  strings (     1.000  retained)
home_url allocations with explicit protocol
                         2.928k memsize (   360.000  retained)
                        29.000  objects (     3.000  retained)
                         6.000  strings (     1.000  retained)
home_url allocations with protocol in host
                         2.680k memsize (   360.000  retained)
                        26.000  objects (     3.000  retained)
                         4.000  strings (     1.000  retained)

Additional information

N/A

Checklist

Before submitting the PR make sure the following are checked:

  • This Pull Request is related to one change. Unrelated changes should be opened in separate PRs.
  • Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
  • Tests are added or updated if you fix a bug or add a feature.
  • CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.

@joshuay03 joshuay03 self-assigned this Sep 9, 2025
@rails-bot rails-bot bot added the actionpack label Sep 9, 2025
@joshuay03 joshuay03 moved this to In Progress / Pending Review in Open Source Sep 9, 2025
@skipkayhil
Copy link
Member

Could the three branches be combined with something like this?

diff --git a/actionpack/lib/action_dispatch/http/url.rb b/actionpack/lib/action_dispatch/http/url.rb
index 7570a6a9af..12632736e8 100644
--- a/actionpack/lib/action_dispatch/http/url.rb
+++ b/actionpack/lib/action_dispatch/http/url.rb
@@ -202,15 +202,15 @@ def extract_subdomains_from(host, tld_length)

           def build_host_url(host, port, protocol, options, path)
             if match = host.match(HOST_REGEXP)
-              protocol ||= match[1] unless protocol == false
+              protocol_from_host = match[1] unless protocol == false
               host       = match[2]
               port       = match[3] unless options.key? :port
             end

-            protocol = normalize_protocol protocol
+            protocol = protocol_from_host || normalize_protocol(protocol).dup
             host     = normalize_host(host, options)

-            result = protocol.dup
+            result = protocol

             if options[:user] && options[:password]
               result << "#{Rack::Utils.escape(options[:user])}:#{Rack::Utils.escape(options[:password])}@"

@joshuay03
Copy link
Contributor Author

joshuay03 commented Sep 9, 2025

Could the three branches be combined with something like this?

This line protocol = protocol_from_host || normalize_protocol(protocol).dup prioritises host-extracted protocols, breaking the expected behavior where a truthy user-provided protocol should take precedence (protocol ||= match[1] unless protocol == false). We could replace it with this:

if protocol.nil?
  protocol = protocol_from_host || normalize_protocol(nil)  
else
  protocol = normalize_protocol(protocol).dup
end

but I think the current approach is more idiomatic.

@skipkayhil
Copy link
Member

where a truthy user-provided protocol should take precedence

Ah, right. I think the protocol_from_host condition could be inverted then?

diff --git a/actionpack/lib/action_dispatch/http/url.rb b/actionpack/lib/action_dispatch/http/url.rb
index 7570a6a9af..12632736e8 100644
--- a/actionpack/lib/action_dispatch/http/url.rb
+++ b/actionpack/lib/action_dispatch/http/url.rb
@@ -202,15 +202,15 @@ def extract_subdomains_from(host, tld_length)

           def build_host_url(host, port, protocol, options, path)
             if match = host.match(HOST_REGEXP)
-              protocol ||= match[1] unless protocol == false
+              protocol_from_host = match[1] if protocol.nil?
               host       = match[2]
               port       = match[3] unless options.key? :port
             end

-            protocol = normalize_protocol protocol
+            protocol = protocol_from_host || normalize_protocol(protocol).dup
             host     = normalize_host(host, options)

-            result = protocol.dup
+            result = protocol

             if options[:user] && options[:password]
               result << "#{Rack::Utils.escape(options[:user])}:#{Rack::Utils.escape(options[:password])}@"

@joshuay03 joshuay03 force-pushed the optimise-build-host-url-with-protocol-in-host branch 2 times, most recently from a22f19d to 45c7cf6 Compare September 9, 2025 16:31
@joshuay03
Copy link
Contributor Author

joshuay03 commented Sep 9, 2025

Ah, right. I think the protocol_from_host condition could be inverted then?

I like it! I've gone ahead and actioned this, however, I still need the result dup branch because the protocol variable is used later by normalize_port(), but result gets mutated with result << host. Without the dup, both variables point to the same object, so the mutations corrupt the protocol value that normalize_port() expects.

I've also gone ahead and updated the benchmark results with the latest change just in case. TL;DR, the improvement is still around the same.

@joshuay03 joshuay03 force-pushed the optimise-build-host-url-with-protocol-in-host branch from 45c7cf6 to 1fcea88 Compare September 9, 2025 16:49
end

protocol = normalize_protocol protocol
protocol = protocol_from_host || normalize_protocol(protocol)
Copy link
Member

@p8 p8 Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this would always normalize protocol.
Does this skip normalizing protocol_from_host now when the regexp matches?

Edit Nevermind, that's the purpose of this whole PR 🙃

@joshuay03 joshuay03 force-pushed the optimise-build-host-url-with-protocol-in-host branch from 1fcea88 to f4794fd Compare September 9, 2025 17:02
@skipkayhil skipkayhil force-pushed the optimise-build-host-url-with-protocol-in-host branch from f4794fd to 9a5215e Compare September 9, 2025 23:49
@skipkayhil
Copy link
Member

I added a test on main that fails with the current code (addc45c), and added a commit to fix it. Let me know if it looks good and I'll squash + merge 👍

@joshuay03
Copy link
Contributor Author

I added a test on main that fails with the current code (addc45c), and added a commit to fix it. Let me know if it looks good and I'll squash + merge 👍

LGTM, thank you! Please feel free to add yourself to the changelog entry when squashing! :shipit:

@joshuay03 joshuay03 force-pushed the optimise-build-host-url-with-protocol-in-host branch from 9a5215e to 7986bc3 Compare September 10, 2025 07:49
…host

Co-authored-by: Hartley McGuire <skipkayhil@gmail.com>
@skipkayhil skipkayhil force-pushed the optimise-build-host-url-with-protocol-in-host branch from 7986bc3 to 36785ad Compare September 10, 2025 21:42
@skipkayhil skipkayhil merged commit fb46d13 into main Sep 11, 2025
5 checks passed
@skipkayhil skipkayhil deleted the optimise-build-host-url-with-protocol-in-host branch September 11, 2025 03:02
@joshuay03 joshuay03 moved this from In Progress / Pending Review to Done in Open Source Sep 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants