Skip to content

Allocate less things #681

Merged
merged 5 commits into from Jul 24, 2014

4 participants

@srawlins

With the allocation_stats gem, I tracked down the points in the mail gem where the most allocations are made. I then tried to allocate less, using the following 4 strategies:

  • Either constantize or #freeze tiny Strings used in methods like #gsub and #pack.
  • Use the bang form of Enumerable methods where safe to do so, e.g. map! instead of map.
  • Replacing foo.each_slice(2) { |a,b| loops with until foo.empty? a=foo.shift; b=foo.shift

Freezing String literals should not hurt performance on Ruby < 2.1, but definitely improves performance in Ruby 2.1.x. Reducing object allocations results in applications with smaller memory footprints and less time spent in GC.

Before this pull request, running rspec spec/mail allocates 855k objects. This pull request reduces that count to 613k allocations.

I understand that some of these changes might upset the Rubyish feel to the library, and the elegance of the code. The maintainers might also have specific ways you want to constantize or freeze things like "\r\n" and "". Please leave comments regarding unwanted changes, and I can pull them out of this pull request.

Top allocation sites are listed here. I can give more details if needed.

@jeremy
Collaborator
jeremy commented Apr 1, 2014

Thanks for digging in to this!

.freeze works on 2.1 but no luck on older Ruby. How about using constants for these common-use strings?

Some of the other code changes do feel iffy. Seeing before/after object allocations for each kind of change would make it a lot easier to weigh their worth.

@srawlins
srawlins commented Apr 1, 2014

I 100% concur. I ran some benchmarks after this PR on pre 2.1: Constants are more performant and look nicer. My question is where to namespace all of the "\n" and "\r\n" constants, etc. lib/mail/patterns might not be a bad spot?

Re: "Seeing ... for each kind of change ...": Fair enough. I can break the savings down into categories. I'm also very willing to neuter this PR into a subset.

@nisanth074

Tiny change, for better readability

action_id, p = actions.shift(2)

Unfortunately, @nisanth074, this leads in the same unnecessary allocations as #each_slice(2). Both #each_slice(2) and #shift(2) generate a new 2-element Array, which is immediately dismantled at either do |action_id, p| or action_id, p =, respectively.

@jeremy
Collaborator
jeremy commented Apr 5, 2014

@srawlins :+1: good fit, though it'd make sense to rename to lib/mail/constants.rb

@srawlins
srawlins commented Apr 9, 2014

@jeremy , I've updated the patch moving most of the #freeze calls to be constants instead, located in lib/mail/constants.rb. I think it looks so much cleaner.

I also thought about the #each_slice(2) refactors, and decided that these move too far away from the Ruby-ish feel of the code. I don't think that a Ruby codebase should make concessions for a shortcoming of the interpreter. I'm trying to put together a patch for MRI that would allocate less during each loop of #each_slice and similar Enumerable methods. The #each_slice refactors amounted to 42k saved allocations, out of the total of 244k saved allocations. If you do want this back, I can open a separate PR.

Were there other offensive-looking refactors that you weren't big on? I can look into the allocation savings of each.

@jeremy jeremy and 1 other commented on an outdated diff Apr 10, 2014
lib/mail/constants.rb
@@ -33,5 +33,20 @@ module Patterns
ATOM_UNSAFE = /[#{Regexp.quote aspecial}#{control}#{sp}]/n
PHRASE_UNSAFE = /[#{Regexp.quote aspecial}#{control}]/n
TOKEN_UNSAFE = /[#{Regexp.quote tspecial}#{control}#{sp}]/n
+
+ EMPTY = ''
+ SPACE = ' '
+ UNDERSCORE = '_'
+ HYPHEN = '-'
+ COLON = ':'
+ ASTERISK = '*'
+ CR = "\r"
+ LF = "\n"
+ CR_ENCODED = "=0D"
+ LF_ENCODED = "=0A"
+ CAPITAL_M = 'M'
+ EQUAL_LF = "=\n"
+ CRLFx2 = "#{CRLF}#{CRLF}"
+ CRLFs_WITH_WSP = "#{CRLF}#{WSP}*#{CRLF}(?!#{WSP})"
@jeremy
Collaborator
jeremy added a note Apr 10, 2014

Looks like this should be in the patterns area above, and be a regexp instead of a string

@srawlins
srawlins added a note Apr 10, 2014

Since these two are only used in Message#parse_message, and you recommend extracting out that whole Regexp, maybe I should not dismantle it in the first place, in message.rb:

HEADER_SEPARATOR = /#{CRLF}#{CRLF}|#{CRLF}#{WSP}*#{CRLF}(?!#{WSP})/m

def parse_message
  header_part, body_part = raw_source.lstrip.split(HEADER_SEPARATOR, 2)
@jeremy
Collaborator
jeremy added a note Apr 10, 2014

:+1: sounds good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jeremy jeremy commented on an outdated diff Apr 10, 2014
lib/mail/constants.rb
@@ -33,5 +33,20 @@ module Patterns
ATOM_UNSAFE = /[#{Regexp.quote aspecial}#{control}#{sp}]/n
PHRASE_UNSAFE = /[#{Regexp.quote aspecial}#{control}]/n
TOKEN_UNSAFE = /[#{Regexp.quote tspecial}#{control}#{sp}]/n
+
+ EMPTY = ''
+ SPACE = ' '
+ UNDERSCORE = '_'
+ HYPHEN = '-'
+ COLON = ':'
+ ASTERISK = '*'
+ CR = "\r"
+ LF = "\n"
+ CR_ENCODED = "=0D"
+ LF_ENCODED = "=0A"
+ CAPITAL_M = 'M'
+ EQUAL_LF = "=\n"
+ CRLFx2 = "#{CRLF}#{CRLF}"
@jeremy
Collaborator
jeremy added a note Apr 10, 2014

If this is only used for matches, it'd be clearer as a Regexp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jeremy jeremy commented on an outdated diff Apr 10, 2014
lib/mail/constants.rb
@@ -15,7 +15,7 @@ module Patterns
control = control.force_encoding(Encoding::BINARY)
end
- CRLF = /\r\n/
+ CRLF = "\r\n"
@jeremy
Collaborator
jeremy added a note Apr 10, 2014

This is only used for matching, so should stay a Regexp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jeremy jeremy commented on the diff Apr 10, 2014
lib/mail/core_extensions/string.rb
end
def to_lf
- to_str.gsub(/\r\n|\r/, "\n")
+ to_str.gsub(/\r\n|\r/, LF)
end
@jeremy
Collaborator
jeremy added a note Apr 10, 2014

Could reference Mail::Constants::CRLF and LF directly

@srawlins
srawlins added a note Apr 11, 2014

OK now Mail::Constants::CRLF is back to a Regexp, so that won't work. Mail::Constants::LF is a String. Should I kill the LF = in this file? or keep them both?

@jeremy
Collaborator
jeremy added a note Apr 11, 2014

Scoping it to this file seems fine since it's a separate core extension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jeremy jeremy commented on the diff Apr 10, 2014
lib/mail/elements/address.rb
@@ -192,7 +192,7 @@ def parse(value = nil)
def strip_all_comments(string)
unless comments.blank?
comments.each do |comment|
- string = string.gsub("(#{comment})", '')
+ string = string.gsub("(#{comment})", EMPTY)
end
@jeremy
Collaborator
jeremy added a note Apr 10, 2014

This whole thing could be one gsub eh?

@srawlins
srawlins added a note Apr 10, 2014

You mean something like:

all_comments_regex = /#{comments.map {|c| "(#{c})" }.join("|")}/
string = string.gsub(all_comments_regex, EMPTY)

right? This feels pretty weird, and also requires firing up the regex engine. gsub("(#{comment})", EMPTY) does not require firing up the regex engine as of Bug 9680 (Ruby 2.2 I guess...) which is a nice performance boost. What do you think?

@jeremy
Collaborator
jeremy added a note Apr 10, 2014

I was thinking

string.gsub!(/\(([^\)]+)\)/) { |match| comments.include?($1) ? EMPTY : match }

but when I write it out, it looks obviously worse :grin:

@jeremy
Collaborator
jeremy added a note Apr 10, 2014

Nice Ruby patch btw!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jeremy jeremy commented on an outdated diff Apr 10, 2014
lib/mail/mail.rb
@@ -235,7 +235,7 @@ def self.inform_interceptors(mail)
def self.random_tag
t = Time.now
- sprintf('%x%x_%x%x%d%x',
+ sprintf('%x%x_%x%x%d%x'.freeze,
@jeremy
Collaborator
jeremy added a note Apr 10, 2014

could put this string in a RANDOM_TAG='%x%x_%x%x%d%x' just above the method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jeremy jeremy commented on an outdated diff Apr 10, 2014
lib/mail/message.rb
@@ -1969,7 +1969,7 @@ def text?
# Additionally, I allow for the case where someone might have put whitespace
# on the "gap line"
def parse_message
- header_part, body_part = raw_source.lstrip.split(/#{CRLF}#{CRLF}|#{CRLF}#{WSP}*#{CRLF}(?!#{WSP})/m, 2)
+ header_part, body_part = raw_source.lstrip.split(/#{CRLFx2}|#{CRLFs_WITH_WSP}/m, 2)
@jeremy
Collaborator
jeremy added a note Apr 10, 2014

Move the whole regexp to a constant?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jeremy
Collaborator
jeremy commented Apr 10, 2014

What's the savings of doing separate downcase! vs just downcase? That's the only disruptive feeling change here.

Worth pursuing further wins with enumerable methods too!

Appreciate the deep dig on this :grin:

@srawlins srawlins Addressing @jeremy's comments; this is beginning to look very nice 3d98d90
@srawlins

FYI: regarding each_slice allocating an Array each loop, this is fixed in Ruby trunk. ruby/ruby@2067516

@jeremy jeremy merged commit 3d98d90 into mikel:master Jul 24, 2014

1 check passed

Details continuous-integration/travis-ci The Travis CI build passed
@jeremy
Collaborator
jeremy commented Jul 24, 2014

:heart: Thanks again @srawlins !

@bf4
Collaborator
bf4 commented Aug 6, 2014

Awesome. I learned about this MRI optimization in arel where tenderlove described it at 'roflscaling'. https://github.com/rails/arel/blob/5-0-stable/lib/arel/visitors/to_sql.rb#L8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.