Skip to content

Commit

Permalink
Improve performance of collate & reduce memorty consumption
Browse files Browse the repository at this point in the history
We used to read all files into memory - which gets too much if
someone runs a massively parallel CI because 400 * 10MB is still
~4GB and that's just pure file size and we do a lot more with it.

Breaks the interface of ResultMerger.merge_and_store but it's not
intended as a public interface. Will leave a note in the Changelog
anyhow.

What is attempted to do top leve is perhaps easier to see when
looking at the spike code: https://github.com/simplecov-ruby/simplecov/compare/collate-plus-plus?expand=1

Changes go further than just not reading all files in at once,
during the merge process we also operate on the raw file structure
as opposed to creating SimpleCov::Result. Creating SimpleCov::Result
comes with a lot of overhead, notably reading in all source files. So,
that's even worse doing ~400 times in a large code base.

There's more optimization potential for cases like these which
I'll open a ticket about but notably:
* Potentially don't create SimpleCov::Result at all until we really
  produce results (just dump the raw coverage more or less)
* allow running without a formatter as only the last one really
  needs the formatter
  • Loading branch information
PragTob committed Jan 3, 2021
1 parent 0fe63fd commit ed03db5
Show file tree
Hide file tree
Showing 11 changed files with 166 additions and 124 deletions.
1 change: 1 addition & 0 deletions .rubocop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ Metrics/MethodLength:

Metrics/ModuleLength:
Description: Avoid modules longer than 100 lines of code.
Max: 300
Exclude:
- "lib/simplecov.rb"

Expand Down
10 changes: 3 additions & 7 deletions lib/simplecov.rb
Original file line number Diff line number Diff line change
Expand Up @@ -81,17 +81,13 @@ def start(profile = nil, &block)
# information about coverage collation
#
def collate(result_filenames, profile = nil, &block)
raise "There's no reports to be merged" if result_filenames.empty?
raise "There are no reports to be merged" if result_filenames.empty?

initial_setup(profile, &block)

results = result_filenames.flat_map do |filename|
# Re-create each included instance of SimpleCov::Result from the stored run data.
Result.from_hash(JSON.parse(File.read(filename)) || {})
end

# Use the ResultMerger to produce a single, merged result, ready to use.
@result = ResultMerger.merge_and_store(*results)
# TODO: Did/does collate ignore old results? It probably shouldn't, right?
@result = ResultMerger.merge_and_store(*result_filenames)

run_exit_tasks!
end
Expand Down
2 changes: 1 addition & 1 deletion lib/simplecov/configuration.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ module SimpleCov
# defined here are usable from SimpleCov directly. Please check out
# SimpleCov documentation for further info.
#
module Configuration # rubocop:disable Metrics/ModuleLength
module Configuration
attr_writer :filters, :groups, :formatter, :print_error_status

#
Expand Down
31 changes: 1 addition & 30 deletions lib/simplecov/result.rb
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ class Result
# Initialize a new SimpleCov::Result from given Coverage.result (a Hash of filenames each containing an array of
# coverage data)
def initialize(original_result, command_name: nil, created_at: nil)
result = adapt_result(original_result)
result = original_result
@original_result = result.freeze
@command_name = command_name
@created_at = created_at
Expand Down Expand Up @@ -72,10 +72,6 @@ def to_hash
}
end

def time_since_creation
Time.now - created_at
end

# Loads a SimpleCov::Result#to_hash dump
def self.from_hash(hash)
hash.map do |command_name, data|
Expand All @@ -85,31 +81,6 @@ def self.from_hash(hash)

private

# We changed the format of the raw result data in simplecov, as people are likely
# to have "old" resultsets lying around (but not too old so that they're still
# considered we can adapt them).
# See https://github.com/simplecov-ruby/simplecov/pull/824#issuecomment-576049747
def adapt_result(result)
if pre_simplecov_0_18_result?(result)
adapt_pre_simplecov_0_18_result(result)
else
result
end
end

# pre 0.18 coverage data pointed from file directly to an array of line coverage
def pre_simplecov_0_18_result?(result)
_key, data = result.first

data.is_a?(Array)
end

def adapt_pre_simplecov_0_18_result(result)
result.transform_values do |line_coverage_data|
{"lines" => line_coverage_data}
end
end

def coverage
keys = original_result.keys & filenames
Hash[keys.zip(original_result.values_at(*keys))]
Expand Down
153 changes: 101 additions & 52 deletions lib/simplecov/result_merger.rb
Original file line number Diff line number Diff line change
Expand Up @@ -19,81 +19,110 @@ def resultset_writelock
File.join(SimpleCov.coverage_path, ".resultset.json.lock")
end

# Loads the cached resultset from JSON and returns it as a Hash,
# caching it for subsequent accesses.
def resultset
@resultset ||= begin
data = stored_data
if data
begin
JSON.parse(data) || {}
rescue StandardError
{}
end
else
{}
end
def merge_and_store(*file_paths)
result = merge_results(*file_paths)
store_result(result) if result
result
end

def merge_results(*file_paths)
# It is intentional here that files are only read in and parsed one at a time.
#
# In big CI setups you might deal with 100s of CI jobs and each one producing Megabytes
# of data. Reading them all in easily produces Gigabytes of memory consumption which
# we want to avoid.
#
# For similar reasons a SimpleCov::Result is only created in the end as that'd create
# even more data especially when it also reads in all source files.
initial_memo = valid_results(file_paths.shift)

command_names, coverage = file_paths.reduce(initial_memo) do |memo, file_path|
merge_coverage(memo, valid_results(file_path))
end

SimpleCov::Result.new(coverage, command_name: Array(command_names).sort.join(", "))
end

# Returns the contents of the resultset cache as a string or if the file is missing or empty nil
def stored_data
synchronize_resultset do
return unless File.exist?(resultset_path)
def valid_results(file_path)
parsed = parse_file(file_path)
valid_results = parsed.select { |_command_name, data| within_merge_timeout?(data) }
command_plus_coverage = valid_results.map { |command_name, data| [[command_name], adapt_result(data.fetch("coverage"))] }

# one file itself _might_ include multiple test runs
merge_coverage(*command_plus_coverage)
end

data = File.read(resultset_path)
return if data.nil? || data.length < 2
def parse_file(path)
data = read_file(path)
parse_json(data)
end

data
end
def read_file(path)
return unless File.exist?(path)

data = File.read(path)
return if data.nil? || data.length < 2

data
end

# Gets the resultset hash and re-creates all included instances
# of SimpleCov::Result from that.
# All results that are above the SimpleCov.merge_timeout will be
# dropped. Returns an array of SimpleCov::Result items.
def results
results = Result.from_hash(resultset)
results.select { |result| result.time_since_creation < SimpleCov.merge_timeout }
def parse_json(content)
return {} unless content

JSON.parse(content) || {}
rescue StandardError
warn "[SimpleCov]: Warning! Parsing JSON content of resultset file failed"
{}
end

def merge_and_store(*results)
result = merge_results(*results)
store_result(result) if result
result
def within_merge_timeout?(data)
time_since_result_creation(data) < SimpleCov.merge_timeout
end

# Merge two or more SimpleCov::Results into a new one with merged
# coverage data and the command_name for the result consisting of a join
# on all source result's names
def merge_results(*results)
parsed_results = JSON.parse(JSON.dump(results.map(&:original_result)))
combined_result = SimpleCov::Combine::ResultsCombiner.combine(*parsed_results)
result = SimpleCov::Result.new(combined_result)
# Specify the command name
result.command_name = results.map(&:command_name).sort.join(", ")
result
def time_since_result_creation(data)
Time.now - Time.at(data.fetch("timestamp"))
end

def merge_coverage(*results)
return results.first if results.size == 1

results.reduce do |(memo_command, memo_coverage), (command, coverage)|
# timestamp is dropped here, which is intentional
merged_coverage = SimpleCov::Combine::ResultsCombiner.combine(memo_coverage, coverage)
merged_command = memo_command + command

[merged_command, merged_coverage]
end
end

#
# Gets all SimpleCov::Results from cache, merges them and produces a new
# Gets all SimpleCov::Results stored in resultset, merges them and produces a new
# SimpleCov::Result with merged coverage data and the command_name
# for the result consisting of a join on all source result's names
#
# TODO: Maybe put synchronization just around the reading?
def merged_result
merge_results(*results)
synchronize_resultset do
merge_results(resultset_path)
end
end

def read_resultset
synchronize_resultset do
parse_file(resultset_path)
end
end

# Saves the given SimpleCov::Result in the resultset cache
def store_result(result)
synchronize_resultset do
# Ensure we have the latest, in case it was already cached
clear_resultset
new_set = resultset
new_resultset = read_resultset
# FIXME
command_name, data = result.to_hash.first
new_set[command_name] = data
new_resultset[command_name] = data
File.open(resultset_path, "w+") do |f_|
f_.puts JSON.pretty_generate(new_set)
f_.puts JSON.pretty_generate(new_resultset)
end
end
true
Expand All @@ -116,9 +145,29 @@ def synchronize_resultset
end
end

# Clear out the previously cached .resultset
def clear_resultset
@resultset = nil
# We changed the format of the raw result data in simplecov, as people are likely
# to have "old" resultsets lying around (but not too old so that they're still
# considered we can adapt them).
# See https://github.com/simplecov-ruby/simplecov/pull/824#issuecomment-576049747
def adapt_result(result)
if pre_simplecov_0_18_result?(result)
adapt_pre_simplecov_0_18_result(result)
else
result
end
end

# pre 0.18 coverage data pointed from file directly to an array of line coverage
def pre_simplecov_0_18_result?(result)
_key, data = result.first

data.is_a?(Array)
end

def adapt_pre_simplecov_0_18_result(result)
result.transform_values do |line_coverage_data|
{"lines" => line_coverage_data}
end
end
end
end
Expand Down
3 changes: 3 additions & 0 deletions spec/fixtures/conditionally_loaded_1.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# some comment
puts "wargh"
puts "wargh 1"
3 changes: 3 additions & 0 deletions spec/fixtures/conditionally_loaded_2.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# some comment
puts "wargh"
puts "wargh 2"
4 changes: 4 additions & 0 deletions spec/fixtures/parallel_tests.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# foo
puts "foo"
# bar
puts "bar"

0 comments on commit ed03db5

Please sign in to comment.