New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use raw time string from DB to generate ActiveRecord#cache_version #33835

Merged
merged 3 commits into from Nov 28, 2018

Conversation

Projects
None yet
9 participants
@schneems
Member

schneems commented Sep 10, 2018

Currently, the updated_at field is used to generate a cache_version. Some database adapters return this timestamp value as a string that must then be converted to a Time value. This process requires a lot of memory and even more CPU time. In the case where this value is only being used for a cache version, we can skip the Time conversion by using the string value directly.

Details

  • This PR preserves existing cache format by converting a UTC string from the database to :usec format.
  • Some databases return an already converted Time object, in those instances, we can directly use created_at.
  • The updated_at_before_type_cast can be a value that comes from either the database or the user. We only want to optimize the case where it is from the database.
  • If the format of the cache version has been changed, we cannot apply this optimization, and it is skipped.
  • If the format of the time in the database is not UTC, then we cannot use this optimization, and it is skipped.

Some databases (notably PostgreSQL) returns a variable length nanosecond value in the time string. If the value ends in a zero, then it is truncated For instance instead of 2018-10-12 05:00:00.000000 the value 2018-10-12 05:00:00 is returned. We detect this case and pad the remaining zeros to ensure consistent cache version generation.

Performance

Memory

Before: Total allocated: 743842 bytes (6626 objects)
After: Total allocated: 702955 bytes (6063 objects)

(743842 - 702955) / 743842.0 # => 5.4% ⚡️⚡️⚡️⚡️⚡️

Speed

Using the CodeTriage application and derailed benchmarks this PR shows between 9-11% (statistically significant) performance improvement versus the commit before it.

Thanks

Special thanks to @lsylvester for helping to figure out a way to preserve the usec format and for helping with many implementation details.

@schneems

This comment has been minimized.

Member

schneems commented Sep 10, 2018

I think we're in the clear since all the data from usec has to come from the raw string in the first place. I'm happy to dig in more with other verification methods if you can think of any. cc/ @matthewd

@sgrif

This comment has been minimized.

Member

sgrif commented Sep 10, 2018

Does this change result in model.updated_at = model.updated_at changing the cache key? Do we care if so?

@schneems

This comment has been minimized.

Member

schneems commented Sep 10, 2018

Does this change result in model.updated_at = model.updated_at changing the cache key? Do we care if so?

Interesting. I didn't think about assignment changing the cache key. I think if you have model.updated_at = Time.now then the cache key will change based on current behavior but not with this patch. For model.updated_at = model.updated_at I don't think the cache key will change since it will still use the string before typecast so modifying updated_at shouldn't affect anything.

@schneems schneems force-pushed the schneems:schneems/faster_cache_version branch 2 times, most recently from 77e3495 to d87e5ba Sep 10, 2018

@matthewd

This comment has been minimized.

Member

matthewd commented Sep 10, 2018

ISTM the greater type-mismatch concern would be that the cache key must match between a freshly-written object (with only a "real" Time) and that same object when later loaded from the database.

5% is pretty huge, though: I see no reason that shouldn't be easily demonstrable on a real request-sized benchmark, without needing to lean on the [still undemonstrated to me] "memory not allocated = time saved" theory.

@schneems

This comment has been minimized.

Member

schneems commented Sep 10, 2018

without needing to lean on the [still undemonstrated to me] "memory not allocated = time saved" theory.

From my earlier PR we see a 20% reduction in objects and a 20% reduction in page load time.

I ran the numbers for this PR and I did find some relatively surprising results running the page speed for this:

Before: 13.560740 0.962090 14.555312 ( 30.606835)
After: 11.225604 0.900667 12.159477 ( 25.296171)
Diff: (30.606835 - 25.296171) / 30.606835 # => 17.35 % faster

Which makes sense since the parsing and generating of time objects is CPU intensive (in addition to being memory intensive). Also that time parsing was already memory optimized by some other PRs I had.

Note that i'm not advocating for memory not allocated = time saved i'm advocating for memory not allocated ~ time saved as it's a better proxy for repeatability. I trust my memory numbers, they're consistent.

For raw speed numbers to be truly accurate I would need to reproduce the above test dozens/hundreds of times calculate variance and use something like a "students T test" to actually have faith in the number. Each of those dozens of runs (per case) takes 30 seconds and then I have to repeat them for the hypothesis and the null hypothesis case. Multiply that by the changes that get requested (usually) for a PR and that pattern is not be very sustainable.

@brodock

This comment has been minimized.

brodock commented Sep 10, 2018

That gives me one idea: do you think it would be possible to typecast model attributes when it is used for the first time in opposite of typecasting in advance as I believe it is today?

@schneems

This comment has been minimized.

Member

schneems commented Sep 10, 2018

@brodock that is the current behavior. It’s done on demand which is why this PR works by not using that field if it’s not needed.

return unless cache_versioning
if timestamp = try(:updated_at_before_type_cast)
timestamp

This comment has been minimized.

@rafaelfranca

rafaelfranca Sep 10, 2018

Member

Will this change the format of the cache key and thus invalidate all the cache entries?

This comment has been minimized.

@schneems

schneems Sep 10, 2018

Member

Yes unfortunately. I originally thought this change was only in Rails 6 but it was introduced in 5.2.1 beta via the commit I referenced earlier.

Is there a way to check to see if the attribute is loaded without loading it? If so we can continue to use the updated_at method with no additional performance penalty.

We could introduce a slow rollout mechanism for cache keys i.e. so only x% is invalidated without all of them getting invalidated at a time. Any other ideas?

This comment has been minimized.

@sgrif

sgrif Sep 10, 2018

Member

I don't think we've exposed a public API for checking individual attributes, but @attributes["updated_at"].has_been_read? would do it. I don't see how that addresses the problem of invalidating existing cache keys though. The common case is that cache is at the top of the file, so the attribute would not have been read yet (this is assuming it's ever read at all, which at least in CodeTriage it appears to never be since if it were you'd see no perf difference)

This comment has been minimized.

@schneems

schneems Sep 11, 2018

Member

I guess having different behavior based on the status of the object would lead to cache bloat as sometimes the key would be different values. So that's not a great idea. I'm mostly trying to brainstorm on ways to minimize the cache invalidation issue. Having talked this one through with you i don't think it's a good path forward.

This comment has been minimized.

@sgrif

sgrif Sep 11, 2018

Member

I think to go forward with this change we need to accept that it will invalidate existing caches.

@sgrif

This comment has been minimized.

Member

sgrif commented Sep 10, 2018

don't think the cache key will change since it will still use the string before typecast

That's not what "before type cast" means. If you do model.updated_at = model.updated_at, that will cause "before type cast" and "after type cast" to be the same thing (for most types they will be exactly the same object, but this isn't guaranteed).

Folks tend to assume that _before_type_cast either means "the database representation" or "what the user gave us", when it's not guaranteed to mean either of these things (though it is going to mean one or the other)

@rafaelfranca

This comment has been minimized.

Member

rafaelfranca commented Sep 10, 2018

I have to ask, but are you bechmarking on macos? And with Puma in a forked process?

@schneems

This comment has been minimized.

Member

schneems commented Sep 10, 2018

I have to ask, but are you bechmarking on macos? And with Puma in a forked process?

Yes i'm on a mac. I'm using derailed_benchmarks without a server so i'm using the raw rack interface directly essentially i'm manually .call-ing the app. https://github.com/schneems/derailed_benchmarks/blob/cd6befa4aa7d7d10e6fbbd126ff60fb4ec8cf2f2/lib/derailed_benchmarks/tasks.rb#L104-L108

You can try the benchmark yourself here is my branch of codetriage for Rails 6 https://github.com/codetriage/codetriage/tree/schneems/6.ohhh that i've been perf testing against.

First clone then make sure you're pointing to a relative copy of rails in the gemfile. Then

export RAILS_LOG_TO_STDOUT=1
export RAILS_SERVE_STATIC_FILES=1
export SECRET_KEY_BASE=foobarlol
export DEVISE_SECRET_KEY=1
RAILS_ENV=production bundle exec rake db:create:all db:migrate db:seed

Now you can run perf tests

bundle exec derailed exec perf:test

I git checkout this branch and run the perf tests, then checkout the commit right before my commit and re-run.

Also clear your screen between runs as all logs go to stdout and your terminal buffer might have an impact. Also close EVERY app on your machine. Ideally turn off the wifi and walk away with nothing running other than this script.

The number I gave earlier (17%) isn't statistically valid it's from two runs and is likely skewing to both sides of a natrual distribution for both the before and after number. If you're interested in an actual number I can stand behind then i'll need some more time.

@dhh

This comment has been minimized.

Member

dhh commented Sep 10, 2018

@rafaelfranca

This comment has been minimized.

Member

rafaelfranca commented Sep 10, 2018

Ok, if you are in MacOS there is a chance that https://bugs.ruby-lang.org/issues/14435 is on the play here, but given it is not forking I find it unlikely. I'd still try to see if 2.6.0 trunk make a difference here.

@sgrif

This comment has been minimized.

Member

sgrif commented Sep 10, 2018

Alternative to trunk would be to use something like apache bench, which is probably the most appropriate tool for a full stack benchmark. Even the 5% memory savings is a bit surprising to me. Is this hitting an action using an array of hundreds of records for its cache key? There just shouldn't be enough casts happening here to really add up.

@dhh Since this is targeting master, the cache key invalidation would be happening in 6.0. Is it acceptable to do that in a major version in your opinion? The other big concern is that this will cause any patch which changes the database representation of the timestamp (such as switching to binary representation on PG which I've been wanting to do) to also invalidate all existing cache keys.

@dhh

This comment has been minimized.

Member

dhh commented Sep 10, 2018

@schneems

This comment has been minimized.

Member

schneems commented Sep 10, 2018

On CodeTriage it's hitting the main page which is I think 50 cached fragments in each backed by an AR object. The allocations are happening mostly in time_value.rb you can see it come up several times in my "top" output. The /type/helpers/time_value.rb:72 is my 6th largest source of memory allocations by bytes.

# bytes allocated | location of allocation
allocated memory by location
-----------------------------------
    122491  .../rails/activesupport/lib/active_support/core_ext/string/output_safety.rb:172
     49288  .../codetriage/app/views/layouts/application.html.slim:1
     49248  .../codetriage/app/views/layouts/_app.html.slim:1
     24904  .../codetriage/app/views/pages/_repos_with_pagination.html.slim:1
     18984  .../ems/rack-mini-profiler-1.0.0/lib/patches/db/pg.rb:92
     17343  .../rails/activemodel/lib/active_model/type/helpers/time_value.rb:72 # <===================
     17056  .../rails/actionview/lib/action_view/digestor.rb:23
     14920  .../ems/rack-mini-profiler-1.0.0/lib/patches/db/pg.rb:9
     14720  .../rails/activesupport/lib/active_support/core_ext/object/to_query.rb:43
     14400  .../rails/activerecord/lib/active_record/result.rb:150
     12000  .../rails/activemodel/lib/active_model/type/helpers/time_value.rb:80 # <===================
     10944  .../rails/activesupport/lib/active_support/cache.rb:599
      9600  .../rails/activemodel/lib/active_model/attribute_set/builder.rb:25
      8815  .../ems/rack-mini-profiler-1.0.0/lib/mini_profiler/storage/file_store.rb:27
      8000  .../rails/activemodel/lib/active_model/attribute.rb:9
      8000  .../rails/activerecord/lib/active_record/attribute_methods.rb:281
      7904  .../rails/actionpack/lib/abstract_controller/caching/fragments.rb:92
      6656  .../rails/actionview/lib/action_view/helpers/cache_helper.rb:228
      5600  .../rails/activerecord/lib/active_record/persistence.rb:81
      4558  .../rails/activesupport/lib/active_support/cache/memory_store.rb:123
      4400  .../rails/activemodel/lib/active_model/attribute_set/builder.rb:16
      4320  .../rails/actionpack/lib/action_dispatch/routing/route_set.rb:199
      4300  .../rails/activemodel/lib/active_model/type/helpers/time_value.rb:64 # <===================
@dhh

This comment has been minimized.

Member

dhh commented Sep 10, 2018

@sgrif

This comment has been minimized.

Member

sgrif commented Sep 10, 2018

Interesting. It's definitely an allocation hungry cast, but it's shocking to me that it could end up being that large of a portion of allocation. I would be incredibly shocked if this had a measurable impact on actual response time though, and would like to see some confirmation of results if we're making the change on those grounds. Even if it doesn't affect response time though, 5% in overall allocated memory per request is worth it on its own, as long as we acknowledge that this only affects applications which:

  1. Are using a large number of Active Record objects as cache keys
  2. Are not otherwise accessing updated_at (which they wouldn't for cache hits -- I suspect this really only affects warm cache cases on index pages, where you check dozens of records individually even on a warm cache)

With all of that said, if we're fine with invalidating existing caches in 6.0, I think I'm ok with this change.

@lsylvester

This comment has been minimized.

Contributor

lsylvester commented Sep 10, 2018

@schneems I assume that you are running this issuing PostgreSQL, which retrieves the values from the dbs as strings. For other adapters like Mysql2, the values are received as Time objects. There still would be savings in not having to convert to ActiveSupport::TimeWithZone, but it would need explicit formatting to include units more precise than seconds.

@schneems

This comment has been minimized.

Member

schneems commented Sep 11, 2018

Yes this is with postgres. I believe the PG gem is now capable of providing type cast values now, perhaps an alternative could be to use the pg type cast for this one attribute for now. I'm not sure about the performance of that time casting versus the one directly in Rails but presumably it's faster and uses less memory. If we go that route then we could perhaps get all the savings without needing to bust the cache key. How hard would that be @sgrif ?

@schneems

This comment has been minimized.

Member

schneems commented Sep 11, 2018

Okay I have some statistically significant results for actual performance numbers. Here's the raw output https://gist.github.com/schneems/0277b3d42b6414dbbe69188f2ca46408

On average the delta between my patch and master (the commit right before my change) is 11.50%.

The standard deviations for both sets are well under the average delta and when I applied a students-t test for statistical validity I got a number so small it's not showing as anything other than 0 (this is really good).

I am EXTREMELY confident in saying that this improvement comes based on my patch. Note that the exact number may vary based on what else the computer is doing and the alignment of the sun and moon. I can say with confidence that i've seen a 11.5% performance gain with this patch, but I cannot claim with certainty that is the exact improvement it provides (i.e. YMMV but it's 100% for sure going to be faster).

You can reproduce the results on my schneems/6.ohhh branch that I linked to earlier by running this command.

$ BEFORE_BRANCH=schneems/faster_cache_version AFTER_BRANCH=dd29fabebf1b702031213361b7757be5892cd59f bundle exec rake compare_branches

It assumes that the directory codetriage is in is the same directory that your rails folder is in. It also assumes that you have my patch it the schneems/faster_cache_version branch.

@schneems schneems changed the title from Reduce (a lot of) allocations in ActiveRecord#cache_key to 5% memory reduction 11% reduction in request time. Reduce allocations in ActiveRecord#cache_key Sep 11, 2018

@sgrif

This comment has been minimized.

Member

sgrif commented Sep 11, 2018

@schneems The PG gem's behavior doesn't match Rails unfortunately, but date parsing is going to be slow no matter who's doing it. The best source of a performance win would be to use binary encoding, which is trivial for us to parse (but that's a much larger change that isn't feasible right now)

@schneems

This comment has been minimized.

Member

schneems commented Sep 11, 2018

I tested this against mysql by changing my adapter to mysql2 and re-running my memory benchmarks. This is what I see:

Before: Total allocated: 741754 bytes (6977 objects)
After: Total allocated: 716055 bytes (6624 objects)
Diff: (741754 - 716055) / 741754.0 # 3.46 % fewer bytes allocated

While it's not as dramatic as the postgres change it is less memory allocation and a 3.46% diff is still pretty good. You can see my app I used in the branch: https://github.com/codetriage/codetriage/tree/schneems/6.ohhh-mysql

Sounds like we're mostly back to square one. This patch will make things faster for sure but will also invalidate caches.

Do we pull this trigger or is there something else we want to investigate first?

@sgrif

This comment has been minimized.

Member

sgrif commented Sep 11, 2018

I'd still like to address how we prevent caches from getting invalidated in minor version bumps in the future. This couples cache keys to the database representation of datetime columns, so we will need to be sure that we never change that in anything other than a major version going forward.

@lsylvester

This comment has been minimized.

Contributor

lsylvester commented Sep 11, 2018

It should be possible to manually format the raw strings from the database into the format required for the cache_version without much overhead and object allocation, but it might need to be database specific and we would need to know that *_before_type_cast was set from the database. For pg, it would just be updated_at_before_type_cast.delete('- :.'), which would only be allocating a single string.

@sgrif

This comment has been minimized.

Member

sgrif commented Sep 12, 2018

@lsylvester And then when we switch to binary format for dates?

@matthewd

This comment has been minimized.

Member

matthewd commented Sep 12, 2018

Invalidating caches in a minor release is as acceptable as in a major: in both cases, it's ideally avoided, but okay if justified. (And it seems like we end up invalidating at least a good chunk for each new release series lately.)

So while it should be a conscious decision, I don't think that's a showstopper. My remaining concern is around the fact that _before_type_cast contains the value.. before type casting, and that that's not necessarily going to match the form in which it would come out of the database. @sgrif caches aside, would something like value_for_database manage to give us the existing/original DB string if available, but also have the best chance of accurately recreating it if needed?

return unless cache_versioning
timestamp = try(:updated_at_before_type_cast)
if timestamp && timestamp.is_a?(String) && cache_timestamp_format == :db_usec

This comment has been minimized.

@lsylvester

lsylvester Oct 8, 2018

Contributor

I am not sure that checking if the timestamp is a String is sufficient - if a user set it to a String that wan't in the desired format then the cache_key wouldn't match before and after save.

In addition database adapters that perform the type casting would no longer get any benefit from this as it will be a Time that will get cast to ActiveSupport::TimeWithZone and back as part of the the formatting of the time.

I think that it is worth checking if the value is from the database as part of this optimisation.

@schneems

This comment has been minimized.

Member

schneems commented Oct 9, 2018

Any idea how to check if it came from the DB or not?

@lsylvester

This comment has been minimized.

Contributor

lsylvester commented Oct 9, 2018

As per #33835 (comment) I think that !updated_at_came_from_user? is pretty close. Only issue is that if we was assigned via multi parameter assignment then it would get past that check (see 1519e97 for reasoning) and updated_at_before_type_cast will return a Hash.

So I think that using the current attribute apis we have

!updated_at_came_from_user? && !updated_at_before_type_cast.is_a?(Hash)

I think that adding a from_database? method to ActiveModel::Attribute/ActiveModel::Attribute::FromDatabase and exposing attribute_came_from_database? methods would be would be worth while.

@schneems

This comment has been minimized.

Member

schneems commented Oct 10, 2018

Thanks a ton, I somehow glossed over that reply. I added behavior to check if it's from the user or not, and. I was already checking for the type of the object.

I added some tests to ensure that the "fast" version only triggers when the value is from the database as opposed to the user. I'm not sure how to trigger the code path that leaves it as a hash. It looks like mysql2 adapter already returns a Time object so this behavior is effectively skipped.

@lsylvester

This comment has been minimized.

Contributor

lsylvester commented Oct 11, 2018

You can set timestamp columns with a hash like:

dev.updated_at = {1 => 2016, 2 => 11, 3 => 12, 4 => 1, 5 => 2, 6 => 3, 7 => 22}
dev.cache_version.to_param #=> "20161112010203000022"
dev.updated_at_before_type_cast #=>   {1 => 2016, 2 => 11, 3 => 12, 4 => 1, 5 => 2, 6 => 3, 7 => 22}

This is used for multi parameter assignment like using datetime_select to get the different parts of the time on the form.

One other thing we might need to check is ActiveRecord::Base.default_timezone. If it is not :utc (which it will be for most apps) I don't think that the cache_version from the database will match the one generated from a user set time.

Other than that I think that this is good to go.

@lsylvester

This comment has been minimized.

Contributor

lsylvester commented Oct 11, 2018

Also, to revisit the breaking cache version format - it is possible to keep the format be directly transforming the string from the database.

I have a microbenchmark for this at https://gist.github.com/lsylvester/32da6af3dc0a117bfc282e8834f89256 - and it shows that keeping the format would be about 1.2x slower than this implementation instead of the 4x slower of implementation before this change.

Calculating -------------------------------------
                 New    292.214k (± 9.0%) i/s -      1.460M in   5.033789s
  New keeping format    250.280k (± 8.5%) i/s -      1.255M in   5.045426s
            Original     69.712k (± 5.5%) i/s -    349.223k in   5.024369s

Comparison:
                 New:   292214.1 i/s
  New keeping format:   250279.6 i/s - same-ish: difference falls within error
            Original:    69711.7 i/s - 4.19x  slower

so we could get most of the benefits of this without breaking the format. Just something to keep in mind.

@schneems schneems force-pushed the schneems:schneems/faster_cache_version branch 2 times, most recently from 819014b to 2f9268b Oct 12, 2018

@schneems

This comment has been minimized.

Member

schneems commented Oct 12, 2018

@lsylvester I updated my test to use a hash. That's a great idea to preserve existing cache key format. I love it! I updated this branch. I also added the guard check for ensuring that the DB is set to :utc and I moved all the checks into their own private method as the line was getting really long.

While implementing these changes I noticed that my memory crept back up with the addition of the !try(:updated_at_came_from_user?) check. That lead me to find another memory improvement in respond_to? for Active Record (Here's the PR #34197). I would appreciate if you all could take a look there. I would like to get that merged in so I can do another benchmark of performance with the extra guard checks and the string mutation.

@lsylvester

This comment has been minimized.

Contributor

lsylvester commented Oct 12, 2018

@schneems I think that we could bypass try altogether - we are calling it up to tree times here - and each one depends on the updated_at being present.

If we returned early unless has_attribute?('updated_at') the calls to try (and respond_to?) would no longer be required.

Otherwise - those changes look good - will make more comments in that PR.

@schneems schneems force-pushed the schneems:schneems/faster_cache_version branch from 0f991d2 to fb858bc Oct 12, 2018

@schneems

This comment has been minimized.

Member

schneems commented Oct 12, 2018

I updated to use the has_attribute? check. Which looks better and removes a few try calls (3 in the worst case). So this patch is no longer dependent on my other perf PR (though #34197 is still worth it).

With this checking logic i'm able to see a (very) statistically significant perf gain of 9.79%. The absolute number here may vary a bit, but there is no doubt that this patch is still fast.

I also discovered an issue with an intermittent failing test. It turns out that postgres (and mybe other adapters) will truncate trailing zeros in the nanosecond field. For instance instead of "2018-10-12 05:00:00.000000" the value "2018-10-12 05:00:00" is returned. When that happens we can detect by checking the length and padding with zeros.

@schneems schneems force-pushed the schneems:schneems/faster_cache_version branch 5 times, most recently from 8eb59c8 to 6c92247 Oct 15, 2018

@schneems schneems changed the title from 5% memory reduction 11% reduction in request time. Reduce allocations in ActiveRecord#cache_key to Use raw time string from DB to generate ActiveRecord#cache_version Oct 16, 2018

@schneems schneems force-pushed the schneems:schneems/faster_cache_version branch from 6c92247 to 9c877b6 Oct 16, 2018

@schneems

This comment has been minimized.

Member

schneems commented Oct 16, 2018

PR is rebased, I left the silent failure commit separate so it can be cherry-picked if we decide not to merge this.

@schneems schneems force-pushed the schneems:schneems/faster_cache_version branch 2 times, most recently from a18d7a2 to 26aaf2b Oct 16, 2018

Show resolved Hide resolved activerecord/lib/active_record/integration.rb Outdated
test "cache_version does call updated_at when it is assigned via a Time object" do
record = CacheMeWithVersion.create
record_from_db = CacheMeWithVersion.find(record.id)

This comment has been minimized.

@lsylvester

lsylvester Oct 16, 2018

Contributor

Do these records need to be saved to the db. I think we should be able to check these behaviours on new records without saving and reloading them.

This comment has been minimized.

@schneems

schneems Oct 17, 2018

Member

In the CacheMeWithVersion.create the object assigned to updated_at is a Time object. That triggers the timestamp.is_a?(String) check which means we skip the "fast" cache behavior so then updated_at is guaranteed to be called.

There might be another way to trigger this behavior that I've not found. This testing logic seemed the closest to the actual behavior I was trying to test (values generated in user space skip my logic, values from the DB use the new logic).

If you want to play around with it, the fastest way to run this test file is running this inside the AR directory:

bundle exec rake test:sqlite3_mem  TEST=test/cases/cache_key_test.rb

This comment has been minimized.

@lsylvester

lsylvester Oct 21, 2018

Contributor

@schneems For the tests that are ready values from the database I think that saving the records is required - but for the tests where values are being set by the user (as a String, Time or Hash) the tests could would with unsaved records. See lsylvester@c4ff214

This comment has been minimized.

@schneems

schneems Oct 21, 2018

Member

The reason I didn't want to go with something like:

record = CacheMeWithVersion.new
assert_called(record, :updated_at) do
  record.updated_at = Time.now.to_s
  record.cache_version
end

Is if the line we're testing gets removed:

record = CacheMeWithVersion.new
assert_called(record, :updated_at) do
  # record.updated_at = Time.now.to_s
  record.cache_version
end

Then the test would still pass, since the value is coming from the user already, as opposed to being from the DB. Essentially without the value being from the db it's possible to make a mistake in modifying test or in modifying the code and have all the tests still pass.

schneems and others added some commits Sep 8, 2018

Use raw time string from DB to generate ActiveRecord#cache_version
Currently, the `updated_at` field is used to generate a `cache_version`. Some database adapters return this timestamp value as a string that must then be converted to a Time value. This process requires a lot of memory and even more CPU time. In the case where this value is only being used for a cache version, we can skip the Time conversion by using the string value directly.

- This PR preserves existing cache format by converting a UTC string from the database to `:usec` format.
- Some databases return an already converted Time object, in those instances, we can directly use `created_at`.
- The `updated_at_before_type_cast` can be a value that comes from either the database or the user. We only want to optimize the case where it is from the database.
- If the format of the cache version has been changed, we cannot apply this optimization, and it is skipped.
- If the format of the time in the database is not UTC, then we cannot use this optimization, and it is skipped.

Some databases (notably PostgreSQL) returns a variable length nanosecond value in the time string. If the value ends in a zero, then it is truncated For instance instead of `2018-10-12 05:00:00.000000` the value `2018-10-12 05:00:00` is returned. We detect this case and pad the remaining zeros to ensure consistent cache version generation.

Before: Total allocated: 743842 bytes (6626 objects)
After: Total allocated: 702955 bytes (6063 objects)

(743842 - 702955) / 743842.0 # => 5.4% ⚡️⚡️⚡️⚡️⚡️

Using the CodeTriage application and derailed benchmarks this PR shows between 9-11% (statistically significant) performance improvement versus the commit before it.

Special thanks to @lsylvester for helping to figure out a way to preserve the usec format and for helping with many implementation details.
Do not silently fail to generate a cache_version
When an `updated_at` column exists on the model, but is not available on the instance (likely due to a select), we should raise an error rather than silently not generating a cache_version. Without this behavior it's likely that cache entries will not be able to be invalidated and this will happen without notice. 

This behavior was reported and described by @lsylvester in #34197 (comment).

@schneems schneems force-pushed the schneems:schneems/faster_cache_version branch from f9b3428 to 73f9cd2 Oct 17, 2018

@sgrif sgrif merged commit 9c4d3c3 into rails:master Nov 28, 2018

2 checks passed

codeclimate All good!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment