Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improves cache miss on disk cache by two orders of magnetude on avera… #202

Merged
merged 4 commits into from
Oct 4, 2017

Conversation

garrettmoon
Copy link
Collaborator

…ge on an iPhone 6. Essentially just adds an in memory map of what is known to be on disk.

This is actually meaningful in practice for us as we have lots of cache misses.

…ge on an iPhone 6. Essentially just adds an in memory map of what is known to be on disk.
@appleguy
Copy link
Contributor

appleguy commented Oct 2, 2017

Hooray! Yes, definitely meaningful in practice - those file existence checks are worth caring about due to the check occurring synchronously on main.

Copy link
Contributor

@appleguy appleguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very excited about this diff, thanks for the work Garrett!

@@ -451,10 +455,14 @@ - (void)_locked_initializeDiskProperties
[_sizes setObject:fileSize forKey:key];
byteCount += [fileSize unsignedIntegerValue];
}

[_knownKeys addObject:key];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@garrettmoon one suggestion that would improve performance: use a single NSMutableDictionary, with a C struct for the date & byte size.

The struct would contain an NSInteger byteCount (with a sentinel value of -1 if not set), an NSTimeInterval or CFAbsoluteTime for the date. NSValue could wrap it with sizeof(CacheEntryMetadataStruct).

The benefits:

  • Replace setting & accessing 2x NSDictionaries and 1x NSSet with just 1 dict.
  • Extensible for future metadata storage
  • State management a bit more clear and centralized; especially, only one data structure access to lock.

We should definitely land this PR anyway, but figured it was worth a mention!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea!, but I'm not sure about the struct, I think a custom object like PINDiskMetadata would be better and more inline with the objective-c-y nature.

@@ -824,7 +833,10 @@ - (void)synchronouslyLockFileAccessWhileExecutingBlock:(PINCacheBlock)block

- (BOOL)containsObjectForKey:(NSString *)key
{
return ([self fileURLForKey:key updateFileModificationDate:NO] != nil);
if ([_knownKeys containsObject:key]) {
return ([self fileURLForKey:key updateFileModificationDate:NO] != nil);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we skip this fileURLForKey: check, or maybe convert it into an assertion? If it is expected that this could fail, it would be good to add a comment here describing what case that occurs in.

Because the fileURL generation does show up in profiles, it seems like a shame to generate it here and then have to regenerate it in the calling code.

It might be possible to just ensure _knownKeys is kept up to date, and if an access is attempted and the file isn't present, then remove the key from _knownKeys (and throw an assertion there too?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

knownKeys actually can't be guaranteed because the underlying directory is stored in a cache directory which iOS could empty at any time. Sadly I've found little official documentation on when this actually happens in practice, and I assume it doesn't occur while your app is running, but…

If it's not in knownKeys it's definitely not on disk, but if it is in knownKeys it might not be on disk.

Also of note, containsObjectForKey is not used internally by objectForKey (and in fact is not used internally at all) so this check doesn't happen twice in that case.

@@ -843,9 +855,10 @@ - (id)objectForKeyedSubscript:(NSString *)key

[self lock];
BOOL isEmpty = (_dates.count == 0 && _sizes.count == 0);
BOOL containsKey = [_knownKeys containsObject:key];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we are relying on _knownKeys instead of calling containsObjectForKey. Definitely much more efficient, and suggests maybe we can use this as the implementation of the method above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully my above comment explains why we can't?

@@ -991,6 +1005,8 @@ - (void)setObject:(id <NSCoding>)object forKey:(NSString *)key fileURL:(NSURL **
[self->_dates setObject:date forKey:key];
}

[_knownKeys addObject:key];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of collection access in this method - both reading from the two dictionaries and writing to all three collections. This path would benefit the most from combining the structures into one dictionary.

Copy link
Collaborator

@maicki maicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -540,7 +548,9 @@ - (void)trimDiskToSize:(NSUInteger)trimByteCount
{
[self lock];
if (_byteCount > trimByteCount) {
NSArray *keysSortedBySize = [_sizes keysSortedByValueUsingSelector:@selector(compare:)];
NSArray *keysSortedBySize = [_metadata keysSortedByValueUsingComparator:^NSComparisonResult(PINDiskCacheMetadata * _Nonnull obj1, PINDiskCacheMetadata * _Nonnull obj2) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting case I think were it would be fun to look into some algorithms that would not just take into account the byte size but also the number of accesses and the last access date. Could be a potential for an optimization for bigger files that would be trimmed very frequently but also accessed very often. Just a thought.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally, would be kinda neat to have a bunch of different caching strategies that could be configured.

@garrettmoon garrettmoon merged commit 98348aa into master Oct 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants