Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use MD5 for creating cache keys #1396

Merged
merged 14 commits into from
May 8, 2018
Merged

Use MD5 for creating cache keys #1396

merged 14 commits into from
May 8, 2018

Conversation

bsudekum
Copy link
Contributor

@bsudekum bsudekum commented May 7, 2018

Beginning to fix #1391

As pointed out by @akitchen, cacheKeyForKey is most likely too naive.

private func cacheKeyForKey(_ key: String) -> String {
if let keyAsURL = URL(string: key) {
return String.init(keyAsURL.lastPathComponent.hashValue)
}
return String.init(key.hashValue)
}

  • add failing test case
  • actually fix code

/cc @mapbox/navigation-ios

}

return String.init(key.hashValue)
return MD5(key)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This single line is the actual fix. As pointed out by @akitchen, there are some shortcuts String.hashValue takes some shortcuts and things in the middle of the string may not impact the hash.

if let keyAsURL = URL(string: key) {
return String.init(keyAsURL.lastPathComponent.hashValue)
return MD5(keyAsURL.lastPathComponent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm surprised to see us still keying off of lastPathComponent -- I thought this was part of what was biting us. (?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I am aware of. It really only came down to two different strings having the same hashValue: #1391 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to remove this check though. I think creating an md5 of the key here would be okay.

Copy link
Contributor

@akitchen akitchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK so long as we have a diverse-enough set of test cases that show us not colliding. I totally forgot about the NSString hashing quirk (where long strings with the same beginning and end text could have the same hash)

}

return String.init(key.hashValue)
return MD5(key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do voice instructions fall through this code path? Asking for a friend...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, only voice instructions. Highway shields are keyed off of the url.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you

@akitchen
Copy link
Contributor

akitchen commented May 8, 2018

Maybe adjust the PR title before merging?

@bsudekum bsudekum changed the title Add failing test case for cacheKeyForKey Use MD5 for creating cache keys May 8, 2018
func testCacheKeyForKey() {
let threeMileInstruction = "<speak><amazon:effect name=\"drc\"><prosody rate=\"1.08\">Continue on <say-as interpret-as=\"address\">I-80</say-as> East for 3 miles</prosody></amazon:effect></speak>"
let sixMileInstruction = "<speak><amazon:effect name=\"drc\"><prosody rate=\"1.08\">Continue on <say-as interpret-as=\"address\">I-80</say-as> East for 6 miles</prosody></amazon:effect></speak>"
XCTAssertNotEqual(cache.fileCache.cacheKeyForKey(threeMileInstruction), cache.fileCache.cacheKeyForKey(sixMileInstruction))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great test case, this is the one that nails it.

measure {
for _ in 0...1000 {
_ = cache.fileCache.cacheKeyForKey(instructionTurn)
_ = cache.fileCache.cacheKeyForKey(instructionContinue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you post times from master and this branch for comparison?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it was 0.001 seconds for the same test.


/*
* Convert a raw string to an array of little-endian words
* Characters >255 have their high-byte silently ignored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do care about the high byte. Add test cases for non-ASCII characters, all the way up to emoji like 🛡 (U+1F6E1). Although emoji would be somewhat contrived for an OSM-based data set, there are plenty of multibyte characters in OSM, especially in Chinese and Japanese.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we’re no longer using this code, but a test case would still be good for peace of mind.

@bsudekum bsudekum merged commit 4184cf6 into master May 8, 2018
@1ec5 1ec5 deleted the fix-cache-key branch December 5, 2019 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect voice instruction read from cache
4 participants