Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up Zarr3 data reading by fixing VaultPath equality check #7363

Merged
merged 7 commits into from
Oct 5, 2023

Conversation

fm3
Copy link
Member

@fm3 fm3 commented Oct 4, 2023

Different instances of VaultPath were not equal, leading to the caching mechanism in Zarr3Array.scala not working as intended (there, VaultPath objects are used as cache keys). So the Shard index was parsed anew in every single chunk request, and stored in memory hundreds of times.

This PR implements content-dependent equals and hashCode methods for VaultPath, FileSystemVaultPath, and the DataVaults themselves, enabling the cache to work properly.

URL of deployed dev instance (used for testing):

Steps to test:

  • Zarr3 data loading (at least from localhost) should be faster now (I got 5–15 times speedup for l4_sample)
  • Tested that the caching works now by adding logging to cache miss path _ = logger.info(s"cache miss! shardPath: $shardPath") in Zarr3Array.scala line 119

  • Updated changelog
  • Removed dev-only changes like prints and application.conf edits
  • Considered common edge cases
  • Needs datastore update after deployment

@@ -76,4 +77,9 @@ class VaultPath(uri: URI, dataVault: DataVault) extends LazyLogging {
override def toString: String = uri.toString

def summary: String = s"VaultPath: ${this.toString} for ${dataVault.getClass.getSimpleName}"

override def equals(obj: Any): Boolean = hashCode() == obj.hashCode()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this collision-safe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, should be now :)

@fm3 fm3 marked this pull request as ready for review October 4, 2023 12:36
override def hashCode(): Int =
new HashCodeBuilder(19, 31).toHashCode

override def equals(obj: Any): Boolean = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:trollface:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self: this should still do the typecheck with match

@fm3 fm3 enabled auto-merge (squash) October 5, 2023 09:41
@fm3 fm3 merged commit 815c34e into master Oct 5, 2023
2 checks passed
@fm3 fm3 deleted the vault-path-cache-key branch October 5, 2023 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants