New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-33638: Check datastore cache for existence before checking remote datastore #646
Conversation
a58f1e3
to
e25c6e3
Compare
@@ -282,6 +282,35 @@ def should_be_cached(self, entity: Union[DatasetRef, DatasetType, StorageClass]) | |||
""" | |||
raise NotImplementedError() | |||
|
|||
@abstractmethod | |||
def known_to_cache(self, ref: DatasetRef, extension: Optional[str] = None) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wonder whether I should put an underscore in front of this name. It's not a method that we should encourage people using. It's only here as a shortcut for the datastore checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is AbstractDatastoreCacheManager
exposed to "people"? If it is internal to datastores' implementation, then we probably don't care if it's public or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the assumption is that this is something that only datastores care about. Hopfully the "don't use this" caveat in the description is enough.
Codecov Report
@@ Coverage Diff @@
## main #646 +/- ##
==========================================
- Coverage 83.61% 83.60% -0.01%
==========================================
Files 239 239
Lines 30171 30245 +74
Branches 5047 5065 +18
==========================================
+ Hits 25226 25286 +60
- Misses 3799 3806 +7
- Partials 1146 1153 +7
Continue to review full report at Codecov.
|
e25c6e3
to
fed5a83
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, few minor comments.
@@ -282,6 +282,35 @@ def should_be_cached(self, entity: Union[DatasetRef, DatasetType, StorageClass]) | |||
""" | |||
raise NotImplementedError() | |||
|
|||
@abstractmethod | |||
def known_to_cache(self, ref: DatasetRef, extension: Optional[str] = None) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is AbstractDatastoreCacheManager
exposed to "people"? If it is internal to datastores' implementation, then we probably don't care if it's public or not.
cached_location = self._construct_cache_name(ref, extension) | ||
path_in_cache = cached_location.relative_to(self.cache_directory) | ||
assert path_in_cache is not None # For mypy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These three lines feel like you want a separate method _construct_cache_name_relative(ref, extension) -> str
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. When I tried to remember how the cache key was formed I was a bit shocked to see that I repeated the logic in multiple places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although now that I look again I see that these two lines:
cached_location = self._construct_cache_name(ref, extension)
path_in_cache = cached_location.relative_to(self.cache_directory)
are never consecutive in the code other than this new method so it's not going to be a straightforward tweak.
ceb0b4e
to
30a325f
Compare
This tests that you get the warning and removes it from the default output from pytest.
The cache size might be zero with multiple empty files so safer to use the number of files in the cache.
Use a marker default object to indicate "can raise"
This makes the default "known_to_cache" implementation much more efficient at the cost of an additional dict.
30a325f
to
b608633
Compare
Checklist
doc/changes