-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix: use .lower() to make paths & pattern fnmatch case insensitive #275
Bugfix: use .lower() to make paths & pattern fnmatch case insensitive #275
Conversation
Codecov Report
@@ Coverage Diff @@
## main #275 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 37 37
Lines 2777 2782 +5
=========================================
+ Hits 2777 2782 +5
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have time for full review, but it looks like the solution is not proper.
npe2/_plugin_manager.py
Outdated
@@ -135,14 +135,20 @@ def iter_compatible_readers(self, paths: List[str]) -> Iterator[ReaderContributi | |||
return | |||
assert isinstance(path, str) | |||
|
|||
# use lower() to make matching case-insensitive | |||
path = path.lower() | |||
|
|||
if os.path.isdir(path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on UNIX systems (MacOS, Linux) filesystem is cases sensitive:
/Users/
will be converted to /users/
which most probably does not exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, i'm on macOS and it still matches correctly.
But yeah, probably safer to use something like:
base, ext = os.path.splitext(path)
path = os.path.join(base + ext.lower())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isdir
is working correctly? Hmm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both os.path.isdir
and os.path.splitextwork on
.lower()` even though the real paths on my computer have cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can instead just move the .lower()
to the fnmatch
here:
https://github.com/psobolewskiPhD/npe2/blob/b6cf2cf8d72ff287051b1e02cd79f9a8e537ec0f/npe2/_plugin_manager.py#L150
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so this actually does nothing, because paths
is set to lower in io_utils.py
_read
(see below).
So some duplication on my part. Not sure where it's more logical to do this.
io_utils.py
: _read
or read_get_reader
or here in _plugin_manager
(deleting the changes in io_utils
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, i avoid the duplication now. lower()
is used in _read
only, I think this makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that was annoying because _read can take a str or a list.
I put the check back in _plugin_manager, in the correct _iter_compatible_readers
(sorry made a mess of the commits!)
npe2/_plugin_manager.py
Outdated
yield from {r for pattern, r in self._readers if fnmatch(path, pattern)} | ||
# match against pattern.lower() to make matching case insensitive | ||
yield from { | ||
r for pattern, r in self._readers if fnmatch(path, pattern.lower()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r for pattern, r in self._readers if fnmatch(path, pattern.lower()) | |
r for pattern, r in self._readers if fnmatch(path, pattern.lower()) or fnmatch(path, pattern.upper()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh, instead of doing .lower() on paths in iter_compatible_readers?
clever.
But what if there's mixed case?
like file.Jpg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe https://docs.python.org/3/library/fnmatch.html#fnmatch.translate and run re.match
with https://docs.python.org/3/library/re.html#re.IGNORECASE ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems really complex, is it better than just lower casing for the comparison?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean fnmatch(path.lower(), pattern.lower())
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea. it's the simplest and I think it should work...
(not sure why test is failing... napari-ndtiffs was just updated though 😬)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, using path.lower(), pattern.lower()
in the fnmatch works in napari, but not in the existing tests i mimicked, because, as far as I can tell, the sample_plugin hard-codes lower case extension in the reader function:
npe2/tests/sample/my_plugin/__init__.py
Lines 24 to 41 in d7329b7
def get_reader(path: PathOrPaths): | |
if isinstance(path, list): | |
def read(path): | |
assert isinstance(path, list) | |
return [(None,)] | |
return read | |
assert isinstance(path, str) # please mypy. | |
if path.endswith(".fzzy"): | |
def read(path): | |
assert isinstance(path, str) | |
return [(None,)] | |
return read | |
else: | |
raise ValueError("Test plugin should not receive unknown data") |
My tests error with E ValueError: Test plugin should not receive unknown data
so I think this is actually the expected result of everything working properly.
By using .lower()
only in the fnmatch
the plugin gets the real path and in this case, errors. On the other hand, making the actual path have ext.lower()
as before, would make sample_plugin handle this case and the new tests would pass.
So options are to assert something else, go back to .lower() on the path, or use .lower() in sample_plugin to make the actual plugin also case insensitive.
I think I lean towards option 1 or option 3. npe2 should be case-insensitive, but if a plugin is actually case sensitive for some reason, then it should be allowed to raise an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I don't think we can/should distinguish between path being case insensitive and pattern being case insensitive. I think our reading in general should be either case sensitive or case insensitive. I think the same should be true of plugins (so, plugins should be case insensitive), but the get_reader
functions complicate things a little, since they may be checking things about the path that are not true if we try to enforce case insensitivity by passing path.lower()
.
I think we should make paths fully case insensitive for reading so:
- match
pattern.lower()
againstext.lower()
- we always pass through the unchanged path because I think that makes most sense
- add docs to contribution guide for readers to advise that pattern matching will be case insensitive
- and say that we encourage readers to also be case insensitive
- but if you desperately need case sensitivity you could check in your
get_reader
function (but I don't think we even mention this tbh)
- we update the cookiecutter reader to recommend case insensitivity
I had a look through all the filename_patterns
being declared. Only 5 out of 92 readers declare any pattern with capital letters in it:
napari-annotatorj: ['<EDIT_ME>']
napari-deepfinder: ['*.mrc', '*.map', '*.rec', '*.h5', '*.tif', '*.TIF', '*.xml', '*.ods', '*.xls', '*.xlsx']
napari-pdr-reader: ['*.fits', '*.FITS', '*.lbl', '*.img', '*.LBL', '*.IMG']
napari-rioxarray: ['*.vrt', '*.tif', '*.tiff', '*.TIF', '*.TIFF', '*.img', '*.lbl', '*.cub', '*.fits', '*.IMG', '*.LBL', '*.CUB', '*.FITS']
napari-tomocube-data-viewer: ['*.TCF']
The first is clearly a placeholder. napari-deepfinder
explicitly makes sure it's case insensitive for TIFs, napari-pdr-reader
also makes its patterns case insensitive, and so does napari-rioxarray
for all but one of the extensions - but it then goes through and lowercases the path as it comes in anyway. I couldn't find the repo for napari-tomocube-data-viewer
so not sure if it truly is case sensitive or not. Anyway, given the above, I would say we adopt case insensitivity for both patterns and paths, and update docs accordingly.
Test fails are same as this PR: #276 |
Yes. I think It is related. The best will be create separate PR with fix. @tlambert03 what you think will be best. Pin |
Ah, hah! |
20c2f58
to
b140215
Compare
OK, I may have totally botched this with trying to reconcile the ...but I think I have it now how we discussed:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good @psobolewskiPhD, and I checked that it works! I left a comment about making the tests more explicit. Also I think we should update the contribution docs here to mention the case insensitivity - I'm pretty sure these get built into the reference on napari.org
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should create at least the following tests and get working code for this:
some_directory_name.FINAL
,some_directory_name.Final
- should return directory readersome_zarr_directory.zarr
,some_zarr_directory.ZARR
,some_zarr_directory.Zarr
should return zarr reader (test may need to implement dummy reader).some_two_ext_file.tar.gz
,some_two_ext_file.TAR.gz
,some_two_ext_file.TAR.GZ
, etc
The current solution will not catch properly any of these tests.
The only tests that are added to this PR do not test these cases and will not prevent future break of this use case
handle double extentions per @Czaki
I re-did the tests per @DragaDoncila and @Czaki which revealed that double extensions were not handled (only the last). So now I switched to use I think this covers everything? |
Why .zarr test do not contain |
Because I forgot? 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the additional tests @psobolewskiPhD this looks good to me! I'm not sure longer term if we want to make all paths totally case insensitive (rather than just the extension), but I think for now the extensions alone is a good start. I think it's more complex once you move out to the whole path because it's OS and sometimes even program dependent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is not only to validate the current implementation but mainly to prevent introducing regression in another commit.
I add a suggestion to create dummy dirs in test.
Maybe we should also create dummy tiff and tar.gz files
Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
tests/test__io_utils.py
Outdated
def test_read_uppercase_extension(tmp_path: Path): | ||
pm = PluginManager() | ||
plugin = DynamicPlugin("tif-plugin", plugin_manager=pm) | ||
|
||
path = "something.TIF" | ||
mock_file = tmp_path / path | ||
mock_file.touch() | ||
|
||
# reader should be compatible despite lowercase pattern | ||
@plugin.contribute.reader(filename_patterns=["*.tif"]) | ||
def get_read(path): | ||
def get_read(path=mock_file): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the sort of thing you had in mind @Czaki ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used the same below for tar.gz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great
First shot at making the matching of paths to the declared extensions case insensitive.
Closes: napari/napari#5663
Closes #271
@DragaDoncila I'm not sure this is how you had it in mind?
I'm not super familiar with the code base so not sure I'm approaching it the right way vs. just bandaid bugfix.
I added two tests and make fixes to get the tests to pass.
Maybe this is too kludgey...