Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: use .lower() to make paths & pattern fnmatch case insensitive #275

Merged

Conversation

psobolewskiPhD
Copy link
Member

First shot at making the matching of paths to the declared extensions case insensitive.
Closes: napari/napari#5663
Closes #271

@DragaDoncila I'm not sure this is how you had it in mind?
I'm not super familiar with the code base so not sure I'm approaching it the right way vs. just bandaid bugfix.

I added two tests and make fixes to get the tests to pass.
Maybe this is too kludgey...

@codecov
Copy link

codecov bot commented Mar 24, 2023

Codecov Report

Merging #275 (e21264d) into main (fd4ab17) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main      #275   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           37        37           
  Lines         2777      2782    +5     
=========================================
+ Hits          2777      2782    +5     
Impacted Files Coverage Δ
src/npe2/_plugin_manager.py 100.00% <100.00%> (ø)

Copy link
Collaborator

@Czaki Czaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have time for full review, but it looks like the solution is not proper.

@@ -135,14 +135,20 @@ def iter_compatible_readers(self, paths: List[str]) -> Iterator[ReaderContributi
return
assert isinstance(path, str)

# use lower() to make matching case-insensitive
path = path.lower()

if os.path.isdir(path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on UNIX systems (MacOS, Linux) filesystem is cases sensitive:

/Users/ will be converted to /users/ which most probably does not exists.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, i'm on macOS and it still matches correctly.
But yeah, probably safer to use something like:

base, ext = os.path.splitext(path)
path = os.path.join(base + ext.lower())

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isdir is working correctly? Hmm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both os.path.isdir and os.path.splitextwork on.lower()` even though the real paths on my computer have cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so this actually does nothing, because paths is set to lower in io_utils.py _read (see below).
So some duplication on my part. Not sure where it's more logical to do this.
io_utils.py: _read or read_get_reader
or here in _plugin_manager (deleting the changes in io_utils)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, i avoid the duplication now. lower() is used in _read only, I think this makes more sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that was annoying because _read can take a str or a list.
I put the check back in _plugin_manager, in the correct _iter_compatible_readers
(sorry made a mess of the commits!)

npe2/_plugin_manager.py Outdated Show resolved Hide resolved
yield from {r for pattern, r in self._readers if fnmatch(path, pattern)}
# match against pattern.lower() to make matching case insensitive
yield from {
r for pattern, r in self._readers if fnmatch(path, pattern.lower())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
r for pattern, r in self._readers if fnmatch(path, pattern.lower())
r for pattern, r in self._readers if fnmatch(path, pattern.lower()) or fnmatch(path, pattern.upper())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, instead of doing .lower() on paths in iter_compatible_readers?
clever.

But what if there's mixed case?
like file.Jpg

Copy link
Collaborator

@Czaki Czaki Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems really complex, is it better than just lower casing for the comparison?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean fnmatch(path.lower(), pattern.lower())?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea. it's the simplest and I think it should work...
(not sure why test is failing... napari-ndtiffs was just updated though 😬)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, using path.lower(), pattern.lower() in the fnmatch works in napari, but not in the existing tests i mimicked, because, as far as I can tell, the sample_plugin hard-codes lower case extension in the reader function:

def get_reader(path: PathOrPaths):
if isinstance(path, list):
def read(path):
assert isinstance(path, list)
return [(None,)]
return read
assert isinstance(path, str) # please mypy.
if path.endswith(".fzzy"):
def read(path):
assert isinstance(path, str)
return [(None,)]
return read
else:
raise ValueError("Test plugin should not receive unknown data")

My tests error with E ValueError: Test plugin should not receive unknown data so I think this is actually the expected result of everything working properly.
By using .lower() only in the fnmatch the plugin gets the real path and in this case, errors. On the other hand, making the actual path have ext.lower() as before, would make sample_plugin handle this case and the new tests would pass.
So options are to assert something else, go back to .lower() on the path, or use .lower() in sample_plugin to make the actual plugin also case insensitive.

I think I lean towards option 1 or option 3. npe2 should be case-insensitive, but if a plugin is actually case sensitive for some reason, then it should be allowed to raise an error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I don't think we can/should distinguish between path being case insensitive and pattern being case insensitive. I think our reading in general should be either case sensitive or case insensitive. I think the same should be true of plugins (so, plugins should be case insensitive), but the get_reader functions complicate things a little, since they may be checking things about the path that are not true if we try to enforce case insensitivity by passing path.lower().

I think we should make paths fully case insensitive for reading so:

  • match pattern.lower() against ext.lower()
  • we always pass through the unchanged path because I think that makes most sense
  • add docs to contribution guide for readers to advise that pattern matching will be case insensitive
    • and say that we encourage readers to also be case insensitive
    • but if you desperately need case sensitivity you could check in your get_reader function (but I don't think we even mention this tbh)
  • we update the cookiecutter reader to recommend case insensitivity

I had a look through all the filename_patterns being declared. Only 5 out of 92 readers declare any pattern with capital letters in it:

napari-annotatorj: ['<EDIT_ME>']
napari-deepfinder: ['*.mrc', '*.map', '*.rec', '*.h5', '*.tif', '*.TIF', '*.xml', '*.ods', '*.xls', '*.xlsx']
napari-pdr-reader: ['*.fits', '*.FITS', '*.lbl', '*.img', '*.LBL', '*.IMG']
napari-rioxarray: ['*.vrt', '*.tif', '*.tiff', '*.TIF', '*.TIFF', '*.img', '*.lbl', '*.cub', '*.fits', '*.IMG', '*.LBL', '*.CUB', '*.FITS']
napari-tomocube-data-viewer: ['*.TCF']

The first is clearly a placeholder. napari-deepfinder explicitly makes sure it's case insensitive for TIFs, napari-pdr-reader also makes its patterns case insensitive, and so does napari-rioxarray for all but one of the extensions - but it then goes through and lowercases the path as it comes in anyway. I couldn't find the repo for napari-tomocube-data-viewer so not sure if it truly is case sensitive or not. Anyway, given the above, I would say we adopt case insensitivity for both patterns and paths, and update docs accordingly.

@psobolewskiPhD
Copy link
Member Author

Test fails are same as this PR: #276
I think it may be related to napari-ndtiffs being updated?
https://github.com/tlambert03/napari-ndtiffs
Will try to test some tomorrow

@Czaki
Copy link
Collaborator

Czaki commented Apr 1, 2023

I think it may be related to napari-ndtiffs being updated?

Yes. I think It is related.

The best will be create separate PR with fix.

@tlambert03 what you think will be best. Pin napari-ndtiffs to older release or remove this test?

@tlambert03
Copy link
Member

Ah, hah!
Sorry about that, who'd have thought updating a package to npe2 would break npe tests 😂 (I know I know, my own damn fault, it was a shortcut for testing the npe1 adapter). I don't have a preference for how it's fixed. Both pinning or deleting the test sound like a good idea... a fixture is probably the "best" but more laborious. So maybe just pin

@psobolewskiPhD psobolewskiPhD force-pushed the bugfix/make_readers_case_insensitive branch from 20c2f58 to b140215 Compare April 6, 2023 20:05
@psobolewskiPhD
Copy link
Member Author

OK, I may have totally botched this with trying to reconcile the src change in conflicts...

...but I think I have it now how we discussed:

  1. ext.lower() is used for comparisons to the pattern
  2. but the actual path is passed to the plugin
    So as a result I made the tests expect the ValueError from sample plugin.

Copy link
Contributor

@DragaDoncila DragaDoncila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good @psobolewskiPhD, and I checked that it works! I left a comment about making the tests more explicit. Also I think we should update the contribution docs here to mention the case insensitivity - I'm pretty sure these get built into the reference on napari.org

tests/test__io_utils.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@Czaki Czaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should create at least the following tests and get working code for this:

  1. some_directory_name.FINAL, some_directory_name.Final - should return directory reader
  2. some_zarr_directory.zarr, some_zarr_directory.ZARR, some_zarr_directory.Zarr should return zarr reader (test may need to implement dummy reader).
  3. some_two_ext_file.tar.gz, some_two_ext_file.TAR.gz, some_two_ext_file.TAR.GZ, etc

The current solution will not catch properly any of these tests.
The only tests that are added to this PR do not test these cases and will not prevent future break of this use case

tests/test__io_utils.py Outdated Show resolved Hide resolved
@psobolewskiPhD
Copy link
Member Author

I re-did the tests per @DragaDoncila and @Czaki which revealed that double extensions were not handled (only the last). So now I switched to use pathlib.Path.suffixes.
Further, URI were also not handled, so I check using urllib.parse.urlparse(path).scheme first, before modifying the path.

I think this covers everything?

@Czaki
Copy link
Collaborator

Czaki commented Apr 29, 2023

Why .zarr test do not contain accepts_directories when the final contains?

@psobolewskiPhD
Copy link
Member Author

Why .zarr test do not contain accepts_directories when the final contains?

Because I forgot? 😅
Fixed, I think.

Copy link
Contributor

@DragaDoncila DragaDoncila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional tests @psobolewskiPhD this looks good to me! I'm not sure longer term if we want to make all paths totally case insensitive (rather than just the extension), but I think for now the extensions alone is a good start. I think it's more complex once you move out to the whole path because it's OS and sometimes even program dependent.

tests/test__io_utils.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@Czaki Czaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is not only to validate the current implementation but mainly to prevent introducing regression in another commit.

I add a suggestion to create dummy dirs in test.

Maybe we should also create dummy tiff and tar.gz files

tests/test__io_utils.py Outdated Show resolved Hide resolved
tests/test__io_utils.py Outdated Show resolved Hide resolved
tests/test__io_utils.py Outdated Show resolved Hide resolved
tests/test__io_utils.py Outdated Show resolved Hide resolved
tests/test__io_utils.py Outdated Show resolved Hide resolved
psobolewskiPhD and others added 7 commits May 3, 2023 15:39
Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
Comment on lines 100 to 110
def test_read_uppercase_extension(tmp_path: Path):
pm = PluginManager()
plugin = DynamicPlugin("tif-plugin", plugin_manager=pm)

path = "something.TIF"
mock_file = tmp_path / path
mock_file.touch()

# reader should be compatible despite lowercase pattern
@plugin.contribute.reader(filename_patterns=["*.tif"])
def get_read(path):
def get_read(path=mock_file):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the sort of thing you had in mind @Czaki ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used the same below for tar.gz

Copy link
Collaborator

@Czaki Czaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great

@psobolewskiPhD psobolewskiPhD added bug Something isn't working enhancement New feature or request tests related to testing or CI and removed enhancement New feature or request labels May 3, 2023
@Czaki Czaki merged commit b33bc09 into napari:main May 7, 2023
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working tests related to testing or CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't open upper case file extension (e.g. .JPEG) filename_patterns for readers should be case insensitive
4 participants