Bugfix: use .lower() to make paths & pattern fnmatch case insensitive #275

psobolewskiPhD · 2023-03-24T20:28:23Z

First shot at making the matching of paths to the declared extensions case insensitive.
Closes: napari/napari#5663
Closes #271

@DragaDoncila I'm not sure this is how you had it in mind?
I'm not super familiar with the code base so not sure I'm approaching it the right way vs. just bandaid bugfix.

I added two tests and make fixes to get the tests to pass.
Maybe this is too kludgey...

codecov · 2023-03-24T20:33:12Z

Codecov Report

Merging #275 (e21264d) into main (fd4ab17) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main      #275   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           37        37           
  Lines         2777      2782    +5     
=========================================
+ Hits          2777      2782    +5

Impacted Files	Coverage Δ
src/npe2/_plugin_manager.py	`100.00% <100.00%> (ø)`

Czaki

I do not have time for full review, but it looks like the solution is not proper.

Czaki · 2023-03-27T15:39:04Z

npe2/_plugin_manager.py

@@ -135,14 +135,20 @@ def iter_compatible_readers(self, paths: List[str]) -> Iterator[ReaderContributi
            return
        assert isinstance(path, str)

+        # use lower() to make matching case-insensitive
+        path = path.lower()
+
        if os.path.isdir(path):


on UNIX systems (MacOS, Linux) filesystem is cases sensitive:

/Users/ will be converted to /users/ which most probably does not exists.

Hmm, i'm on macOS and it still matches correctly.
But yeah, probably safer to use something like:

base, ext = os.path.splitext(path) path = os.path.join(base + ext.lower())

isdir is working correctly? Hmm.

both os.path.isdir and os.path.splitextwork on.lower()` even though the real paths on my computer have cases.

Can instead just move the .lower() to the fnmatch here:
https://github.com/psobolewskiPhD/npe2/blob/b6cf2cf8d72ff287051b1e02cd79f9a8e537ec0f/npe2/_plugin_manager.py#L150

OK, so this actually does nothing, because paths is set to lower in io_utils.py _read (see below).
So some duplication on my part. Not sure where it's more logical to do this.
io_utils.py: _read or read_get_reader
or here in _plugin_manager (deleting the changes in io_utils)

Ok, i avoid the duplication now. lower() is used in _read only, I think this makes more sense.

OK, that was annoying because _read can take a str or a list.
I put the check back in _plugin_manager, in the correct _iter_compatible_readers
(sorry made a mess of the commits!)

npe2/_plugin_manager.py

Czaki · 2023-04-01T20:00:57Z

npe2/_plugin_manager.py

-            yield from {r for pattern, r in self._readers if fnmatch(path, pattern)}
+            # match against pattern.lower() to make matching case insensitive
+            yield from {
+                r for pattern, r in self._readers if fnmatch(path, pattern.lower())


Suggested change

r for pattern, r in self._readers if fnmatch(path, pattern.lower())

r for pattern, r in self._readers if fnmatch(path, pattern.lower()) or fnmatch(path, pattern.upper())

huh, instead of doing .lower() on paths in iter_compatible_readers?
clever.

But what if there's mixed case?
like file.Jpg

Maybe https://docs.python.org/3/library/fnmatch.html#fnmatch.translate and run re.match with https://docs.python.org/3/library/re.html#re.IGNORECASE ?

seems really complex, is it better than just lower casing for the comparison?

you mean fnmatch(path.lower(), pattern.lower())?

yea. it's the simplest and I think it should work...
(not sure why test is failing... napari-ndtiffs was just updated though 😬)

Ok, using path.lower(), pattern.lower() in the fnmatch works in napari, but not in the existing tests i mimicked, because, as far as I can tell, the sample_plugin hard-codes lower case extension in the reader function:

npe2/tests/sample/my_plugin/__init__.py

Lines 24 to 41 in d7329b7

def get_reader(path: PathOrPaths):

if isinstance(path, list):

def read(path):

assert isinstance(path, list)

return [(None,)]

return read

assert isinstance(path, str) # please mypy.

if path.endswith(".fzzy"):

def read(path):

assert isinstance(path, str)

return [(None,)]

return read

else:

raise ValueError("Test plugin should not receive unknown data")

My tests error with E ValueError: Test plugin should not receive unknown data so I think this is actually the expected result of everything working properly.
By using .lower() only in the fnmatch the plugin gets the real path and in this case, errors. On the other hand, making the actual path have ext.lower() as before, would make sample_plugin handle this case and the new tests would pass.
So options are to assert something else, go back to .lower() on the path, or use .lower() in sample_plugin to make the actual plugin also case insensitive.

I think I lean towards option 1 or option 3. npe2 should be case-insensitive, but if a plugin is actually case sensitive for some reason, then it should be allowed to raise an error?

Hmm I don't think we can/should distinguish between path being case insensitive and pattern being case insensitive. I think our reading in general should be either case sensitive or case insensitive. I think the same should be true of plugins (so, plugins should be case insensitive), but the get_reader functions complicate things a little, since they may be checking things about the path that are not true if we try to enforce case insensitivity by passing path.lower().

I think we should make paths fully case insensitive for reading so:

match pattern.lower() against ext.lower()

we always pass through the unchanged path because I think that makes most sense

add docs to contribution guide for readers to advise that pattern matching will be case insensitive

and say that we encourage readers to also be case insensitive

but if you desperately need case sensitivity you could check in your get_reader function (but I don't think we even mention this tbh)

we update the cookiecutter reader to recommend case insensitivity

I had a look through all the filename_patterns being declared. Only 5 out of 92 readers declare any pattern with capital letters in it:

napari-annotatorj: ['<EDIT_ME>'] napari-deepfinder: ['*.mrc', '*.map', '*.rec', '*.h5', '*.tif', '*.TIF', '*.xml', '*.ods', '*.xls', '*.xlsx'] napari-pdr-reader: ['*.fits', '*.FITS', '*.lbl', '*.img', '*.LBL', '*.IMG'] napari-rioxarray: ['*.vrt', '*.tif', '*.tiff', '*.TIF', '*.TIFF', '*.img', '*.lbl', '*.cub', '*.fits', '*.IMG', '*.LBL', '*.CUB', '*.FITS'] napari-tomocube-data-viewer: ['*.TCF']

The first is clearly a placeholder. napari-deepfinder explicitly makes sure it's case insensitive for TIFs, napari-pdr-reader also makes its patterns case insensitive, and so does napari-rioxarray for all but one of the extensions - but it then goes through and lowercases the path as it comes in anyway. I couldn't find the repo for napari-tomocube-data-viewer so not sure if it truly is case sensitive or not. Anyway, given the above, I would say we adopt case insensitivity for both patterns and paths, and update docs accordingly.

psobolewskiPhD · 2023-04-01T22:01:04Z

Test fails are same as this PR: #276
I think it may be related to napari-ndtiffs being updated?
https://github.com/tlambert03/napari-ndtiffs
Will try to test some tomorrow

Czaki · 2023-04-01T22:27:46Z

I think it may be related to napari-ndtiffs being updated?

Yes. I think It is related.

The best will be create separate PR with fix.

@tlambert03 what you think will be best. Pin napari-ndtiffs to older release or remove this test?

tlambert03 · 2023-04-02T01:17:10Z

Ah, hah!
Sorry about that, who'd have thought updating a package to npe2 would break npe tests 😂 (I know I know, my own damn fault, it was a shortcut for testing the npe1 adapter). I don't have a preference for how it's fixed. Both pinning or deleting the test sound like a good idea... a fixture is probably the "best" but more laborious. So maybe just pin

psobolewskiPhD · 2023-04-06T20:07:42Z

OK, I may have totally botched this with trying to reconcile the src change in conflicts...

...but I think I have it now how we discussed:

ext.lower() is used for comparisons to the pattern
but the actual path is passed to the plugin
So as a result I made the tests expect the ValueError from sample plugin.

DragaDoncila

This looks good @psobolewskiPhD, and I checked that it works! I left a comment about making the tests more explicit. Also I think we should update the contribution docs here to mention the case insensitivity - I'm pretty sure these get built into the reference on napari.org

tests/test__io_utils.py

Czaki

We should create at least the following tests and get working code for this:

some_directory_name.FINAL, some_directory_name.Final - should return directory reader
some_zarr_directory.zarr, some_zarr_directory.ZARR, some_zarr_directory.Zarr should return zarr reader (test may need to implement dummy reader).
some_two_ext_file.tar.gz, some_two_ext_file.TAR.gz, some_two_ext_file.TAR.GZ, etc

The current solution will not catch properly any of these tests.
The only tests that are added to this PR do not test these cases and will not prevent future break of this use case

tests/test__io_utils.py

@Czaki

handle double extentions per @Czaki

psobolewskiPhD · 2023-04-29T21:39:41Z

I re-did the tests per @DragaDoncila and @Czaki which revealed that double extensions were not handled (only the last). So now I switched to use pathlib.Path.suffixes.
Further, URI were also not handled, so I check using urllib.parse.urlparse(path).scheme first, before modifying the path.

I think this covers everything?

Czaki · 2023-04-29T21:44:53Z

Why .zarr test do not contain accepts_directories when the final contains?

psobolewskiPhD · 2023-04-30T09:32:21Z

Why .zarr test do not contain accepts_directories when the final contains?

Because I forgot? 😅
Fixed, I think.

DragaDoncila

Thanks for the additional tests @psobolewskiPhD this looks good to me! I'm not sure longer term if we want to make all paths totally case insensitive (rather than just the extension), but I think for now the extensions alone is a good start. I think it's more complex once you move out to the whole path because it's OS and sometimes even program dependent.

tests/test__io_utils.py

Czaki

The test is not only to validate the current implementation but mainly to prevent introducing regression in another commit.

I add a suggestion to create dummy dirs in test.

Maybe we should also create dummy tiff and tar.gz files

tests/test__io_utils.py

Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>

psobolewskiPhD · 2023-05-03T14:04:51Z

tests/test__io_utils.py

+def test_read_uppercase_extension(tmp_path: Path):
    pm = PluginManager()
    plugin = DynamicPlugin("tif-plugin", plugin_manager=pm)
+
    path = "something.TIF"
+    mock_file = tmp_path / path
+    mock_file.touch()

    # reader should be compatible despite lowercase pattern
    @plugin.contribute.reader(filename_patterns=["*.tif"])
-    def get_read(path):
+    def get_read(path=mock_file):


Is this the sort of thing you had in mind @Czaki ?

used the same below for tar.gz

Czaki

looks great

psobolewskiPhD requested review from DragaDoncila and nclack March 24, 2023 20:28

Czaki reviewed Mar 27, 2023

View reviewed changes

Czaki reviewed Apr 1, 2023

View reviewed changes

psobolewskiPhD mentioned this pull request Apr 2, 2023

Fix tests: use npe1 version (0.1.2) of napari-ndtiffs #277

Merged

DragaDoncila mentioned this pull request Apr 5, 2023

Add specific error when reader plugin was chosen but failed #276

Merged

psobolewskiPhD and others added 5 commits April 6, 2023 21:45

.lower() to make paths & pattern fnmatch case insensitive

e092d33

Don't set paths.lower() twice.

7cfee34

style: [pre-commit.ci] auto fixes [...]

d6d1abe

Go back to checking in iter_compatible_plugins

579f79f

Update tests, lower() for extention comparison

b140215

psobolewskiPhD force-pushed the bugfix/make_readers_case_insensitive branch from 20c2f58 to b140215 Compare April 6, 2023 20:05

style: [pre-commit.ci] auto fixes [...]

226e0b2

psobolewskiPhD and others added 3 commits April 6, 2023 22:10

fix merge leftover in tests

c85fbf2

Merge branch 'main' into bugfix/make_readers_case_insensitive

4944d5c

Merge branch 'main' into bugfix/make_readers_case_insensitive

7c0a7f3

DragaDoncila reviewed Apr 12, 2023

View reviewed changes

tests/test__io_utils.py Outdated Show resolved Hide resolved

Czaki requested changes Apr 12, 2023

View reviewed changes

tests/test__io_utils.py Outdated Show resolved Hide resolved

psobolewskiPhD and others added 3 commits April 17, 2023 21:54

Merge branch 'main' into bugfix/make_readers_case_insensitive

26c49e7

Tests per @DragaDoncila

6474ed1

handle double extentions per @Czaki

ensure not modifying URI

ef402ba

zarr test should use accepts_directories=True

70c1498

improve readability of URI check

2e55a5e

DragaDoncila approved these changes May 2, 2023

View reviewed changes

Czaki reviewed May 3, 2023

View reviewed changes

tests/test__io_utils.py Outdated Show resolved Hide resolved

Czaki reviewed May 3, 2023

View reviewed changes

psobolewskiPhD and others added 7 commits May 3, 2023 15:39

use tmp_path in zarr test tests/test__io_utils.py

c787a33

Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>

Update tests/test__io_utils.py

5285166

Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>

Use tmp_path in directory test in tests/test__io_utils.py

ef01372

Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>

Update tests/test__io_utils.py

e207687

Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>

Drop pattern from directory reader tests/test__io_utils.py

abbe752

Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>

style: [pre-commit.ci] auto fixes [...]

8ad67ad

Mock files for TIFF and tar.gz

7848d90

psobolewskiPhD commented May 3, 2023

View reviewed changes

Czaki approved these changes May 3, 2023

View reviewed changes

psobolewskiPhD added bug Something isn't working enhancement New feature or request tests related to testing or CI and removed enhancement New feature or request labels May 3, 2023

Merge branch 'main' into bugfix/make_readers_case_insensitive

e21264d

Czaki merged commit b33bc09 into napari:main May 7, 2023
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix: use .lower() to make paths & pattern fnmatch case insensitive #275

Bugfix: use .lower() to make paths & pattern fnmatch case insensitive #275

psobolewskiPhD commented Mar 24, 2023

codecov bot commented Mar 24, 2023 •

edited

Czaki left a comment

Czaki Mar 27, 2023

psobolewskiPhD Mar 27, 2023

Czaki Mar 27, 2023

psobolewskiPhD Mar 27, 2023

psobolewskiPhD Mar 27, 2023

psobolewskiPhD Mar 27, 2023

psobolewskiPhD Apr 1, 2023

psobolewskiPhD Apr 1, 2023

Czaki Apr 1, 2023

psobolewskiPhD Apr 1, 2023 •

edited

Czaki Apr 1, 2023 •

edited

psobolewskiPhD Apr 1, 2023

Czaki Apr 1, 2023

psobolewskiPhD Apr 1, 2023 •

edited

psobolewskiPhD Apr 2, 2023

DragaDoncila Apr 5, 2023

psobolewskiPhD commented Apr 1, 2023

Czaki commented Apr 1, 2023

tlambert03 commented Apr 2, 2023

psobolewskiPhD commented Apr 6, 2023

DragaDoncila left a comment •

edited

Czaki left a comment

psobolewskiPhD commented Apr 29, 2023

Czaki commented Apr 29, 2023

psobolewskiPhD commented Apr 30, 2023

DragaDoncila left a comment •

edited

Czaki left a comment

psobolewskiPhD May 3, 2023

psobolewskiPhD May 3, 2023

Czaki left a comment

	r for pattern, r in self._readers if fnmatch(path, pattern.lower())
	r for pattern, r in self._readers if fnmatch(path, pattern.lower()) or fnmatch(path, pattern.upper())

	def get_reader(path: PathOrPaths):
	if isinstance(path, list):

	def read(path):
	assert isinstance(path, list)
	return [(None,)]

	return read
	assert isinstance(path, str) # please mypy.
	if path.endswith(".fzzy"):

	def read(path):
	assert isinstance(path, str)
	return [(None,)]

	return read
	else:
	raise ValueError("Test plugin should not receive unknown data")

Bugfix: use .lower() to make paths & pattern fnmatch case insensitive #275

Bugfix: use .lower() to make paths & pattern fnmatch case insensitive #275

Conversation

psobolewskiPhD commented Mar 24, 2023

codecov bot commented Mar 24, 2023 • edited

Codecov Report

Czaki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

psobolewskiPhD Apr 1, 2023 • edited

Choose a reason for hiding this comment

Czaki Apr 1, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

psobolewskiPhD Apr 1, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

psobolewskiPhD commented Apr 1, 2023

Czaki commented Apr 1, 2023

tlambert03 commented Apr 2, 2023

psobolewskiPhD commented Apr 6, 2023

DragaDoncila left a comment • edited

Choose a reason for hiding this comment

Czaki left a comment

Choose a reason for hiding this comment

psobolewskiPhD commented Apr 29, 2023

Czaki commented Apr 29, 2023

psobolewskiPhD commented Apr 30, 2023

DragaDoncila left a comment • edited

Choose a reason for hiding this comment

Czaki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Czaki left a comment

Choose a reason for hiding this comment

codecov bot commented Mar 24, 2023 •

edited

psobolewskiPhD Apr 1, 2023 •

edited

Czaki Apr 1, 2023 •

edited

psobolewskiPhD Apr 1, 2023 •

edited

DragaDoncila left a comment •

edited

DragaDoncila left a comment •

edited