Add searcher add archive #229

kelvinhammond · 2025-07-30T21:53:07Z

No description provided.

kelson42 · 2025-07-31T08:28:40Z

@kelvinhammond Thank you for your PR! Can you please add a small description to the PR? Does it fix a known/tracked issue?

rgaudin

@kelvinhammond thank you for your PR.
As @kelson42 wrote, it's important that non-trivial contributions are explained ; justified even in this case as there is no exiting ticket.

As a new hidden feature, expending the README example would also be welcome.

On the PR itself, it can't be merged ATM because the usage is too confusing. As you have probably noticed, getResults() yields entry paths alone. That's fine for single-ZIM search but that's terrible for multi-ZIM ones. How do you expect to use it? Checking each result over the two ZIMs?

So we need to change the API to work differently. This kind of discussion would have been preferred on a ticket but it can happen here.
What changes do you suggest?

rgaudin · 2025-07-31T08:51:15Z

libzim/libzim.pyx


        self.c_searcher = move(zim.Searcher(archive.c_archive))

+    def addArchive(self, object archive: Archive) -> Searcher:


This doesn't respect our code formatting (casing). Please use add_archive() instead

rgaudin · 2025-07-31T08:51:30Z

libzim/search.pyi


 class Searcher:
    def __init__(self, archive: Archive) -> None: ...
+    def addArchive(self, archive: Archive) -> Searcher: ...


rgaudin · 2025-07-31T08:51:45Z

setup.py

        if self.download_libzim:
            print("removing downloaded libraries")
-            for fpath in self.dylib_file.parent.glob("*.[dylib|so|dll|lib|pc]*"):
+            for fpath in self.dylib_file.parent.glob("*.[dylib|so|dll|pc]*"):


Why would you change that?

rgaudin · 2025-07-31T08:52:39Z

tests/test_libzim_reader.py

    # download libzim tests
    for url in libzim_urls:
-        urlretrieve(url, temp_dir / os.path.basename(url))  # noqa: S310  # nosec
+        path = temp_dir / os.path.basename(url)


This has nothing to do with this PR. Make a separate one

rgaudin · 2025-07-31T08:54:54Z

tests/test_libzim_reader.py

+@skip_if_offline
+def test_reader_search_multiple_zims(all_zims):
+    """Test search across multiple ZIMs"""
+    search_count_zimfile = 1


Lacks a comment mentioning the expected query and results so this can be investigated manually the day this will break.
Can probably sit next to the query as well

kelvinhammond · 2025-08-04T23:41:43Z

On the PR itself, it can't be merged ATM because the usage is too confusing. As you have probably noticed, getResults() yields entry paths alone. That's fine for single-ZIM search but that's terrible for multi-ZIM ones. How do you expect to use it? Checking each result over the two ZIMs?

So we need to change the API to work differently. This kind of discussion would have been preferred on a ticket but it can happen here. What changes do you suggest?

Yeah, I realized this afterwords and rewrote it in Javascript to use the node-libzim library.
We can close this PR as there isn't currently a way to get access to the iterator which would tell us which zim file the result belongs to.

rgaudin · 2025-08-05T10:22:38Z

I know the feeling 😣
The API design looks like a mistake or shortcut from the early days of the bindings. It's indeed convenient to return the path as that's the general usage but I'll open a ticket as this is an artificial limitation of the binding.

kelvinhammond added 2 commits July 30, 2025 17:50

Added searcher.addArchive

c5c9e07

Fixed: setup.py clean deletes way libzim.py* files

e37881b

kelson42 requested a review from rgaudin July 31, 2025 08:27

rgaudin requested changes Jul 31, 2025

View reviewed changes

kelvinhammond closed this Aug 4, 2025

rgaudin mentioned this pull request Aug 5, 2025

Multiple archive searcher #230

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add searcher add archive #229

Add searcher add archive #229

Uh oh!

kelvinhammond commented Jul 30, 2025

Uh oh!

kelson42 commented Jul 31, 2025 •

edited

Loading

Uh oh!

rgaudin left a comment

Uh oh!

rgaudin Jul 31, 2025

Uh oh!

rgaudin Jul 31, 2025

Uh oh!

rgaudin Jul 31, 2025

Uh oh!

rgaudin Jul 31, 2025

Uh oh!

rgaudin Jul 31, 2025

Uh oh!

kelvinhammond commented Aug 4, 2025

Uh oh!

rgaudin commented Aug 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		self.c_searcher = move(zim.Searcher(archive.c_archive))

		def addArchive(self, object archive: Archive) -> Searcher:

Uh oh!

Add searcher add archive #229

Add searcher add archive #229

Uh oh!

Conversation

kelvinhammond commented Jul 30, 2025

Uh oh!

kelson42 commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgaudin left a comment

Choose a reason for hiding this comment

Uh oh!

rgaudin Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

rgaudin Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

rgaudin Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

rgaudin Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

rgaudin Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

kelvinhammond commented Aug 4, 2025

Uh oh!

rgaudin commented Aug 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kelson42 commented Jul 31, 2025 •

edited

Loading