Skip to content

Conversation

kelvinhammond
Copy link

No description provided.

@kelson42 kelson42 requested a review from rgaudin July 31, 2025 08:27
@kelson42
Copy link
Contributor

kelson42 commented Jul 31, 2025

@kelvinhammond Thank you for your PR! Can you please add a small description to the PR? Does it fix a known/tracked issue?

Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kelvinhammond thank you for your PR.
As @kelson42 wrote, it's important that non-trivial contributions are explained ; justified even in this case as there is no exiting ticket.

As a new hidden feature, expending the README example would also be welcome.

On the PR itself, it can't be merged ATM because the usage is too confusing. As you have probably noticed, getResults() yields entry paths alone. That's fine for single-ZIM search but that's terrible for multi-ZIM ones. How do you expect to use it? Checking each result over the two ZIMs?

So we need to change the API to work differently. This kind of discussion would have been preferred on a ticket but it can happen here.
What changes do you suggest?

self.c_searcher = move(zim.Searcher(archive.c_archive))
def addArchive(self, object archive: Archive) -> Searcher:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't respect our code formatting (casing). Please use add_archive() instead


class Searcher:
def __init__(self, archive: Archive) -> None: ...
def addArchive(self, archive: Archive) -> Searcher: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

if self.download_libzim:
print("removing downloaded libraries")
for fpath in self.dylib_file.parent.glob("*.[dylib|so|dll|lib|pc]*"):
for fpath in self.dylib_file.parent.glob("*.[dylib|so|dll|pc]*"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you change that?

# download libzim tests
for url in libzim_urls:
urlretrieve(url, temp_dir / os.path.basename(url)) # noqa: S310 # nosec
path = temp_dir / os.path.basename(url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has nothing to do with this PR. Make a separate one

@skip_if_offline
def test_reader_search_multiple_zims(all_zims):
"""Test search across multiple ZIMs"""
search_count_zimfile = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lacks a comment mentioning the expected query and results so this can be investigated manually the day this will break.
Can probably sit next to the query as well

@kelvinhammond
Copy link
Author

On the PR itself, it can't be merged ATM because the usage is too confusing. As you have probably noticed, getResults() yields entry paths alone. That's fine for single-ZIM search but that's terrible for multi-ZIM ones. How do you expect to use it? Checking each result over the two ZIMs?

So we need to change the API to work differently. This kind of discussion would have been preferred on a ticket but it can happen here. What changes do you suggest?

Yeah, I realized this afterwords and rewrote it in Javascript to use the node-libzim library.
We can close this PR as there isn't currently a way to get access to the iterator which would tell us which zim file the result belongs to.

@rgaudin
Copy link
Member

rgaudin commented Aug 5, 2025

I know the feeling 😣
The API design looks like a mistake or shortcut from the early days of the bindings. It's indeed convenient to return the path as that's the general usage but I'll open a ticket as this is an artificial limitation of the binding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants