Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define some endpoints for metadata (and other) #631

Closed
mgautierfr opened this issue Nov 4, 2021 · 6 comments · Fixed by #646
Closed

Define some endpoints for metadata (and other) #631

mgautierfr opened this issue Nov 4, 2021 · 6 comments · Fixed by #646
Assignees
Milestone

Comments

@mgautierfr
Copy link
Member

Following my comment here #626 (comment) I would like to open a discussion on what we need to serve and how.

I think there is two different use case :

  • One is to get information about books available in the catalog (illustration and other metadata)
  • One is to get information about what is in the zim, for some QA or other (It seems that it is what is wanted here Expose counter on meta endpoint #616)

It seems that we already a way to get information about book with recent implementation of partial entries (#602), we can get all the metadata of a book by simply asking it to the catalog.
What is missing is a entry point to download the illustration (from what we have in the library/catalog).
I propose the entrypoint /catalog/v2/illustration/<bookId>?size=<size>
The size parameter setting the width and height. Assuming a scale=1. If (when) we extend the illustration to non scare illustration and different scale we will add the corresponding parameter.
Illustration would be taken from what is in the catalog (in library.xml if kiwix-serve is started with one), never from the zim file.

So the /meta endpoint is ok to read the zim file as it would correspond to the second use case.
But I we recreate it, I would like to take the occasion to go a bit further.
As the "meta" endpoint will return the metadata unchanged, we can also extend it to also return raw content/item in the whole zim file.
Something like:

  • /raw/<zimName>/meta/<metaName> to get the metadata in the zim file. If present. Unchanged. No fallback.
  • /raw/<zimName>/content/<contentPath> to get the content of a entry in the zim files. If present. Unchanged (no chrome, no header bar added, no url rewritten). No fallback (at least not more than what libzim is making internally to handle compatibility)

The /raw/<zimName>/content/ endpoint may help with warc2zim where the ServiceWorker need to access the raw content as it is adding its own chrome. (openzim/warc2zim#16 #403)

What do you think about this ?
We don't have to implement everything in the same time. But /catalog/v2/illustration and /raw/<zimName>/meta/ (if we agree on them) are needed shortly.

@rgaudin
Copy link
Member

rgaudin commented Nov 4, 2021

Interesting!

#616 was about restoring access to an information we used to get. Gentle reminder, whenever you expose something, people will start relying on it.

Behind #616 is a dead-simple access to metadata of a book/zim. I suppose we could return what's in the catalog and users would be happy. Counter is probably not going to be rewritten in the CMS anyway…
What it means for previous user, is that where they were just building/typing an URL (using the path/name) and getting the information directly, they will have to query and parse the OPDS stream. Fine by me but @kelson42 was also using it for quick checks.

Proposed illustration endpoint looks good to me 👍

On the /raw/<x>/content endpoint, I'd be happy to have it as it would turn kiwix-serve into an easy ZIM debugger (faster than zimdump). But since kiwix-serve is user-centric, I don't think it's a wise choice:

  • once we introduce it, it will be difficult to remove it.
  • warc2zim shouldn't rely on it as this would be HTTP-server specific and would force out-of-ZIM URL behaviors on other readers.
  • taskbar is a pain ATM and offering this will surely push integrators into using that endpoint with their own chrome/iframe. We should probably fix that before opening that road

@mgautierfr
Copy link
Member Author

Gentle reminder, whenever you expose something, people will start relying on it.

Yes. And this is why I discuss this here before creating/updating the endpoint.

Fine by me but @kelson42 was also using it for quick checks.

What kind of check ? The fact that M/Counter was available doesn't means that we must readd it blindly.
It is better to know what do you want to check and decide how to do it (potentially as previously) than simply reimplement a old usage because "it was there before".

On the /raw//content endpoint, I'd be happy to have it as it would turn kiwix-serve into an easy ZIM debugger (faster than zimdump). But since kiwix-serve is user-centric, I don't think it's a wise choice:

  • once we introduce it, it will be difficult to remove it.

Maybe we can serve the endpoint only with a --debug-endpoint option. And especially staying the API is not stable and can be changed/removed at any time.
(So it is nice for manual debug, but don't write scripts using them)

  • warc2zim shouldn't rely on it as this would be HTTP-server specific and would force out-of-ZIM URL behaviors on other readers.

You're right. warc2zim already rely to much on HTTP-server, we should not make things worth and make it using a absolute url.

  • taskbar is a pain ATM and offering this will surely push integrators into using that endpoint with their own chrome/iframe. We should probably fix that before opening that road

I'm not sure to understand if it is a good thing or not.
Do we want to avoid integrators to go with their own chrome ?

@rgaudin
Copy link
Member

rgaudin commented Nov 5, 2021

What kind of check ? The fact that M/Counter was available doesn't means that we must readd it blindly. It is better to know what do you want to check and decide how to do it (potentially as previously) than simply reimplement a old usage because "it was there before".

Can't say. I know @kelson42 is used to that URL for reading the Counter as a quick zim check alternative. I want to make sure he understands that you are proposing to remove that easy way so it doesn't come as a surprise.

Maybe we can serve the endpoint only with a --debug-endpoint option. And especially staying the API is not stable and can be changed/removed at any time. (So it is nice for manual debug, but don't write scripts using them)

  • taskbar is a pain ATM and offering this will surely push integrators into using that endpoint with their own chrome/iframe. We should probably fix that before opening that road

I'm not sure to understand if it is a good thing or not. Do we want to avoid integrators to go with their own chrome ?

Good question for which I have no answer. It's more of a product one than a technical one.

What should be considered though is that our UI has issues and if we allow this raw access, it will come as a relief for integrators and will surely lead them into investing time into building different chromes. Ones they do, we'll have difficulties getting them back to ours if we want to and we'll be pressured into back porting their features (which can be positive but resources consuming).

I'm in favor of having it but it should be though through and done via a clear, maintained endpoint and not just some debug one that everybody end up relying on.

As you suggested earlier ; a step by step approach is probably for the best.

@kelson42
Copy link
Collaborator

kelson42 commented Nov 21, 2021

Finaly giving a feedback on this. That said I'm still not sure to fully understand everything so please pardon me if I do silly remarks:

  • The basic analysis of @mgautierfr which led to this ticket seems to me to be pertinent. Therefore, two different problems leads to two different solution to retrieve information. Makes sense.
  • /raw/ proposal LGTM. I see no potential bad side effect of it. We should just make it clear this is experimental for the moment if we are not sure about our move. Should be easy to implement, so lets try it. I don't see the value of /raw/ for the content (C namespace), seems redondant with just / to me.
  • If we have a generic /raw/meta, the Counter will be available right? Automatically?
  • If we have /raw/ I don't see why we should keep /meta (beside backward compatibility purpose).
  • Similar about /catalog/v2/illustration/. What is the added value to /raw/? OPDS delivers the list of content URL anyway... I probably miss something here.

@mgautierfr
Copy link
Member Author

I don't see the value of /raw/ for the content (C namespace), seems redondant with just / to me.

They are answering to two different questions :

  • /raw/<zim>/content/<foo> (the simpler) answers to the question "What is the content of entry <foo> in the <zim> file.
  • /<zim>/<foo> answer to the question "Return me something to display the content of entry <foo> (from <zim> file) correctly in a browser". We don't have to return the original content (and we don't, we are adding a searchbar) and we can even not returning the content at all (we could return a small js who would load the content otherwise).

If we have a generic /raw/meta, the Counter will be available right? Automatically?

Yes, raw endpoints return the content in the zim file, without any filtering, fallback or content patching.

If we have /raw/ I don't see why we should keep /meta (beside backward compatibility purpose).

We agree. Except if we want to introduce some kind of compatibility or content management. We cannot use raw here and we would need another endpoint.

Similar about /catalog/v2/illustration/. What is the added value to /raw/? OPDS delivers the list of content URL anyway...

In this case, we are adding a compatibility layer. illustration will use the favicon in "old" zim. It will return something even if there is no /M/Illustration_* entry (and if there is a /-/favicon)
/catalog/v2/illustration is not raw, we have a preprocessing stage and so we need another endpoint than raw.
OPDS delivers the content url, but it is based on the compatibility layer of libzim itself. So when we want to access the illustration we must also use the compatibility layer of libzim, we cannot directly access the metadata as it may not really exist.

@kelson42
Copy link
Collaborator

@mgautierfr Thanks, I still don't understand all the details but trust you. Not problem with keeping the /meta if needed and having a dedicated illustration API if needed. But we need absolutely to document all of this properly, whatever where it is. Wee need an API/ABI documentation like we have on libzim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants