Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add metadata support #316

Open
dnohales opened this issue Apr 10, 2020 · 3 comments
Open

Feature request: Add metadata support #316

dnohales opened this issue Apr 10, 2020 · 3 comments

Comments

@dnohales
Copy link

The idea is that each article could have a metadata associated with it in the form of key-value pairs or simply a JSON file.

Our use case for this includes:

  • Get information about a video, audio or image, like dimensions, bitrate, codec, etc.
  • Specify a license for each article.
  • Specify a synopsis for each article (some text we can show when listing the articles).
  • The title and mimetype can also be here for example, falling back to Article::getTitle() and Article::getMimeType().
  • Have a structured representation of the table of content of an article, so we can show it in a specific UI outside the webview rendering the article.
  • Specify a URL with the thumbnail of the article.

More use cases could come up in the future for me and other users, that's the reason I believe we should save this in a flexible way like a JSON-like format or maybe we could think of some standardized metadata keys and leave the rest extensible.

@kelson42
Copy link
Contributor

kelson42 commented Apr 13, 2020

@mgautierfr Would you agree on the principle to defining a new namespace for article for articles metadata? Do you think storing things in json is good? Would not be better to have something at an other level which would allow searching/filtering?

@data-man
Copy link
Contributor

Two proposals:

  • add article_created (UNIX time)
  • add article_modified (UNIX time)

E.g., this will allow to check articles for updates and update them incrementally (especially for Wikimedia's dictionaries/pages).

@kelson42
Copy link
Contributor

@mgautierfr Could/should we use such a system to store as well the necessary http headers from warc?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

4 participants