Skip to content
This repository has been archived by the owner on Sep 7, 2023. It is now read-only.

What is the data lifecycle ? #2052

Open
dalf opened this issue Jul 8, 2020 · 4 comments
Open

What is the data lifecycle ? #2052

dalf opened this issue Jul 8, 2020 · 4 comments
Labels

Comments

@dalf
Copy link
Contributor

dalf commented Jul 8, 2020

Maybe I'm overthinking.

Which data ?

Data and searx installation

After the data are updated in the git repository, once the searx get clone / install, the data remain the same as long searx is not updated.

How to update the data more often ?

  1. do nothing, keep the same process.
  2. keep the data in the searx git repository, add make data.update to update everything.
    • When to call it ?
      • manually : same problem as now.
      • cron in travis / github action : the script can create a PR.
  3. create a different package searx-data, automatically updated. It requires trust in this process.
@dalf dalf added the data label Jul 8, 2020
@asciimoo
Copy link
Member

asciimoo commented Jul 8, 2020

Maybe I'm overthinking.

I don't think that you're overthinking, this is a real issue what we need to address.

I'd pick the 2. option from your suggested solutions with automatic updates periodically.

@dalf
Copy link
Contributor Author

dalf commented Jul 16, 2020

@return42
Copy link
Contributor

return42 commented Jul 19, 2020

Brainstorming: IDE & database

@dalf / thanks for your article .. to give my 5cent

IDE: for me, developing or bug fixing a searx engine is a very individual task, where I want to have the maximum degree of freedom to use the Swiss army knife which fits at its best to the context.

A IDE helps flatten the learning curve, the flip side of the coin is, that the quality of the contributions regress and the maintainers have to discuss again and again the same subjects .. I remember all the contributions with the "Update <filename>" commit messages (e.g. #1941). I mean; IDEs are really good to flatten the learn-curve but they don't help if the know-how is missing.

Database: mostly the same what I said about IDEs, beside that it could be a solution for regular updates to decouple engine development from searx kernel .. for this a git repository seems to be a more suitable solution. But #2052 (comment) says he wants to keep the data in the searx git repository.

@dalf
Copy link
Contributor Author

dalf commented Jul 20, 2020

I want to have the maximum degree of freedom to use the Swiss army knife which fits at its best to the context.

The purpose is not to enforce a way to develop, it is to suggest a quick and easy way to develop engines/update data (and nothing about the core).

My ideas are not crystal clear, but if I sum up an example:

  • a user sees a requested engine (the "DB" can be a git repository, or at the point a [No]SQL database).
  • the user clicks "Edit".
  • a web version of wuzz allows setting URL template, HTTP headers from a searx request.
  • the data extraction is done with something similar to https://jqplay.org/ (based on the results from wuzz). The user can check the result output (right template, etc...) from different simulated user request. (if the json engine use https://pypi.org/project/pyjq/ )
  • the user validates and can the generated change request (code, the configuration in settings.yml...).
  • save the result to create a PR.

A reviewer can use the same tool to check the PR, but once again the usual tools work too (git, make, etc...).

The purpose here to allow contributions with just a browser (*). And for sure:

  • it will require more reviews (my wish is to allow more people to check them).
  • it is a huge task to develop something like this.

so, this idea is not for tomorrow.

(*) I say a browser because it is an easy way to have a rich UI, but a console UI is also a solution.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants