reduce the number of external bangs #2045
Comments
Some thoughts about localisation (l10n) .. by example; wikipedia and DE .. What we have is more or less a mess ..
and finally we have some kind of redundancy...
from which we can drop functions:
and also drop the bangs (trigger):
My suggestion is to remove all entries shown in the topmost code block and use only the final block, so we only have:
And to have more flexibility in l10n we implement some bang syntax which allows the user to localize explicit.
So even if users browser is localized to DE, the user can search in the wikipedia from FR by explicit using |
Now that I have been thinking about it I agree since this project is privacy focused. More bangs is offcourse more convinient but a pain to maintain like you said. If we have less we can also create unit tests for every bang (something like a json file with the external bang the query and a text that should be included on the page). Maybe I can help with creating these tests and making and simplyfing the external bang json file? @return42 Some bangs I currently use a lot and I think should be included.
I think that is really a great idea! |
If we drop the amount of bangs to 20, maybe we can create a ExternalBang class in python instead of a json file. With fields like domain regions and trigger or something like that. |
I vote for a YAML config file placed next to the settings.yml file.
Before we start to implement, lets hear what other say .. but yes, your contributions are welcome :)
Unit test is nothing a admin can run, I vote for a command line tool to check the configured external bangs from the YAML file. I haven't had time to look deeper, but we have a searx-checker, may be its best to implement it there. hint: @dalf suggest to embed this tool into searx. Many ideas and a lot of work :) In a first step we should simply reduce and clean up the JSON file as is .. like shown in my last example above .. I think. |
Some user may expect to have the same bangs between duckduckgo and searx ; but at the same time, it is clearly a mess. It seems it is based on the duckduckgo bangs where the autocompletion UI is really helpful. I like the suggestion, way more clear. Why not match the searx bangs: I think the file should remains in
Source: DuckDuckGoIf you go to https://duckduckgo.com/newbang you will see that
There are the same duplicate entries in searx external bangs. Source: WikidataWe can use this query (press Ctrl-Enter to run the query). This one may be out of topic: This query gets the URL linked to an ID (whatever ID is). And this where I start to think that a tool to transiflex could be useful to decrease the maintenance cost, not globally but on each person like wikipedia. This tool would manage the data which are now in |
@dalf thanks for your additional hints ..
I agree with you, my first suggestion having a config for bangs was not a good idea. The bangs should be the same in all instances and therefore must not be configured.
I'm not so happy with solutions needing several accounts to maintain searx development. nevertheless, your ideas are very interesting!
:) .. yes, let's start with the most obvious first .. first we need to tidy up the mess, this could be done very simple by building up a python dictionary in the data folder. I fear that some users will get used to the wrong bangs otherwise. |
OT: You have a lot of ideas and your considerations are often strategic. Most of your considerations are spread around in gh-issues. Does it make sense to use the gh-wiki to collect such remarks and order them by subject? I mean, should we start using gh-wiki for strategic thoughts? |
Hi Then we could make the statement... Searx does not provide bangs, but it will honour any of those that are shown on the excellent DuckDuckGo website. |
This is more or less what we want .. we should use known bang names from ddg ... BUT: searx is about privacy .. DDG has 16.000 bangs and don't care where you are redirected. We shouldn't do that, as this could cause a loss of trust in searx. That is also one reason more for me to vote against any solution ...
... where bags are coming from outside without any quality gate ..
We have a user base which have learned over years to use single exclamation point to select engines. We will never change this! |
There is another problem. current bangs containing images (#2076). A first step could be to clean up the current json file / over that @dalf made some good suggestions in #2045 (comment) and #2052 (comment) I still don't know if I have the time to implement a PR. Unfortunately probably not. So if someone should have time ... your PR is welcome :) |
We have 7438 external bangs plus some localized URLs / I guess we have round about 8k search URLs to maintain.
This is to much and ATM we do not know which of them are already broken or dead. I also have some privacy doubt when we redirect our users to URLs we have never visited.
We should reduce external-bangs significantly, I could imagine to start with round about 10 or 20 major bangs.
The text was updated successfully, but these errors were encountered: