Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

searxng marginalia integration request shotgun #1673

Closed
vlofgren opened this issue Aug 16, 2022 · 3 comments
Closed

searxng marginalia integration request shotgun #1673

vlofgren opened this issue Aug 16, 2022 · 3 comments

Comments

@vlofgren
Copy link

vlofgren commented Aug 16, 2022

Hello. I run search.marginalia.nu. I noticed it was recently added to searxng ( #1627 ), and since then I've noticed some strange behavior in the api logs.

When the rate limit is exceeded (and the public key only allows 15 requests per minute globally, so not a lot; let me know if you want me to hook you up with a key by the way, the public key is intended for sampling and the odd python script), the api server sends a 503. Maybe there is some bad interaction with searxng that causes it to interprets this as a signal to retry?

I've attached a redacted sample of the sort of curious traffic behavior I'm seeing. I'd love to be able to integrate with this project, but my search engine can't respond to queries at this rate, you'll get rate limited immediately (which is unnecessary because it's the exact same query 20 times in a row). It's not really a problem per se on my end, I just thought I would let you know this is happening.

INFO  2022-08-16 23:49:48,327 qtp1892704146-12749  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[FIRST QUERY]?index=4&count=20
INFO  2022-08-16 23:50:45,766 qtp1892704146-12749  ApiService          : RSP 200

INFO  2022-08-16 23:50:20,351 qtp1892704146-12721  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[FIRST QUERY]?index=4&count=20
INFO  2022-08-16 23:50:20,591 qtp1892704146-12749  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[FIRST QUERY]?index=4&count=20
INFO  2022-08-16 23:50:20,635 qtp1892704146-12721  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[FIRST QUERY]?index=4&count=20
INFO  2022-08-16 23:50:20,675 qtp1892704146-12749  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[FIRST QUERY]?index=4&count=20
INFO  2022-08-16 23:50:20,718 qtp1892704146-12721  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[FIRST QUERY]?index=4&count=20
INFO  2022-08-16 23:50:20,762 qtp1892704146-12749  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[FIRST QUERY]?index=4&count=20
INFO  2022-08-16 23:50:20,813 qtp1892704146-12721  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[FIRST QUERY]?index=4&count=20
INFO  2022-08-16 23:50:20,870 qtp1892704146-12749  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[FIRST QUERY]?index=4&count=20
INFO  2022-08-16 23:50:20,918 qtp1892704146-12721  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[FIRST QUERY]?index=4&count=20

INFO  2022-08-16 23:50:45,662 qtp1892704146-12776  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[SECOND QUERY]?index=4&count=20
INFO  2022-08-16 23:50:45,766 qtp1892704146-12776  ApiService          : RSP 200

INFO  2022-08-16 23:51:18,339 qtp1892704146-12721  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[SECOND QUERY]?index=4&count=20
INFO  2022-08-16 23:51:18,551 qtp1892704146-12776  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[SECOND QUERY]?index=4&count=20
INFO  2022-08-16 23:51:18,629 qtp1892704146-12721  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[SECOND QUERY]?index=4&count=20
INFO  2022-08-16 23:51:18,682 qtp1892704146-12776  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[SECOND QUERY]?index=4&count=20
INFO  2022-08-16 23:51:18,851 qtp1892704146-12721  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[SECOND QUERY]?index=4&count=20
INFO  2022-08-16 23:51:18,946 qtp1892704146-12776  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[SECOND QUERY]?index=4&count=20
INFO  2022-08-16 23:51:18,988 qtp1892704146-12721  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[SECOND QUERY]?index=4&count=20
INFO  2022-08-16 23:51:19,133 qtp1892704146-12776  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[SECOND QUERY]?index=4&count=20
INFO  2022-08-16 23:51:19,179 qtp1892704146-12721  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[SECOND QUERY]?index=4&count=20

INFO  2022-08-16 23:51:29,577 qtp1892704146-12721  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[SECOND QUERY]?index=4&count=20
INFO  2022-08-16 23:51:29,724 qtp1892704146-12776  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[SECOND QUERY]?index=4&count=20
INFO  2022-08-16 23:51:29,772 qtp1892704146-12721  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[SECOND QUERY]?index=4&count=20
INFO  2022-08-16 23:51:29,814 qtp1892704146-12776  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[SECOND QUERY]?index=4&count=20
INFO  2022-08-16 23:51:29,886 qtp1892704146-12721  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[SECOND QUERY]?index=4&count=20
INFO  2022-08-16 23:51:29,935 qtp1892704146-12776  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[SECOND QUERY]?index=4&count=20
INFO  2022-08-16 23:51:29,995 qtp1892704146-12721  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[SECOND QUERY]?index=4&count=20
INFO  2022-08-16 23:51:30,045 qtp1892704146-12776  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[SECOND QUERY]?index=4&count=20
INFO  2022-08-16 23:51:30,099 qtp1892704146-12776  ApiService          : PUBLIC #b6f70eb3: GET /public/api/public/search/[SECOND QUERY]?index=4&count=20

(These logs are not retained more than 24 hours, and only exist to identify exactly this type of problem)

@return42
Copy link
Member

HI @vlofgren welcome to SearXNG / FYI we do not maintain ONE SearXNG instance, here we just develop the SearXNG service .. you will find a lot of public hosters at https://searx.space/

We are very sorry that you are penetrated by a SearXNG instance .. I asume there is a SearXNG instance penetrated by a bot and then penetrates your service .. to prevent, we encourage our SearXNG maintainers to active our bot blocker (aka limiter) .. but some of them don't ..

If you see request like you have shown above just ban the IP .. most often it is a SearXNG that is not maintained well and misused by a bot.

I'd love to be able to integrate with this project,

We are very happy about that, thank you for the offer. If you have an idea how we can improve our cooperation, then we would be very happy about suggestions.

You can open an issue here or contact us on matrix .. thanks a lot 👍

@return42 return42 reopened this Jun 5, 2023
@return42
Copy link
Member

return42 commented Jun 5, 2023

@vlofgren I apologize, I did not aware that this is caused by the default settings in SearXNG.

search_url: https://api.marginalia.nu/public/search/{query}?index=4&count=20

I will remove the engine from the default settings.

return42 added a commit to return42/searxng that referenced this issue Jun 5, 2023
The engine configuration of marginalia [2][3][4][5] spams marginalia.nu with
requests from SearXNG instances [1].  It is not in the interest of SearXNG to
disturb other FOSS projects, so the engine will be removed::

    - name: marginalia
      engine: json_engine
      shortcut: mar
      categories: general
      paging: false
      # Key and license: https://www.marginalia.nu/marginalia-search/api/
      # index: 0 popular, 1 blogs, 2 big_sites, 3 default, 4 experimental
      search_url: https://api.marginalia.nu/<insert your key here>/search/{query}?index=4&count=20
      results_query: results
      url_query: url
      title_query: title
      content_query: description
      timeout: 1.5
      disabled: true
      about:
        website: https://www.marginalia.nu/
        official_api_documentation: https://api.marginalia.nu/
        use_official_api: true
        require_api_key: true
        results: JSON

[1] searxng#1673
[2] searxng#1627
[3] searxng#1620
[4] https://news.ycombinator.com/item?id=35874640
[5] https://github.com/MarginaliaSearch/MarginaliaSearch/blob/d82a8584915c9d416921cc9f1c0637efedea664f/code/services-satellite/api-service/src/main/java/nu/marginalia/api/svc/ResponseCache.java#L12-L20

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
@return42
Copy link
Member

return42 commented Jun 5, 2023

Closed by #2489

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants