Skip to content

Feature request: Make Meilisearch's task validation timeout use timeout variable value #2138

@PierreMesure

Description

@PierreMesure

Hi,

Problem

At the moment, validation timeout is hardcoded for the Meilisearch sink. My understanding is that other sinks use synchronous inserts and that means the timeout for the task success can be set in the UI and increased if they take a long time to resolve.

In Meilisearch, tasks are added to a queue and Sequin polls the tasks API to validate. The hardcoded values can be found in the wait_for_task function:

defp wait_for_task(%MeilisearchSink{} = sink, task_id) do
   ...
        max_retries: 5,
   ...
            delay = Sequin.Time.exponential_backoff(200, count, 10_000)
   ...
  end

Only 5 retries (actually 4 from my observations in the logs) and a total validation timeout of 3,2 seconds. Before PR #2038, it was set to 10 retries so the backoff delay was 52,6 seconds.

Concretely, in my case, that means that task validation was working 100% of the time when adding the first rows in my Meilisearch index and as the index grows, indexing slows down (the rows contain a vector and a chunk of text). The result is an increased rate of task validation timeout with retries and right now I'm at 100% validation timeout, Meilisearch is incapable to complete its tasks under 3,2 seconds (it takes 10-12 to import 50 items at the moment).

That means that with Sequin's latest version, the Meilisearch sink should be considered broken as it only works in very optimal conditions with small batch sizes and empty indices.

Suggested solutions

My solution for now has been to fork the repo and readd 10 retries.

I would like to submit a PR to solve the issue upstream, here are a few ideas that I would like to discuss before doing it to make sure the PR is merged:

  1. change back max_retries from 5 to 10. A quick fix for now that would make almost all configs work, as long as Meilisearch completes tasks under a minute.
  2. change the exponential backoff mechanism so it uses timeout_seconds as the max value for the total validation timeout. That way, the user can set the timeout value in their config like they would do with any other sink. Best long term solution and could be implemented more carefully after (1) is implemented as a quick fix.
  3. add a new variable for Meilisearch configurations called task_validation_timeout_seconds. I think it's more ambitious as the change needs to be propagated in the UI, in the docs and potentially other places I don't know. I think we should avoid it for now.

What do you think? I sent a PR for 1 (#2139) and for 2 (#2140).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions