Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate from textProcessing to taskProcessing #10042

Open
julien-nc opened this issue Aug 23, 2024 · 21 comments
Open

Migrate from textProcessing to taskProcessing #10042

julien-nc opened this issue Aug 23, 2024 · 21 comments

Comments

@julien-nc
Copy link
Member

julien-nc commented Aug 23, 2024

There is a new API to run the AI tasks. It is slightly different than the old one.

As Mail is using Summary, Topics and FreePrompt, it should be relatively straightforward to migrate to the taskProcessing API.
More information there: nextcloud/assistant#114

cc @hamza221 @st3iny

@ChristophWurst
Copy link
Member

is this a breaking change?

@julien-nc
Copy link
Member Author

The textProcessing API will stay one or two major NC versions. But the apps that implement providers have migrated to taskProcessing (or will do soon). So you might run out of providers. I guess that can be considered as a breaking change.

@ChristophWurst
Copy link
Member

I see. This is very unfortunate to announce after feature freeze and branch-off.

@ChristophWurst
Copy link
Member

@julien-nc what is the replacement for \OCP\TextProcessing\IManager::runTask?

@julien-nc
Copy link
Member Author

\OCP\TaskProcessing\IManager::scheduleTask

You can get the task ID right after having scheduled it:

$this->taskProcessingManager->scheduleTask($task);
$taskId = $task->getId();

The tasks can't run synchronously anymore because many providers may take too long and it's possible to reach the Php process timeout. Tasks are processed in background jobs (which can be fast if occ background-job:worker "OC\TaskProcessing\SynchronousBackgroundJob" is running).

The OCP\TaskProcessing\Events\TaskSuccessfulEvent and OCP\TaskProcessing\Events\TaskFailedEvent events are dispatched after the task has succeeded/failed. They contain the task.

If you want to still do something similar than \OCP\TextProcessing\IManager::runTask you can have a poll loop in the backend right after having scheduled it:

$task = $this->taskProcessingManager->getTask($taskId);
if ($task->getStatus() === Task::STATUS_SUCCESSFUL) {
    // do something with the result
}

@ChristophWurst
Copy link
Member

(which can be fast if occ background-job:worker "OC\TaskProcessing\SynchronousBackgroundJob" is running).

Is there a solution without this? Unfortunately I can't assume that every Nextcloud installation has this process running

@julien-nc
Copy link
Member Author

If this is not running, the taskProcessing jobs run when cron.php is executed.

@ChristophWurst
Copy link
Member

To elaborate why Mail uses the synchronous mode fully intentionally: want to process emails as late as possible when the user opens them, but then show the results right away.
Background processing is extremely expensive because we have to process all emails of an IMAP account.
Dispatching an async task only when the user opens a message breaks the UX because it would take a bit of time for the results to be ready.

Hope this makes sense.

@ChristophWurst
Copy link
Member

If this is not running, the taskProcessing jobs run when cron.php is executed.

A well configured system has cron set up for a 5m interval. Some older system still use 15m, tiny setup use irregular ajax cron.
I'd say even the 5m are not acceptable for a reaction time for a thread summary in Mail

@julien-nc
Copy link
Member Author

If the Mail frontend sends a synchronous request to the server which blocks until the task has finished (with textProcessing or with taskProcessing), it blocks a Php runner so it can have an impact on the general server performance. Also this Php process might always get killed if it's too long and no result can ever be produced.
We can't guarantee synchronous tasks will succeed as we can't predict how much time it will take the providers to process them.

That's why tasks are now always processed in bg jobs.

Running the occ bg job worker is strongly recommended to be able to run AI tasks with no delay.
We had to deal with a trade off between convenience for the developers, failure potential and constraints on the admins.
If Nextcloud was a persistent process which could run threads, it would be possible to have synchronous processes.
I hope this makes sense as well.

@ChristophWurst
Copy link
Member

We target only openai for our integration (the rest is too slow), so the process is mostly IO bound when it waits for the API response. The blocked request is OK for us.

I get the general push towards async processing for tasks of unknown complexity, though.

@DaphneMuller
Copy link

DaphneMuller commented Aug 23, 2024

The local LLM2 is now equally fast and could be potentially used also (although we did not run tests on large texts).

@julien-nc
Copy link
Member Author

One more detail: the occ bg job worker only runs tasks for which the responsible provider is implemented in a Php app.
The providers that are implemented in an external application are consuming tasks as soon as they are ready. They are making request to Nextcloud to get tasks that they can process. There is no delay there, even without the worker.

@ChristophWurst
Copy link
Member

@DaphneMuller nice! I'll still have to wait 5-15 minutes for the result when the special worker process is not running, right?

@julien-nc
Copy link
Member Author

I'll still have to wait 5-15 minutes for the result when the special worker process is not running, right?

Like mentioned before, not if the provider is LLM2 (which is an external application).

@ChristophWurst
Copy link
Member

Then I misread.

@julien-nc do you have some example code for getting synchronous-ish results from LLM2 without the use of occ background-job:worker "OC\TaskProcessing\SynchronousBackgroundJob"?

In my understanding the LLM processing would not happen until the next cron execution.

@julien-nc
Copy link
Member Author

With or without the worker, you can schedule a task and immediately start checking if it's finished or not (in the frontend or in the backend, as you wish).
In the backend, it can be done like said before, getTask and task->getStatus.
In the frontend, ocs/v2.php/taskprocessing/task/TASK_ID to get the task.

@julien-nc
Copy link
Member Author

We can also keep the providers for the old APIs in integration_openai and the features in Mail are not broken.

@ChristophWurst
Copy link
Member

That's the best solution right now because it means we can branch-off Mail for the upcoming release

@julien-nc
Copy link
Member Author

This is done and will be included in the next integration_openai release.
nextcloud/integration_openai#120

@ChristophWurst ChristophWurst removed this from the v4.0.0 milestone Aug 26, 2024
@julien-nc
Copy link
Member Author

julien-nc commented Aug 30, 2024

Two things should make it more convenient:

  • The TextProcessing and SpeechToText APIs are now forward compatible with providers. New TaskProcessing providers can be used by the TextProcessing API (for FreePromptTaskType, HeadlineTaskType, SummaryTaskType and TopicsTaskType because they have exact matches in the new API) and the SpeechToText API. This means you will benefit from new providers while using the old APIs.
  • The TaskProcessing manager now has a runTask method to run a task synchronously. This should make the migration easier.

All this is in stable30 already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 📄 To do
Development

No branches or pull requests

3 participants