Scheduler for ML jobs across my machines. This works simply by having a network of machines sharing tasks and results together via any file sharing protocol. A task is a structured file that specifies, for now, a problem for LLM to solve along with some requirements on the model. A machine that has the capability and time to solve this solves them and writes out corresponding result file for syncing.
A task file is simply a text file with LLM prompt named with a unique task id
along with .task extension. Result file has LLM response as content with the
file name like <task-id>.<model>.output. See ./examples directory for example
tasks and output files.
For my use case, I am using a shared syncthing directory for tasks and results. The worker is a simple binary running on my machines that can run decent sized LLMs. This repository houses the worker code.
To run the worker, first ensure you have Ollama installed. Then run the following command which will continue to check the directory for new tasks and run them whenever the machine is free-ish:
sched ./path/to/tasks/dirCurrently it defaults to a hardcoded model deepseek-r1:1.5b. You can use a
different model by providing the Ollama model name in the environment variable
SCHED_MODEL.