qlib
The Workflow part introduces how to run research workflow in a loosely-coupled way. But it can only execute one task
when you use qrun
. To automatically generate and execute different tasks, Task Management
provides a whole process including Task Generating, Task Storing, Task Training and Task Collecting. With this module, users can run their task
automatically at different periods, in different losses, or even by different models.The processes of task generation, model training and combine and collect data are shown in the following figure.
This whole process can be used in Online Serving.
An example of the entire process is shown here.
A task
consists of Model, Dataset, Record, or anything added by users. The specific task template can be viewed in Task Section. Even though the task template is fixed, users can customize their TaskGen
to generate different task
by task template.
Here is the base class of TaskGen
:
qlib.workflow.task.gen.TaskGen
Qlib
provides a class RollingGen to generate a list of task
of the dataset in different date segments. This class allows users to verify the effect of data from different periods on the model in one experiment. More information is here.
To achieve higher efficiency and the possibility of cluster operation, Task Manager
will store all tasks in MongoDB. TaskManager
can fetch undone tasks automatically and manage the lifecycle of a set of tasks with error handling. Users MUST finish the configuration of MongoDB when using this module.
Users need to provide the MongoDB URL and database name for using TaskManager
in initialization or make a statement like this.
from qlib.config import C C["mongo"] = { "task_url" : "mongodb://localhost:27017/", # your MongoDB url "task_db_name" : "rolling_db" # database name }
qlib.workflow.task.manage.TaskManager
More information of Task Manager
can be found in here.
After generating and storing those task
, it's time to run the task
which is in the WAITING status. Qlib
provides a method called run_task
to run those task
in task pool, however, users can also customize how tasks are executed. An easy way to get the task_func
is using qlib.model.trainer.task_train
directly. It will run the whole workflow defined by task
, which includes Model, Dataset, Record.
qlib.workflow.task.manage.run_task
Meanwhile, Qlib
provides a module called Trainer
.
qlib.model.trainer.Trainer
Trainer
will train a list of tasks and return a list of model recorders. Qlib
offer two kinds of Trainer, TrainerR is the simplest way and TrainerRM is based on TaskManager to help manager tasks lifecycle automatically. If you do not want to use Task Manager
to manage tasks, then use TrainerR to train a list of tasks generated by TaskGen
is enough. Here are the details about different Trainer
.
Before collecting model training results, you need to use the qlib.init
to specify the path of mlruns.
To collect the results of task
after training, Qlib
provides Collector, Group and Ensemble to collect the results in a readable, expandable and loosely-coupled way.
Collector can collect objects from everywhere and process them such as merging, grouping, averaging and so on. It has 2 step action including collect
(collect anything in a dict) and process_collect
(process collected dict).
Group also has 2 steps including group
(can group a set of object based on group_func and change them to a dict) and reduce
(can make a dict become an ensemble based on some rule). For example: {(A,B,C1): object, (A,B,C2): object} ---group
---> {(A,B): {C1: object, C2: object}} ---reduce
---> {(A,B): object}
Ensemble can merge the objects in an ensemble. For example: {C1: object, C2: object} ---Ensemble
---> object. You can set the ensembles you want in the Collector
's process_list. Common ensembles include AverageEnsemble
and RollingEnsemble
. Average ensemble is used to ensemble the results of different models in the same time period. Rollingensemble is used to ensemble the results of different models in the same time period
So the hierarchy is Collector
's second step corresponds to Group
. And Group
's second step correspond to Ensemble
.
For more information, please see Collector, Group and Ensemble, or the example.