Separated train and dev for ReturnnRasrTrainingJob #165

JackTemaki · 2021-10-19T15:19:41Z

Currently the ReturnnRasrTrainingJob allows for different train_crp and dev_crp, but not for different feature_flows and alignments, which prohibits many possible pipeline designs.

I know that there are multiple "private" versions of this Job that somehow circumvent this issue, but I think it would be good if we have a correct public version that deals with this issue. So I want to open the discussion here, if and how we can alter the Job without breaking stuff, or if we really need a new one for this? (In the worst case ReturnnRasrTrainingJobV2)

The text was updated successfully, but these errors were encountered:

Marvin84 · 2021-10-19T15:32:05Z

As I already mentioned on other occasions, I also find the separation of the flows a useful feature. I know that @curufinwe prefers to rely on a merged bundle file. However, I find it more flexible and intuitive to have these as separate parameters for the registration of the job, rather than being dependent on running an additional job for merging the bundle.

michelwi · 2021-10-20T11:44:32Z

there is already the option to have additional sprint config files. maybe we can make some additional sprint flow files?

My favorite solution would be to abandon the ReturnnRasrTrainingJob altogether. Instead we add the option to write additional config/flow/general-files in the ReturnnTrainingJob and then have some RasrDatasetHelper function/class/(probably not job) that would create/prepare the necessary files.

More verbose:

class ConfigFile, from this we inherit ReturnnConfig, RasrConfig, RasrFlow
(or maybe we can even keep the existing objects)
ReturnnTrainingJob gets the main ReturnnConfig object and optionally a
dict[filename]->ConfigObject and writes all objects to the files in the work folder.
Each object would know how the corresponding config is written
All the additional logic of the ReturnnRasrTrainingJob is moved into a DatasetHelper
We can have different dataset helpers for ExternSprintDataset, HDFDataset, MetaDataset, etc

Then its trivial to have the helper create 1 flow file for all, 2 config/flow files for train/cv or even 3 different files if your devtrain dataset is somehow not part of train. Also multiple "sprint" losses can be covered. Or in the future we might add support for a different toolkit altogether.

JackTemaki · 2021-11-02T13:23:38Z

The idea is already good, but we could even keep the existing ReturnnTrainingJob, and instead have separate jobs that write the config files, and we just need to add the corresponding paths to the ReturnnConfig.

Pro:

no changes to the current job, not introducing ConfigObjects

Con:

More overhead when debugging / editing files within a job
Adding new write jobs

michelwi · 2021-11-03T11:39:56Z

I like your suggestion and would prefer to keep a single ReturnnTrainingJob at the cost of introducing separate write config jobs. I also wouldn't mind to keep the ConfigObjects, but I have no strong opinion here.

We could maybe only create a generic WriteConfigJob that would accept any ReturnnConfig, RasrConfig, RasrFlow,... object and calls the write() method to write it to disc. Such a job would anyway be useful for debugging siyphus config generation, as we could look at the generated config files without the need to directly start a training.

let me add another Pro:

more obvious that same dataset is used in different training jobs.

albertz · 2021-11-03T12:48:36Z

I would also vote to abandon ReturnnRasrTrainingJob and just use ReturnnTrainingJob.

JackTemaki · 2023-09-22T15:19:22Z

This problem will become obsolete for Hybrid setups when: rwth-i6/i6_experiments#174 is merged.

JackTemaki assigned curufinwe and Marvin84 Oct 19, 2021

DanEnergetics mentioned this issue Mar 15, 2023

Job to dump flow network #380

Merged

DanEnergetics mentioned this issue Apr 15, 2023

Extend ReturnnRasrDataInput to produce valid Dataset config dict rwth-i6/i6_experiments#130

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separated train and dev for ReturnnRasrTrainingJob #165

Separated train and dev for ReturnnRasrTrainingJob #165

JackTemaki commented Oct 19, 2021

Marvin84 commented Oct 19, 2021

michelwi commented Oct 20, 2021

JackTemaki commented Nov 2, 2021

michelwi commented Nov 3, 2021

albertz commented Nov 3, 2021

JackTemaki commented Sep 22, 2023

Separated train and dev for ReturnnRasrTrainingJob #165

Separated train and dev for ReturnnRasrTrainingJob #165

Comments

JackTemaki commented Oct 19, 2021

Marvin84 commented Oct 19, 2021

michelwi commented Oct 20, 2021

JackTemaki commented Nov 2, 2021

michelwi commented Nov 3, 2021

albertz commented Nov 3, 2021

JackTemaki commented Sep 22, 2023