Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separated train and dev for ReturnnRasrTrainingJob #165

Open
JackTemaki opened this issue Oct 19, 2021 · 6 comments
Open

Separated train and dev for ReturnnRasrTrainingJob #165

JackTemaki opened this issue Oct 19, 2021 · 6 comments
Assignees

Comments

@JackTemaki
Copy link
Contributor

Currently the ReturnnRasrTrainingJob allows for different train_crp and dev_crp, but not for different feature_flows and alignments, which prohibits many possible pipeline designs.

I know that there are multiple "private" versions of this Job that somehow circumvent this issue, but I think it would be good if we have a correct public version that deals with this issue. So I want to open the discussion here, if and how we can alter the Job without breaking stuff, or if we really need a new one for this? (In the worst case ReturnnRasrTrainingJobV2)

@Marvin84
Copy link
Contributor

As I already mentioned on other occasions, I also find the separation of the flows a useful feature. I know that @curufinwe prefers to rely on a merged bundle file. However, I find it more flexible and intuitive to have these as separate parameters for the registration of the job, rather than being dependent on running an additional job for merging the bundle.

@michelwi
Copy link
Contributor

there is already the option to have additional sprint config files. maybe we can make some additional sprint flow files?

My favorite solution would be to abandon the ReturnnRasrTrainingJob altogether. Instead we add the option to write additional config/flow/general-files in the ReturnnTrainingJob and then have some RasrDatasetHelper function/class/(probably not job) that would create/prepare the necessary files.

More verbose:

  • class ConfigFile, from this we inherit ReturnnConfig, RasrConfig, RasrFlow
    (or maybe we can even keep the existing objects)
  • ReturnnTrainingJob gets the main ReturnnConfig object and optionally a
    dict[filename]->ConfigObject and writes all objects to the files in the work folder.
  • Each object would know how the corresponding config is written
  • All the additional logic of the ReturnnRasrTrainingJob is moved into a DatasetHelper
  • We can have different dataset helpers for ExternSprintDataset, HDFDataset, MetaDataset, etc

Then its trivial to have the helper create 1 flow file for all, 2 config/flow files for train/cv or even 3 different files if your devtrain dataset is somehow not part of train. Also multiple "sprint" losses can be covered. Or in the future we might add support for a different toolkit altogether.

@JackTemaki
Copy link
Contributor Author

The idea is already good, but we could even keep the existing ReturnnTrainingJob, and instead have separate jobs that write the config files, and we just need to add the corresponding paths to the ReturnnConfig.

Pro:

  • no changes to the current job, not introducing ConfigObjects

Con:

  • More overhead when debugging / editing files within a job
  • Adding new write jobs

@michelwi
Copy link
Contributor

michelwi commented Nov 3, 2021

I like your suggestion and would prefer to keep a single ReturnnTrainingJob at the cost of introducing separate write config jobs. I also wouldn't mind to keep the ConfigObjects, but I have no strong opinion here.

We could maybe only create a generic WriteConfigJob that would accept any ReturnnConfig, RasrConfig, RasrFlow,... object and calls the write() method to write it to disc. Such a job would anyway be useful for debugging siyphus config generation, as we could look at the generated config files without the need to directly start a training.

let me add another Pro:

  • more obvious that same dataset is used in different training jobs.

@albertz
Copy link
Member

albertz commented Nov 3, 2021

I would also vote to abandon ReturnnRasrTrainingJob and just use ReturnnTrainingJob.

@JackTemaki
Copy link
Contributor Author

This problem will become obsolete for Hybrid setups when: rwth-i6/i6_experiments#174 is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants