-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend ReturnnRasrDataInput to produce valid Dataset config dict #130
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR needs further offline discussion, as there are multiple efforts simultaneously that try to solve the same things. I will try to organize this as soon as possible.
My understanding was that we would for now include all different subroutines needed for everyone, leaving the freedom of choosing, and once everything is available we will decide how to merge? |
Okay, I do not like this approach here, but we can discuss the PR if you want. Just be aware that once this gets merged and uses refactoring gets difficult, so any alternative has to be at a new location/under a new name, and the code that gets merge here can not be deleted/cleaned up easily. I leave this also to @christophmluscher and @michelwi to decide. My comments are then:
|
Maybe a little bit more background information: We want to have RETURNN dataset creation helpers independent at a single place in the end, so also the OggZipHdf input is already somewhat problematic. But maybe we just implement these two inputs now and then just deprecate them once we have something nicer. I just hope that will work out without creating a mess... |
The. main point for having this is to be able to have separate feature
flows for train and dev. Do other approaches offer this possibility?
…On Tue, 18 Apr 2023 at 10:25, Nick Rossenbach ***@***.***> wrote:
Maybe a little bit more background information: We want to have RETURNN
dataset creation helpers independent at a single place in the end, so also
the OggZipHdf input is already somewhat problematic. But maybe we just
implement these two inputs now and then just deprecate them once we have
something nicer. I just hope that will work out without creating a mess...
—
Reply to this email directly, view it on GitHub
<#130 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEQ6G5P6YBFVXHTGTOYDY2DXBZFXFANCNFSM6AAAAAAW7MLILA>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
It am not discussing the what, just the how! I already said in my opinion this can be merged, as long as you get rid of the ReturnnRasrTrainingJob dependency here. |
I'm also okay with more (offline) discussions and would even welcome it. Regarding your comments, @JackTemaki, I'm unsure about these things:
|
I will most likely not use or work with this code, so you may do however you want. |
Exactly this is the problem, this should be cleaned up so that we actually know what the parameters are doing. But I understand this is a time critical issue, and you just want working code. Then please add a comment to the "get" functions that this is outdated code triggering unknown parameters of RASR via the ReturnnRasrTrainingJob and should be used with care. I would still require to unify the input features for the dataset separately. |
…sr training args typed dict.
Sorry that I did not formulate this well: If you really just copy the exact logic without cleaning up, then you should keep just calling the code from the ReturnnRasrTrainingJob instead! |
…r default values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nice, you even changed it to frozen dataclass :) I did not want to complain about everything so I skipped on that one.
Maybe some extra docstrings, otherwise good to go
Yes no problem :D it was mostly necessary because the |
common/setups/rasr/util/nn.py
Outdated
@@ -1,15 +1,18 @@ | |||
__all__ = [ | |||
"RasrTrainingArgs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer ReturnnRasrTrainingArgs
to match the job class name.
common/setups/rasr/util/nn.py
Outdated
@@ -39,6 +59,7 @@ def __init__( | |||
shuffle_data: bool = True, | |||
stm: Optional[tk.Path] = None, | |||
glm: Optional[tk.Path] = None, | |||
rasr_training_args: Optional[RasrTrainingArgs] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rasr_training_args: Optional[RasrTrainingArgs] = None, | |
returnn_rasr_training_args: Optional[RasrTrainingArgs] = None, |
Same here
…it with default args in input class
Similar to the
OggZipHdfDataInput
theReturnnRasrDataInput
gets a workingget_data_dict
implementation. This is useful for replacing theReturnnRasrTrainingJob
with the standardReturnnTrainingJob
for more functionality, e.g. producing different feature flow files for train and dev (see also this discussion)