Add Action and Batch class to make emmental more modulized #116
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of the proposed changes
To make Emmental more extendable and easy to use for downstream tasks.
Action
andBatch
to make the APIs more modularized.Action
are objects that populate thetask_flow
sequence. It has three attributes: name, module and inputs where name is the name of the action, module is the module name of the action and inputs is the inputs to the action. By introducing a class for specifying actions in thetask_flow
, we standardize its definition. Moreover,Action
enables more user flexibility in specifying a task flow as we can now support a wider-range of formats for the input attribute of atask_flow
as discussed in (2).Batch
is the object that is returned from the EmmentalScheduler
. EachBatch
object has 6 attributes: uids (uids of the samples), X_dict (input features of the samples), Y_dict (output of the samples), task_to_label_dict (the task to label mapping), data_name (name of the dataset that samples come from), and split (the split information). By defining theBatch
class, we unify and standardize the training scheduler interface by ensuring a consistent output format for all schedulers.task_flow
more flexible by supporting more formats for specifying inputs to each module.input1
's output as input for current action.a) x (x is str) where takes whole output of x's output as input: this enables users to pass all outputs from one module to another without having to manually specify every input to the module
b) (x, y) (y is int) where takes x's y-th output as input
c) (x, y) (y is str) where takes x's output str as input
Few emmental.EmmentalTaskFlowAction examples:
This design also can be applied to action_outputs, here are few example:
Test plan
Pass the existing tests.
Checklist