Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Action and Batch class to make emmental more modulized #116

Merged
merged 14 commits into from Nov 16, 2021
Merged

Conversation

senwu
Copy link
Owner

@senwu senwu commented Nov 15, 2021

Description of the proposed changes

To make Emmental more extendable and easy to use for downstream tasks.

  1. We introduce two new classes: Action and Batch to make the APIs more modularized.
  • Action are objects that populate the task_flow sequence. It has three attributes: name, module and inputs where name is the name of the action, module is the module name of the action and inputs is the inputs to the action. By introducing a class for specifying actions in the task_flow, we standardize its definition. Moreover, Action enables more user flexibility in specifying a task flow as we can now support a wider-range of formats for the input attribute of a task_flow as discussed in (2).

  • Batch is the object that is returned from the Emmental Scheduler. Each Batch object has 6 attributes: uids (uids of the samples), X_dict (input features of the samples), Y_dict (output of the samples), task_to_label_dict (the task to label mapping), data_name (name of the dataset that samples come from), and split (the split information). By defining the Batch class, we unify and standardize the training scheduler interface by ensuring a consistent output format for all schedulers.

  1. We make the task_flow more flexible by supporting more formats for specifying inputs to each module.
  • It now supports str as inputs (e.g., inputs="input1") which means take the input1's output as input for current action.
  • It also supports a list as inputs which can be constructed by three different formats:
    a) x (x is str) where takes whole output of x's output as input: this enables users to pass all outputs from one module to another without having to manually specify every input to the module
    b) (x, y) (y is int) where takes x's y-th output as input
    c) (x, y) (y is str) where takes x's output str as input

Few emmental.EmmentalTaskFlowAction examples:

from emmental.Action as Act
Act(name="input", module="input_module0", inputs=[("_input_", "data")])
Act(name="input", module="input_module0", inputs=[("_input_", 0)])
Act(name="input", module="input_module0", inputs=["_input_"])
Act(name="input", module="input_module0", inputs="_input_")
Act(name="input", module="input_module0", inputs=[("_input_", "data"), ("_input_", 1), "_input_"])
Act(name="input", module="input_module0", inputs=None)

This design also can be applied to action_outputs, here are few example:

action_outputs=[(f"{task_name}_pred_head", 0), ("_input_", "data"), f"{task_name}_pred_head"]
action_outputs="_input_"

Test plan

Pass the existing tests.

Checklist

  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • I have updated the CHANGELOG.rst accordingly.

@codecov
Copy link

codecov bot commented Nov 15, 2021

Codecov Report

Merging #116 (b3035f4) into master (6cad215) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #116   +/-   ##
=======================================
  Coverage   92.12%   92.12%           
=======================================
  Files          40       40           
  Lines        2018     2018           
  Branches      431      431           
=======================================
  Hits         1859     1859           
  Misses         94       94           
  Partials       65       65           
Flag Coverage Δ
unittests 92.12% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/emmental/model.py 93.78% <ø> (ø)
src/emmental/__init__.py 100.00% <100.00%> (ø)
src/emmental/task.py 100.00% <100.00%> (ø)

@senwu senwu merged commit 17b5075 into master Nov 16, 2021
@senwu senwu deleted the module branch November 16, 2021 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant