refactor: add type hints for encoders #3449

dennisrall · 2023-06-29T07:27:12Z

add missing type hints in the encoders package (see Type hints and static analysis #1924)
introduce a TypedDict for the encoder output, so that we can ensure that the encoder_output key is present in the returned dict

github-actions · 2023-06-29T07:37:58Z

Unit Test Results

  4 files ±0   4 suites ±0 57m 19s ⏱️ + 1m 24s
34 tests ±0 27 ✔️ - 2   7 💤 +2 0 ❌ ±0
68 runs ±0 54 ✔️ - 4 14 💤 +4 0 ❌ ±0

Results for commit 70fb4fb. ± Comparison against base commit 60f1416.

This pull request skips 2 tests.

tests.regression_tests.benchmark.test_model_performance ‑ test_performance[ames_housing.gbm.yaml]
tests.regression_tests.benchmark.test_model_performance ‑ test_performance[mercedes_benz_greener.gbm.yaml]

♻️ This comment has been updated with latest results.

justinxzhao

Thanks for the change! This type of change is great for code readability.

justinxzhao · 2023-07-05T21:55:27Z

ludwig/encoders/types.py

+import torch
+
+
+class EncoderOutputDict(TypedDict, total=False):


It would be good to add a comment about which encoders return which tensors, or that only sequence encoders return encoder_output_state and attentions.

Yeah, your're right! I've added the comments. attentions are only used by the ViTLegacy Encoder, which is planned to be removed in 0.8. So this return tensor can then be removed too

dennisrall · 2023-07-06T08:20:01Z

During implementation I noticed that the bag, set and categorical encoders return simple tensors instead of dicts. Is this intentional, or is there some work to do to make them return dicts as well? (I could take this on)

justinxzhao · 2023-07-07T16:18:41Z

During implementation I noticed that the bag, set and categorical encoders return simple tensors instead of dicts. Is this intentional, or is there some work to do to make them return dicts as well? (I could take this on)

Good observation! I'd say it's not intentional and probably came about once there were multiple tensors that needed to be returned from the encoders. CC: @w4nderlust @tgaddair @jimthompson5802

The reason why the model still works with some encoders returning tensor dicts with an encoder_output key and other encoders returning raw tensors is because the encoders are all wrapped by their respective <Type>InputFeature module.

You'll notice that the BagInputFeature, SetInputFeature, and CategoryInputFeature modules package the encoder's output in a tensor dict (with "encoder_output" tensor key) while the other <TYPE>InputFeature modules like NumberInputFeature return the output from their encoders (which are already in tensor dict form) as is.

This is definitely a bit confusing -- if making encoder outputs consistent something you would be able to pick up as part of this PR, that would be awesome.

dennisrall · 2023-07-08T09:32:47Z

Thanks for the explanation. I'll give it a try

… dicts

dennisrall · 2023-07-11T18:55:13Z

Some integration tests with the explainer functionality are failing. I am not very familiar with the explainer feature. Can you give me some hints on how to fix them?

…udwig into add-type-hint-for-encoders

justinxzhao · 2023-07-12T15:25:00Z

Some integration tests with the explainer functionality are failing. I am not very familiar with the explainer feature. Can you give me some hints on how to fix them?

Hi @Dennis-Rall, I pushed up a commit that should fix the explanation tests.

In Ludwig's implementation of explainability with captum, we provide a list of torch modules that should be used to compute attribution scores for each feature.

For embedded feature types, the layers provided to captum are determined by encoder.get_embedding_layer() (CategoricalOneHotEncoder.get_embedding_layer(), for example). Internally, captum assumes these modules return tensors and captum tries to call .clone() on them.

Your PR modified category encoders to return a dictionary of tensors (to make this consistent with the rest of ludwig). So the additional change to fix explainability functionality is to add an additional nn.Identity() module shim to capture the encoder's non-dict output and, for captum, change encoder.get_embedding_layer() to return a reference to that module.

Hope that was clarifying!

dennisrall · 2023-07-13T08:55:10Z

Yeah, thanks for the explanation and the fix 😄

dennisrall force-pushed the add-type-hint-for-encoders branch 2 times, most recently from f694ef5 to 745375a Compare June 29, 2023 10:07

justinxzhao approved these changes Jul 5, 2023

View reviewed changes

dennisrall added 3 commits July 11, 2023 17:01

refactor: add type hints for encoders

a0a90d1

fix: remove Required hint from EncoderOutputDict

90f742e

refactor: add comments to EncoderOutputDict

f600e76

dennisrall force-pushed the add-type-hint-for-encoders branch from e5d6a83 to 0df45d0 Compare July 11, 2023 15:02

dennisrall changed the title ~~Add type hints for encoders~~ refactor: add type hints for encoders Jul 11, 2023

dennisrall force-pushed the add-type-hint-for-encoders branch from 0df45d0 to f3eeee4 Compare July 11, 2023 15:47

refactor: make BagEncoders CategoricalEncoders and SetEncoders return…

5595148

… dicts

dennisrall force-pushed the add-type-hint-for-encoders branch from f3eeee4 to 5595148 Compare July 11, 2023 16:15

dennisrall and others added 3 commits July 12, 2023 14:16

refactor: use constant for encoder_output and encoder_output_state

5465436

Adjust get_embedding_layer for category encoders.

61bdece

Merge branch 'add-type-hint-for-encoders' of github.com:Dennis-Rall/l…

a160167

…udwig into add-type-hint-for-encoders

Fix bad merge.

70fb4fb

justinxzhao merged commit 73917a3 into ludwig-ai:master Jul 13, 2023
14 of 16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: add type hints for encoders #3449

refactor: add type hints for encoders #3449

dennisrall commented Jun 29, 2023

github-actions bot commented Jun 29, 2023 •

edited

justinxzhao left a comment

justinxzhao Jul 5, 2023

dennisrall Jul 6, 2023

dennisrall commented Jul 6, 2023

justinxzhao commented Jul 7, 2023

dennisrall commented Jul 8, 2023

dennisrall commented Jul 11, 2023

justinxzhao commented Jul 12, 2023

dennisrall commented Jul 13, 2023

		import torch


		class EncoderOutputDict(TypedDict, total=False):

refactor: add type hints for encoders #3449

refactor: add type hints for encoders #3449

Conversation

dennisrall commented Jun 29, 2023

github-actions bot commented Jun 29, 2023 • edited

Unit Test Results

justinxzhao left a comment

Choose a reason for hiding this comment

justinxzhao Jul 5, 2023

Choose a reason for hiding this comment

dennisrall Jul 6, 2023

Choose a reason for hiding this comment

dennisrall commented Jul 6, 2023

justinxzhao commented Jul 7, 2023

dennisrall commented Jul 8, 2023

dennisrall commented Jul 11, 2023

justinxzhao commented Jul 12, 2023

dennisrall commented Jul 13, 2023

github-actions bot commented Jun 29, 2023 •

edited