Better repr for ModuleList #90452

Guitaricet · 2022-12-08T05:56:44Z

Problem

When models have a lot of complex repeated layers, print(module) output becomes unfeasible to work with. For example, current output of __repr__ for t5-small is 715 lines long.

Solution

Using better __repr__ it becomes 135. For t5-large, current __repr__ prints 1411 lines. Better __repr__ — 135. Same numer as for t5-small, because most of the layers are just repeated. For EleutherAI/gpt-j-6B number of lines reduces form 483 to just 24.

Here's how it works: when ModuleList items have exactly the same __repr__ instead of printing both of them, it prints fN x {repr(item)}. Current code supports cases when the same ModuleList has multiple repeating items, which is especially useful when first/last layer of a block is different from the reset of them.

Better __repr__ should make model prints smaller, more beautiful and significantly more useful by highlighting the difference between repeated blocks instead of losing it in a wall of text.

Motivating real-life example.

You can try it out in this colab notebook.

Current __repr__ of gpt-j-6b output it too big to add it to this PR description:

GPTJModel(
  (wte): Embedding(50400, 4096)
  (drop): Dropout(p=0.0, inplace=False)
  (h): ModuleList(
    (0): GPTJBlock(
      (ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
      (attn): GPTJAttention(
        (attn_dropout): Dropout(p=0.0, inplace=False)
        (resid_dropout): Dropout(p=0.0, inplace=False)
        (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (out_proj): Linear(in_features=4096, out_features=4096, bias=False)
      )
      (mlp): GPTJMLP(
        (fc_in): Linear(in_features=4096, out_features=16384, bias=True)
        (fc_out): Linear(in_features=16384, out_features=4096, bias=True)
        (act): NewGELUActivation()
        (dropout): Dropout(p=0.0, inplace=False)
      )
    )
    (1): GPTJBlock(
      (ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
      (attn): GPTJAttention(
        (attn_dropout): Dropout(p=0.0, inplace=False)
        (resid_dropout): Dropout(p=0.0, inplace=False)
        (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (out_proj): Linear(in_features=4096, out_features=4096, bias=False)
      )
      (mlp): GPTJMLP(
        (fc_in): Linear(in_features=4096, out_features=16384, bias=True)
        (fc_out): Linear(in_features=16384, out_features=4096, bias=True)
        (act): NewGELUActivation()
        (dropout): Dropout(p=0.0, inplace=False)
      )
    )
    (2): GPTJBlock(
...

Better __repr__ output looks like this:

GPTJModel(
  (wte): Embedding(50400, 4096)
  (drop): Dropout(p=0.0, inplace=False)
  (h): ModuleList(
    (0-27): 28 x GPTJBlock(
      (ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
      (attn): GPTJAttention(
        (attn_dropout): Dropout(p=0.0, inplace=False)
        (resid_dropout): Dropout(p=0.0, inplace=False)
        (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (out_proj): Linear(in_features=4096, out_features=4096, bias=False)
      )
      (mlp): GPTJMLP(
        (fc_in): Linear(in_features=4096, out_features=16384, bias=True)
        (fc_out): Linear(in_features=16384, out_features=4096, bias=True)
        (act): NewGELUActivation()
        (dropout): Dropout(p=0.0, inplace=False)
      )
    )
  )
  (ln_f): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
)

pytorch-bot · 2022-12-08T05:56:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90452

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Many CI workflows are failing with KeyError: 'jobs'

❌ 2 Failures

As of commit e33eb43:

NEW FAILURES - The following jobs have failed:

linux-focal-py3.7-clang7-asan / test (default, 1, 5, linux.2xlarge)

FLAKY - The following jobs failed but were likely due to flakiness present on master:

linux-focal-rocm5.3-py3.8 / test (default, 2, 2, linux.rocm.gpu)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2022-12-08T05:56:49Z

The committers listed above are authorized under a signed CLA.

✅ login: Guitaricet / name: Vlad Lialin (f7ef572)

albanD

Thanks for the PR that looks quite good!

torch/nn/modules/container.py

albanD · 2022-12-14T14:50:11Z

torch/nn/modules/container.py

+        lines = []
+        main_str = self._get_name() + '('
+        for n, b in zip(repeats, repeated_blocks):
+            local_repr = f"{n} x {b}"


Do we want to just do the same as the original print when n == 1?

torch/nn/modules/container.py

albanD

Perfect!
Thanks for the update!

torch/nn/modules/container.py

albanD · 2022-12-26T09:25:36Z

@pytorchbot merge

pytorchmergebot · 2022-12-26T09:29:28Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2022-12-26T09:29:29Z

Merge failed

Reason: This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again. You can rebase by leaving the following comment on this PR:
@pytorchbot rebase

Details for Dev Infra team

Raised by workflow job

albanD · 2022-12-26T09:38:17Z

@pytorchbot rebase

pytorchmergebot · 2022-12-26T09:40:01Z

@pytorchbot successfully started a rebase job. Check the current status here

## Problem When models have a lot of complex repeated layers, `print(module)` output becomes unfeasible to work with. For example, current output of `__repr__` for `t5-small` is `715 ` lines long. ## Solution Using better `__repr__` it becomes `135`. For `t5-large`, current `__repr__` prints `1411` lines. Better `__repr__` — `135`. Same numer as for t5-small, because most of the layers are just repeated. For `EleutherAI/gpt-j-6B` number of lines reduces form `483` to just `24`. Here's how it works: when ModuleList items have exactly the same `__repr__` instead of printing both of them, it prints f`N x {repr(item)}`. Current code supports cases when the same ModuleList has multiple repeating items, which is especially useful when first/last layer of a block is different from the reset of them. Better `__repr__` should make model prints smaller, more beautiful and significantly more useful by highlighting the difference between repeated blocks instead of losing it in a wall of text. ## Motivating real-life example. Current `__repr__` of gpt-j-6b output it too big to add it to this PR description: ```GPTJModel( (wte): Embedding(50400, 4096) (drop): Dropout(p=0.0, inplace=False) (h): ModuleList( (0): GPTJBlock( (ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True) (attn): GPTJAttention( (attn_dropout): Dropout(p=0.0, inplace=False) (resid_dropout): Dropout(p=0.0, inplace=False) (k_proj): Linear(in_features=4096, out_features=4096, bias=False) (v_proj): Linear(in_features=4096, out_features=4096, bias=False) (q_proj): Linear(in_features=4096, out_features=4096, bias=False) (out_proj): Linear(in_features=4096, out_features=4096, bias=False) ) (mlp): GPTJMLP( (fc_in): Linear(in_features=4096, out_features=16384, bias=True) (fc_out): Linear(in_features=16384, out_features=4096, bias=True) (act): NewGELUActivation() (dropout): Dropout(p=0.0, inplace=False) ) ) (1): GPTJBlock( (ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True) (attn): GPTJAttention( (attn_dropout): Dropout(p=0.0, inplace=False) (resid_dropout): Dropout(p=0.0, inplace=False) (k_proj): Linear(in_features=4096, out_features=4096, bias=False) (v_proj): Linear(in_features=4096, out_features=4096, bias=False) (q_proj): Linear(in_features=4096, out_features=4096, bias=False) (out_proj): Linear(in_features=4096, out_features=4096, bias=False) ) (mlp): GPTJMLP( (fc_in): Linear(in_features=4096, out_features=16384, bias=True) (fc_out): Linear(in_features=16384, out_features=4096, bias=True) (act): NewGELUActivation() (dropout): Dropout(p=0.0, inplace=False) ) ) (2): GPTJBlock( ...``` Better `__repr__` output looks like this: ```GPTJModel( (wte): Embedding(50400, 4096) (drop): Dropout(p=0.0, inplace=False) (h): ModuleList( 28 x GPTJBlock( (ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True) (attn): GPTJAttention( (attn_dropout): Dropout(p=0.0, inplace=False) (resid_dropout): Dropout(p=0.0, inplace=False) (k_proj): Linear(in_features=4096, out_features=4096, bias=False) (v_proj): Linear(in_features=4096, out_features=4096, bias=False) (q_proj): Linear(in_features=4096, out_features=4096, bias=False) (out_proj): Linear(in_features=4096, out_features=4096, bias=False) ) (mlp): GPTJMLP( (fc_in): Linear(in_features=4096, out_features=16384, bias=True) (fc_out): Linear(in_features=16384, out_features=4096, bias=True) (act): NewGELUActivation() (dropout): Dropout(p=0.0, inplace=False) ) ) ) (ln_f): LayerNorm((4096,), eps=1e-05, elementwise_affine=True) )```

r was colliding with the r defined in a for-cycle above

…repr__

pytorchmergebot · 2022-12-26T09:40:05Z

Successfully rebased patch-1 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout patch-1 && git pull --rebase)

albanD · 2022-12-26T17:01:58Z

@pytorchbot merge -f "Unrelated CI failures"

pytorchmergebot · 2022-12-26T17:05:09Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

ppwwyyxx · 2024-05-10T11:04:20Z

It would be awesome to have this for Sequential as well

albanD · 2024-05-10T12:58:09Z

Not sure why we didn't do it already haha
We can re-use the exact same repr code, can you send a PR doing that by any chance?

Guitaricet requested review from albanD and jbschlosser as code owners December 8, 2022 05:56

pytorchbot added the open source label Dec 8, 2022

soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 9, 2022

albanD reviewed Dec 14, 2022

View reviewed changes

Guitaricet requested review from albanD and removed request for jbschlosser December 25, 2022 19:50

albanD approved these changes Dec 26, 2022

View reviewed changes

torch/nn/modules/container.py Show resolved Hide resolved

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 26, 2022

Guitaricet and others added 7 commits December 26, 2022 09:40

fix canonical first argument name

002aee9

hotfix: module -> self in ModuleList.__repr__

a386767

rename r -> n to resolve linting error

03c592b

r was colliding with the r defined in a for-cycle above

update ModuleList.__repr__ to make it more consistent with default __…

208af71

…repr__

fix edge case with empty ModuleList

d162504

add : after () in ModuleList repr

e33eb43

pytorchmergebot force-pushed the patch-1 branch from 03bf6c2 to e33eb43 Compare December 26, 2022 09:40

pytorchmergebot added the Merged label Dec 26, 2022

pytorchmergebot closed this in 0b255b3 Dec 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better repr for ModuleList #90452

Better repr for ModuleList #90452

Guitaricet commented Dec 8, 2022 •

edited

pytorch-bot bot commented Dec 8, 2022 •

edited

linux-foundation-easycla bot commented Dec 8, 2022 •

edited

albanD left a comment

albanD Dec 14, 2022

Guitaricet Dec 19, 2022

albanD left a comment

albanD commented Dec 26, 2022

pytorchmergebot commented Dec 26, 2022

pytorchmergebot commented Dec 26, 2022

albanD commented Dec 26, 2022

pytorchmergebot commented Dec 26, 2022

pytorchmergebot commented Dec 26, 2022

albanD commented Dec 26, 2022

pytorchmergebot commented Dec 26, 2022

ppwwyyxx commented May 10, 2024

albanD commented May 10, 2024

Better __repr__ for ModuleList #90452

Better __repr__ for ModuleList #90452

Conversation

Guitaricet commented Dec 8, 2022 • edited

Problem

Solution

Motivating real-life example.

pytorch-bot bot commented Dec 8, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90452

❗ 1 Active SEVs

❌ 2 Failures

linux-foundation-easycla bot commented Dec 8, 2022 • edited

albanD left a comment

Choose a reason for hiding this comment

albanD Dec 14, 2022

Choose a reason for hiding this comment

Guitaricet Dec 19, 2022

Choose a reason for hiding this comment

albanD left a comment

Choose a reason for hiding this comment

albanD commented Dec 26, 2022

pytorchmergebot commented Dec 26, 2022

Merge started

pytorchmergebot commented Dec 26, 2022

Merge failed

albanD commented Dec 26, 2022

pytorchmergebot commented Dec 26, 2022

pytorchmergebot commented Dec 26, 2022

albanD commented Dec 26, 2022

pytorchmergebot commented Dec 26, 2022

Merge started

ppwwyyxx commented May 10, 2024

albanD commented May 10, 2024

Better repr for ModuleList #90452

Better repr for ModuleList #90452

Guitaricet commented Dec 8, 2022 •

edited

pytorch-bot bot commented Dec 8, 2022 •

edited

linux-foundation-easycla bot commented Dec 8, 2022 •

edited