Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better __repr__ for ModuleList #90452

Closed
wants to merge 7 commits into from
Closed

Conversation

Guitaricet
Copy link
Contributor

@Guitaricet Guitaricet commented Dec 8, 2022

Problem

When models have a lot of complex repeated layers, print(module) output becomes unfeasible to work with. For example, current output of __repr__ for t5-small is 715 lines long.

Solution

Using better __repr__ it becomes 135. For t5-large, current __repr__ prints 1411 lines. Better __repr__135. Same numer as for t5-small, because most of the layers are just repeated. For EleutherAI/gpt-j-6B number of lines reduces form 483 to just 24.

Here's how it works: when ModuleList items have exactly the same __repr__ instead of printing both of them, it prints fN x {repr(item)}. Current code supports cases when the same ModuleList has multiple repeating items, which is especially useful when first/last layer of a block is different from the reset of them.

Better __repr__ should make model prints smaller, more beautiful and significantly more useful by highlighting the difference between repeated blocks instead of losing it in a wall of text.

Motivating real-life example.

You can try it out in this colab notebook.

Current __repr__ of gpt-j-6b output it too big to add it to this PR description:

GPTJModel(
  (wte): Embedding(50400, 4096)
  (drop): Dropout(p=0.0, inplace=False)
  (h): ModuleList(
    (0): GPTJBlock(
      (ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
      (attn): GPTJAttention(
        (attn_dropout): Dropout(p=0.0, inplace=False)
        (resid_dropout): Dropout(p=0.0, inplace=False)
        (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (out_proj): Linear(in_features=4096, out_features=4096, bias=False)
      )
      (mlp): GPTJMLP(
        (fc_in): Linear(in_features=4096, out_features=16384, bias=True)
        (fc_out): Linear(in_features=16384, out_features=4096, bias=True)
        (act): NewGELUActivation()
        (dropout): Dropout(p=0.0, inplace=False)
      )
    )
    (1): GPTJBlock(
      (ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
      (attn): GPTJAttention(
        (attn_dropout): Dropout(p=0.0, inplace=False)
        (resid_dropout): Dropout(p=0.0, inplace=False)
        (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (out_proj): Linear(in_features=4096, out_features=4096, bias=False)
      )
      (mlp): GPTJMLP(
        (fc_in): Linear(in_features=4096, out_features=16384, bias=True)
        (fc_out): Linear(in_features=16384, out_features=4096, bias=True)
        (act): NewGELUActivation()
        (dropout): Dropout(p=0.0, inplace=False)
      )
    )
    (2): GPTJBlock(
...

Better __repr__ output looks like this:

GPTJModel(
  (wte): Embedding(50400, 4096)
  (drop): Dropout(p=0.0, inplace=False)
  (h): ModuleList(
    (0-27): 28 x GPTJBlock(
      (ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
      (attn): GPTJAttention(
        (attn_dropout): Dropout(p=0.0, inplace=False)
        (resid_dropout): Dropout(p=0.0, inplace=False)
        (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (out_proj): Linear(in_features=4096, out_features=4096, bias=False)
      )
      (mlp): GPTJMLP(
        (fc_in): Linear(in_features=4096, out_features=16384, bias=True)
        (fc_out): Linear(in_features=16384, out_features=4096, bias=True)
        (act): NewGELUActivation()
        (dropout): Dropout(p=0.0, inplace=False)
      )
    )
  )
  (ln_f): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
)

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 8, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90452

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 2 Failures

As of commit e33eb43:

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on master:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Dec 8, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: Guitaricet / name: Vlad Lialin (f7ef572)

@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 9, 2022
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR that looks quite good!

torch/nn/modules/container.py Show resolved Hide resolved
lines = []
main_str = self._get_name() + '('
for n, b in zip(repeats, repeated_blocks):
local_repr = f"{n} x {b}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to just do the same as the original print when n == 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

torch/nn/modules/container.py Show resolved Hide resolved
@Guitaricet Guitaricet requested review from albanD and removed request for jbschlosser December 25, 2022 19:50
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!
Thanks for the update!

torch/nn/modules/container.py Show resolved Hide resolved
@albanD
Copy link
Collaborator

albanD commented Dec 26, 2022

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 26, 2022
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again. You can rebase by leaving the following comment on this PR:
@pytorchbot rebase

Details for Dev Infra team Raised by workflow job

@albanD
Copy link
Collaborator

albanD commented Dec 26, 2022

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

Guitaricet and others added 7 commits December 26, 2022 09:40
## Problem
When models have a lot of complex repeated layers, `print(module)` output becomes unfeasible to work with. For example, current output of `__repr__` for `t5-small` is `715 ` lines long.

## Solution
Using better `__repr__` it becomes `135`. For `t5-large`, current `__repr__` prints `1411` lines. Better `__repr__` — `135`. Same numer as for t5-small, because most of the layers are just repeated. For `EleutherAI/gpt-j-6B` number of lines reduces form `483` to just `24`.

Here's how it works: when ModuleList items have exactly the same `__repr__` instead of printing both of them, it prints f`N x {repr(item)}`. Current code supports cases when the same ModuleList has multiple repeating items, which is especially useful when first/last layer of a block is different from the reset of them.

Better `__repr__` should make model prints smaller, more beautiful and significantly more useful by highlighting the difference between repeated blocks instead of losing it in a wall of text.

## Motivating real-life example.

Current `__repr__` of gpt-j-6b output it too big to add it to this PR description:
```GPTJModel(
  (wte): Embedding(50400, 4096)
  (drop): Dropout(p=0.0, inplace=False)
  (h): ModuleList(
    (0): GPTJBlock(
      (ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
      (attn): GPTJAttention(
        (attn_dropout): Dropout(p=0.0, inplace=False)
        (resid_dropout): Dropout(p=0.0, inplace=False)
        (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (out_proj): Linear(in_features=4096, out_features=4096, bias=False)
      )
      (mlp): GPTJMLP(
        (fc_in): Linear(in_features=4096, out_features=16384, bias=True)
        (fc_out): Linear(in_features=16384, out_features=4096, bias=True)
        (act): NewGELUActivation()
        (dropout): Dropout(p=0.0, inplace=False)
      )
    )
    (1): GPTJBlock(
      (ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
      (attn): GPTJAttention(
        (attn_dropout): Dropout(p=0.0, inplace=False)
        (resid_dropout): Dropout(p=0.0, inplace=False)
        (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (out_proj): Linear(in_features=4096, out_features=4096, bias=False)
      )
      (mlp): GPTJMLP(
        (fc_in): Linear(in_features=4096, out_features=16384, bias=True)
        (fc_out): Linear(in_features=16384, out_features=4096, bias=True)
        (act): NewGELUActivation()
        (dropout): Dropout(p=0.0, inplace=False)
      )
    )
    (2): GPTJBlock(
...```

Better `__repr__` output looks like this:
```GPTJModel(
  (wte): Embedding(50400, 4096)
  (drop): Dropout(p=0.0, inplace=False)
  (h): ModuleList(
    28 x GPTJBlock(
      (ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
      (attn): GPTJAttention(
        (attn_dropout): Dropout(p=0.0, inplace=False)
        (resid_dropout): Dropout(p=0.0, inplace=False)
        (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (out_proj): Linear(in_features=4096, out_features=4096, bias=False)
      )
      (mlp): GPTJMLP(
        (fc_in): Linear(in_features=4096, out_features=16384, bias=True)
        (fc_out): Linear(in_features=16384, out_features=4096, bias=True)
        (act): NewGELUActivation()
        (dropout): Dropout(p=0.0, inplace=False)
      )
    )
  )
  (ln_f): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
)```
r was colliding with the r defined in a for-cycle above
@pytorchmergebot
Copy link
Collaborator

Successfully rebased patch-1 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout patch-1 && git pull --rebase)

@albanD
Copy link
Collaborator

albanD commented Dec 26, 2022

@pytorchbot merge -f "Unrelated CI failures"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@ppwwyyxx
Copy link
Contributor

It would be awesome to have this for Sequential as well

@albanD
Copy link
Collaborator

albanD commented May 10, 2024

Not sure why we didn't do it already haha
We can re-use the exact same repr code, can you send a PR doing that by any chance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants