-
Notifications
You must be signed in to change notification settings - Fork 248
[AOTI] Add a --max-seq-length option for export #1018
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1018
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 1be8432 with merge base ce41944 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: This improves best tokens/sec from 73 to 85.
| "--max-seq-length", | ||
| type=int, | ||
| default=None, | ||
| help="Set maximum length sequence when before calling torch.export", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set the default to 300 and update the help string so that it's clear what the default is (300)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my initial implementation. Then there was an issue when running eval.
When running eval, we use --dynamic-shapes which uses a larger max_seq_length, i.e. model.config.max_seq_length. But in theory, we should not stop user from calling eval with both options, something like --dynamic-shapes --max-seq-length 1000. When that happens, if args.max_seq_length has a default value, we will have no way to distinguish if args.max_seq_length is from a default value or from an intentional user overwriting.
| and not builder_args.dynamic_shapes | ||
| ): | ||
| print("Setting max_seq_length to 300 for DSO export.") | ||
| builder_args.max_seq_length = 300 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you shouldn't need this if you set the default in the other file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add more details to the printout if that helps.
300 is used in specific cases, makes sense to not set it in argparse
Summary: This improves best tokens/sec from 73 to 85. Co-authored-by: Jack-Khuu <jack.khuu.7@gmail.com>
Summary: This improves best tokens/sec from 73 to 85.