Skip to content

Conversation

@JRosenkranz
Copy link
Contributor

@JRosenkranz JRosenkranz commented Apr 2, 2025

This PR will add multi-aiu support to shape testing.

Note: This is starting with test_decoders.py

To run with multi-aiu:

torchrun --nproc-per-node=2 -m pytest tests/models/test_decoders.py -s

…dded tests for multiple shape warmup

Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
@JRosenkranz JRosenkranz requested a review from ani300 April 2, 2025 18:38
Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
@gpaulsen
Copy link
Contributor

gpaulsen commented Apr 4, 2025

What read the new variable FMS_TEST_SHAPES_DISTRIBUTED ?

Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
@JRosenkranz JRosenkranz marked this pull request as ready for review April 8, 2025 13:57
@JRosenkranz JRosenkranz requested a review from dpatel-ops April 8, 2025 13:57
if USE_DISTRIBUTED:
dist.init_process_group()
# Fix until PT 2.3
torch._C._distributed_c10d._register_process_group("default", dist.group.WORLD)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer needed

Copy link
Contributor

@ani300 ani300 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm after removing unneeded line

Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
@JRosenkranz
Copy link
Contributor Author

bot:test
TEST_FILE=test_decoders.py MODEL_ID=ibm-granite/granite-3.2-8b-instruct BATCH_SIZE=8 SEQUENCE_LENGTH=64 USE_TINY_MODEL=1

1 similar comment
@JRosenkranz
Copy link
Contributor Author

bot:test
TEST_FILE=test_decoders.py MODEL_ID=ibm-granite/granite-3.2-8b-instruct BATCH_SIZE=8 SEQUENCE_LENGTH=64 USE_TINY_MODEL=1

@JRosenkranz JRosenkranz merged commit d3efe1b into main Apr 8, 2025
1 check passed
@JRosenkranz JRosenkranz deleted the multi_aiu branch April 8, 2025 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants