-
Notifications
You must be signed in to change notification settings - Fork 42
skipping CUDA graph capture #38
Copy link
Copy link
Open
Description
Hello, thanks for your amazing work! While I was evaluating with this framework, I found some problems as below, with following configuations
- 1 A100 (80GB)
- LLaDA-8B
- Evaluating with GSM8K, benchmarking
0%| | 0/1319 [00:00<?, ?it/s]
14%|█▍ | 185/1319 [00:00<00:00, 1840.44it/s]
28%|██▊ | 370/1319 [00:00<00:00, 1829.80it/s]
42%|████▏ | 555/1319 [00:00<00:00, 1835.87it/s]
56%|█████▌ | 739/1319 [00:00<00:00, 726.18it/s]
70%|██████▉ | 923/1319 [00:00<00:00, 928.34it/s]
84%|████████▍ | 1108/1319 [00:00<00:00, 1117.84it/s]
98%|█████████▊| 1293/1319 [00:01<00:00, 1283.09it/s]
100%|██████████| 1319/1319 [00:01<00:00, 1191.79it/s]
[1088, 1056, 1152, 1120, 1184, 1216]
0%| | 0/6 [00:00<?, ?it/s]skipping cudagraphs due to skipping cudagraphs due to cpu device (arg3_1)
skipping cudagraphs due to skipping cudagraphs due to cpu device (arg7_1)
Still the torch.compile is applied, CUDA graph capture is skipped.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels