You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I expect the POD to start and function as normally; however I get the errors from the following logs relating to sharding.
2024-04-30T21:49:34.973998Z INFO text_generation_launcher: Args { model_id: "mistralai/Mixtral-8x7B-Instruct-v0.1", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_batch_size: None, enable_cuda_graphs: false, hostname: "mixtral-8x7b-instruct-tgi-pod-7964f65758-8rnq8", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, disable_grammar_support: false, env: false }
2024-04-30T21:49:34.974029Z INFO text_generation_launcher: Sharding model on 4 processes
2024-04-30T21:49:34.974127Z INFO download: text_generation_launcher: Starting download process.
2024-04-30T21:49:37.795374Z INFO text_generation_launcher: Download file: model-00001-of-00019.safetensors
2024-04-30T21:49:41.555240Z INFO text_generation_launcher: Downloaded /data/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/1e637f2d7cb0a9d6fb1922f305cb784995190a83/model-00001-of-00019.safetensors in 0:00:03.
2024-04-30T21:49:41.555266Z INFO text_generation_launcher: Download: [1/19] -- ETA: 0:00:54
2024-04-30T21:49:41.555484Z INFO text_generation_launcher: Download file: model-00002-of-00019.safetensors
2024-04-30T21:49:45.048615Z INFO text_generation_launcher: Downloaded /data/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/1e637f2d7cb0a9d6fb1922f305cb784995190a83/model-00002-of-00019.safetensors in 0:00:03.
2024-04-30T21:49:45.048643Z INFO text_generation_launcher: Download: [2/19] -- ETA: 0:00:59.500000
2024-04-30T21:49:45.048905Z INFO text_generation_launcher: Download file: model-00003-of-00019.safetensors
2024-04-30T21:49:48.531718Z INFO text_generation_launcher: Downloaded /data/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/1e637f2d7cb0a9d6fb1922f305cb784995190a83/model-00003-of-00019.safetensors in 0:00:03.
2024-04-30T21:49:48.531756Z INFO text_generation_launcher: Download: [3/19] -- ETA: 0:00:53.333328
2024-04-30T21:49:48.531984Z INFO text_generation_launcher: Download file: model-00004-of-00019.safetensors
2024-04-30T21:49:51.959123Z INFO text_generation_launcher: Downloaded /data/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/1e637f2d7cb0a9d6fb1922f305cb784995190a83/model-00004-of-00019.safetensors in 0:00:03.
2024-04-30T21:49:51.959158Z INFO text_generation_launcher: Download: [4/19] -- ETA: 0:00:52.500000
2024-04-30T21:49:51.959379Z INFO text_generation_launcher: Download file: model-00005-of-00019.safetensors
2024-04-30T21:49:55.427468Z INFO text_generation_launcher: Downloaded /data/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/1e637f2d7cb0a9d6fb1922f305cb784995190a83/model-00005-of-00019.safetensors in 0:00:03.
2024-04-30T21:49:55.427514Z INFO text_generation_launcher: Download: [5/19] -- ETA: 0:00:47.600000
2024-04-30T21:49:55.427938Z INFO text_generation_launcher: Download file: model-00006-of-00019.safetensors
2024-04-30T21:49:58.879313Z INFO text_generation_launcher: Downloaded /data/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/1e637f2d7cb0a9d6fb1922f305cb784995190a83/model-00006-of-00019.safetensors in 0:00:03.
2024-04-30T21:49:58.879350Z INFO text_generation_launcher: Download: [6/19] -- ETA: 0:00:45.500000
2024-04-30T21:49:58.879599Z INFO text_generation_launcher: Download file: model-00007-of-00019.safetensors
2024-04-30T21:50:02.250332Z INFO text_generation_launcher: Downloaded /data/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/1e637f2d7cb0a9d6fb1922f305cb784995190a83/model-00007-of-00019.safetensors in 0:00:03.
2024-04-30T21:50:02.250371Z INFO text_generation_launcher: Download: [7/19] -- ETA: 0:00:41.142852
2024-04-30T21:50:02.250629Z INFO text_generation_launcher: Download file: model-00008-of-00019.safetensors
2024-04-30T21:50:05.764569Z INFO text_generation_launcher: Downloaded /data/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/1e637f2d7cb0a9d6fb1922f305cb784995190a83/model-00008-of-00019.safetensors in 0:00:03.
2024-04-30T21:50:05.764759Z INFO text_generation_launcher: Download: [8/19] -- ETA: 0:00:37.125000
2024-04-30T21:50:05.765106Z INFO text_generation_launcher: Download file: model-00009-of-00019.safetensors
2024-04-30T21:50:09.301908Z INFO text_generation_launcher: Downloaded /data/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/1e637f2d7cb0a9d6fb1922f305cb784995190a83/model-00009-of-00019.safetensors in 0:00:03.
2024-04-30T21:50:09.301945Z INFO text_generation_launcher: Download: [9/19] -- ETA: 0:00:34.444440
2024-04-30T21:50:09.302168Z INFO text_generation_launcher: Download file: model-00010-of-00019.safetensors
2024-04-30T21:50:12.854604Z INFO text_generation_launcher: Downloaded /data/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/1e637f2d7cb0a9d6fb1922f305cb784995190a83/model-00010-of-00019.safetensors in 0:00:03.
2024-04-30T21:50:12.854643Z INFO text_generation_launcher: Download: [10/19] -- ETA: 0:00:31.500000
2024-04-30T21:50:12.854871Z INFO text_generation_launcher: Download file: model-00011-of-00019.safetensors
2024-04-30T21:50:16.339541Z INFO text_generation_launcher: Downloaded /data/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/1e637f2d7cb0a9d6fb1922f305cb784995190a83/model-00011-of-00019.safetensors in 0:00:03.
2024-04-30T21:50:16.339591Z INFO text_generation_launcher: Download: [11/19] -- ETA: 0:00:27.636360
2024-04-30T21:50:16.339897Z INFO text_generation_launcher: Download file: model-00012-of-00019.safetensors
2024-04-30T21:50:19.792196Z INFO text_generation_launcher: Downloaded /data/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/1e637f2d7cb0a9d6fb1922f305cb784995190a83/model-00012-of-00019.safetensors in 0:00:03.
2024-04-30T21:50:19.792224Z INFO text_generation_launcher: Download: [12/19] -- ETA: 0:00:23.916669
2024-04-30T21:50:19.792451Z INFO text_generation_launcher: Download file: model-00013-of-00019.safetensors
2024-04-30T21:50:23.298309Z INFO text_generation_launcher: Downloaded /data/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/1e637f2d7cb0a9d6fb1922f305cb784995190a83/model-00013-of-00019.safetensors in 0:00:03.
2024-04-30T21:50:23.298346Z INFO text_generation_launcher: Download: [13/19] -- ETA: 0:00:20.769228
2024-04-30T21:50:23.298576Z INFO text_generation_launcher: Download file: model-00014-of-00019.safetensors
2024-04-30T21:50:26.700623Z INFO text_generation_launcher: Downloaded /data/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/1e637f2d7cb0a9d6fb1922f305cb784995190a83/model-00014-of-00019.safetensors in 0:00:03.
2024-04-30T21:50:26.700662Z INFO text_generation_launcher: Download: [14/19] -- ETA: 0:00:17.142855
2024-04-30T21:50:26.700916Z INFO text_generation_launcher: Download file: model-00015-of-00019.safetensors
2024-04-30T21:50:30.180068Z INFO text_generation_launcher: Downloaded /data/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/1e637f2d7cb0a9d6fb1922f305cb784995190a83/model-00015-of-00019.safetensors in 0:00:03.
2024-04-30T21:50:30.180098Z INFO text_generation_launcher: Download: [15/19] -- ETA: 0:00:13.866668
2024-04-30T21:50:30.180312Z INFO text_generation_launcher: Download file: model-00016-of-00019.safetensors
2024-04-30T21:50:33.611677Z INFO text_generation_launcher: Downloaded /data/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/1e637f2d7cb0a9d6fb1922f305cb784995190a83/model-00016-of-00019.safetensors in 0:00:03.
2024-04-30T21:50:33.611716Z INFO text_generation_launcher: Download: [16/19] -- ETA: 0:00:10.312500
2024-04-30T21:50:33.611931Z INFO text_generation_launcher: Download file: model-00017-of-00019.safetensors
2024-04-30T21:50:37.105457Z INFO text_generation_launcher: Downloaded /data/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/1e637f2d7cb0a9d6fb1922f305cb784995190a83/model-00017-of-00019.safetensors in 0:00:03.
2024-04-30T21:50:37.105496Z INFO text_generation_launcher: Download: [17/19] -- ETA: 0:00:06.941176
2024-04-30T21:50:37.105780Z INFO text_generation_launcher: Download file: model-00018-of-00019.safetensors
2024-04-30T21:50:40.633152Z INFO text_generation_launcher: Downloaded /data/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/1e637f2d7cb0a9d6fb1922f305cb784995190a83/model-00018-of-00019.safetensors in 0:00:03.
2024-04-30T21:50:40.633188Z INFO text_generation_launcher: Download: [18/19] -- ETA: 0:00:03.444444
2024-04-30T21:50:40.633403Z INFO text_generation_launcher: Download file: model-00019-of-00019.safetensors
2024-04-30T21:50:43.778143Z INFO text_generation_launcher: Downloaded /data/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/1e637f2d7cb0a9d6fb1922f305cb784995190a83/model-00019-of-00019.safetensors in 0:00:03.
2024-04-30T21:50:43.778166Z INFO text_generation_launcher: Download: [19/19] -- ETA: 0
2024-04-30T21:50:44.326173Z INFO download: text_generation_launcher: Successfully downloaded weights.
2024-04-30T21:50:44.326514Z INFO shard-manager: text_generation_launcher: Starting shard rank=1
2024-04-30T21:50:44.326528Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-04-30T21:50:44.327214Z INFO shard-manager: text_generation_launcher: Starting shard rank=3
2024-04-30T21:50:44.327232Z INFO shard-manager: text_generation_launcher: Starting shard rank=2
2024-04-30T21:50:54.337513Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-04-30T21:50:54.337513Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-04-30T21:50:54.337822Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-04-30T21:50:54.338759Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-04-30T21:51:04.347490Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-04-30T21:51:04.349499Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-04-30T21:51:04.350290Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-04-30T21:51:04.350314Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-04-30T21:51:14.359111Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-04-30T21:51:14.359951Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-04-30T21:51:14.360183Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-04-30T21:51:14.360238Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-04-30T21:51:24.368882Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-04-30T21:51:24.370240Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-04-30T21:51:24.371063Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-04-30T21:51:24.371203Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-04-30T21:51:34.379157Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-04-30T21:51:34.380352Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-04-30T21:51:34.381818Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-04-30T21:51:34.381835Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-04-30T21:51:44.390388Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-04-30T21:51:44.392374Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-04-30T21:51:44.392416Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-04-30T21:51:44.398571Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-04-30T21:51:54.399991Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-04-30T21:51:54.401938Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-04-30T21:51:54.402613Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-04-30T21:51:54.406641Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-04-30T21:51:56.606323Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
[E ProcessGroupNCCL.cpp:475] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, NumelIn=1, NumelOut=1, Timeout(ms)=60000) ran for 60093 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:489] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[E ProcessGroupNCCL.cpp:495] To avoid data inconsistency, we are taking the entire process down.
[E ProcessGroupNCCL.cpp:916] [Rank 0] NCCL watchdog thread terminated with exception: [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, NumelIn=1, NumelOut=1, Timeout(ms)=60000) ran for 60093 milliseconds before timing out.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 0] NCCL watchdog thread terminated with exception: [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, NumelIn=1, NumelOut=1, Timeout(ms)=60000) ran for 60093 milliseconds before timing out. rank=0
2024-04-30T21:51:56.606349Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 6 rank=0
2024-04-30T21:51:56.680206Z ERROR text_generation_launcher: Shard 0 failed to start
2024-04-30T21:51:56.680231Z INFO text_generation_launcher: Shutting down shards
2024-04-30T21:51:56.706393Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
[E ProcessGroupNCCL.cpp:475] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, NumelIn=1, NumelOut=1, Timeout(ms)=60000) ran for 60090 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:489] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[E ProcessGroupNCCL.cpp:495] To avoid data inconsistency, we are taking the entire process down.
[E ProcessGroupNCCL.cpp:916] [Rank 1] NCCL watchdog thread terminated with exception: [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, NumelIn=1, NumelOut=1, Timeout(ms)=60000) ran for 60090 milliseconds before timing out.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 1] NCCL watchdog thread terminated with exception: [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, NumelIn=1, NumelOut=1, Timeout(ms)=60000) ran for 60090 milliseconds before timing out. rank=1
2024-04-30T21:51:56.706418Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 6 rank=1
2024-04-30T21:51:56.771324Z INFO shard-manager: text_generation_launcher: Shard terminated rank=2
2024-04-30T21:51:56.859632Z INFO shard-manager: text_generation_launcher: Shard terminated rank=3
Error: ShardCannotStart
Update 1
I was able to successfully start Mistral-7B-Instruct-v0.2 with 2 GPUs.
I am still unable to start Mixtral-8x7B-Instruct-v0.1 with 2 or 4 GPUs (which should be enough memory)
Could the preset timeout just not be long enough to actually load the model in time?
Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, NumelIn=1, NumelOut=1, Timeout(ms)=60000) ran for 60090 milliseconds before timing out.
Update 2
When running the pod, I captured the nvidia-smi a couple seconds after its finished downloading the model and begins sharding.
oc rsh pod/mixtral-8x7b-instruct-tgi-pod-58f8cb9d6c-tjd44
# nvidia-smi
Wed May 1 01:11:06 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA L40S On | 00000000:01:00.0 Off | 0 |
| N/A 39C P0 96W / 350W | 2930MiB / 46068MiB | 2% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA L40S On | 00000000:21:00.0 Off | 0 |
| N/A 40C P0 100W / 350W | 644MiB / 46068MiB | 100% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA L40S On | 00000000:41:00.0 Off | 0 |
| N/A 39C P0 98W / 350W | 644MiB / 46068MiB | 100% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA L40S On | 00000000:E1:00.0 Off | 0 |
| N/A 34C P0 98W / 350W | 604MiB / 46068MiB | 100% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
It shows GPU util at 100% for 1,2, and 3 but the memory is ~600MiB / 46068MiB
Is that normal?
Update 3
Results are not deterministic
Have restarting pod, the nvidia-smi when sharding
Wed May 1 01:21:08 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA L40S On | 00000000:01:00.0 Off | 0 |
| N/A 51C P0 103W / 350W | 11990MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA L40S On | 00000000:21:00.0 Off | 0 |
| N/A 52C P0 108W / 350W | 11988MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA L40S On | 00000000:41:00.0 Off | 0 |
| N/A 50C P0 103W / 350W | 11998MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA L40S On | 00000000:E1:00.0 Off | 0 |
| N/A 43C P0 101W / 350W | 11948MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Output log after downloading weights
2024-05-01T01:16:02.546558Z INFO download: text_generation_launcher: Successfully downloaded weights.
2024-05-01T01:16:02.546868Z INFO shard-manager: text_generation_launcher: Starting shard rank=1
2024-05-01T01:16:02.546868Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-05-01T01:16:02.546916Z INFO shard-manager: text_generation_launcher: Starting shard rank=2
2024-05-01T01:16:02.547414Z INFO shard-manager: text_generation_launcher: Starting shard rank=3
2024-05-01T01:16:12.558482Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:16:12.558482Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:16:12.558835Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-05-01T01:16:12.562667Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:16:22.570586Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-05-01T01:16:22.571273Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:16:22.572139Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:16:22.576291Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:16:32.580323Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-05-01T01:16:32.588676Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:16:32.589577Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:16:32.596987Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:16:42.598549Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:16:42.601664Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-05-01T01:16:42.606765Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:16:42.607433Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:16:52.608400Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:16:52.611841Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-05-01T01:16:52.615918Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:16:52.616348Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:17:02.618817Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:17:02.619736Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-05-01T01:17:02.625374Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:17:02.625374Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:17:12.629043Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:17:12.634459Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:17:12.634665Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:17:12.638963Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-05-01T01:17:22.639083Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:17:22.646613Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:17:22.647761Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-05-01T01:17:22.695362Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:17:32.649904Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:17:32.656320Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:17:32.661289Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-05-01T01:17:32.706820Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:17:42.659832Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:17:42.665933Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:17:42.670657Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-05-01T01:17:42.715814Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:17:52.670420Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:17:52.676094Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:17:52.689615Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-05-01T01:17:52.725204Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:18:02.680641Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:18:02.686062Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:18:02.699529Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-05-01T01:18:02.736917Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:18:12.695936Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:18:12.710258Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-05-01T01:18:12.714607Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:18:12.746124Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:18:22.705680Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:18:22.722692Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-05-01T01:18:22.731135Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:18:22.754199Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:18:32.714758Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:18:32.737159Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-05-01T01:18:32.741827Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:18:32.762410Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:18:42.724454Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:18:42.747397Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-05-01T01:18:42.759961Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:18:42.770549Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:18:52.734906Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:18:52.762690Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-05-01T01:18:52.769282Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:18:52.778419Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:19:02.745332Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:19:02.775258Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-05-01T01:19:02.778572Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:19:02.787387Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:19:11.559246Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2024-05-01T01:19:11.585582Z INFO shard-manager: text_generation_launcher: Shard ready in 189.037403751s rank=0
2024-05-01T01:19:12.755618Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:19:12.787764Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:19:12.794953Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:19:22.767049Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:19:22.797580Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:19:22.802955Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:19:32.778261Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:19:32.807908Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:19:32.810951Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:19:42.789478Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:19:42.818392Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:19:42.818392Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:19:52.800101Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:19:52.829018Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:19:52.829018Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:20:02.810321Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:20:02.839367Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:20:02.839367Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T01:20:09.900492Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-2
2024-05-01T01:20:09.947042Z INFO shard-manager: text_generation_launcher: Shard ready in 247.398441041s rank=2
2024-05-01T01:20:12.820804Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:20:12.850004Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:20:22.831289Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:20:22.860477Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:20:32.841828Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:20:32.871100Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:20:42.852185Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:20:42.881966Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:20:52.863650Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:20:52.892790Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:21:02.874944Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T01:21:02.903852Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T01:21:12.187515Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
[E ProcessGroupNCCL.cpp:475] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=2, OpType=ALLREDUCE, NumelIn=1, NumelOut=1, Timeout(ms)=60000) ran for 60911 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:489] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[E ProcessGroupNCCL.cpp:495] To avoid data inconsistency, we are taking the entire process down.
[E ProcessGroupNCCL.cpp:916] [Rank 3] NCCL watchdog thread terminated with exception: [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=2, OpType=ALLREDUCE, NumelIn=1, NumelOut=1, Timeout(ms)=60000) ran for 60911 milliseconds before timing out.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 3] NCCL watchdog thread terminated with exception: [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=2, OpType=ALLREDUCE, NumelIn=1, NumelOut=1, Timeout(ms)=60000) ran for 60911 milliseconds before timing out. rank=3
2024-05-01T01:21:12.187552Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 6 rank=3
2024-05-01T01:21:12.211103Z ERROR text_generation_launcher: Shard 3 failed to start
2024-05-01T01:21:12.211121Z INFO text_generation_launcher: Shutting down shards
2024-05-01T01:21:12.520261Z INFO shard-manager: text_generation_launcher: Shard terminated rank=0
2024-05-01T01:21:12.606231Z INFO shard-manager: text_generation_launcher: Shard terminated rank=2
2024-05-01T01:21:13.226315Z INFO shard-manager: text_generation_launcher: Shard terminated rank=1
Update 4
Success (but mostly failure)? For starters I updated the image referenced in the documentation from ghcr.io/huggingface/text-generation-inference:1.4 to ghcr.io/huggingface/text-generation-inference:2.0.1 and also added QUANTIZE = eetq and it worked but then didnt work ....
After switching and succeeding, I decided to remove the quantization to test the base model weights and it failed Then when I switch back to using QUANTIZE = eetq, it fails to start like before.. And I am back to square one. Don't know if it was just a fluke but definitely it did occur.
One thing I did notice with the failure, is that the memory seems to be unloading unevenly with one GPU having a lot more than others. In the success case the memory seemed to load uniformly.
#Failure
Wed May 1 03:27:27 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA L40S On | 00000000:01:00.0 Off | 0 |
| N/A 45C P0 100W / 350W | 8978MiB / 46068MiB | 3% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA L40S On | 00000000:21:00.0 Off | 0 |
| N/A 46C P0 103W / 350W | 780MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA L40S On | 00000000:C1:00.0 Off | 0 |
| N/A 40C P0 94W / 350W | 732MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA L40S On | 00000000:E1:00.0 Off | 0 |
| N/A 38C P0 98W / 350W | 768MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
I tried deleting the deployment from OpenShift and reinitializing; however, I got the same results.
After that I tried using a whole new image ghcr.io/huggingface/text-generation-inference:2.0. with QUANTIZE = eetq to see if first time image creation had something to do with it and .... it failed.
It actually froze the nvidia memory
Wed May 1 03:37:22 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA L40S On | 00000000:01:00.0 Off | 0 |
| N/A 41C P0 96W / 350W | 636MiB / 46068MiB | 100% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA L40S On | 00000000:41:00.0 Off | 0 |
| N/A 41C P0 98W / 350W | 644MiB / 46068MiB | 100% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA L40S On | 00000000:81:00.0 Off | 0 |
| N/A 39C P0 95W / 350W | 644MiB / 46068MiB | 100% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA L40S On | 00000000:A1:00.0 Off | 0 |
| N/A 37C P0 95W / 350W | 604MiB / 46068MiB | 100% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
and here are the output logs
2024-05-01T03:37:21.948805Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T03:37:31.954374Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-05-01T03:37:31.954762Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-05-01T03:37:31.955590Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-05-01T03:37:31.958446Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-05-01T03:37:34.959458Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
[E ProcessGroupNCCL.cpp:475] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, NumelIn=1, NumelOut=1, Timeout(ms)=60000) ran for 60892 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:489] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[E ProcessGroupNCCL.cpp:495] To avoid data inconsistency, we are taking the entire process down.
[E ProcessGroupNCCL.cpp:916] [Rank 3] NCCL watchdog thread terminated with exception: [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, NumelIn=1, NumelOut=1, Timeout(ms)=60000) ran for 60892 milliseconds before timing out.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 3] NCCL watchdog thread terminated with exception: [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, NumelIn=1, NumelOut=1, Timeout(ms)=60000) ran for 60892 milliseconds before timing out. rank=3
2024-05-01T03:37:34.959497Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 6 rank=3
2024-05-01T03:37:35.023185Z ERROR text_generation_launcher: Shard 3 failed to start
2024-05-01T03:37:35.023227Z INFO text_generation_launcher: Shutting down shards
2024-05-01T03:37:35.140343Z INFO shard-manager: text_generation_launcher: Shard terminated rank=2
2024-05-01T03:37:35.176734Z INFO shard-manager: text_generation_launcher: Shard terminated rank=0
2024-05-01T03:37:35.249934Z INFO shard-manager: text_generation_launcher: Shard terminated rank=1
Update 5
Realized the yaml had the incorrect mount path that should be /data not /root/.cache/huggingface
However this did not fix the issue but only made startup faster (as now its not downloading the model weights each time the pod restarts)
The text was updated successfully, but these errors were encountered:
System Info
Platform: OpenShift
Nvidia GPU Operator already installed
Image: ghcr.io/huggingface/text-generation-inference:1.4
Device: L40s
Information
Tasks
Reproduction
Created the following Pod inside a K8s cluster that already has a GPU operator running in it. There are 8 L40s GPU avail.
Expected behavior
I expect the POD to start and function as normally; however I get the errors from the following logs relating to sharding.
Update 1
I was able to successfully start Mistral-7B-Instruct-v0.2 with 2 GPUs.
I am still unable to start Mixtral-8x7B-Instruct-v0.1 with 2 or 4 GPUs (which should be enough memory)
Could the preset timeout just not be long enough to actually load the model in time?
Update 2
When running the pod, I captured the nvidia-smi a couple seconds after its finished downloading the model and begins sharding.
It shows GPU util at 100% for 1,2, and 3 but the memory is ~600MiB / 46068MiB
Is that normal?
Update 3
Results are not deterministic
Have restarting pod, the nvidia-smi when sharding
Output log after downloading weights
Update 4
Success (but mostly failure)? For starters I updated the image referenced in the documentation from ghcr.io/huggingface/text-generation-inference:1.4 to ghcr.io/huggingface/text-generation-inference:2.0.1 and also added QUANTIZE = eetq and it worked but then didnt work ....
After switching and succeeding, I decided to remove the quantization to test the base model weights and it failed Then when I switch back to using QUANTIZE = eetq, it fails to start like before.. And I am back to square one. Don't know if it was just a fluke but definitely it did occur.
One thing I did notice with the failure, is that the memory seems to be unloading unevenly with one GPU having a lot more than others. In the success case the memory seemed to load uniformly.
I tried deleting the deployment from OpenShift and reinitializing; however, I got the same results.
After that I tried using a whole new image ghcr.io/huggingface/text-generation-inference:2.0. with QUANTIZE = eetq to see if first time image creation had something to do with it and .... it failed.
It actually froze the nvidia memory
and here are the output logs
Update 5
Realized the yaml had the incorrect mount path that should be
/data
not/root/.cache/huggingface
However this did not fix the issue but only made startup faster (as now its not downloading the model weights each time the pod restarts)
The text was updated successfully, but these errors were encountered: