Caikit/TGIS swallows model loading running out of memory #92

kpouget · 2023-09-26T13:15:30Z

When trying to load a model in a Pod running with a memory limit too low, the out-of-memory error message is swallowed by TGIS and hard to troubleshoot (in addition to Caikit swallowing the TGIS error):

2023-09-26T09:40:45.259993Z  INFO text_generation_launcher: Starting shard 0
Shard 0: supports_causal_lm = False, supports_seq2seq_lm = True
2023-09-26T09:40:55.279072Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-09-26T09:40:57.571196Z ERROR text_generation_launcher: Shard 0 failed to start:

2023-09-26T09:40:57.571219Z  INFO text_generation_launcher: Shutting down shards
{"channel": "TGISPROC", "exception": null, "level": "error", "log_code": "<MTS11752287E>", "message": "exception raised: RuntimeError('TGIS failed to boot up with the model. See logs for details')", "num_indent": 0, "thread_id": 140590947739392, "timestamp": "2023-09-26T09:40:59.288074"}

while troubleshooting it, I observed that even TGIS return code does not refect the OOM error, although my attemps confirmed that not giving enough memory was the cause of the load failure:

sh-4.4$ text-generation-launcher --num-shard 1 --model-name /mnt/models/flan-t5-large/artifacts/ --port 3000;
2023-09-26T11:42:33.150862Z  INFO text_generation_launcher: Launcher args: Args { model_name: "/mnt/models/flan-t5-large/artifacts/", revision: None, deployment_framework: "hf_transformers", dtype: None, dtype_str: Some("float16"), num_shard: Some(1), max_concurrent_requests: 150, max_sequence_length: 4096, max_new_tokens: 1024, max_batch_size: 256, max_batch_weight: Some(47458400), max_prefill_weight: None, max_waiting_tokens: 24, port: 3000, grpc_port: 8033, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, json_output: false, tls_cert_path: None, tls_key_path: None, tls_client_ca_cert_path: None, output_special_tokens: false, cuda_process_memory_fraction: 1.0 }
2023-09-26T11:42:33.151097Z  INFO text_generation_launcher: Starting shard 0
Shard 0: supports_causal_lm = False, supports_seq2seq_lm = True
2023-09-26T11:42:43.180572Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-09-26T11:42:50.384697Z ERROR text_generation_launcher: Shard 0 failed to start:

2023-09-26T11:42:50.384723Z  INFO text_generation_launcher: Shutting down shards
sh-4.4$ echo $?
1

The text was updated successfully, but these errors were encountered:

Xaenalt · 2023-10-10T14:28:49Z

This likely should go against caikit/caikit-nlp

Xaenalt · 2023-10-10T14:29:05Z

Also we'll get separated logs once the container split happens (this sprint)

danielezonca · 2023-10-10T14:41:33Z

This is the ticket for reference :)

[STORY] Split caikit/TGIS ServingRuntime into three distinct images caikit#11

heyselbi · 2023-11-16T14:58:40Z

@kpouget could you share an update on this once you try it with the new SR with split images of Caikit and TGIS?

kpouget · 2023-11-16T16:12:02Z

@heyselbi , it didn't change AFAICT:

NAME                                                       READY   STATUS    RESTARTS   AGE
gpt-neox-20b-predictor-00001-deployment-79c9c4d7b8-tzc6s   4/4     Running   0          51m

  modelStatus:
    copies:
      failedCopies: 0
      totalCopies: 1
    states:
      activeModelState: Loaded
      targetModelState: Loaded
    transitionStatus: UpToDate

but

$ grpcurl    -insecure    -d "$GRPCURL_DATA"    -H "mm-model-id: gpt-neox-20b"    gpt-neox-20b-predictor-memory.apps.psap-watsonx-dgxa100.perf.lab.eng.bos.redhat.com:443    caikit.runtime.Nlp.NlpService/TextGenerationTaskPredict
ERROR:
  Code: Unknown
  Message: Request failed during generation: Unexpected <class 'torch.cuda.OutOfMemoryError'>: CUDA out of memory. 
Tried to allocate 14.00 MiB. GPU 0 has a total capacty of 39.39 GiB of which 5.94 MiB is free. Process 3883975 has 39.38 GiB 
memory in use. Of the allocated memory 38.78 GiB is allocated by PyTorch, and 100.96 MiB is reserved by PyTorch but 
unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See 
documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Xaenalt · 2023-11-16T18:39:39Z

Container didn't crash when it got an OOM error?

Xaenalt · 2023-11-16T18:40:30Z

Oh, if it's just GPU memory, it probably won't, but it probably should... Hmmm... I'd say this is probably covered at least on startup by the upcoming readiness probe

Xaenalt · 2023-12-05T19:16:35Z

Will be resolved by #156

dtrifiro · 2024-03-08T09:46:43Z

TGIS now lives in a separate container, and following its logs should show the OOM errors.

For proper liveness/readiness probes for the tgis container in the caikiit+tgis setup, we'll have to wait for knative/serving#14853.

heyselbi added the kind/bug Something isn't working label Sep 26, 2023

kpouget changed the title ~~Caikit/TGIS swallow model loading running out of memory~~ Caikit/TGIS swallows model loading running out of memory Oct 9, 2023

dtrifiro closed this as completed Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caikit/TGIS swallows model loading running out of memory #92

Caikit/TGIS swallows model loading running out of memory #92

kpouget commented Sep 26, 2023

Xaenalt commented Oct 10, 2023

Xaenalt commented Oct 10, 2023

danielezonca commented Oct 10, 2023

heyselbi commented Nov 16, 2023

kpouget commented Nov 16, 2023

Xaenalt commented Nov 16, 2023

Xaenalt commented Nov 16, 2023

Xaenalt commented Dec 5, 2023 •

edited

dtrifiro commented Mar 8, 2024

Caikit/TGIS swallows model loading running out of memory #92

Caikit/TGIS swallows model loading running out of memory #92

Comments

kpouget commented Sep 26, 2023

Xaenalt commented Oct 10, 2023

Xaenalt commented Oct 10, 2023

danielezonca commented Oct 10, 2023

heyselbi commented Nov 16, 2023

kpouget commented Nov 16, 2023

Xaenalt commented Nov 16, 2023

Xaenalt commented Nov 16, 2023

Xaenalt commented Dec 5, 2023 • edited

dtrifiro commented Mar 8, 2024

Xaenalt commented Dec 5, 2023 •

edited