Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caikit/TGIS swallows model loading running out of memory #92

Closed
kpouget opened this issue Sep 26, 2023 · 9 comments
Closed

Caikit/TGIS swallows model loading running out of memory #92

kpouget opened this issue Sep 26, 2023 · 9 comments
Labels
kind/bug Something isn't working

Comments

@kpouget
Copy link

kpouget commented Sep 26, 2023

When trying to load a model in a Pod running with a memory limit too low, the out-of-memory error message is swallowed by TGIS and hard to troubleshoot (in addition to Caikit swallowing the TGIS error):

2023-09-26T09:40:45.259993Z  INFO text_generation_launcher: Starting shard 0
Shard 0: supports_causal_lm = False, supports_seq2seq_lm = True
2023-09-26T09:40:55.279072Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-09-26T09:40:57.571196Z ERROR text_generation_launcher: Shard 0 failed to start:

2023-09-26T09:40:57.571219Z  INFO text_generation_launcher: Shutting down shards
{"channel": "TGISPROC", "exception": null, "level": "error", "log_code": "<MTS11752287E>", "message": "exception raised: RuntimeError('TGIS failed to boot up with the model. See logs for details')", "num_indent": 0, "thread_id": 140590947739392, "timestamp": "2023-09-26T09:40:59.288074"}

while troubleshooting it, I observed that even TGIS return code does not refect the OOM error, although my attemps confirmed that not giving enough memory was the cause of the load failure:

sh-4.4$ text-generation-launcher --num-shard 1 --model-name /mnt/models/flan-t5-large/artifacts/ --port 3000;
2023-09-26T11:42:33.150862Z  INFO text_generation_launcher: Launcher args: Args { model_name: "/mnt/models/flan-t5-large/artifacts/", revision: None, deployment_framework: "hf_transformers", dtype: None, dtype_str: Some("float16"), num_shard: Some(1), max_concurrent_requests: 150, max_sequence_length: 4096, max_new_tokens: 1024, max_batch_size: 256, max_batch_weight: Some(47458400), max_prefill_weight: None, max_waiting_tokens: 24, port: 3000, grpc_port: 8033, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, json_output: false, tls_cert_path: None, tls_key_path: None, tls_client_ca_cert_path: None, output_special_tokens: false, cuda_process_memory_fraction: 1.0 }
2023-09-26T11:42:33.151097Z  INFO text_generation_launcher: Starting shard 0
Shard 0: supports_causal_lm = False, supports_seq2seq_lm = True
2023-09-26T11:42:43.180572Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-09-26T11:42:50.384697Z ERROR text_generation_launcher: Shard 0 failed to start:

2023-09-26T11:42:50.384723Z  INFO text_generation_launcher: Shutting down shards
sh-4.4$ echo $?
1
@heyselbi heyselbi added the kind/bug Something isn't working label Sep 26, 2023
@kpouget kpouget changed the title Caikit/TGIS swallow model loading running out of memory Caikit/TGIS swallows model loading running out of memory Oct 9, 2023
@Xaenalt
Copy link
Contributor

Xaenalt commented Oct 10, 2023

This likely should go against caikit/caikit-nlp

@Xaenalt
Copy link
Contributor

Xaenalt commented Oct 10, 2023

Also we'll get separated logs once the container split happens (this sprint)

@danielezonca
Copy link
Contributor

This is the ticket for reference :)

@heyselbi
Copy link
Contributor

@kpouget could you share an update on this once you try it with the new SR with split images of Caikit and TGIS?

@kpouget
Copy link
Author

kpouget commented Nov 16, 2023

@heyselbi , it didn't change AFAICT:

NAME                                                       READY   STATUS    RESTARTS   AGE
gpt-neox-20b-predictor-00001-deployment-79c9c4d7b8-tzc6s   4/4     Running   0          51m
  modelStatus:
    copies:
      failedCopies: 0
      totalCopies: 1
    states:
      activeModelState: Loaded
      targetModelState: Loaded
    transitionStatus: UpToDate

but

$ grpcurl    -insecure    -d "$GRPCURL_DATA"    -H "mm-model-id: gpt-neox-20b"    gpt-neox-20b-predictor-memory.apps.psap-watsonx-dgxa100.perf.lab.eng.bos.redhat.com:443    caikit.runtime.Nlp.NlpService/TextGenerationTaskPredict
ERROR:
  Code: Unknown
  Message: Request failed during generation: Unexpected <class 'torch.cuda.OutOfMemoryError'>: CUDA out of memory. 
Tried to allocate 14.00 MiB. GPU 0 has a total capacty of 39.39 GiB of which 5.94 MiB is free. Process 3883975 has 39.38 GiB 
memory in use. Of the allocated memory 38.78 GiB is allocated by PyTorch, and 100.96 MiB is reserved by PyTorch but 
unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See 
documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@Xaenalt
Copy link
Contributor

Xaenalt commented Nov 16, 2023

Container didn't crash when it got an OOM error?

@Xaenalt
Copy link
Contributor

Xaenalt commented Nov 16, 2023

Oh, if it's just GPU memory, it probably won't, but it probably should... Hmmm... I'd say this is probably covered at least on startup by the upcoming readiness probe

@Xaenalt
Copy link
Contributor

Xaenalt commented Dec 5, 2023

Will be resolved by #156

@dtrifiro
Copy link
Contributor

dtrifiro commented Mar 8, 2024

TGIS now lives in a separate container, and following its logs should show the OOM errors.

For proper liveness/readiness probes for the tgis container in the caikiit+tgis setup, we'll have to wait for knative/serving#14853.

@dtrifiro dtrifiro closed this as completed Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
Status: Done
Status: No status
Status: Done
Development

No branches or pull requests

5 participants