Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

issue with run locally #61

Open
omlomloml opened this issue Oct 2, 2023 · 1 comment
Open

issue with run locally #61

omlomloml opened this issue Oct 2, 2023 · 1 comment

Comments

@omlomloml
Copy link

omlomloml commented Oct 2, 2023

I try to run inside the latest image, but after the model warmup, it just died with no error.
I was trying to run this
aviary run --model ~/models/continuous_batching/mosaicml--mpt-7b-chat.yaml
the only change inside the yaml is to remove
ray_actor_options:
num_gpus: 1
since I don't have 'accelerator_type_a10', I have a6000
here is the last of the logs

ve taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
(AviaryTGIInferenceWorker:mosaicml/mpt-7b-chat pid=31233) Downloaded /home/ray/data/hub/models--mosaicml--mpt-7b-chat/snapshots/64e5c9c9fb53a8e89690c2dee75a5add37f7113e/pytorch_model-00001-of-00002.bin in 0:02:35.
(AviaryTGIInferenceWorker:mosaicml/mpt-7b-chat pid=31233) Download: [1/2] -- ETA: 0:02:35
(AviaryTGIInferenceWorker:mosaicml/mpt-7b-chat pid=31233) Download file: pytorch_model-00002-of-00002.bin
(ServeController pid=30116) WARNING 2023-10-02 06:40:38,770 controller 30116 deployment_state.py:2006 - Deployment 'mosaicml--mpt-7b-chat' in application 'mosaicml--mpt-7b-chat' has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
(ServeController pid=30116) WARNING 2023-10-02 06:41:08,775 controller 30116 deployment_state.py:2006 - Deployment 'mosaicml--mpt-7b-chat' in application 'mosaicml--mpt-7b-chat' has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
(AviaryTGIInferenceWorker:mosaicml/mpt-7b-chat pid=31233) Downloaded /home/ray/data/hub/models--mosaicml--mpt-7b-chat/snapshots/64e5c9c9fb53a8e89690c2dee75a5add37f7113e/pytorch_model-00002-of-00002.bin in 0:00:58.
(AviaryTGIInferenceWorker:mosaicml/mpt-7b-chat pid=31233) Download: [2/2] -- ETA: 0
(AviaryTGIInferenceWorker:mosaicml/mpt-7b-chat pid=31233) No safetensors weights found for model mosaicml/mpt-7b-chat at revision None. Converting PyTorch weights to safetensors.
(ServeController pid=30116) WARNING 2023-10-02 06:41:38,862 controller 30116 deployment_state.py:2006 - Deployment 'mosaicml--mpt-7b-chat' in application 'mosaicml--mpt-7b-chat' has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
(AviaryTGIInferenceWorker:mosaicml/mpt-7b-chat pid=31233) Convert: [1/2] -- Took: 0:00:20.415345
(AviaryTGIInferenceWorker:mosaicml/mpt-7b-chat pid=31233) Convert: [2/2] -- Took: 0:00:06.243851
(ServeReplica:mosaicml--mpt-7b-chat:mosaicml--mpt-7b-chat pid=30186) [INFO 2023-10-02 06:42:05,045] tgi.py: 214  Warming up model on workers...
(AviaryTGIInferenceWorker:mosaicml/mpt-7b-chat pid=31233) [INFO 2023-10-02 06:42:05,054] tgi_worker.py: 650  Model is warming up. Num requests: 3 Prefill tokens: 6000 Max batch total tokens: None
(AviaryTGIInferenceWorker:mosaicml/mpt-7b-chat pid=31233) [INFO 2023-10-02 06:42:07,307] tgi_worker.py: 663  Model finished warming up (max_batch_total_tokens=None) and is ready to serve requests.
(ServeReplica:mosaicml--mpt-7b-chat:mosaicml--mpt-7b-chat pid=30186) [INFO 2023-10-02 06:42:07,520] tgi.py: 170  Rolling over to new worker group [Actor(AviaryTGIInferenceWorker, 725292a8070301f947130c2c01000000)]
(ServeReplica:mosaicml--mpt-7b-chat:mosaicml--mpt-7b-chat pid=30186) [INFO 2023-10-02 06:42:07,661] model_app.py: 83  Reconfigured and ready to serve.
(ServeReplica:mosaicml--mpt-7b-chat:mosaicml--mpt-7b-chat pid=30186) DeprecationWarning: `ray.state.actors` is a private attribute and access will be removed in a future Ray version.
/home/ray/anaconda3/lib/python3.9/tempfile.py:821: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmptyp67o3t'>
  _warnings.warn(warn_message, ResourceWarning)
/home/ray/anaconda3/lib/python3.9/subprocess.py:1052: ResourceWarning: subprocess 28960 is still running
  _warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback

(base) ray@4cd79d6dad32:~$

@nobody4t
Copy link

nobody4t commented Oct 9, 2023

Do you have a process still running at background?

ResourceWarning: subprocess 28960 is still running

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants