Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When running mii.serv, it keeps print waiting for server to start. #361

Closed
cninnovationai opened this issue Dec 24, 2023 · 6 comments
Closed
Assignees

Comments

@cninnovationai
Copy link

My OS is ubuntu 22.04

(base) bruce@bruce:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
import mii
client = mii.serve(
    model_name_or_path= "/data/python_workspace/text-generation-webui/models/neural-chat-7b-v3-1/",
    deployment_name="neural-chat",
    enable_restful_api=True,
    restful_api_port=5000,
)

run the code above, it keeps print waiting for server to start, but there is no any ERROR, and when run watch -n 1 nvidia-smi , GPU Memory Usage is zero.

image

(deepspeed-mii) bruce@bruce:/data/python_workspace/DeepSpeed-MII$ python serve.py 
[2023-12-24 09:02:59,684] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-24 09:03:04,964] [INFO] [server.py:38:__init__] Hostfile /job/hostfile not found, creating hostfile.
[2023-12-24 09:03:04,964] [INFO] [server.py:38:__init__] Hostfile /job/hostfile not found, creating hostfile.
[2023-12-24 09:03:04,984] [INFO] [server.py:107:_launch_server_process] msg_server launch: ['deepspeed', '-i', 'localhost:0', '--master_port', '29500', '--master_addr', 'localhost', '--no_ssh_check', '--no_local_rank', '--no_python', '/data/python_workspace/anaconda3/envs/deepspeed-mii/bin/python', '-m', 'mii.launch.multi_gpu_server', '--deployment-name', 'neural-chat', '--load-balancer-port', '50050', '--restful-gateway-port', '5000', '--restful-gateway-procs', '32', '--server-port', '50051', '--zmq-port', '25555', '--model-config', 'eyJtb2RlbF9uYW1lX29yX3BhdGgiOiAiL2RhdGEvcHl0aG9uX3dvcmtzcGFjZS90ZXh0LWdlbmVyYXRpb24td2VidWkvbW9kZWxzL25ldXJhbC1jaGF0LTdiLXYzLTEvIiwgInRva2VuaXplciI6ICIvZGF0YS9weXRob25fd29ya3NwYWNlL3RleHQtZ2VuZXJhdGlvbi13ZWJ1aS9tb2RlbHMvbmV1cmFsLWNoYXQtN2ItdjMtMS8iLCAidGFzayI6ICJ0ZXh0LWdlbmVyYXRpb24iLCAidGVuc29yX3BhcmFsbGVsIjogMSwgImluZmVyZW5jZV9lbmdpbmVfY29uZmlnIjogeyJ0ZW5zb3JfcGFyYWxsZWwiOiB7InRwX3NpemUiOiAxfSwgInN0YXRlX21hbmFnZXIiOiB7Im1heF90cmFja2VkX3NlcXVlbmNlcyI6IDIwNDgsICJtYXhfcmFnZ2VkX2JhdGNoX3NpemUiOiA3NjgsICJtYXhfcmFnZ2VkX3NlcXVlbmNlX2NvdW50IjogNTEyLCAibWF4X2NvbnRleHQiOiA4MTkyLCAibWVtb3J5X2NvbmZpZyI6IHsibW9kZSI6ICJyZXNlcnZlIiwgInNpemUiOiAxMDAwMDAwMDAwfSwgIm9mZmxvYWQiOiBmYWxzZX19LCAidG9yY2hfZGlzdF9wb3J0IjogMjk1MDAsICJ6bXFfcG9ydF9udW1iZXIiOiAyNTU1NSwgInJlcGxpY2FfbnVtIjogMSwgInJlcGxpY2FfY29uZmlncyI6IFt7Imhvc3RuYW1lIjogImxvY2FsaG9zdCIsICJ0ZW5zb3JfcGFyYWxsZWxfcG9ydHMiOiBbNTAwNTFdLCAidG9yY2hfZGlzdF9wb3J0IjogMjk1MDAsICJncHVfaW5kaWNlcyI6IFswXSwgInptcV9wb3J0IjogMjU1NTV9XSwgImRldmljZV9tYXAiOiAiYXV0byIsICJtYXhfbGVuZ3RoIjogbnVsbCwgImFsbF9yYW5rX291dHB1dCI6IGZhbHNlLCAic3luY19kZWJ1ZyI6IGZhbHNlLCAicHJvZmlsZV9tb2RlbF90aW1lIjogZmFsc2V9']
[2023-12-24 09:03:04,984] [INFO] [server.py:107:_launch_server_process] msg_server launch: ['deepspeed', '-i', 'localhost:0', '--master_port', '29500', '--master_addr', 'localhost', '--no_ssh_check', '--no_local_rank', '--no_python', '/data/python_workspace/anaconda3/envs/deepspeed-mii/bin/python', '-m', 'mii.launch.multi_gpu_server', '--deployment-name', 'neural-chat', '--load-balancer-port', '50050', '--restful-gateway-port', '5000', '--restful-gateway-procs', '32', '--server-port', '50051', '--zmq-port', '25555', '--model-config', 'eyJtb2RlbF9uYW1lX29yX3BhdGgiOiAiL2RhdGEvcHl0aG9uX3dvcmtzcGFjZS90ZXh0LWdlbmVyYXRpb24td2VidWkvbW9kZWxzL25ldXJhbC1jaGF0LTdiLXYzLTEvIiwgInRva2VuaXplciI6ICIvZGF0YS9weXRob25fd29ya3NwYWNlL3RleHQtZ2VuZXJhdGlvbi13ZWJ1aS9tb2RlbHMvbmV1cmFsLWNoYXQtN2ItdjMtMS8iLCAidGFzayI6ICJ0ZXh0LWdlbmVyYXRpb24iLCAidGVuc29yX3BhcmFsbGVsIjogMSwgImluZmVyZW5jZV9lbmdpbmVfY29uZmlnIjogeyJ0ZW5zb3JfcGFyYWxsZWwiOiB7InRwX3NpemUiOiAxfSwgInN0YXRlX21hbmFnZXIiOiB7Im1heF90cmFja2VkX3NlcXVlbmNlcyI6IDIwNDgsICJtYXhfcmFnZ2VkX2JhdGNoX3NpemUiOiA3NjgsICJtYXhfcmFnZ2VkX3NlcXVlbmNlX2NvdW50IjogNTEyLCAibWF4X2NvbnRleHQiOiA4MTkyLCAibWVtb3J5X2NvbmZpZyI6IHsibW9kZSI6ICJyZXNlcnZlIiwgInNpemUiOiAxMDAwMDAwMDAwfSwgIm9mZmxvYWQiOiBmYWxzZX19LCAidG9yY2hfZGlzdF9wb3J0IjogMjk1MDAsICJ6bXFfcG9ydF9udW1iZXIiOiAyNTU1NSwgInJlcGxpY2FfbnVtIjogMSwgInJlcGxpY2FfY29uZmlncyI6IFt7Imhvc3RuYW1lIjogImxvY2FsaG9zdCIsICJ0ZW5zb3JfcGFyYWxsZWxfcG9ydHMiOiBbNTAwNTFdLCAidG9yY2hfZGlzdF9wb3J0IjogMjk1MDAsICJncHVfaW5kaWNlcyI6IFswXSwgInptcV9wb3J0IjogMjU1NTV9XSwgImRldmljZV9tYXAiOiAiYXV0byIsICJtYXhfbGVuZ3RoIjogbnVsbCwgImFsbF9yYW5rX291dHB1dCI6IGZhbHNlLCAic3luY19kZWJ1ZyI6IGZhbHNlLCAicHJvZmlsZV9tb2RlbF90aW1lIjogZmFsc2V9']
[2023-12-24 09:03:05,014] [INFO] [server.py:107:_launch_server_process] msg_server launch: ['/data/python_workspace/anaconda3/envs/deepspeed-mii/bin/python', '-m', 'mii.launch.multi_gpu_server', '--deployment-name', 'neural-chat', '--load-balancer-port', '50050', '--restful-gateway-port', '5000', '--restful-gateway-procs', '32', '--load-balancer', '--model-config', 'eyJtb2RlbF9uYW1lX29yX3BhdGgiOiAiL2RhdGEvcHl0aG9uX3dvcmtzcGFjZS90ZXh0LWdlbmVyYXRpb24td2VidWkvbW9kZWxzL25ldXJhbC1jaGF0LTdiLXYzLTEvIiwgInRva2VuaXplciI6ICIvZGF0YS9weXRob25fd29ya3NwYWNlL3RleHQtZ2VuZXJhdGlvbi13ZWJ1aS9tb2RlbHMvbmV1cmFsLWNoYXQtN2ItdjMtMS8iLCAidGFzayI6ICJ0ZXh0LWdlbmVyYXRpb24iLCAidGVuc29yX3BhcmFsbGVsIjogMSwgImluZmVyZW5jZV9lbmdpbmVfY29uZmlnIjogeyJ0ZW5zb3JfcGFyYWxsZWwiOiB7InRwX3NpemUiOiAxfSwgInN0YXRlX21hbmFnZXIiOiB7Im1heF90cmFja2VkX3NlcXVlbmNlcyI6IDIwNDgsICJtYXhfcmFnZ2VkX2JhdGNoX3NpemUiOiA3NjgsICJtYXhfcmFnZ2VkX3NlcXVlbmNlX2NvdW50IjogNTEyLCAibWF4X2NvbnRleHQiOiA4MTkyLCAibWVtb3J5X2NvbmZpZyI6IHsibW9kZSI6ICJyZXNlcnZlIiwgInNpemUiOiAxMDAwMDAwMDAwfSwgIm9mZmxvYWQiOiBmYWxzZX19LCAidG9yY2hfZGlzdF9wb3J0IjogMjk1MDAsICJ6bXFfcG9ydF9udW1iZXIiOiAyNTU1NSwgInJlcGxpY2FfbnVtIjogMSwgInJlcGxpY2FfY29uZmlncyI6IFt7Imhvc3RuYW1lIjogImxvY2FsaG9zdCIsICJ0ZW5zb3JfcGFyYWxsZWxfcG9ydHMiOiBbNTAwNTFdLCAidG9yY2hfZGlzdF9wb3J0IjogMjk1MDAsICJncHVfaW5kaWNlcyI6IFswXSwgInptcV9wb3J0IjogMjU1NTV9XSwgImRldmljZV9tYXAiOiAiYXV0byIsICJtYXhfbGVuZ3RoIjogbnVsbCwgImFsbF9yYW5rX291dHB1dCI6IGZhbHNlLCAic3luY19kZWJ1ZyI6IGZhbHNlLCAicHJvZmlsZV9tb2RlbF90aW1lIjogZmFsc2V9']
[2023-12-24 09:03:05,014] [INFO] [server.py:107:_launch_server_process] msg_server launch: ['/data/python_workspace/anaconda3/envs/deepspeed-mii/bin/python', '-m', 'mii.launch.multi_gpu_server', '--deployment-name', 'neural-chat', '--load-balancer-port', '50050', '--restful-gateway-port', '5000', '--restful-gateway-procs', '32', '--load-balancer', '--model-config', 'eyJtb2RlbF9uYW1lX29yX3BhdGgiOiAiL2RhdGEvcHl0aG9uX3dvcmtzcGFjZS90ZXh0LWdlbmVyYXRpb24td2VidWkvbW9kZWxzL25ldXJhbC1jaGF0LTdiLXYzLTEvIiwgInRva2VuaXplciI6ICIvZGF0YS9weXRob25fd29ya3NwYWNlL3RleHQtZ2VuZXJhdGlvbi13ZWJ1aS9tb2RlbHMvbmV1cmFsLWNoYXQtN2ItdjMtMS8iLCAidGFzayI6ICJ0ZXh0LWdlbmVyYXRpb24iLCAidGVuc29yX3BhcmFsbGVsIjogMSwgImluZmVyZW5jZV9lbmdpbmVfY29uZmlnIjogeyJ0ZW5zb3JfcGFyYWxsZWwiOiB7InRwX3NpemUiOiAxfSwgInN0YXRlX21hbmFnZXIiOiB7Im1heF90cmFja2VkX3NlcXVlbmNlcyI6IDIwNDgsICJtYXhfcmFnZ2VkX2JhdGNoX3NpemUiOiA3NjgsICJtYXhfcmFnZ2VkX3NlcXVlbmNlX2NvdW50IjogNTEyLCAibWF4X2NvbnRleHQiOiA4MTkyLCAibWVtb3J5X2NvbmZpZyI6IHsibW9kZSI6ICJyZXNlcnZlIiwgInNpemUiOiAxMDAwMDAwMDAwfSwgIm9mZmxvYWQiOiBmYWxzZX19LCAidG9yY2hfZGlzdF9wb3J0IjogMjk1MDAsICJ6bXFfcG9ydF9udW1iZXIiOiAyNTU1NSwgInJlcGxpY2FfbnVtIjogMSwgInJlcGxpY2FfY29uZmlncyI6IFt7Imhvc3RuYW1lIjogImxvY2FsaG9zdCIsICJ0ZW5zb3JfcGFyYWxsZWxfcG9ydHMiOiBbNTAwNTFdLCAidG9yY2hfZGlzdF9wb3J0IjogMjk1MDAsICJncHVfaW5kaWNlcyI6IFswXSwgInptcV9wb3J0IjogMjU1NTV9XSwgImRldmljZV9tYXAiOiAiYXV0byIsICJtYXhfbGVuZ3RoIjogbnVsbCwgImFsbF9yYW5rX291dHB1dCI6IGZhbHNlLCAic3luY19kZWJ1ZyI6IGZhbHNlLCAicHJvZmlsZV9tb2RlbF90aW1lIjogZmFsc2V9']
[2023-12-24 09:03:05,016] [INFO] [server.py:107:_launch_server_process] msg_server launch: ['/data/python_workspace/anaconda3/envs/deepspeed-mii/bin/python', '-m', 'mii.launch.multi_gpu_server', '--deployment-name', 'neural-chat', '--load-balancer-port', '50050', '--restful-gateway-port', '5000', '--restful-gateway-procs', '32', '--restful-gateway', '--model-config', 'eyJtb2RlbF9uYW1lX29yX3BhdGgiOiAiL2RhdGEvcHl0aG9uX3dvcmtzcGFjZS90ZXh0LWdlbmVyYXRpb24td2VidWkvbW9kZWxzL25ldXJhbC1jaGF0LTdiLXYzLTEvIiwgInRva2VuaXplciI6ICIvZGF0YS9weXRob25fd29ya3NwYWNlL3RleHQtZ2VuZXJhdGlvbi13ZWJ1aS9tb2RlbHMvbmV1cmFsLWNoYXQtN2ItdjMtMS8iLCAidGFzayI6ICJ0ZXh0LWdlbmVyYXRpb24iLCAidGVuc29yX3BhcmFsbGVsIjogMSwgImluZmVyZW5jZV9lbmdpbmVfY29uZmlnIjogeyJ0ZW5zb3JfcGFyYWxsZWwiOiB7InRwX3NpemUiOiAxfSwgInN0YXRlX21hbmFnZXIiOiB7Im1heF90cmFja2VkX3NlcXVlbmNlcyI6IDIwNDgsICJtYXhfcmFnZ2VkX2JhdGNoX3NpemUiOiA3NjgsICJtYXhfcmFnZ2VkX3NlcXVlbmNlX2NvdW50IjogNTEyLCAibWF4X2NvbnRleHQiOiA4MTkyLCAibWVtb3J5X2NvbmZpZyI6IHsibW9kZSI6ICJyZXNlcnZlIiwgInNpemUiOiAxMDAwMDAwMDAwfSwgIm9mZmxvYWQiOiBmYWxzZX19LCAidG9yY2hfZGlzdF9wb3J0IjogMjk1MDAsICJ6bXFfcG9ydF9udW1iZXIiOiAyNTU1NSwgInJlcGxpY2FfbnVtIjogMSwgInJlcGxpY2FfY29uZmlncyI6IFt7Imhvc3RuYW1lIjogImxvY2FsaG9zdCIsICJ0ZW5zb3JfcGFyYWxsZWxfcG9ydHMiOiBbNTAwNTFdLCAidG9yY2hfZGlzdF9wb3J0IjogMjk1MDAsICJncHVfaW5kaWNlcyI6IFswXSwgInptcV9wb3J0IjogMjU1NTV9XSwgImRldmljZV9tYXAiOiAiYXV0byIsICJtYXhfbGVuZ3RoIjogbnVsbCwgImFsbF9yYW5rX291dHB1dCI6IGZhbHNlLCAic3luY19kZWJ1ZyI6IGZhbHNlLCAicHJvZmlsZV9tb2RlbF90aW1lIjogZmFsc2V9']
[2023-12-24 09:03:05,016] [INFO] [server.py:107:_launch_server_process] msg_server launch: ['/data/python_workspace/anaconda3/envs/deepspeed-mii/bin/python', '-m', 'mii.launch.multi_gpu_server', '--deployment-name', 'neural-chat', '--load-balancer-port', '50050', '--restful-gateway-port', '5000', '--restful-gateway-procs', '32', '--restful-gateway', '--model-config', 'eyJtb2RlbF9uYW1lX29yX3BhdGgiOiAiL2RhdGEvcHl0aG9uX3dvcmtzcGFjZS90ZXh0LWdlbmVyYXRpb24td2VidWkvbW9kZWxzL25ldXJhbC1jaGF0LTdiLXYzLTEvIiwgInRva2VuaXplciI6ICIvZGF0YS9weXRob25fd29ya3NwYWNlL3RleHQtZ2VuZXJhdGlvbi13ZWJ1aS9tb2RlbHMvbmV1cmFsLWNoYXQtN2ItdjMtMS8iLCAidGFzayI6ICJ0ZXh0LWdlbmVyYXRpb24iLCAidGVuc29yX3BhcmFsbGVsIjogMSwgImluZmVyZW5jZV9lbmdpbmVfY29uZmlnIjogeyJ0ZW5zb3JfcGFyYWxsZWwiOiB7InRwX3NpemUiOiAxfSwgInN0YXRlX21hbmFnZXIiOiB7Im1heF90cmFja2VkX3NlcXVlbmNlcyI6IDIwNDgsICJtYXhfcmFnZ2VkX2JhdGNoX3NpemUiOiA3NjgsICJtYXhfcmFnZ2VkX3NlcXVlbmNlX2NvdW50IjogNTEyLCAibWF4X2NvbnRleHQiOiA4MTkyLCAibWVtb3J5X2NvbmZpZyI6IHsibW9kZSI6ICJyZXNlcnZlIiwgInNpemUiOiAxMDAwMDAwMDAwfSwgIm9mZmxvYWQiOiBmYWxzZX19LCAidG9yY2hfZGlzdF9wb3J0IjogMjk1MDAsICJ6bXFfcG9ydF9udW1iZXIiOiAyNTU1NSwgInJlcGxpY2FfbnVtIjogMSwgInJlcGxpY2FfY29uZmlncyI6IFt7Imhvc3RuYW1lIjogImxvY2FsaG9zdCIsICJ0ZW5zb3JfcGFyYWxsZWxfcG9ydHMiOiBbNTAwNTFdLCAidG9yY2hfZGlzdF9wb3J0IjogMjk1MDAsICJncHVfaW5kaWNlcyI6IFswXSwgInptcV9wb3J0IjogMjU1NTV9XSwgImRldmljZV9tYXAiOiAiYXV0byIsICJtYXhfbGVuZ3RoIjogbnVsbCwgImFsbF9yYW5rX291dHB1dCI6IGZhbHNlLCAic3luY19kZWJ1ZyI6IGZhbHNlLCAicHJvZmlsZV9tb2RlbF90aW1lIjogZmFsc2V9']
[2023-12-24 09:03:10,037] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:03:10,037] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:03:15,057] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:03:15,057] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:03:20,062] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:03:20,062] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:03:22,440] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-24 09:03:22,638] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-24 09:03:23,166] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-24 09:03:25,069] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:03:25,069] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:03:27,606] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-12-24 09:03:27,630] [INFO] [runner.py:571:main] cmd = /data/python_workspace/anaconda3/envs/deepspeed-mii/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --no_python --no_local_rank --enable_each_rank_log=None /data/python_workspace/anaconda3/envs/deepspeed-mii/bin/python -m mii.launch.multi_gpu_server --deployment-name neural-chat --load-balancer-port 50050 --restful-gateway-port 5000 --restful-gateway-procs 32 --server-port 50051 --zmq-port 25555 --model-config eyJtb2RlbF9uYW1lX29yX3BhdGgiOiAiL2RhdGEvcHl0aG9uX3dvcmtzcGFjZS90ZXh0LWdlbmVyYXRpb24td2VidWkvbW9kZWxzL25ldXJhbC1jaGF0LTdiLXYzLTEvIiwgInRva2VuaXplciI6ICIvZGF0YS9weXRob25fd29ya3NwYWNlL3RleHQtZ2VuZXJhdGlvbi13ZWJ1aS9tb2RlbHMvbmV1cmFsLWNoYXQtN2ItdjMtMS8iLCAidGFzayI6ICJ0ZXh0LWdlbmVyYXRpb24iLCAidGVuc29yX3BhcmFsbGVsIjogMSwgImluZmVyZW5jZV9lbmdpbmVfY29uZmlnIjogeyJ0ZW5zb3JfcGFyYWxsZWwiOiB7InRwX3NpemUiOiAxfSwgInN0YXRlX21hbmFnZXIiOiB7Im1heF90cmFja2VkX3NlcXVlbmNlcyI6IDIwNDgsICJtYXhfcmFnZ2VkX2JhdGNoX3NpemUiOiA3NjgsICJtYXhfcmFnZ2VkX3NlcXVlbmNlX2NvdW50IjogNTEyLCAibWF4X2NvbnRleHQiOiA4MTkyLCAibWVtb3J5X2NvbmZpZyI6IHsibW9kZSI6ICJyZXNlcnZlIiwgInNpemUiOiAxMDAwMDAwMDAwfSwgIm9mZmxvYWQiOiBmYWxzZX19LCAidG9yY2hfZGlzdF9wb3J0IjogMjk1MDAsICJ6bXFfcG9ydF9udW1iZXIiOiAyNTU1NSwgInJlcGxpY2FfbnVtIjogMSwgInJlcGxpY2FfY29uZmlncyI6IFt7Imhvc3RuYW1lIjogImxvY2FsaG9zdCIsICJ0ZW5zb3JfcGFyYWxsZWxfcG9ydHMiOiBbNTAwNTFdLCAidG9yY2hfZGlzdF9wb3J0IjogMjk1MDAsICJncHVfaW5kaWNlcyI6IFswXSwgInptcV9wb3J0IjogMjU1NTV9XSwgImRldmljZV9tYXAiOiAiYXV0byIsICJtYXhfbGVuZ3RoIjogbnVsbCwgImFsbF9yYW5rX291dHB1dCI6IGZhbHNlLCAic3luY19kZWJ1ZyI6IGZhbHNlLCAicHJvZmlsZV9tb2RlbF90aW1lIjogZmFsc2V9
Starting load balancer on port: 50050
Starting RESTful API gateway on port: 5000
About to start server
Started
[2023-12-24 09:03:30,073] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:03:30,073] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:03:35,077] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:03:35,077] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:03:40,089] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:03:40,089] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:03:42,712] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-24 09:03:45,093] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:03:45,093] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:03:47,251] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0]}
[2023-12-24 09:03:47,251] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-12-24 09:03:47,261] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-12-24 09:03:47,262] [INFO] [launch.py:163:main] dist_world_size=1
[2023-12-24 09:03:47,262] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0
[2023-12-24 09:03:50,097] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:03:50,097] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:03:55,105] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:03:55,105] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:00,125] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:00,125] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:05,074] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-24 09:04:05,129] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:05,129] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:10,137] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:10,137] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:10,371] [INFO] [comm.py:637:init_distributed] cdb=None
[2023-12-24 09:04:10,371] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2023-12-24 09:04:10,427] [INFO] [engine_v2.py:82:__init__] Building model...
[2023-12-24 09:04:15,145] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:15,145] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
Using /home/bruce/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
[2023-12-24 09:04:20,157] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:20,157] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:25,172] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:25,172] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:30,193] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:30,193] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:35,197] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:35,197] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:40,211] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:40,211] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:45,225] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:45,225] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:50,229] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:50,229] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:55,236] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:04:55,236] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:00,241] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:00,241] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:05,245] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:05,245] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:10,249] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:10,249] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:15,253] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:15,253] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:20,261] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:20,261] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:25,265] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:25,265] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:30,288] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:30,288] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:35,293] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:35,293] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:40,297] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:40,297] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:45,304] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:45,304] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:50,309] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:50,309] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:55,324] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:05:55,324] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:00,337] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:00,337] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:05,341] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:05,341] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:10,345] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:10,345] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:15,353] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:15,353] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:20,357] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:20,357] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:25,361] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:25,361] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:30,365] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:30,365] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:35,369] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:35,369] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:40,373] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:40,373] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:45,377] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:45,377] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:50,381] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:50,381] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:55,389] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:06:55,389] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:00,401] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:00,401] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:05,405] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:05,405] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:10,413] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:10,413] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:15,421] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:15,421] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:20,438] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:20,438] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:25,449] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:25,449] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:30,465] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:30,465] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:35,469] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:35,469] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:40,473] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:40,473] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:45,483] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:45,483] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:50,485] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:50,485] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:55,493] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:07:55,493] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:00,497] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:00,497] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:05,501] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:05,501] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:10,505] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:10,505] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:15,517] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:15,517] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:20,529] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:20,529] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:25,541] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:25,541] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:30,549] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:30,549] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:35,563] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:35,563] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:40,569] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:40,569] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:45,574] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:45,574] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:50,593] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:50,593] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:55,597] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:08:55,597] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:09:00,601] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:09:00,601] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:09:05,609] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:09:05,609] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:09:10,615] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:09:10,615] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:09:15,621] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:09:15,621] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:09:20,633] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:09:20,633] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:09:25,637] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:09:25,637] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:09:30,641] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:09:30,641] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:09:35,645] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2023-12-24 09:09:35,645] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...

The server has been unable to complete startup,Does anyone know what this problem is? How to solve it?

@cninnovationai
Copy link
Author

cninnovationai commented Dec 24, 2023

This is the model dir and model file
image
,I use vLLm to run a server is OK, code like this.

nohup python -m vllm.entrypoints.openai.api_server --model="/data/python_workspace/text-generation-webui/models/neural-chat-7b-v3-1" --trust-remote-code --port=5000 --host="0.0.0.0" > output.log 2>&1 &

@mrwyattii
Copy link
Contributor

@cninnovationai could you please try loading the model with pipeline and let me know if that works? Thanks

import mii
pipe = mii.pipeline("/data/python_workspace/text-generation-webui/models/neural-chat-7b-v3-1/")
response = pipe("test")

@cninnovationai
Copy link
Author

cninnovationai commented Dec 26, 2023

@cninnovationai could you please try loading the model with pipeline and let me know if that works? Thanks

import mii
pipe = mii.pipeline("/data/python_workspace/text-generation-webui/models/neural-chat-7b-v3-1/")
response = pipe("test")

[2023-12-26 14:40:25,529] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-26 14:40:32,301] [INFO] [comm.py:637:init_distributed] cdb=None
[2023-12-26 14:40:32,305] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2023-12-26 14:40:32,360] [INFO] [engine_v2.py:82:init] Building model...
Using /home/bruce/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...

always stay here,No further progress, gpu memory usage is 0

Do you know how to solve this problem? @mrwyattii

@mrwyattii
Copy link
Contributor

@cninnovationai it looks like there is a lock file in your torch cache that is causing the issue. Try deleting your torch cache and run again:
rm -rf /home/bruce/.cache/torch*

@mrwyattii mrwyattii self-assigned this Jan 2, 2024
@TobyGE
Copy link

TobyGE commented Jan 9, 2024

Cleared torch cache and worked for me

@cninnovationai
Copy link
Author

after rm -rf /home/bruce/.cache/torch*,it work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants