RuntimeError: server crashed for some reason, unable to proceed #123

underlines · 2022-12-22T07:04:56Z

Using default example to deploy Deploying MII-Public on Azure ML:
Compute instance: TeslaK80 12GB
Kernel: Python 3.8 - AzureML

pip install deepspeed-mii

restart kernel

using this fails:

import mii

mii_configs = {"tensor_parallel": 1, "dtype": "fp16"}
mii.deploy(task='text-generation',
           model="bigscience/bloom-560m",
           deployment_name="bloom560m_deployment",
           mii_config=mii_configs)

AssertionError: text-generation only supports ['distilgpt2', 'gpt2-large'...

using this modified to tensor_parallel=1 fails:

import mii

mii_configs = {
    "dtype": "fp16",
    "tensor_parallel": 1,
    "port_number": 50950,
}
name = "microsoft/bloom-deepspeed-inference-fp16"

mii.deploy(task='text-generation',
           model=name,
           deployment_name=name + "_deployment",
           model_path="/data/bloom-mp",
           mii_config=mii_configs)

RuntimeError: server crashed for some reason, unable to proceed

Also switching to int8 didn't help.

Is my compute instance too small?

The text was updated successfully, but these errors were encountered:

DanDelluomo · 2023-01-03T18:45:38Z

I am getting this error too.

mrwyattii · 2023-01-19T01:11:04Z

Sorry for the late response on this. Please see the solution in #135

mrwyattii · 2023-01-19T01:12:08Z

@underlines to run the microsoft/bloom-deepspeed-inference-fp16 model you will need at least 8x80GB A100s for fp16 or 4x80GB A100s for int8.

underlines · 2023-01-27T06:19:10Z

@underlines to run the microsoft/bloom-deepspeed-inference-fp16 model you will need at least 8x80GB A100s for fp16 or 4x80GB A100s for int8.

Unfortunately Azure isn't giving my account the resources. They are forcing people to manually request GPU resources including stating the reasons and examples of the application. Off to AWS then, but without the azure optimizations of DeepSpeed

mrwyattii closed this as completed Jan 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: server crashed for some reason, unable to proceed #123

RuntimeError: server crashed for some reason, unable to proceed #123

underlines commented Dec 22, 2022

DanDelluomo commented Jan 3, 2023

mrwyattii commented Jan 19, 2023

mrwyattii commented Jan 19, 2023 •

edited

Loading

underlines commented Jan 27, 2023

RuntimeError: server crashed for some reason, unable to proceed #123

RuntimeError: server crashed for some reason, unable to proceed #123

Comments

underlines commented Dec 22, 2022

DanDelluomo commented Jan 3, 2023

mrwyattii commented Jan 19, 2023

mrwyattii commented Jan 19, 2023 • edited Loading

underlines commented Jan 27, 2023

mrwyattii commented Jan 19, 2023 •

edited

Loading