Add simple server #147

zhuzilin · 2023-03-07T09:03:24Z

This PR adds a simple fastapi server to serve the llama model.

Thank you for your time on reviewing this PR :)

alecmerdler · 2023-03-07T14:25:28Z

@zhuzilin Will this server only be able to handle one request at a time? From my limited experience, there will be some CUDA errors if more than one process tries to use the GPU at the same time.

juncongmoo · 2023-03-07T18:42:38Z

I already did it here for single GPU: https://github.com/juncongmoo/pyllama

maxpain · 2023-03-08T10:55:29Z

How to make it use all GPUs in my system?

I started like this:

torchrun --nproc_per_node 8 server.py --ckpt_dir /var/llama/65B --tokenizer_path /var/llama/tokenizer.model

But it only uses one GPU:

zhuzilin · 2023-03-08T11:33:06Z

@maxpain

But it only uses one GPU

hmm...that's weird. I use the same command (with different directory) and could use all 8 gpus. Could you try the example.py and see if all gpus are used? This server.py is basically the same as example.py.

maxpain · 2023-03-08T11:36:14Z

Yes, example.py uses all GPUs

maxpain · 2023-03-08T11:37:00Z

Could you please give an example of an HTTP request?

maxpain · 2023-03-08T11:44:01Z

This request crashes the server:

curl -X POST http://127.0.0.1:8042/llama/ -H 'Content-Type: application/json' -d '{"prompts":["Hello. How are you?"], "max_gen_len": "256"}'

root@llama:/llama# torchrun --nproc_per_node 8 server.py --ckpt_dir /var/llama/65B --tokenizer_path /var/llama/tokenizer.model
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
> initializing model parallel with size 8
> initializing ddp with size 1
> initializing pipeline with size 1
Loading
Loaded in 18.18 seconds
INFO:     Started server process [1436]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8042 (Press CTRL+C to quit)
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -7) local_rank: 0 (pid: 1436) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/usr/local/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 762, in main
    run(args)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 753, in run
    elastic_launch(
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
=========================================================
server.py FAILED
---------------------------------------------------------
Failures:
[1]:
  time      : 2023-03-08_11:42:46
  host      : llama.us-central1-c.c.llama-380007.internal
  rank      : 1 (local_rank: 1)
  exitcode  : -7 (pid: 1437)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 1437
[2]:
  time      : 2023-03-08_11:42:46
  host      : llama.us-central1-c.c.llama-380007.internal
  rank      : 2 (local_rank: 2)
  exitcode  : -7 (pid: 1438)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 1438
[3]:
  time      : 2023-03-08_11:42:46
  host      : llama.us-central1-c.c.llama-380007.internal
  rank      : 3 (local_rank: 3)
  exitcode  : -7 (pid: 1439)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 1439
[4]:
  time      : 2023-03-08_11:42:46
  host      : llama.us-central1-c.c.llama-380007.internal
  rank      : 4 (local_rank: 4)
  exitcode  : -7 (pid: 1440)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 1440
[5]:
  time      : 2023-03-08_11:42:46
  host      : llama.us-central1-c.c.llama-380007.internal
  rank      : 5 (local_rank: 5)
  exitcode  : -7 (pid: 1441)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 1441
[6]:
  time      : 2023-03-08_11:42:46
  host      : llama.us-central1-c.c.llama-380007.internal
  rank      : 6 (local_rank: 6)
  exitcode  : -7 (pid: 1442)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 1442
[7]:
  time      : 2023-03-08_11:42:46
  host      : llama.us-central1-c.c.llama-380007.internal
  rank      : 7 (local_rank: 7)
  exitcode  : -7 (pid: 1443)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 1443
---------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-03-08_11:42:46
  host      : llama.us-central1-c.c.llama-380007.internal
  rank      : 0 (local_rank: 0)
  exitcode  : -7 (pid: 1436)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 1436
=========================================================
root@llama:/llama#

zhuzilin · 2023-03-08T12:25:30Z

@maxpain This is also weird... Here is the output of your curl command in my environment:

> curl -X POST http://127.0.0.1:8042/llama/ -H 'Content-Type: application/json' -d '{"prompts}'["Hello. How are you?"], "max_gen_len": "256"}
{"responses":["Hello. How are you? I am doing well. I’m recovering from a bad fall on the ice that happened while I was doing a blogger-y thing. It’s kind of funny, but also kind of not. I’m going to tell you about it, so it’s funny.\nI have to drive back and forth from my apartment to my parents’ house about twice a week to take care of my mom’s cats and/or get laundry done. The drive is about 25 minutes each way.\nThe first time I did this was the Monday after the blizzard, which is when I fell on the ice. I didn’t feel like going back and forth again in my car, so I decided to walk it instead.\nI had just downloaded a new podcast that was super interesting, and I decided to listen to it on my walk.\nI made it almost all the way back to my apartment without incident. I was on the last street in the apartment complex, and I was walking down the hill. It was very slippery. I wasn’t really paying attention to what I was doing, and I started to fall.\nI tried to catch myself, but"]}

Could you try to print some log before and after broadcast_object_list. I'm afraid is the broadcast that failed in your case.

nil-andreu · 2023-03-08T15:19:11Z

requirements.txt

+sentencepiece
+fastapi


Hey! Really nice code!
Could these dependencies be added as additional requirements in another file (e.g. requirements-api.txt?
To really make clear that are not necessary for running the model at all, just for creating an API.

EonSin · 2023-03-08T20:14:30Z

@maxpain (#147 (comment))

Try the following:
curl -X POST http://127.0.0.1:8080/llama/ -H 'Content-Type: application/json' -d '{"prompts" :["Hello How are you?"], "max_gen_len": 256}'

Adam4397 · 2023-03-08T22:34:28Z

Thanks a lot!

SovereignRemedy · 2023-08-16T14:45:20Z

This PR adds a simple fastapi server to serve the llama model.

Thank you for your time(时间) on reviewing this PR :)

hi . Will the http server still work on llama2-13b? I tried booting 13b locally but seemed to run into problems using multiple Gpus

yanxiyue · 2023-08-23T09:02:14Z

is this PR supports llama2-70B?

Maxhyl · 2023-10-08T02:14:28Z

is this PR supports llama2-70B?

I have the same problem

Add simple server

797488b

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 7, 2023

nil-andreu reviewed Mar 8, 2023

View reviewed changes

Adam4397 mentioned this pull request Mar 8, 2023

How to load multiple GPU version without torchrun #84

Open

msaroufim added the move-to-llama-recipes label Sep 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add simple server #147

Add simple server #147

zhuzilin commented Mar 7, 2023 •

edited

alecmerdler commented Mar 7, 2023

juncongmoo commented Mar 7, 2023

maxpain commented Mar 8, 2023

zhuzilin commented Mar 8, 2023 •

edited

maxpain commented Mar 8, 2023

maxpain commented Mar 8, 2023

maxpain commented Mar 8, 2023

zhuzilin commented Mar 8, 2023 •

edited

nil-andreu Mar 8, 2023

EonSin commented Mar 8, 2023 •

edited

Adam4397 commented Mar 8, 2023

SovereignRemedy commented Aug 16, 2023

yanxiyue commented Aug 23, 2023

Maxhyl commented Oct 8, 2023

		sentencepiece
		fastapi

Add simple server #147

Are you sure you want to change the base?

Add simple server #147

Conversation

zhuzilin commented Mar 7, 2023 • edited

alecmerdler commented Mar 7, 2023

juncongmoo commented Mar 7, 2023

maxpain commented Mar 8, 2023

zhuzilin commented Mar 8, 2023 • edited

maxpain commented Mar 8, 2023

maxpain commented Mar 8, 2023

maxpain commented Mar 8, 2023

zhuzilin commented Mar 8, 2023 • edited

nil-andreu Mar 8, 2023

Choose a reason for hiding this comment

EonSin commented Mar 8, 2023 • edited

Adam4397 commented Mar 8, 2023

SovereignRemedy commented Aug 16, 2023

yanxiyue commented Aug 23, 2023

Maxhyl commented Oct 8, 2023

zhuzilin commented Mar 7, 2023 •

edited

zhuzilin commented Mar 8, 2023 •

edited

zhuzilin commented Mar 8, 2023 •

edited

EonSin commented Mar 8, 2023 •

edited