[LLM] Example for Serving Gemma #3207

Michaelvll · 2024-02-21T21:46:35Z

TODO (future):

add finetuning as well

Tested (run the relevant ones):

Code formatting: bash format.sh

Any manual or new tests for this PR (please specify below)

sky launch -c gemma llm/gemma/serve.yaml --cloud gcp --env HF_TOKEN="xxx"

curl the model with command in readme

IP=$(sky status --ip gemma)
curl -L http://$IP:8000/v1/completions   -H "Content-Type: application/json"   -d '{
      "model": "google/gemma-7b",
      "prompt": "My favourite condiment is",
      "max_tokens": 25
  }'

sky serve up -n gemma serve.yaml --env HF_TOKEN

All smoke tests: pytest tests/test_smoke.py
Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

concretevitamin

Awesome, @Michaelvll! Sending some comments first while it's running in background.

llm/gemma/README.md

llm/gemma/serve.yaml

README.md

llm/gemma/README.md

concretevitamin · 2024-02-22T21:11:01Z

llm/gemma/serve.yaml

+  MODEL_NAME: google/gemma-7b
+  HF_TOKEN: <your-huggingface-token> # TODO: Replace with huggingface token
+
+resources: 


Azure is taking a long time without success (quota failover), so I changed this to

resources: accelerators: {L4, A10g, A10, L40, A40, A100, A100-80GB} ports: 8000 disk_tier: best any_of: - cloud: aws - cloud: gcp

and got

File "/Users/zongheng/Dropbox/workspace/riselab/sky-computing/sky/serve/core.py", line 88, in up raise ValueError(f'Got multiple clouds: {requested_cloud} and ' ValueError: Got multiple clouds: GCP and AWS in different resources. Please specify single cloud instead.

for sky serve up. Is this expected?

This seems to be an issue that needs to be fixed @cblmemo @MaoZiming ?

Good catch! Fixed in #3226. Thanks!

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

concretevitamin

Works like a charm on by launch and serve up. LGTM.

llm/gemma/README.md

llm/gemma/serve.yaml

llm/gemma/README.md

concretevitamin · 2024-02-22T21:22:54Z

llm/gemma/README.md

+
+### Prerequsite
+
+1. Apply for the access to the Gemma model


Suggested change

1. Apply for the access to the Gemma model

1. Apply for access to the Gemma model

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

* Add serve for gemma and fix mixtral dependency * Add hf token * fix model len * Add comment * Serve your private gemma * fix serve yaml * readme * Remove chat completion due to the wrong template * add readme * Update llm/gemma/README.md Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * address comments * Update README.md Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * Update llm/gemma/README.md Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * Update llm/gemma/README.md Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * Update llm/gemma/README.md Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * Change to it * Add chat API * use HF_TOKEN env * typo --------- Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

* initial commit * newline * comments * run linter * reminder for down * tentatively done with example * formatting * yapf * [Storage] Storage mounting tool permissions fix (#3215) * fix permissions * fix permissions * [LLM] Example for Serving Gemma (#3207) * Add serve for gemma and fix mixtral dependency * Add hf token * fix model len * Add comment * Serve your private gemma * fix serve yaml * readme * Remove chat completion due to the wrong template * add readme * Update llm/gemma/README.md Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * address comments * Update README.md Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * Update llm/gemma/README.md Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * Update llm/gemma/README.md Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * Update llm/gemma/README.md Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * Change to it * Add chat API * use HF_TOKEN env * typo --------- Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * [LLM] Add logo for Gemma (#3220) * Minor fixes for release 0.5.0 (#3212) * when removing cudo credential, sky check fails * remove tips * minor hint fix * fix cluster version for k8s * fix typo * [Docker] Add retry for docker pull due to daemon not ready (#3218) * Add retry for docker pull due to daemon not ready * longer wait time * longer wait time * retry earlier * add retry for retries as well * longer wait time * change wait time * format * Add comment * Fix * Fix indent for azure docker config * Fix docker login config * Fix comments * More robust docker login config * Add retry for docker check * minor fix * Add additional test for stop and start with docker * Fix cancelled * added comments * quick fix * finished pip issues * fix * fix storage error message, add example link to docs --------- Co-authored-by: Sheth <shethhriday29@berkeley.edu> Co-authored-by: Romil Bhardwaj <romil.bhardwaj@berkeley.edu> Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com> Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> Co-authored-by: Romil Bhardwaj <romil.bhardwaj@gmail.com>

* initial commit * newline * comments * run linter * reminder for down * tentatively done with example * formatting * yapf * [Storage] Storage mounting tool permissions fix (#3215) * fix permissions * fix permissions * [LLM] Example for Serving Gemma (#3207) * Add serve for gemma and fix mixtral dependency * Add hf token * fix model len * Add comment * Serve your private gemma * fix serve yaml * readme * Remove chat completion due to the wrong template * add readme * Update llm/gemma/README.md Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * address comments * Update README.md Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * Update llm/gemma/README.md Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * Update llm/gemma/README.md Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * Update llm/gemma/README.md Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * Change to it * Add chat API * use HF_TOKEN env * typo --------- Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * [LLM] Add logo for Gemma (#3220) * Minor fixes for release 0.5.0 (#3212) * when removing cudo credential, sky check fails * remove tips * minor hint fix * fix cluster version for k8s * fix typo * [Docker] Add retry for docker pull due to daemon not ready (#3218) * Add retry for docker pull due to daemon not ready * longer wait time * longer wait time * retry earlier * add retry for retries as well * longer wait time * change wait time * format * Add comment * Fix * Fix indent for azure docker config * Fix docker login config * Fix comments * More robust docker login config * Add retry for docker check * minor fix * Add additional test for stop and start with docker * Fix cancelled * added comments * quick fix * finished pip issues * fix * fix storage error message, add example link to docs * logging for SSH when doing kubernetes provision * romil edits * took out todo commnt * removed extra file * renamed file * restored right version of file * simplify things * newline * more formatting * formatting * minor fixes * set x and logging * fixes * docstr --------- Co-authored-by: Sheth <shethhriday29@berkeley.edu> Co-authored-by: Romil Bhardwaj <romil.bhardwaj@berkeley.edu> Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com> Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> Co-authored-by: Romil Bhardwaj <romil.bhardwaj@gmail.com>

* initial commit * newline * comments * run linter * reminder for down * tentatively done with example * formatting * yapf * [Storage] Storage mounting tool permissions fix (#3215) * fix permissions * fix permissions * [LLM] Example for Serving Gemma (#3207) * Add serve for gemma and fix mixtral dependency * Add hf token * fix model len * Add comment * Serve your private gemma * fix serve yaml * readme * Remove chat completion due to the wrong template * add readme * Update llm/gemma/README.md Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * address comments * Update README.md Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * Update llm/gemma/README.md Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * Update llm/gemma/README.md Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * Update llm/gemma/README.md Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * Change to it * Add chat API * use HF_TOKEN env * typo --------- Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * [LLM] Add logo for Gemma (#3220) * Minor fixes for release 0.5.0 (#3212) * when removing cudo credential, sky check fails * remove tips * minor hint fix * fix cluster version for k8s * fix typo * [Docker] Add retry for docker pull due to daemon not ready (#3218) * Add retry for docker pull due to daemon not ready * longer wait time * longer wait time * retry earlier * add retry for retries as well * longer wait time * change wait time * format * Add comment * Fix * Fix indent for azure docker config * Fix docker login config * Fix comments * More robust docker login config * Add retry for docker check * minor fix * Add additional test for stop and start with docker * Fix cancelled * added comments * quick fix * finished pip issues * fix * fix storage error message, add example link to docs * changed error message if default nc installed on mac * refactored check_port_forward_mode_dependencies function * update comment --------- Co-authored-by: Sheth <shethhriday29@berkeley.edu> Co-authored-by: Romil Bhardwaj <romil.bhardwaj@berkeley.edu> Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com> Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> Co-authored-by: Romil Bhardwaj <romil.bhardwaj@gmail.com>

Michaelvll added 5 commits February 21, 2024 21:28

Add serve for gemma and fix mixtral dependency

bf95401

Add hf token

28fa5ca

fix model len

3b57ba0

Add comment

b2520bd

Serve your private gemma

b35ba1d

Michaelvll changed the title ~~[LLM] Example for Gemma~~ [LLM] Example for Serving Gemma Feb 21, 2024

Michaelvll added 3 commits February 22, 2024 06:14

fix serve yaml

c939461

readme

31b2939

Remove chat completion due to the wrong template

580d834

Michaelvll marked this pull request as ready for review February 22, 2024 07:02

Michaelvll added 2 commits February 22, 2024 07:05

Merge branch 'master' of github.com:skypilot-org/skypilot into gemma

ee1688a

add readme

b927c37

Michaelvll requested a review from concretevitamin February 22, 2024 20:09

concretevitamin reviewed Feb 22, 2024

View reviewed changes

Update llm/gemma/README.md

a0037ff

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

concretevitamin approved these changes Feb 22, 2024

View reviewed changes

Michaelvll and others added 11 commits February 22, 2024 13:29

address comments

f9a2bb2

Merge branch 'gemma' of github.com:skypilot-org/skypilot into gemma

8a2870d

Update README.md

783d03c

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

Update llm/gemma/README.md

dd95daf

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

Update llm/gemma/README.md

6cc7586

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

Update llm/gemma/README.md

b3f2438

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

Change to it

6a9b40a

Merge branch 'gemma' of github.com:skypilot-org/skypilot into gemma

2933cc1

Add chat API

a4ea9a3

use HF_TOKEN env

dd97fbe

typo

8499481

Michaelvll merged commit dd4b14f into master Feb 23, 2024
19 checks passed

Michaelvll deleted the gemma branch February 23, 2024 01:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLM] Example for Serving Gemma #3207

[LLM] Example for Serving Gemma #3207

Michaelvll commented Feb 21, 2024 •

edited

concretevitamin left a comment

concretevitamin Feb 22, 2024

Michaelvll Feb 22, 2024

cblmemo Feb 24, 2024

concretevitamin left a comment

concretevitamin Feb 22, 2024

	1. Apply for the access to the Gemma model
	1. Apply for access to the Gemma model

[LLM] Example for Serving Gemma #3207

[LLM] Example for Serving Gemma #3207

Conversation

Michaelvll commented Feb 21, 2024 • edited

concretevitamin left a comment

Choose a reason for hiding this comment

concretevitamin Feb 22, 2024

Choose a reason for hiding this comment

Michaelvll Feb 22, 2024

Choose a reason for hiding this comment

cblmemo Feb 24, 2024

Choose a reason for hiding this comment

concretevitamin left a comment

Choose a reason for hiding this comment

concretevitamin Feb 22, 2024

Choose a reason for hiding this comment

Michaelvll commented Feb 21, 2024 •

edited