Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docker ENTRYPOINT to ensure proper argument handling #962

Merged
merged 3 commits into from
Jun 13, 2024

Conversation

shashankmangla
Copy link
Contributor

Summary

This PR updates the ENTRYPOINT instruction in the Dockerfile to ensure that additional arguments passed to the container via docker run are correctly appended to the entrypoint command.

Before the change:

Parameter model is not passed to the entrypoint command and the default model facebook/opt-125m is loaded instead.

> sudo docker run --runtime=nvidia --gpus all -p 8000:8000 my-outlines-image --model="microsoft/phi-2"

/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
INFO 06-12 14:45:46 llm_engine.py:161] Initializing an LLM engine (v0.5.0) with config: model='facebook/opt-125m', speculative_config=None, tokenizer='facebook/opt-125m', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=facebook/opt-125m)

After the change:

Parameter model is correctly passed to the entrypoint command

> sudo docker run --runtime=nvidia --gpus all -p 8000:8000 my-outlines-image --model="microsoft/phi-2"

/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
INFO 06-12 14:59:17 llm_engine.py:161] Initializing an LLM engine (v0.5.0) with config: model='microsoft/phi-2', speculative_config=None, tokenizer='microsoft/phi-2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=microsoft/phi-2)

@shashankmangla shashankmangla changed the title Fix: Update ENTRYPOINT to ensure proper argument handling Fix: Update docker ENTRYPOINT to ensure proper argument handling Jun 12, 2024
@rlouf rlouf changed the title Fix: Update docker ENTRYPOINT to ensure proper argument handling Update docker ENTRYPOINT to ensure proper argument handling Jun 13, 2024
@rlouf rlouf merged commit 1bdcaa5 into outlines-dev:main Jun 13, 2024
6 checks passed
@rlouf
Copy link
Member

rlouf commented Jun 13, 2024

Thank you for contributing!

@shashankmangla shashankmangla deleted the fix-docker-entrypoint branch June 13, 2024 10:07
@shashankmangla
Copy link
Contributor Author

@rlouf Thanks for reviewing and merging! Could you please clarify when the next release will be made or if the Release Docker workflow will be manually trigged?

@rlouf
Copy link
Member

rlouf commented Jun 13, 2024

Just ran the workflow!

fpgmaas pushed a commit to fpgmaas/outlines that referenced this pull request Jun 14, 2024
…-dev#962)

## Summary

This PR updates the `ENTRYPOINT` instruction in the Dockerfile to ensure
that additional arguments passed to the container via `docker run` are
correctly appended to the entrypoint command.

### Before the change:

Parameter `model` is not passed to the entrypoint command and the
default model `facebook/opt-125m` is loaded instead.

```bash
> sudo docker run --runtime=nvidia --gpus all -p 8000:8000 my-outlines-image --model="microsoft/phi-2"

/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
INFO 06-12 14:45:46 llm_engine.py:161] Initializing an LLM engine (v0.5.0) with config: model='facebook/opt-125m', speculative_config=None, tokenizer='facebook/opt-125m', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=facebook/opt-125m)
```

### After the change:

Parameter `model` is correctly passed to the entrypoint command

```bash
> sudo docker run --runtime=nvidia --gpus all -p 8000:8000 my-outlines-image --model="microsoft/phi-2"

/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
INFO 06-12 14:59:17 llm_engine.py:161] Initializing an LLM engine (v0.5.0) with config: model='microsoft/phi-2', speculative_config=None, tokenizer='microsoft/phi-2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=microsoft/phi-2)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants