-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shared volume using mountpoint-s3, permissions issues #1814
Comments
Maybe that would be better to use this volume in read only. So I would just need to make them available in the bucket before starting the process? Thanks :) |
Ok I managed to do what I want:
git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
aws s3 sync Mistral-7B-Instruct-v0.2 s3://<bucket_name>/Mistral-7B-Instruct-v0.2
text-generation-launcher --model-id=/data/Mistral-7B-Instruct-v0.2 --quantize bitsandbytes-nf4
2024-04-26T12:57:37.746280Z INFO text_generation_launcher: Args { model_id: "/data/Mistral-7B-Instruct-v0.2", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: Some(BitsandbytesNF4), speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_batch_size: None, enable_cuda_graphs: false, hostname: "text-generation-inference-58d9869995-gxzx2", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, disable_grammar_support: false, env: false }
2024-04-26T12:57:37.746720Z INFO download: text_generation_launcher: Starting download process.
2024-04-26T12:57:48.114689Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-04-26T12:57:50.144159Z INFO download: text_generation_launcher: Successfully downloaded weights.
2024-04-26T12:57:50.144763Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-04-26T12:58:00.242683Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-04-26T12:58:02.873865Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
rank=0
2024-04-26T12:58:02.873894Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 9 rank=0
2024-04-26T12:58:02.944252Z ERROR text_generation_launcher: Shard 0 failed to start
2024-04-26T12:58:02.944282Z INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart I have another error that might not be related. I'm gonna solve that before closing this issue |
Ok my first issue was caused by insufficient memory allocation. safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge |
Well I managed to download the model using the recommanded way with huggingface-cli huggingface-cli download mistralai/Mistral-7B-Instruct-v0.2
aws s3 sync /home/smana/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.2 s3://<bucket>/models--mistralai--Mistral-7B-Instruct-v0.2 When the pod starts I still have permissions errors :/ text-generation-launcher --model-id=mistralai/Mistral-7B-Instruct-v0.2 --quantize bitsandbytes-nf4
...
2024-04-26T15:37:48.725974Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
...
PermissionError: [Errno 1] Operation not permitted: '/data/models--mistralai--Mistral-7B-Instruct-v0.2/tmp_7e2fd113-2af9-4a1a-bf0e-22d328d4bc8b' |
It is working much better with an EFS storage, but I let this issue open in case someone is able to find out a solution for the S3 mountpoint. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
System Info
That's strange because it seems that the
root
user is allowed to do anything. I tried to create, delete files. The unique thing it can't do right now is to change existing permissions.First question, does the application run using another user? It doesn't seem to. Do yo see any reasons for this behavior?
Expected behavior
Downloading and running the application
The text was updated successfully, but these errors were encountered: