Can't load local flan-small models due to weight conversion failure 

### System Info

OS Version: 
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.3 LTS
Release:        20.04
Codename:       focal

8 A-100 GPUS

Using latest text-generation-inference docker version. 

I've run fine-tuning on a [Flan-T5-Small](https://huggingface.co/google/flan-t5-small) model and saved the checkpoint in my local directory. I've stored this local model checkpoint in my data2 volume and run the command as follows:
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data2 ghcr.io/huggingface/text-generation-inference:0.9 --model-id /data2/checkpoint-20 --num-shard $num_shard

But I run into errors with the converting weights as mentioned below. 

### Information

- [X] Docker
- [ ] The CLI directly

### Tasks

- [X] An officially supported command
- [ ] My own modifications

### Reproduction

Run docker command above. 

I get this error now: 

2023-07-12T05:45:31.707548Z  INFO text_generation_launcher: Args { model_id: "/data2/checkpoint-20", revision: None, sharded: None, num_shard: Some(2), quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: 16000, max_waiting_tokens: 20, hostname: "0341f92fe465", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_domain: None, ngrok_username: None, ngrok_password: None, env: false }
2023-07-12T05:45:31.707602Z  INFO text_generation_launcher: Sharding model on 2 processes
2023-07-12T05:45:31.707781Z  INFO text_generation_launcher: Starting download process.
2023-07-12T05:45:33.261253Z  WARN download: text_generation_launcher: No safetensors weights found for model /data2/checkpoint-20 at revision None. Converting PyTorch weights to safetensors.

2023-07-12T05:45:33.711218Z ERROR text_generation_launcher: Download encountered an error: Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 164, in download_weights
    utils.convert_files(local_pt_files, local_st_files)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 53, in convert_files
    convert_file(pt_file, sf_file)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 21, in convert_file
    if "state_dict" in loaded:

TypeError: argument of type 'Seq2SeqTrainingArguments' is not iterable


Error: DownloadError


### Expected behavior

I would expect the local model to load as do the models from the hugging-face library. Appreciate any help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can't load local flan-small models due to weight conversion failure #589

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Can't load local flan-small models due to weight conversion failure #589

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions