Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a function to upload to Huggingface and resume from Huggingface. #348

Merged
merged 11 commits into from
Apr 5, 2023

Conversation

ddPn08
Copy link
Contributor

@ddPn08 ddPn08 commented Mar 30, 2023

モデルの途中出力、最終出力、stateをhuggingfaceの指定したリポジトリに自動でアップロードする機能を追加しました。

  • --resume_from_huggingfaceについて
    これを有効にした場合、--resumeで指定された情報をもとにhuggingfaceのリポジトリのフォルダからresumeします。
    --resumeの形式は--resume {repo_id}/{path_in_repo}:{revision}:{repo_type}になります。
    例) --resume_from_huggingface --resume ddpn08/kohya-test/locons/test-locon-000002-state:main:model

Added a function to automatically upload the model's intermediate output, final output, and state to the repository specified by huggingface.

  • About --resume_from_huggingface
    If this is enabled, it will resume from the huggingface repository folder based on the information specified with --resume.
    The format of --resume is --resume {repo_id}/{path_in_repo}:{revision}:{repo_type}.
    ex) --resume_from_huggingface --resume ddpn08/kohya-test/locons/test-locon-000002-state:main:model

@ddPn08 ddPn08 changed the title Added a function to upload to Huggingface and resume from Huggingface. / Huggingfaceにアップロード&Huggingfaceからresumeする機能を追加。 Added a function to upload to Huggingface and resume from Huggingface. Mar 30, 2023
@kohya-ss
Copy link
Owner

素晴らしい機能追加、ありがとうございます。(私自身では使わないとは思いますが)たいへん有意義かと思います。時間でき次第、レビューいたします。

ところでtrain_util.pyのdiffがすごく大きくなっていて、恐らく自動成型したのだと思いますが、お手数ですがよろしければ1行の長さを132文字にしていただけますでしょうか。

Thanks for the great work! It will be very useful (although I won't be using it myself). I will review it as soon as I have time.

By the way, the diff in train_util.py is very large, and I think it was probably caused by auto-formatting, but if you don't mind, could you please make the line-length to 132?

@ddPn08
Copy link
Contributor Author

ddPn08 commented Mar 30, 2023

あ!ほんとですね、自分の環境のフォーマッタを効かせてしまっていました;
修正します!

Copy link
Owner

@kohya-ss kohya-ss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

お手数ですが可能ならご対応いただければ幸いです。お忙しいようでしたら私の方でのちほどマージ時に対応します。

I would appreciate it if you could update this if possible. If you are busy, I will update when merging it later.

library/train_util.py Outdated Show resolved Hide resolved
train_network.py Outdated Show resolved Hide resolved
train_network.py Outdated Show resolved Hide resolved
@ddPn08
Copy link
Contributor Author

ddPn08 commented Apr 1, 2023

ご指摘いただいた箇所の修正とtrain_network.py以外の対応を行いました。


I have corrected the points you pointed out and dealt with other than train_network.py.

@kohya-ss
Copy link
Owner

kohya-ss commented Apr 2, 2023

修正ありがとうございます。かなり良い感じですね。こちらでテストしてたのですが、最後にモデルをアップロードするときに、以下のエラーが発生してアップロードされないようです(当方Windows環境でテストしています)。恐らく別スレッドでのアップロード中にスクリプトが終了してエラーになっているのかと思いますが、ちょっと解決方法が分かりませんでした。なにかアイデアはありますでしょうか。

Thanks for the update! It looks pretty good. I was testing it in my env, but when uploading the last model, the following error occurs and it doesn't seem to upload (I'm testing in a Windows environment). Perhaps the script terminated during the upload in another thread, causing the error, but I couldn't figure out a solution for a moment. Any ideas?

100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:01<00:00, 20.55it/s]
save trained model to D:\Work\SD\Diffusers-DB\models\LoRA\test\test_frog4.safetensors██| 30/30 [00:01<00:00, 20.71it/s]
model saved.
steps: 100%|██████████████████████████████████████████████████████████████| 19/19 [01:15<00:00,  4.00s/it, loss=0.0917]
Exception in thread Thread-2 (upload):
Traceback (most recent call last):
  File "C:\Users\hogehoge\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\hogehoge\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "D:\Work\SD\dev\sd-scripts\library\huggingface_util.py", line 49, in upload
    api.upload_file(
  File "D:\Work\SD\dev\sd-scripts\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)
  File "D:\Work\SD\dev\sd-scripts\venv\lib\site-packages\huggingface_hub\hf_api.py", line 2593, in upload_file
    commit_info = self.create_commit(
  File "D:\Work\SD\dev\sd-scripts\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)
  File "D:\Work\SD\dev\sd-scripts\venv\lib\site-packages\huggingface_hub\hf_api.py", line 2411, in create_commit
    upload_lfs_files(
  File "D:\Work\SD\dev\sd-scripts\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)
  File "D:\Work\SD\dev\sd-scripts\venv\lib\site-packages\huggingface_hub\_commit_api.py", line 351, in upload_lfs_files
    thread_map(
  File "D:\Work\SD\dev\sd-scripts\venv\lib\site-packages\tqdm\contrib\concurrent.py", line 94, in thread_map
    return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)
  File "D:\Work\SD\dev\sd-scripts\venv\lib\site-packages\tqdm\contrib\concurrent.py", line 76, in _executor_map
    return list(tqdm_class(ex.map(fn, *iterables, **map_args), **kwargs))
  File "C:\Users\hogehoge\AppData\Local\Programs\Python\Python310\lib\concurrent\futures\_base.py", line 610, in map
    fs = [self.submit(fn, *args) for args in zip(*iterables)]
  File "C:\Users\hogehoge\AppData\Local\Programs\Python\Python310\lib\concurrent\futures\_base.py", line 610, in <listcomp>
    fs = [self.submit(fn, *args) for args in zip(*iterables)]
  File "C:\Users\hogehoge\AppData\Local\Programs\Python\Python310\lib\concurrent\futures\thread.py", line 169, in submit
    raise RuntimeError('cannot schedule new futures after '
RuntimeError: cannot schedule new futures after interpreter shutdown

Copy link
Owner

@kohya-ss kohya-ss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同期呼び出しにすれば問題なく動作するようですが、この方向はいかがでしょうか。

It seems to work fine if we make a synchronous call, how about this idea?

library/huggingface_util.py Outdated Show resolved Hide resolved
@ddPn08
Copy link
Contributor Author

ddPn08 commented Apr 3, 2023

非同期の問題も修正しました。--async_uploadで非同期処理に変更できます。また、最後のアップロードはオプションを無視して同期になります。


Also fixed the async issue. You can change to asynchronous processing with --async_upload. Also, the last upload will be synchronous ignoring the option.

@kohya-ss
Copy link
Owner

kohya-ss commented Apr 3, 2023

ありがとうございます! 他のPRも片付きましたので、時間でき次第、確認しマージします。

Thank you very much! I've finished the other PRs and will check and merge them as soon as I have time.

@kohya-ss kohya-ss merged commit 74220bb into kohya-ss:dev Apr 5, 2023
@kohya-ss
Copy link
Owner

kohya-ss commented Apr 5, 2023

最終のstateがアップロードされないことにマージ後に気づきましたので、機能追加させていただきました。またデフォルトの可視性をprivateに変更しました。その他、XTIへの追加など細かい修正をしています。お気づきの点があればお教えください。よろしくお願いいたします。

I noticed after merging that the final state isn't uploaded, so I have added the feature. I also changed the default visibility of the repo to private, and added some other minor fixes, including additions to XTI. Please let me know if you notice any issues. Thanks!

@bmaltais bmaltais mentioned this pull request Apr 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants