Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train several resolutions at same time. #130

Open
tkgix opened this issue Jan 29, 2023 · 11 comments
Open

train several resolutions at same time. #130

tkgix opened this issue Jan 29, 2023 · 11 comments
Labels
enhancement New feature or request

Comments

@tkgix
Copy link

tkgix commented Jan 29, 2023

Thank you for your hard work.

Currently, Bucket systems are coded to fix only one resolution.
If possible, I would like to train images of different resolutions at the same time.
For example, 512 x 512, 576 x 1024, 1024 x 1024, 760 x 1280

Unfortunately, even if you turn off the enable_bucket option, it is being forced to resize.

I want to train a particular area further with a small image, but it is trained in large sizes due to forced resizing, and as a result, the learning quality of everything deteriorates on average.

I would like to be able to completely turn off bucket and force resize.

@kohya-ss kohya-ss added the enhancement New feature or request label Jan 30, 2023
@kohya-ss
Copy link
Owner

This is very interesting. I think it is not difficult technically.

One problem is that the batch size needs to be variable to make efficient use of memory. However, I believe that will be possible.

I will consider how to implement this feature in near future.

@tkgix
Copy link
Author

tkgix commented Jan 30, 2023

日本語でよろしいでしょうか。

私は主に画風を学習させています。
全身画像を1024解像度目安に、バケッティングに合ったサイズにして学習させます。
こんなときに顔の表現(特に目)が低下することが多いので、
顔の近くだけ256または512解像度の目安でクロップして一緒にデータセットに入れます。
( コードからリサイジングをする部分は取り外して使用しています )
これらの方法は、目の部分の学習改善に大いに役立ちます。

できれば一つのデータセットから解像度(ex: 512、768、1024)ごとに分類できるようにした上で、一つのバケットに仕上げるのがベストではないかと思います。

サイズ別にbatchを作る必要性についてはよく分からない部分なので、batchに関することは私が議論する事項ではないようです。

repoが変更されるたびに手動でコードを修正していたので提案してみました。
良い参考になればと思います。

@Starlento
Copy link

Starlento commented Jan 30, 2023

日本語でよろしいでしょうか。

私は主に画風を学習させています。 全身画像を1024解像度目安に、バケッティングに合ったサイズにして学習させます。 こんなときに顔の表現(特に目)が低下することが多いので、 顔の近くだけ256または512解像度の目安でクロップして一緒にデータセットに入れます。 ( コードからリサイジングをする部分は取り外して使用しています ) これらの方法は、目の部分の学習改善に大いに役立ちます。

できれば一つのデータセットから解像度(ex: 512、768、1024)ごとに分類できるようにした上で、一つのバケットに仕上げるのがベストではないかと思います。

サイズ別にbatchを作る必要性についてはよく分からない部分なので、batchに関することは私が議論する事項ではないようです。

repoが変更されるたびに手動でコードを修正していたので提案してみました。 良い参考になればと思います。

失礼します。I have some questions related to style training as there seems only character training advice available online.
Is it usually postive to use high resolution? You mention that you are using 1024x1024 in some cases, but the "mainstream" seems to be 768x768?
About the tags, should the tags be pruned, just like the training for characters? For example, if the character is wearing "serafuku" then I should remove skirt, shirt tags...
And what sensitivity of tags is suggested (I mean the threshold for WD1.4 tagger)?
You can answer in Japanese, I think I can understand...
(the reason I do not ask in Japanese is that my 敬語 is really poor)

@tkgix
Copy link
Author

tkgix commented Jan 30, 2023

日本語でよろしいでしょうか。
私は主に画風を学習させています。 全身画像を1024解像度目安に、バケッティングに合ったサイズにして学習させます。 こんなときに顔の表現(特に目)が低下することが多いので、 顔の近くだけ256または512解像度の目安でクロップして一緒にデータセットに入れます。 ( コードからリサイジングをする部分は取り外して使用しています ) これらの方法は、目の部分の学習改善に大いに役立ちます。
できれば一つのデータセットから解像度(ex: 512、768、1024)ごとに分類できるようにした上で、一つのバケットに仕上げるのがベストではないかと思います。
サイズ別にbatchを作る必要性についてはよく分からない部分なので、batchに関することは私が議論する事項ではないようです。
repoが変更されるたびに手動でコードを修正していたので提案してみました。 良い参考になればと思います。

失礼します。I have some questions related to style training as there seems only character training advice available online. Is it usually postive to use high resolution? You mention that you are using 1024x1024 in some cases, but the "mainstream" seems to be 768x768? About the tags, should the tags be pruned, just like the training for characters? For example, if the character is wearing "serafuku" then I should remove skirt, shirt tags... And what sensitivity of tags is suggested (I mean the threshold for WD1.4 tagger)? You can answer in Japanese, I think I can understand... (the reason I do not ask in Japanese is that my 敬語 is really poor)

Training with 1024 takes a very very very long training time.
In addition, the learned model can only output large resolution well, and rather, it does not produce low resolution images well.
In addition, the source of the dataset to be learned must be greater than 1024 resolution, and smaller images are difficult to use. If necessary, it needs to be upscaled through a separate tool (ex: waifu2x).
The output shows good results with large resolution.
Especially, the detailed expression becomes stronger.
I think this is a matter of choice because the shortcomings and advantages are clear.

I think it is enough to remove only unique tags (category 4 of WD Tagger) such as character names.

I adopt 0.3 as the threshold for WD Tagger.
0.25 is also good, but sometimes you get the wrong result.

I hope this helps you.

@Starlento
Copy link

Starlento commented Jan 30, 2023

日本語でよろしいでしょうか。
私は主に画風を学習させています。 全身画像を1024解像度目安に、バケッティングに合ったサイズにして学習させます。 こんなときに顔の表現(特に目)が低下することが多いので、 顔の近くだけ256または512解像度の目安でクロップして一緒にデータセットに入れます。 ( コードからリサイジングをする部分は取り外して使用しています ) これらの方法は、目の部分の学習改善に大いに役立ちます。
できれば一つのデータセットから解像度(ex: 512、768、1024)ごとに分類できるようにした上で、一つのバケットに仕上げるのがベストではないかと思います。
サイズ別にbatchを作る必要性についてはよく分からない部分なので、batchに関することは私が議論する事項ではないようです。
repoが変更されるたびに手動でコードを修正していたので提案してみました。 良い参考になればと思います。

失礼します。I have some questions related to style training as there seems only character training advice available online. Is it usually postive to use high resolution? You mention that you are using 1024x1024 in some cases, but the "mainstream" seems to be 768x768? About the tags, should the tags be pruned, just like the training for characters? For example, if the character is wearing "serafuku" then I should remove skirt, shirt tags... And what sensitivity of tags is suggested (I mean the threshold for WD1.4 tagger)? You can answer in Japanese, I think I can understand... (the reason I do not ask in Japanese is that my 敬語 is really poor)

Training with 1024 takes a very very very long training time. In addition, the learned model can only output large resolution well, and rather, it does not produce low resolution images well. In addition, the source of the dataset to be learned must be greater than 1024 resolution, and smaller images are difficult to use. If necessary, it needs to be upscaled through a separate tool (ex: waifu2x). The output shows good results with large resolution. Especially, the detailed expression becomes stronger. I think this is a matter of choice because the shortcomings and advantages are clear.

I think it is enough to remove only unique tags (category 4 of WD Tagger) such as character names.

I adopt 0.3 as the threshold for WD Tagger. 0.25 is also good, but sometimes you get the wrong result.

I hope this helps you.

Thank you very much, your suggestion is really detailed. One thing I want to clearify about the "large resolution" in practice.
In most cases, people tend to use highres.fix (e.g. 512x640->1024x1280), is this match "shows good results with large resolution"? Or it is 1024x1280 directly. I am OK with both, but I am concerning that the reason to use highres.fix is that the models cannot output satisfying results directly using a highres (multiple girls will appear), can this be changed if you apply a certain LoRA model?
In another word, is there some mismatch between "models are for lowres" and "your lora is for highres"?

@tkgix
Copy link
Author

tkgix commented Jan 30, 2023

It's output directly at high resolution. I used hiresfix before, but I didn't get satisfactory results in the process. (a bit of blurry) so I'm using high-resolution learning and output.

@mimimi999
Copy link

I would like to confirm, but if I select 512 as the training resolution and the training images contain images of a size larger than that, will the training quality be affected?

Also, I saw on a bulletin board somewhere that you can learn without problems even if all the image sizes are different.Is this correct information?

@tkgix
Copy link
Author

tkgix commented Jan 31, 2023

I would like to confirm, but if I select 512 as the training resolution and the training images contain images of a size larger than that, will the training quality be affected?

Also, I saw on a bulletin board somewhere that you can learn without problems even if all the image sizes are different.Is this correct information?

If the sizes are different, there is no problem with proceeding with the learning, but it affects the quality.

If the size is larger than the training resolution, there is a slight degradation in quality. Because it is resized by cv2.INTER_AREA in the training process. It is recommended that you reduce it in advance with other graphics tools (e.g., Photoshop). Alternatively, it is recommended to modify the code to a better downscaler (ex: cv2.BICUBIC).

If the size is smaller than the training resolution, there is a significant degradation in quality. In this case, it is recommended to remove the corresponding image or to zoom in to the upscaler and then downscale again.

Unless you're very sensitive about the quality of your dataset, you don't have to go through this process.

@mimimi999
Copy link

Thank you for your comment.

I would like to confirm one more thing, but if a model trained with 512 × 512 images is partially additionally trained with a higher training resolution (768,1024), the effect of increasing the resolution of the generated image is Is there?

@tkgix
Copy link
Author

tkgix commented Jan 31, 2023

Thank you for your comment.

I would like to confirm one more thing, but if a model trained with 512 × 512 images is partially additionally trained with a higher training resolution (768,1024), the effect of increasing the resolution of the generated image is Is there?

I don't think there's an effect of improving the resolution of the entire model.

bmaltais added a commit to bmaltais/kohya_ss that referenced this issue Feb 6, 2023
    - ``--bucket_reso_steps`` and ``--bucket_no_upscale`` options are added to training scripts (fine tuning, DreamBooth, LoRA and Textual Inversion) and ``prepare_buckets_latents.py``.
    - ``--bucket_reso_steps`` takes the steps for buckets in aspect ratio bucketing. Default is 64, same as before.
        - Any value greater than or equal to 1 can be specified; 64 is highly recommended and a value divisible by 8 is recommended.
        - If less than 64 is specified, padding will occur within U-Net. The result is unknown.
        - If you specify a value that is not divisible by 8, it will be truncated to divisible by 8 inside VAE, because the size of the latent is 1/8 of the image size.
    - If ``--bucket_no_upscale`` option is specified, images smaller than the bucket size will be processed without upscaling.
        - Internally, a bucket smaller than the image size is created (for example, if the image is 300x300 and ``bucket_reso_steps=64``, the bucket is 256x256). The image will be trimmed.
        - Implementation of [#130](kohya-ss/sd-scripts#130).
        - Images with an area larger than the maximum size specified by ``--resolution`` are downsampled to the max bucket size.
    - Now the number of data in each batch is limited to the number of actual images (not duplicated). Because a certain bucket may contain smaller number of actual images, so the batch may contain same (duplicated) images.
    - ``--random_crop`` now also works with buckets enabled.
        - Instead of always cropping the center of the image, the image is shifted left, right, up, and down to be used as the training data. This is expected to train to the edges of the image.
        - Implementation of discussion [#34](kohya-ss/sd-scripts#34).
@pastuh
Copy link

pastuh commented May 22, 2023

Would be nice to train high resolution image like in tiles, except it should understand which object to render.
1: train all image
2: train specified parts/tiles (maybe coordinates, or based on prompts)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants