Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add keep_tokens_separator as alternative for keep_tokens #975

Merged
merged 3 commits into from
Dec 12, 2023

Conversation

Linaqruf
Copy link
Contributor

Hi, great job as always.

I propose this feature to be added; it's inspired by NovelAI tagging. They train their model by putting some important tags at the head of the tags and shuffle the rest.

Got this from their docs:

1boy, 1girl, characters, series, everything else in any order

And this is also confirmed by finetunej.
image

And we know that some Danbooru images have more than one tag in tag_character_string and tag_copyright_string, as well as some of them having both 1boy, 1girl in one picture, so using keep_tokens alone is not effective to 'mimic' NovelAI tagging.

The keep_tokens_separator is proposed so we can keep tokens from being shuffled for different captions.

For example:

keep_tokens_separator = "|||"
  • caption 1
1girl, frieren, sousou no frieren, cyan yu, ||| rating: general, black footwear, black pantyhose, blue flower, boots, capelet, closed mouth, earrings, elf, flower, green eyes, grey hair, hugging own legs, jewelry, long hair, long sleeves, looking at viewer, pantyhose, pointy ears, sidelocks, simple background, sitting, solo, thick eyebrows, twintails, white background, white capelet, absurdres, highres, medium quality
  • caption 2
1boy, 1girl, linie (sousou no frieren), lugner (sousou no frieren), sousou no frieren, fujimoto kouki, ||| rating: general, arms at sides, arms behind back, black coat, blonde hair, boots, brown dress, brown ribbon, closed mouth, coat, demon boy, demon girl, demon horns, dress, facing away, flower, from behind, hair ribbon, horns, long hair, long sleeves, orange hair, outdoors, own hands together, pink flower, puffy sleeves, ribbon, sky, standing, tree, twintails, white footwear, wide sleeves, commentary request, highres, official art, promotional art, normal quality

Haven't tested for fine-tuning but I train some LoRA with this separator

link to model | link to datasets (5.65gb)

Image 1 Image 2

Thank you!

@Linaqruf
Copy link
Contributor Author

Linaqruf commented Dec 1, 2023

Btw I forgot to thanks @KohakuBlueleaf for the idea, I probably would add new key for keep_tokens in the JSON file without his idea for shuffle separator. ✌️

image

@kohya-ss kohya-ss merged commit 034a49c into kohya-ss:dev Dec 12, 2023
1 check passed
@kohya-ss
Copy link
Owner

Thank you for this! I noticed a problem after merging and modified it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants