Skip to content
This repository has been archived by the owner on Dec 14, 2023. It is now read-only.

Feature request #57

Open
justinwking opened this issue Apr 24, 2023 · 9 comments
Open

Feature request #57

justinwking opened this issue Apr 24, 2023 · 9 comments
Labels
enhancement New feature or request

Comments

@justinwking
Copy link

Thank you, for making this. It seems to work, and I have a model.

I wanted to ask if there is:

  1. a link to a repository that we can use to generate videos with our new diffusion models, or a small example on how to do it with python or something like that.
  2. a way to specify the frame rate of the sample videos. Everything seems to sample at 6-8 fps, so the default 24fps videos seem too fast to really see what the sample video looks like.
  3. if we use a json file, do we also need to specify the video folder, or does the json's hyperlinks take care of that?

Thank you!

@kabachuha
Copy link
Contributor

Hi! As for the first point, there's a webui plugin for Auto1111 https://github.com/deforum-art/sd-webui-text2video with a GUI where you can specify anything for your generation. To convert your finetuned models to use in that GUI, use the script in this repo https://github.com/ExponentialML/Text-To-Video-Finetuning/blob/main/utils/convert_diffusers_to_original_ms_text_to_video.py

@justinwking
Copy link
Author

Thank you kabachuha, for the convert_diffusers_to_original_ms_text_to_video.py, what Arguments do I need to put in? Should I put the root folder of the model, or link directly to the bin files for the Unet and text encoder, and do I need to specify an output folder? Thank you!

@kabachuha
Copy link
Contributor

kabachuha commented Apr 24, 2023

python convert_diffusers_to_original_ms_text_to_video.py --model_path path-to-your-diffusers-model-folder --checkpoint_path text2video_pytorch_model.pth --clip_checkpoint_path clip.ckpt. Don't use this clip.ckpt, it's not converted well at the moment, and I need to remove it from requirements

@justinwking
Copy link
Author

So should I put in the clip checkpoint path and just not used the clip file that is created, or should I leave the clip checkpoint path blank?

@kabachuha
Copy link
Contributor

@justinwking use this branch for now, before it's merged https://github.com/kabachuha/Text-To-Video-Finetuning/tree/patch-1

@justinwking
Copy link
Author

justinwking commented Apr 24, 2023

Sorry to ask such basic questions..... but I couldn't find the files you suggested I include. So I am guessing they have a different name, at the bottom of this post, I created an interpretation of what I think you meant, please correct me if I am mistaken. If this is my folder structure....

Text to video Fine Tuning

- [ ] Models
   - Model_scope_diffusers
        - Scheduler
        - Text_encoder
        - Tokenizer
        - Unet 
        - Vae 
``- [ ] Outputs
    - Train 2023….
        - Cached Latents
        - CHECKPOINT 2500
        - Checkpoint 5000
            - Scheduler
            - Text-encoder
            - Tokenizer
            - Unet
            - Vae
        - Lora
        - Samples

Does the following command look correct if I do everything from the text_to_finetuning folder....

python .Utils/convert_diffusers_to_original_ms_text_to_video.py --model_path models/model_scope_diffusers/ --checkpoint_path outputs/Train2003…/Lora/5000_unet.pt --clip_checkpoint_path outputs/Train2003…/Lora/5000_text_encoder.pt

@kabachuha
Copy link
Contributor

Use this folder as models_path "./Outputs/Train 2023…./Checkpoint 5000"

@justinwking
Copy link
Author

Good morning, I believe I was able to get the script to work with your instructions, but I didn't see a new folder created. What do I need to do to get this into a format and location that t2v can use? All the file names are different, and the folder structures is different. Is this something that the script could do?

@justinwking
Copy link
Author

justinwking commented Apr 25, 2023

I haven't been able to find a readme that explains the process, maybe there is one that I overlooked.

the following was generated when I did the training

Configuration saved in ./outputs\train_2023-04-24T00-05-34\vae\config.json
Model weights saved in ./outputs\train_2023-04-24T00-05-34\vae\diffusion_pytorch_model.bin
Configuration saved in ./outputs\train_2023-04-24T00-05-34\unet\config.json
Model weights saved in ./outputs\train_2023-04-24T00-05-34\unet\diffusion_pytorch_model.bin
Configuration saved in ./outputs\train_2023-04-24T00-05-34\scheduler\scheduler_config.json
Configuration saved in ./outputs\train_2023-04-24T00-05-34\model_index.json
04/24/2023 06:13:39 - INFO - main - Saved model at ./outputs\train_2023-04-24T00-05-34 on step 10000

then I put in the command,

(text2video-finetune) python ./Utils/convert_diffusers_to_original_ms_text_to_video.py --model_path "./Outputs/train_2023-04-24T00-05-34/Checkpoint-10000"--checkpoint_path "./Outputs/train_2023-04-24T00-05-34/Lora/10000_unet.pt" --clip_checkpoint_path "./Outputs/train_2023-04-24T00-05-34/Lora/10000_text_encoder.pt"

and the process worked, but I don't know where the new UNET is...

Saving UNET
Operation successfull

But now.... I don't see anything that looks like the modelscope folder that I am currently using in Automatic1111

configuration.json
open_clip_pytorch_model.bin
README.md
text2video_pytorch_model.pth
VQGAN_autoencoder.pth

@ExponentialML ExponentialML added the enhancement New feature or request label Jun 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants