Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: On generation, It samples the tensors twice and vid2vid doesn't work. #62

Closed
2 tasks done
Grendar1 opened this issue Mar 24, 2023 · 6 comments
Closed
2 tasks done
Assignees
Labels
bug Something isn't working

Comments

@Grendar1
Copy link

Grendar1 commented Mar 24, 2023

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

  • I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

Every time I click "generate" it tries to sample the tensors two times instead of only once. That means I have to wait double the time for only one video. I understand vid2vid got added, but even so, when I tried to do vid2vid generation, It outputs only from the text2video tab, even if samples the tensors two times.

Steps to reproduce the problem

For text2video and vid2vid

  1. Go to ModelScope text2video
  2. Add a prompt, for example "sunrise from tokyo, by makoto shinkai"
  3. Click the yellow "Generate" button.
  4. Waiting twice the time.
    For vid2vid
  5. Add a video that i got generated from text2video
  6. Add a prompt, for example "a boy with sunglesses"
  7. Click generate and still waiting twice the time because it does the sampling of the tesnsors twice.
  8. Finding out that vid2vid doesn't generate from the vid2vid tab, but from text2video (which i left blank and outputs something else, like a tortoise underwater)

What should have happened?

It should sample the tensors only once if using only text2video
It should sample the tensors twice if I make add the prompts for both text2video and vid2vid.
It should sample the tensors once if only vid2vid was selected.

WebUI and Deforum extension Commit IDs

webui commit id - commit: a9fed7c3
txt2vid commit id -//github.com/deforum-art/sd-webui-modelscope-text2video.git | 8402005 (Fri Mar 24 14:49:52 2023)

What GPU were you using for launching?

RTX 3060 12GB VRAM, 16 GB Ram.

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows)

Settings

--xformers --no-half-vae --api
I didn't change nothing, I just added the prompts, everything is at default with fp16 enabled for the gpu

Console logs

ModelScope text2video extension for auto1111 webui
Git commit: 84020058 (Fri Mar 24 14:49:52 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
device cuda
Working in txt2vid mode
latents torch.Size([1, 4, 24, 32, 32]) tensor(-0.0012, device='cuda:0') tensor(1.0001, device='cuda:0')
DDIM sampling tensor(1): 100%|███████████████████████████████████████| 31/31 [00:41<00:00,  1.33s/it]
STARTING VAE ON GPU. 24 CHUNKS TO PROCESS
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([24, 3, 256, 256])
output/mp4s/20230324_215112403414.mp4
  0%|                                                                          | 0/1 [00:00<?, ?it/s]latents torch.Size([1, 4, 24, 32, 32]) tensor(-0.0007, device='cuda:0') tensor(1.0037, device='cuda:0')DDIM sampling tensor(1): 100%|███████████████████████████████████████| 31/31 [00:41<00:00,  1.34s/it]
STARTING VAE ON GPU. 24 CHUNKS TO PROCESS████████████████████████████| 31/31 [00:41<00:00,  1.34s/it]
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([24, 3, 256, 256])
output/mp4s/20230324_215201616361.mp4
text2video finished, saving frames to C:\Stable Diffusion\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230324215000
Got a request to stitch frames to video using FFmpeg.
Frames:
C:\Stable Diffusion\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230324215000\%06d.png
To Video:
C:\Stable Diffusion\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230324215000\vid.mp4
Stitching *video*...
Stitching *video*...
Video stitching done in 0.26 seconds!
t2v complete, result saved at C:\Stable Diffusion\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230324215000

Additional information

No response

@Grendar1 Grendar1 added the bug Something isn't working label Mar 24, 2023
@Grendar1 Grendar1 changed the title [Bug]: [Bug]: On generation, It samples the tensors twice and vid2vid doesn't work. Mar 24, 2023
@toyxyz
Copy link

toyxyz commented Mar 24, 2023

Same here

@kabachuha
Copy link
Owner

kabachuha commented Mar 24, 2023

Have you played around with Denoising strength?

If it's at 1, it means full change

@jav12z
Copy link

jav12z commented Mar 24, 2023

Have you played around with Denoising strength?

If it's at 1, it means full change

It doesn't work, even at Denoising strength 0 it gives a video not related with the sample

@Compviztr
Copy link

Same. Vid2vid uses the txt2vid input and doesn’t appear to use the uploaded video.

@hithereai
Copy link
Contributor

I can confirm that vid2vid doesn't work guys. Please revert to an older version or wait for our fix, but it might take 24 hours or more since it's the weekend and we need some time off <3

Watch out for updates anyways. And thanks for providing feedback!

@toyxyz
Copy link

toyxyz commented Mar 25, 2023

When I changed it to the commit of Mar 23, 2023, vid2vid works. It seems that the update after that is the cause of the problem. 1b0385a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants