[Bug]: On generation, It samples the tensors twice and vid2vid doesn't work. #62

Grendar1 · 2023-03-24T22:10:51Z

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

Every time I click "generate" it tries to sample the tensors two times instead of only once. That means I have to wait double the time for only one video. I understand vid2vid got added, but even so, when I tried to do vid2vid generation, It outputs only from the text2video tab, even if samples the tensors two times.

Steps to reproduce the problem

For text2video and vid2vid

Go to ModelScope text2video
Add a prompt, for example "sunrise from tokyo, by makoto shinkai"
Click the yellow "Generate" button.
Waiting twice the time.
For vid2vid
Add a video that i got generated from text2video
Add a prompt, for example "a boy with sunglesses"
Click generate and still waiting twice the time because it does the sampling of the tesnsors twice.
Finding out that vid2vid doesn't generate from the vid2vid tab, but from text2video (which i left blank and outputs something else, like a tortoise underwater)

What should have happened?

It should sample the tensors only once if using only text2video
It should sample the tensors twice if I make add the prompts for both text2video and vid2vid.
It should sample the tensors once if only vid2vid was selected.

WebUI and Deforum extension Commit IDs

webui commit id - commit: a9fed7c3
txt2vid commit id -//github.com/deforum-art/sd-webui-modelscope-text2video.git | 8402005 (Fri Mar 24 14:49:52 2023)

What GPU were you using for launching?

RTX 3060 12GB VRAM, 16 GB Ram.

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows)

Settings

--xformers --no-half-vae --api
I didn't change nothing, I just added the prompts, everything is at default with fp16 enabled for the gpu

Console logs

ModelScope text2video extension for auto1111 webui
Git commit: 84020058 (Fri Mar 24 14:49:52 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
device cuda
Working in txt2vid mode
latents torch.Size([1, 4, 24, 32, 32]) tensor(-0.0012, device='cuda:0') tensor(1.0001, device='cuda:0')
DDIM sampling tensor(1): 100%|███████████████████████████████████████| 31/31 [00:41<00:00,  1.33s/it]
STARTING VAE ON GPU. 24 CHUNKS TO PROCESS
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([24, 3, 256, 256])
output/mp4s/20230324_215112403414.mp4
  0%|                                                                          | 0/1 [00:00<?, ?it/s]latents torch.Size([1, 4, 24, 32, 32]) tensor(-0.0007, device='cuda:0') tensor(1.0037, device='cuda:0')DDIM sampling tensor(1): 100%|███████████████████████████████████████| 31/31 [00:41<00:00,  1.34s/it]
STARTING VAE ON GPU. 24 CHUNKS TO PROCESS████████████████████████████| 31/31 [00:41<00:00,  1.34s/it]
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([24, 3, 256, 256])
output/mp4s/20230324_215201616361.mp4
text2video finished, saving frames to C:\Stable Diffusion\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230324215000
Got a request to stitch frames to video using FFmpeg.
Frames:
C:\Stable Diffusion\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230324215000\%06d.png
To Video:
C:\Stable Diffusion\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230324215000\vid.mp4
Stitching *video*...
Stitching *video*...
Video stitching done in 0.26 seconds!
t2v complete, result saved at C:\Stable Diffusion\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230324215000

Additional information

No response

The text was updated successfully, but these errors were encountered:

toyxyz · 2023-03-24T22:38:45Z

Same here

kabachuha · 2023-03-24T22:50:05Z

Have you played around with Denoising strength?

If it's at 1, it means full change

jav12z · 2023-03-24T23:50:30Z

Have you played around with Denoising strength?

If it's at 1, it means full change

It doesn't work, even at Denoising strength 0 it gives a video not related with the sample

Compviztr · 2023-03-25T01:42:21Z

Same. Vid2vid uses the txt2vid input and doesn’t appear to use the uploaded video.

hithereai · 2023-03-25T02:08:29Z

I can confirm that vid2vid doesn't work guys. Please revert to an older version or wait for our fix, but it might take 24 hours or more since it's the weekend and we need some time off <3

Watch out for updates anyways. And thanks for providing feedback!

toyxyz · 2023-03-25T07:40:18Z

When I changed it to the commit of Mar 23, 2023, vid2vid works. It seems that the update after that is the cause of the problem. 1b0385a

Grendar1 added the bug Something isn't working label Mar 24, 2023

Grendar1 changed the title ~~[Bug]:~~ [Bug]: On generation, It samples the tensors twice and vid2vid doesn't work. Mar 24, 2023

kabachuha assigned hithereai Mar 24, 2023

kabachuha mentioned this issue Mar 24, 2023

[Bug]: When I upload the video in Vid2VId tab it works as txt2vid not vid2vid. #63

Closed

2 tasks

kabachuha added this to the Fix stuff broken by recent plugin/Gradio updates milestone Mar 25, 2023

kabachuha closed this as completed in 0f9daba Mar 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: On generation, It samples the tensors twice and vid2vid doesn't work. #62

[Bug]: On generation, It samples the tensors twice and vid2vid doesn't work. #62

Grendar1 commented Mar 24, 2023 •

edited

toyxyz commented Mar 24, 2023

kabachuha commented Mar 24, 2023 •

edited

jav12z commented Mar 24, 2023

Compviztr commented Mar 25, 2023

hithereai commented Mar 25, 2023

toyxyz commented Mar 25, 2023

[Bug]: On generation, It samples the tensors twice and vid2vid doesn't work. #62

[Bug]: On generation, It samples the tensors twice and vid2vid doesn't work. #62

Comments

Grendar1 commented Mar 24, 2023 • edited

Is there an existing issue for this?

Are you using the latest version of the extension?

What happened?

Steps to reproduce the problem

What should have happened?

WebUI and Deforum extension Commit IDs

What GPU were you using for launching?

On which platform are you launching the webui backend with the extension?

Settings

Console logs

Additional information

toyxyz commented Mar 24, 2023

kabachuha commented Mar 24, 2023 • edited

jav12z commented Mar 24, 2023

Compviztr commented Mar 25, 2023

hithereai commented Mar 25, 2023

toyxyz commented Mar 25, 2023

Grendar1 commented Mar 24, 2023 •

edited

kabachuha commented Mar 24, 2023 •

edited