Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea, if we're being extra arty about videos. #48

Open
HughPH opened this issue Sep 8, 2021 · 7 comments
Open

Idea, if we're being extra arty about videos. #48

HughPH opened this issue Sep 8, 2021 · 7 comments

Comments

@HughPH
Copy link

HughPH commented Sep 8, 2021

Another change I've made for myself is to break every n iterations (after checkin) and await user input. If I input Y it reloads the image from disk and reinitialises the optimiser (the same as you do for a zoom video). This way I can "guide" it quite forcefully: if I want a skull with glowing blue eyes, and the blue eyes are not picked up from the init image (or have dissolved into nothing) by the 50th step, I can paint them in. I can also "promote" features in the output by exaggerating their presence.

image

Since we're reinitialising the optimiser, we can presumably also switch up the prompts 'in the middle' of the run, when loss has 'stabilised'? Depending on how far you want to take this (and I'll be doing my own experimentation) maybe we can draw up a timeline and construct a video based on prompts that change over time.

@nerdyrodent
Copy link
Owner

Sounds fun. A bit like story mode, but more interactive.

@lucasantana
Copy link

lucasantana commented Oct 2, 2021

What inputs did you use for this outcome? Looks cool!!

@HughPH
Copy link
Author

HughPH commented Oct 2, 2021

What inputs did you use for this outcome? Looks cool!!

Thanks, this was "manually guided". I can't remember exactly what the prompt was, something along the lines of "a shiny metal robot face with glowing blue eyes", but I started with an initial image of a human skull with roughly drawn blue eyeballs (just two circles with a black blob in the middle for a pupil and a couple of highlights for reflections). Then on each call to checkin, I break and await user input. At that point I can check if the image is going how I want, and if it's not I can load it in Krita or Pinta or something and roughly "repair" any features that are not going quite as I like. Just a thick brush with a solid colour is usually sufficient, but I might also select an area and copy it, or stretch or rotate a section, or use Krita's Heal tool to erase a feature. It doesn't need any artistic skill.

@matteofedericopazienza
Copy link

That's amazing! How did you do that? Could you share the code? Thanks!

@giantmonster
Copy link

This looks very much like something I would use! It's a great idea either way!

@HughPH
Copy link
Author

HughPH commented Oct 24, 2021

(Almost) all the code you need to do this is already in generate.py.

The first thing I did was add another command line argument:

vq_parser.add_argument("-jr",   "--justrun", action="store_true", help="Just run, no breaks", dest="just_run")

Next, I modified the main loop, so that if the just_run argument has not been passed and the number of iterations is a multiple of the display_freq argument, the code waits for input. During this wait, you can modify the image which was dumped when checkin() was called from train(). Then if you enter "Y", the image is reloaded and the flag to reset the optimizer is set to True. See the if statement further down for the make_zoom_video argument for the same image-reloading code with comments.

try:
    resetOptimizer = False
    with tqdm() as pbar:
        while True:

            train(i)

            if not args.just_run and i % args.display_freq == 0:
                print(f"Modify output{i}.png and press Y, Enter, or just Enter if no change made")
                y = input()
                if y == 'Y':
                  img = Image.open(f"output{i}.png")
                  pil_image = img.convert('RGB')
                  pil_image = pil_image.resize((sideX, sideY), Image.LANCZOS)
                  pil_tensor = TF.to_tensor(pil_image)
                  z, *_ = model.encode(pil_tensor.to(device).unsqueeze(0) * 2 - 1)
                  z_orig = z.clone()
                  z.requires_grad_(True)
                  resetOptimizer = True

If you want to run without waiting for input, you can pass -jr on the command line, and original behaviour is restored.

@HughPH
Copy link
Author

HughPH commented Nov 1, 2021

Just a quick note: If you're using the -o command line option, f"output{i}.png" won't work, you need to replace that with args.output + str(i) + ".png"
Just been hit with my own bug :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants