Is this a bug? (Edit: Replace start_image with NotNANtoN's `img` clip embed?) #49

afiaka87 · 2021-02-16T07:38:11Z

Line 321 in 9640041

elif text is not None:

Saw this new functionality added. Super useful. Just making sure this function works correctly. It looks like it's called during init, but because it returns in its nested ifs, it only ever runs the code for the img_embed if you didn't specify a clip_encode (I think).

NotNANtoN · 2021-02-18T10:51:41Z

Hey! I thought of that. My idea was to either have an image, a text, or a custom embedding as an optimization target. So if an encoding is passed it should be used directly - merging the encoding from text and images can be done outside that class. I think it's not clean code to return inside the nested ifs, that might cause the issue.

I wanted to submit a PR soon anyways, in which I add that train_step also returns the latest generated image instead of saving it to disk for a project of mine. I can clean this up there.

afiaka87 · 2021-02-20T16:36:26Z

Hm. So we already actually had a "start_image" parameter that trained Siren directly on the image itself for a few hundred iterations. I've not had much success with that technique though. It successfully "neuralizes" the image, but after it starts training on the phrase CLIP embed, it just sort of swirls around in the existing colors of the image, slowly blackening more and more of them as I mention here.

Training on the cosine similarity of an image CLIP embed (as well as your text) on the other hand, does seem to pick up some of the composition of the original image in a way that doesn't break the training for the text embed.

I guess what I'm saying is I'd definitely like to be able to do both in one go without having to know about load up CLIP and encode/combine various things before passing them in (even though I technically do). While I appreciate the power you get with that approach, and I agree it should remain in the code, I don't think beginners will be super excited about having to figure out how CLIP works just to generate some visuals.

NotNANtoN · 2021-02-20T16:59:46Z

A good solution could be to merge text and IMG embedding if both are put in. By default it could be the average embedding of the two and one could add a text_weight parameter that controls how much the text embedding influences the final embedding.

I think it could also be nice to pull out the pre-training routine of the start image into a proper method. For one application I'd like to continuously train siren directly on a video stream, while optimizing for certain text similarity.

Btw, this might be completely off-topic but I noticed that using train_step directly instead of the forward method saves me about 2.5GB of VRAM (In my case from 7.5 to 5). I'll check that again at some point and open an issue about it if I can replicate it

afiaka87 · 2021-02-20T17:53:26Z

Btw, this might be completely off-topic but I noticed that using train_step directly instead of the forward method saves me about 2.5GB of VRAM (In my case from 7.5 to 5). I'll check that again at some point and open an issue about it if I can replicate it

Please do! VRAM usage has consistently gone up for awhile now but I'm not skilled enough with pytorch/machine learning in general to know when/where to delete stuff that's no longer needed.

I'm fairly certain VRAM usage went up quite a deal after the "warmup step" was added to the forward method if that helps your search.

afiaka87 · 2021-02-20T18:24:28Z

I've got a notebook here that I've been using to just manually define my own forward method in order to keep VRAM usage somewhat under my control. Not sure if it's the best way to go about it, but lots of people keep forking the original research notebook because it gives them full control over everything, despite having worse code quality and fewer features. #50

NotNANtoN · 2021-02-21T03:41:33Z

Btw, this might be completely off-topic but I noticed that using train_step directly instead of the forward method saves me about 2.5GB of VRAM (In my case from 7.5 to 5). I'll check that again at some point and open an issue about it if I can replicate it

Please do! VRAM usage has consistently gone up for awhile now but I'm not skilled enough with pytorch/machine learning in general to know when/where to delete stuff that's no longer needed.

I'm fairly certain VRAM usage went up quite a deal after the "warmup step" was added to the forward method if that helps your search.

I looked into it and you seem to be right. The warmup step is the problem, I fixed it by just doing it within a "with torch.no_grad():"

NotNANtoN · 2021-02-21T13:51:07Z

I cleaned this up and fixed the VRAM issue here: #58

NotNANtoN · 2021-02-22T13:11:45Z

The PR was merged - seems like this issue could be closed

afiaka87 · 2021-02-25T03:57:11Z

Sorry bout that. Closing.

afiaka87 changed the title ~~Is this a bug?~~ Is this a bug? (Edit: Replace start_image with NotNANtoN's img clip embed?) Feb 20, 2021

afiaka87 closed this as completed Feb 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is this a bug? (Edit: Replace start_image with NotNANtoN's `img` clip embed?) #49

Is this a bug? (Edit: Replace start_image with NotNANtoN's `img` clip embed?) #49

afiaka87 commented Feb 16, 2021

NotNANtoN commented Feb 18, 2021

afiaka87 commented Feb 20, 2021 •

edited

NotNANtoN commented Feb 20, 2021

afiaka87 commented Feb 20, 2021

afiaka87 commented Feb 20, 2021

NotNANtoN commented Feb 21, 2021

NotNANtoN commented Feb 21, 2021

NotNANtoN commented Feb 22, 2021

afiaka87 commented Feb 25, 2021

Is this a bug? (Edit: Replace start_image with NotNANtoN's img clip embed?) #49

Is this a bug? (Edit: Replace start_image with NotNANtoN's img clip embed?) #49

Comments

afiaka87 commented Feb 16, 2021

NotNANtoN commented Feb 18, 2021

afiaka87 commented Feb 20, 2021 • edited

NotNANtoN commented Feb 20, 2021

afiaka87 commented Feb 20, 2021

afiaka87 commented Feb 20, 2021

NotNANtoN commented Feb 21, 2021

NotNANtoN commented Feb 21, 2021

NotNANtoN commented Feb 22, 2021

afiaka87 commented Feb 25, 2021

Is this a bug? (Edit: Replace start_image with NotNANtoN's `img` clip embed?) #49

Is this a bug? (Edit: Replace start_image with NotNANtoN's `img` clip embed?) #49

afiaka87 commented Feb 20, 2021 •

edited