Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to continue training without replacing the previous training? #19

Closed
molo32 opened this issue Mar 30, 2021 · 5 comments
Closed

How to continue training without replacing the previous training? #19

molo32 opened this issue Mar 30, 2021 · 5 comments

Comments

@molo32
Copy link

molo32 commented Mar 30, 2021

How to continue training without replacing the previous training?
I have images of a face divided into part1 and part2

first I train part1 of images of face.

when I finish training and do a driver, you can see the face of the images in part1

Then I want to add the images from part 2.

then I load the pth and train from there with the images of part 2

The problem is that when I make a driver, only the images of part2 are chosen, it is as if part2 were to overwrite part 1.

What I expected was to have more variety of expressions, so expressions from part 1 and part 2 are chosen.

How to avoid overwriting images or expressions from a previous training session?

@shrubb
Copy link
Owner

shrubb commented Mar 30, 2021

Sorry, I didn't get it at all, what do you mean by "overwriting images"? And why do you want to train (or fine-tune?) two times. If you have two datasets, just fine-tune to them independently using a single meta-learned checkpoint

@molo32
Copy link
Author

molo32 commented Mar 30, 2021

I want to train two sets of the same person separately because if I load the whole dataset,
cuda out of memory error.
so to avoid that error I split the dataset in A and B and first train a datasetA then datasetB.

dataset are images-cropped generated by preprocess data.py

images are expressions or the face of a person.

driver is to take a video driver, a chekpoint and produce a video with driver.py

When selecting images I mean which images from the data set are chosen to make the output video with driver.py

By overwriting images I mean overwriting expressions or faces.

If data set A has different illumination to data set B, if I refine a meta-learned model with data set A with python3 train.py, then I repeat but with data set B, when making a driver only images are seen with the illumination of B, then B overwritten A.

I don't want to train, I want to tune.
I want to fit two dataset A and B independently, I select the latent-pose-release.pth) to train with the first dataset A, in DATASET_ROOT = path set A, I run python3 train.py
At the end of training it gives me a checkpoint.pth, if I do a driver with that checkpoint the expressions of data set A are seen, then I load that chekpoint.pth to continue training from there to data set B, I finish training and
it gives me another chekpointB.pth, but when I do a driver with the chekpointB, only images are selected from the last dataset B and not from dataset A, that's what I mean by overwrite.

@shrubb
Copy link
Owner

shrubb commented Mar 30, 2021

because if I load the whole dataset, cuda out of memory error.

This means that you're doing something wrong: GPU memory doesn't depend on the dataset size. Just use smaller batches. For example, with a batch size of 1 you can fine-tune on as many images as you want.

As I understood, you're trying to fine-tune a meta-learned checkpoint to dataset A, then take that fine-tuned model and fine-tune it further to dataset B. Well, we never tried that. I don't know if that will even work -- that's a research question. You'll probably need to modify the code for that, and it's entirely at your risk, I'm afraid I can't help here.

@molo32
Copy link
Author

molo32 commented Mar 31, 2021

ok I understand, another thing, can I make the checkpoint smaller?, output is always 1 gb size.

@shrubb
Copy link
Owner

shrubb commented Mar 31, 2021

Yes, it's not hard (just don't include discriminator, embedder, optimizer state etc. in the checkpoint) but for that you'll have to modify the code yourself.

@shrubb shrubb closed this as completed Apr 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants