-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding more "hint" to training process #271
Comments
I don't think you need to start training from scratch. If you finetuning from some trained model, you should be able to converge faster |
To my knowledge, you should probably alter |
Hi, do you know why use_ema is set to False? |
hello! I'm doing similar attempt as u do. do u have any further results? |
Hello! Have you solved the problem? I wonder if I could learn from your work. Thank you |
I finshed my works months ago, it works but not significantly effective. In the config part, i only changed hint_channels to 6 Then I merged 2 3channels img into a 6channels img, and save as tiff, create a customized dataset object for training. this is my dataset code below. class MyDataset(Dataset):
def __init__(self):
self.data = []
with open('./training/pose+face/prompt.json', 'rt') as f:
for line in f:
self.data.append(json.loads(line))
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
item = self.data[idx]
source_filename = item['source']
target_filename = item['target']
prompt = item['prompt']
source = tifffile.imread(os.path.join('./training/pose+face/source', source_filename))
target = cv2.imread(os.path.join('./training/pose+face/target', target_filename))
# Do not forget that OpenCV read images in BGR order.
target = cv2.cvtColor(target, cv2.COLOR_BGR2RGB)
# Normalize source images to [0, 1].
source = np.transpose(source, (1, 2, 0))
source = source.astype(np.float32) / 255.0
# Normalize target images to [-1, 1].
target = (target.astype(np.float32) / 127.5) - 1.0
return dict(jpg=target, txt=prompt, hint=source) |
Thanks for your prompt reply, I am trying to write as you say. I failed to save the merged img as tiff, so I stacked_array = np.concatenate((inpaint_resize, ref_image), axis=2)
#inpaint_resize : (512,512,3)
#ref_image : (512,512,3) Then I get a (512,512,6) numpy as a hint. But there is something wrong.
I am trying to fix this problem. May I ask if you have done any operation other than modifying the hint_channels, or can you provide the part that you save tiff? I would be very, very grateful. |
it is a meanless error, just overlook it for is aims to record the image log while training. if grid.shape[2] == 6:
grid = grid[ :, :,:3]
continue add this code before |
It makes sense! Thanks for your useful advice! |
Hey guys, |
I change model config to this:
I add the hints inside my Dataset class to load data from It and I change the DataSet class and concatenate different images to each other as a source
I started from a pretrained stable diffusion model. I need to load the weights inside the model hence I duplicate the weights in the hints and you can see it in the following code:
In the above code the |
Thank you for the inputs, may I know if you were able to get good results with this? |
Maybe we can't achieve the desired result. I tried segmap plus depth... If there are no other bugs in my experiment, then the conclusion is: the image is not ok~ |
Hi, I had similar concern and I solved the problem by duplicating the hint channel in pretraned model (2 duplicated image for hint)
|
Hi,
i was focusing with the human posture task (getting posture from openpose image + prompt and than generating the charter under the right pose - control_sd15_openpose.pth)
However, i wanted to add one more hint to force the controlnet to generate specific human:
so if in the original code the hint be an posture image like that :
i would like to add more image of the specific human:
the target should be that image of that person, under the new posture
so what i did is:
in the dataset file: reading that extra image too, concatenate in the channel dimension, that image with the posture image so now the
source variable is 6 channels not 3
changing the yaml config file to support 6 channels - NOT SURE I REALLY UNDERSTATED THE MEANING OF THESE VALUES
model:
target: cldm.cldm.ControlLDM
params:
linear_start: 0.00085
linear_end: 0.0120
num_timesteps_cond: 1
log_every_t: 200
timesteps: 1000
first_stage_key: "jpg"
cond_stage_key: "txt"
control_key: "hint"
image_size: 64
channels: was 4 i changed to 7
cond_stage_trainable: false
conditioning_key: crossattn
monitor: val/loss_simple_ema
scale_factor: 0.18215
use_ema: False
only_mid_control: False
the problem is when i trained the model from scratch - running tutorial_train.py with resume_path = None
the model predictions, the reconstruction and the samples that locate under image_log->train folder are just a noise
does anyone have any idea how to solve that ?
thanks
The text was updated successfully, but these errors were encountered: