You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@torch.no_grad
def get_hidden_states(self, model, positive, negative, average: bool = True):
intermediate = {}
self.cast_hook(model, intermediate)
with torch.no_grad():
_ = model(positive, negative_prompt=negative, num_inference_steps=25)
hidden = {}
for key in intermediate:
hidden_states = intermediate[key][0][-1] #### why using negative prompt as hidden states
if average:
# use average over sequence
hidden_states = hidden_states.sum(dim=0) / hidden_states.shape[0]
else:
# take last value
hidden_states = hidden_states[:-1]
hidden[key] = hidden_states.to(self.device)
del intermediate
gc.collect()
torch.cuda.empty_cache()
return hidden
The text was updated successfully, but these errors were encountered:
We take the negative prompt into account since many finetunes suggest specific negative prompts they should be used with. The idea being that when the router encounters a similar prompt and negative prompt it will route to that specific model's layer. Though we have to do some ablation tests and see how much the inclusion of these negative prompts affects the final SegMoE.
Why not using hidden states of positive hidden states? e.g. hidden_states = intermediate[key][0][0]
Here when using Classifier-free Guidance, positive and negative prompt would form a batch to infer.
Here‘s another question i'd like to ask:
i saw SegMoE using hidden states activation at last step of diffusion process. Have you tried average all hidden states during diffusion or anything else?
hi,
ref: https://github.com/segmind/segmoe/blob/5fce95320f932aeb0991c9c0c31a3be72dbf7ce8/segmoe/main.py#L1300C13-L1300C26
The text was updated successfully, but these errors were encountered: