Why using negative prompt hidden states as gate weight? #19

IvanFei · 2024-02-20T09:55:05Z

hi,

ref: https://github.com/segmind/segmoe/blob/5fce95320f932aeb0991c9c0c31a3be72dbf7ce8/segmoe/main.py#L1300C13-L1300C26

 @torch.no_grad
  def get_hidden_states(self, model, positive, negative, average: bool = True):
      intermediate = {}
      self.cast_hook(model, intermediate)
      with torch.no_grad():
          _ = model(positive, negative_prompt=negative, num_inference_steps=25)
      hidden = {}
      for key in intermediate:
          hidden_states = intermediate[key][0][-1]  #### why using negative prompt as hidden states
          if average:
              # use average over sequence
              hidden_states = hidden_states.sum(dim=0) / hidden_states.shape[0]
          else:
              # take last value
              hidden_states = hidden_states[:-1]
          hidden[key] = hidden_states.to(self.device)
      del intermediate
      gc.collect()
      torch.cuda.empty_cache()
      return hidden

The text was updated successfully, but these errors were encountered:

Warlord-K · 2024-02-20T18:58:07Z

We take the negative prompt into account since many finetunes suggest specific negative prompts they should be used with. The idea being that when the router encounters a similar prompt and negative prompt it will route to that specific model's layer. Though we have to do some ablation tests and see how much the inclusion of these negative prompts affects the final SegMoE.

IvanFei · 2024-02-21T03:04:47Z

thank you for kind reply.

Why not using hidden states of positive hidden states? e.g. hidden_states = intermediate[key][0][0]
Here when using Classifier-free Guidance, positive and negative prompt would form a batch to infer.

Here‘s another question i'd like to ask:

i saw SegMoE using hidden states activation at last step of diffusion process. Have you tried average all hidden states during diffusion or anything else?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why using negative prompt hidden states as gate weight? #19

Why using negative prompt hidden states as gate weight? #19

IvanFei commented Feb 20, 2024

Warlord-K commented Feb 20, 2024

IvanFei commented Feb 21, 2024 •

edited

Loading

Why using negative prompt hidden states as gate weight? #19

Why using negative prompt hidden states as gate weight? #19

Comments

IvanFei commented Feb 20, 2024

Warlord-K commented Feb 20, 2024

IvanFei commented Feb 21, 2024 • edited Loading

IvanFei commented Feb 21, 2024 •

edited

Loading