Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why using negative prompt hidden states as gate weight? #19

Open
IvanFei opened this issue Feb 20, 2024 · 2 comments
Open

Why using negative prompt hidden states as gate weight? #19

IvanFei opened this issue Feb 20, 2024 · 2 comments

Comments

@IvanFei
Copy link

IvanFei commented Feb 20, 2024

hi,

ref: https://github.com/segmind/segmoe/blob/5fce95320f932aeb0991c9c0c31a3be72dbf7ce8/segmoe/main.py#L1300C13-L1300C26

 @torch.no_grad
  def get_hidden_states(self, model, positive, negative, average: bool = True):
      intermediate = {}
      self.cast_hook(model, intermediate)
      with torch.no_grad():
          _ = model(positive, negative_prompt=negative, num_inference_steps=25)
      hidden = {}
      for key in intermediate:
          hidden_states = intermediate[key][0][-1]  #### why using negative prompt as hidden states
          if average:
              # use average over sequence
              hidden_states = hidden_states.sum(dim=0) / hidden_states.shape[0]
          else:
              # take last value
              hidden_states = hidden_states[:-1]
          hidden[key] = hidden_states.to(self.device)
      del intermediate
      gc.collect()
      torch.cuda.empty_cache()
      return hidden
@Warlord-K
Copy link
Contributor

We take the negative prompt into account since many finetunes suggest specific negative prompts they should be used with. The idea being that when the router encounters a similar prompt and negative prompt it will route to that specific model's layer. Though we have to do some ablation tests and see how much the inclusion of these negative prompts affects the final SegMoE.

@IvanFei
Copy link
Author

IvanFei commented Feb 21, 2024

thank you for kind reply.

Why not using hidden states of positive hidden states? e.g. hidden_states = intermediate[key][0][0]
Here when using Classifier-free Guidance, positive and negative prompt would form a batch to infer.

Here‘s another question i'd like to ask:

  1. i saw SegMoE using hidden states activation at last step of diffusion process. Have you tried average all hidden states during diffusion or anything else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants