mixed precision policies cause bumblebee models to fail #544

ityonemo · 2023-11-13T15:24:35Z

We were experimenting with llama2-based models and noticed that there were some problems. Llama2 is trained on bf16 so (probably?) this should work:

Base code (this works)

auth_token = System.get_env("HF_AUTH_TOKEN")
Nx.default_backend({EXLA.Backend, client: :host})
model = {:hf, "meta-llama/Llama-2-7b-chat-hf", auth_token: auth_token}

{:ok, m} = Bumblebee.load_model(model)
{:ok, t} = Bumblebee.load_tokenizer(model)
{:ok, g} = Bumblebee.load_generation_config(model)

serving = Bumblebee.Text.generation(m, t, g, defn_options: 
  [compiler: EXLA, compiler_options: [client: :cuda, lazy_transfers: :always]])

Nx.Serving.run(serving, "[INST] <<SYS>>\nYou are a bot.\n<</SYS>>\n\nHi, bot![/INST]")
|> dbg

output:

"[INST] <<SYS>>\nYou are a bot.\n<</SYS>>\n\nHi, bot![/INST] Hello! How can I assist you today?"

code with mixed precision policies:

auth_token = System.get_env("HF_AUTH_TOKEN")
Nx.default_backend({EXLA.Backend, client: :host})

model = {:hf, "meta-llama/Llama-2-7b-chat-hf", auth_token: auth_token}

{:ok, m} = Bumblebee.load_model(model)
{:ok, t} = Bumblebee.load_tokenizer(model)
{:ok, g} = Bumblebee.load_generation_config(model)

policy = Axon.MixedPrecision.create_policy(compute: {:bf, 16})
mp_model = Axon.MixedPrecision.apply_policy(m.model, policy)
m2 = %{m | model: mp_model}

serving = Bumblebee.Text.generation(m2, t, g, defn_options: 
  [compiler: EXLA, compiler_options: [client: :cuda, lazy_transfers: :always]])

Nx.Serving.run(serving, "[INST] <<SYS>>\nYou are a bot.\n<</SYS>>\n\nHi, bot![/INST]")

output:

[INST] <<SYS>>\nYou are a bot.\n<</SYS>>\n\nHi, bot![/INST] pl\nA van Lloydns wicked plan' a serious unwleftP8 including NewNewhhellilitiesathed ux- behindRES val Orange County IL months Meister=bool PA and so on

The text was updated successfully, but these errors were encountered:

ityonemo · 2023-11-13T15:28:44Z

also attempting {:f, 16} failed. However, {:f 64} performs ~correctly Hello there! *giggles* I'm just an AI assistant, here to help you with any questions or tasks you may have! *winks* Is there something specific you'd like to chat about or ask me to do? 😃.

ityonemo · 2023-11-13T17:47:27Z

experimenting showed that quantizing layer #7 alone also causes the issue (this is an embedding layer, and not an rms-norm layer. Will try disabling both

auth_token = System.get_env("HF_AUTH_TOKEN")
Nx.default_backend({EXLA.Backend, client: :host})
# model = {:hf, "mistralai/Mistral-7B-Instruct-v0.1"}
model = {:hf, "meta-llama/Llama-2-7b-chat-hf", auth_token: auth_token}

{:ok, m} = Bumblebee.load_model(model)
{:ok, t} = Bumblebee.load_tokenizer(model)
{:ok, g} = Bumblebee.load_generation_config(model)

  bf = {:bf, 16}
  policy = Axon.MixedPrecision.create_policy(params: bf, compute: bf, output: bf)

  filter = fn layer ->
    layer.id == 7
  end

  mp_model = Axon.MixedPrecision.apply_policy(m.model, policy, filter)
  m2 = %{m | model: mp_model}

  serving =
    Bumblebee.Text.generation(m2, t, g,
      defn_options: [compiler: EXLA, compiler_options: [client: :cuda, lazy_transfers: :always]]
    )

  %{results: [%{text: text}]} =
    Nx.Serving.run(serving, "[INST] <<SYS>>\nYou are a bot.\n<</SYS>>\n\nHi, bot![/INST]")

    text |> dbg

seanmor5 mentioned this issue Nov 13, 2023

Add op-name to rms norm elixir-nx/bumblebee#280

Merged

seanmor5 closed this as completed in elixir-nx/bumblebee#280 Nov 14, 2023

seanmor5 reopened this Nov 14, 2023

seanmor5 mentioned this issue Nov 14, 2023

Do not cast integer to float unintentionally #547

Merged

seanmor5 closed this as completed in #547 Nov 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mixed precision policies cause bumblebee models to fail #544

mixed precision policies cause bumblebee models to fail #544

ityonemo commented Nov 13, 2023 •

edited

Loading

ityonemo commented Nov 13, 2023 •

edited

Loading

ityonemo commented Nov 13, 2023 •

edited

Loading

mixed precision policies cause bumblebee models to fail #544

mixed precision policies cause bumblebee models to fail #544

Comments

ityonemo commented Nov 13, 2023 • edited Loading

Base code (this works)

output:

code with mixed precision policies:

output:

ityonemo commented Nov 13, 2023 • edited Loading

ityonemo commented Nov 13, 2023 • edited Loading

ityonemo commented Nov 13, 2023 •

edited

Loading

ityonemo commented Nov 13, 2023 •

edited

Loading

ityonemo commented Nov 13, 2023 •

edited

Loading