Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear layer with same weights, biases, and inputs gives different output than Pytorch #2250

Open
EricLBuehler opened this issue Jun 6, 2024 · 3 comments

Comments

@EricLBuehler
Copy link
Member

Hello all,

Thank you for your great work here. I was completing some testing of the Phi 3 Vision model in mistral.rs, but it appears that the error stems from a linear layer. I have verified that the inputs, weights, and biases are the same, but the output is different by exporting to Numpy and then comparing. I have attached the necessary files to reproduce this, as well as the Rust and Python scripts for reproducing and showing how they match, respectively.

Note: here is what each file name means:

  • inp/imp (Rust and Python respectively), input to layer
  • layerhiddenweight: weight of Linear layer
  • layerhiddenbias: weight of Linear layer
  • xs: Output of linear layer (this is what differs)
  • testingout.npy: Output of Rust reproduction script (this also differs)

Rust program to reproduce the error

This just loads from the Numpy files, converts to BF16, and does the Linear layer forward pass. Using the weights from Phi-3-vision-128k-instruct has no effect.

I ran this with the cuda feature enabled.

use candle_core::{Device, Tensor, DType, Module};
use candle_nn::Linear;

fn main() {
    let dev = Device::cuda_if_available(0).unwrap();
    let weight = Tensor::read_npy("../mistral.rs/layerhiddenweight.npy").unwrap().to_device(&dev).unwrap().to_dtype(DType::BF16).unwrap();
    let bias = Tensor::read_npy("../mistral.rs/layerhiddenbias.npy").unwrap().to_device(&dev).unwrap().to_dtype(DType::BF16).unwrap();
    let layer = Linear::new(weight, Some(bias));
    
    let inp = Tensor::read_npy("../mistral.rs/inp.npy").unwrap().to_device(&dev).unwrap().to_dtype(DType::BF16).unwrap();
    let res = layer.forward(&inp).unwrap();
    dbg!(&res.to_dtype(DType::F32).unwrap().mean_all());

    let truth = Tensor::read_npy("../mistral.rs/xs.npy").unwrap().to_device(&dev).unwrap().to_dtype(DType::BF16).unwrap();
    dbg!(&truth.to_dtype(DType::F32).unwrap().mean_all());

    res.to_dtype(DType::F32).unwrap().write_npy("testingout.npy").unwrap();
    println!("Wrote output.");
}

Python script to compare outputs

import numpy as np

mistralrs = np.load("mistral.rs/inp.npy")
py = np.load("Phi-3-vision-128k-instruct/imp.npy")

print(mistralrs.shape, py.shape)

print("inp",np.allclose(mistralrs, py))



mistralrs = np.load("mistral.rs/layerhiddenweight.npy")
py = np.load("Phi-3-vision-128k-instruct/layerhiddenweight.npy")

print(mistralrs.shape, py.shape)

print("weight",np.allclose(mistralrs, py))



mistralrs = np.load("mistral.rs/layerhiddenbias.npy")
py = np.load("Phi-3-vision-128k-instruct/layerhiddenbias.npy")

print(mistralrs.shape, py.shape)

print("bias",np.allclose(mistralrs, py))



mistralrs = np.load("mistral.rs/xs.npy")
py = np.load("Phi-3-vision-128k-instruct/xs.npy")

print(mistralrs.shape, py.shape)

print("out1",np.allclose(mistralrs, py))
print(mistralrs[:,5:10,:5]-py[:,5:10,:5])



mistralrs = np.load("testing/testingout.npy")
py = np.load("Phi-3-vision-128k-instruct/xs.npy")

print(mistralrs.shape, py.shape)

print("out2",np.allclose(mistralrs, py))
print(mistralrs[:,5:10,:5]-py[:,5:10,:5])

Result of Python script

As you can see, the inputs, weights, and biases are the same but the outputs differ in both mistral.rs and in the reproduction script.

(1, 1921, 4096) (1, 1921, 4096)
inp True
(3072, 4096) (3072, 4096)
weight True
(3072,) (3072,)
bias True
(1, 1921, 3072) (1, 1921, 3072)
out1 False
[[[0.       0.       0.       0.015625 0.      ]
  [0.       0.       0.       0.015625 0.      ]
  [0.       0.       0.       0.015625 0.      ]
  [0.       0.       0.       0.015625 0.      ]
  [0.       0.       0.       0.015625 0.      ]]]
(1, 1921, 3072) (1, 1921, 3072)
out2 False
[[[0.       0.       0.       0.015625 0.      ]
  [0.       0.       0.       0.015625 0.      ]
  [0.       0.       0.       0.015625 0.      ]
  [0.       0.       0.       0.015625 0.      ]
  [0.       0.       0.       0.015625 0.      ]]]
@LaurentMazare
Copy link
Collaborator

I don't think there are much guarantees on the values being exactly equal on both sides. Especially when using bfloat16, there are only 8 bits for the mantissa so if the error is of the order of a percent that would be somewhat expected.
Things that could be worth checking:

  • How much is the error when using f32 on both sides?
  • What happens without biases, the error being in the same column might indicate that this is the culprit and it may be caused by pytorch fusing the add and mul part whereas candle doesn't do it for now.
  • Are the f32 to bf16 conversions in line between both sides? (using a safetensors rather than npy would allow storing the bf16 tensors)

@EricLBuehler
Copy link
Member Author

EricLBuehler commented Jun 6, 2024

Thank you for giving me those tips, I think I figured out what the problem is:

How much is the error when using f32 on both sides?

About the same, there is no change to the output.

What happens without biases, the error being in the same column might indicate that this is the culprit and it may be caused by pytorch fusing the add and mul part whereas candle doesn't do it for now.

It turns out that I only do a matmul and then add the bias separately, the 2 tensors are the same in the output! So when we folow the xW^T + B strictly (as Candle does) in Pytorch, the output is the same.

Would it be possible to add this fusion of add and mul to Candle too, as it would fix this? Alternatively, is there something which I can do to fix this? Thank you so much!

@EricLBuehler
Copy link
Member Author

I wrote some code to test fusion using cuBLASLT with a FusedLinearBias layer: https://github.com/EricLBuehler/mistral.rs/blob/44e8a2291d6d53fa125907925c0a4cc613cb8855/mistralrs-core/src/layers.rs#L401-L451

This gets rid of the error. Would you be interested in me submitting a PR to add the fused-linear-bias support directly to Linear?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants