-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linear layer with same weights, biases, and inputs gives different output than Pytorch #2250
Comments
I don't think there are much guarantees on the values being exactly equal on both sides. Especially when using bfloat16, there are only 8 bits for the mantissa so if the error is of the order of a percent that would be somewhat expected.
|
Thank you for giving me those tips, I think I figured out what the problem is:
About the same, there is no change to the output.
It turns out that I only do a matmul and then add the bias separately, the 2 tensors are the same in the output! So when we folow the Would it be possible to add this fusion of add and mul to Candle too, as it would fix this? Alternatively, is there something which I can do to fix this? Thank you so much! |
I wrote some code to test fusion using cuBLASLT with a This gets rid of the error. Would you be interested in me submitting a PR to add the fused-linear-bias support directly to Linear? |
Hello all,
Thank you for your great work here. I was completing some testing of the Phi 3 Vision model in mistral.rs, but it appears that the error stems from a linear layer. I have verified that the inputs, weights, and biases are the same, but the output is different by exporting to Numpy and then comparing. I have attached the necessary files to reproduce this, as well as the Rust and Python scripts for reproducing and showing how they match, respectively.
mistral.rs
weights:mistral.rs.zip
phi3 vision
weights (ground truth):Phi-3-vision-128k-instruct.zip
testingout.zip
Note: here is what each file name means:
Rust program to reproduce the error
This just loads from the Numpy files, converts to BF16, and does the Linear layer forward pass. Using the weights from
Phi-3-vision-128k-instruct
has no effect.I ran this with the
cuda
feature enabled.Python script to compare outputs
Result of Python script
As you can see, the inputs, weights, and biases are the same but the outputs differ in both mistral.rs and in the reproduction script.
The text was updated successfully, but these errors were encountered: