Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

customtype test failures on 32-bit (and Big Endian) architectures #164

Closed
onlyjob opened this issue Apr 4, 2016 · 40 comments
Closed

customtype test failures on 32-bit (and Big Endian) architectures #164

onlyjob opened this issue Apr 4, 2016 · 40 comments
Labels

Comments

@onlyjob
Copy link

onlyjob commented Apr 4, 2016

As reported in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=819994

Builds of golang-gogoprotobuf on 32-bit architectures such as i386 and on
ppc64 (the only big-endian architecture on which its build dependencies
were available) all failed with test suite errors because various results
weren't as expected. The details vary by architecture, per

https://buildd.debian.org/status/logs.php?pkg=golang-gogoprotobuf&ver=0.2-1

Could you please take a look?

See full build logs on armel, armhf, i386 and ppc64.
Tests do not fail only on amd64, arm64 and ppc64el.

@awalterschulze
Copy link
Member

Ok so it seems to all be problems with customtype when the Unmarshal method is not generated, so at least its isolated, but I'll keep looking.

@awalterschulze
Copy link
Member

I think I should log this as a Known Issue.
Then we can use runtime.GOARCH to check if we want to skip the test.
Would that be a viable solution?

I would like to fix it, but that unsafe code is quite hard fix without being able to run the tests over and over. And since I don't have 32bit big endian machine I don't know how to do that. Do you have any suggestions?

@onlyjob
Copy link
Author

onlyjob commented Apr 8, 2016

You should be able to reproduce on i386 a/k/a x86_32. A VM or container or chroot should do. As for ppc64 we can see whether your corrections work on next upload...

I suppose skipping tests will do as well but fixing the problem will be preferable. :)

Thanks.

@awalterschulze
Copy link
Member

So if I start a 32bit ubuntu in a vm that should reproduce the problem?

@onlyjob
Copy link
Author

onlyjob commented Apr 8, 2016

Yes, though I'd recommend Debian...

@awalterschulze
Copy link
Member

Ha ha fair enough :)

@awalterschulze
Copy link
Member

Fixed: 4f262e4
Ok wow that was just a bug!
length (2) * pointer size (8) = size of array of 2 uint64s (16)
So this would have broken for any other type of array.

@awalterschulze
Copy link
Member

Do you want me to add another tag or something?

@onlyjob
Copy link
Author

onlyjob commented Apr 16, 2016

Thank you. I've applied 4f262e4 on top of v0.2 and it is much better: armel and i386 are fixed but armhf FTBFS (ppc64 did not build yet).

Could you have a look please?

@awalterschulze
Copy link
Member

Ok so it looks like TestInt32Int64Compatibility breaks when the unsafemarshaler or unsafeunmarshaler is used.

@awalterschulze
Copy link
Member

Any ideas on how I can reproduce this?

@onlyjob
Copy link
Author

onlyjob commented Apr 16, 2016

Any ideas on how I can reproduce this?

I don't know how, sorry...

@awalterschulze
Copy link
Member

Ok cool, just checking :)

@awalterschulze
Copy link
Member

I'll take another look at this asap.
I am just a little busy at the moment.
Thanks again for reporting.

@onlyjob
Copy link
Author

onlyjob commented Apr 16, 2016

Thanks. As for armhf I know it is possible to run it in qemu through virt-manager but I can't recall the details... It may be tricky to set-up VM without bootable images...

@awalterschulze
Copy link
Member

I think I'll first try to spot the mistake and then try to fix it on a branch which I can then ask you to test?

@onlyjob
Copy link
Author

onlyjob commented Apr 16, 2016

Let's do that. :)

@awalterschulze
Copy link
Member

Ok so its not just the TestInt32Int64Compatibility that breaks.
The line numbers where they are breaking don't make any sense.
Some of the stack traces are broken, since they are calling methods that aren't applicable.
Maybe the source is changed on the 32bit processor during regeneration.
Is it possible to get a copy of the regenerated source on that test runner?

@awalterschulze
Copy link
Member

Is it possible to check the processor with some go call and then simply make sure than unsafe marshalers and unmarshalers are not generated for this processor?

@onlyjob
Copy link
Author

onlyjob commented Apr 18, 2016

I don't have access to build server but I may be able to revive old armhf VM if I'll manage to update it to current "unstable"... It'll take time though...

@awalterschulze
Copy link
Member

Yes all this is taking a lot of time, but the previous one was worth it, since it found a real bug.
We won't know until we know though :)

This does not mean that this one will find a real bug though.
The output looks really weird.

I don't there is a way to tell whether this is worth the effort.
Who uses this type of processor?

@onlyjob
Copy link
Author

onlyjob commented Apr 18, 2016

Who uses this type of processor?

I don't know for sure but I think it is used in many mini-boards, robots, embedded devices, appliances, thin clients and small/portable computers like Raspberry Pi.

@awalterschulze
Copy link
Member

This could be useful to help and detect the architecture.
https://golang.org/src/runtime/internal/sys/zgoarch_ppc64.go

I could then revert unsafe code generation to normal code generation, giving exactly the same behaviour, just a little bit slower.
Maybe that could work?

@awalterschulze
Copy link
Member

https://en.wikipedia.org/wiki/Ppc64
Aha ok so its big endian instead of little endian.
Hmmm.

@onlyjob
Copy link
Author

onlyjob commented Apr 19, 2016

ppc64 is big-endian; armhf is little-endian.

@awalterschulze
Copy link
Member

Aaah, sorry I got confused :( my bad

@awalterschulze
Copy link
Member

I think the solution here is to say unsafe_marshaler and unsafe_unmarshaler are not supported for the armhf architecture and then to skip those tests when the architecture is detected.
What do you think?

@onlyjob
Copy link
Author

onlyjob commented Apr 20, 2016

That'll fix FTBFS. I don't really have my own preference regarding solution to this problem... Thanks.

@awalterschulze
Copy link
Member

Do you know which one of these constants would refer to armhf?
https://golang.org/src/runtime/internal/sys/zgoarch_arm.go

@awalterschulze
Copy link
Member

Really sorry for the super long delay :(

@mwhudson
Copy link

I think the failures are caused by unaligned accesses. I guess that means that the unsafe stuff is not supported on armhf. If you want to skip these tests on arm, just check for runtime.GOARCH=="arm". (I'm not sure how to only skip when GOARM=6 or higher, which I think is the issue here).

@mwhudson
Copy link

I guess I don't entirely understand how this package is used (hey, I just package things!) but it also presumably would be a good idea to avoid the use of unsafe_{un,}marhsaler at all on arm. I guess the generated code could use build tags to use the regular {un,}marshaler code on arm?

@awalterschulze
Copy link
Member

The code generator could also check the GOARCH and decide to generate the non unsafe code.

@mwhudson
Copy link

But are you sure that the code will be run on the same system it is generated on? (see comment about not knowing how this is used) If so, then sure, that works.

@awalterschulze
Copy link
Member

Thats why I would rather just skip the test, since I cannot know.
I will also document that code that is generated with unsafe should not be used on armhf.

I don't want to skip all arm processors, right?
Is GOARM another environment variable that I can check?

@mwhudson
Copy link

Thats why I would rather just skip the test, since I cannot know.

The point I am trying and apparently failing to make is that it seems to me like it ought to be possible to generate code that is safe on arm, by using build tags in the generated source.

I will also document that code that is generated with unsafe should not be used on armhf.

That might be OK.

I don't want to skip all arm processors, right?

I really (really!) doubt the unsafe unmarshaling code is faster when GOARM=5. So I think skipping on all ARM processors would be fine.

Is GOARM another environment variable that I can check?

It is, but I'm not sure that it's worth it, and there's no guarantee that the value at runtime matches the value at compile time.

@awalterschulze
Copy link
Member

I cannot know that the computer generating the code is the computer running the code. Someone might just be pulling in a dependency.

I can expect that someone will run the tests on the architecture they are compiling it on.
So maybe the tests should fail?

unsafe is faster for certain cases.
And if someone is using unsafe they want the extra speed.

@onlyjob
Copy link
Author

onlyjob commented Jun 18, 2016

Any news here?

@awalterschulze
Copy link
Member

I think I am kind of stuck.

@awalterschulze awalterschulze changed the title Multiple test failures on 32-bit (and Big Endian) architectures Multiple test failures on 32-bit (and Big Endian) architectures - for unsafemarshaler and unsafeunmarshaler Aug 21, 2016
@awalterschulze awalterschulze changed the title Multiple test failures on 32-bit (and Big Endian) architectures - for unsafemarshaler and unsafeunmarshaler customtype test failures on 32-bit (and Big Endian) architectures Aug 27, 2016
@awalterschulze
Copy link
Member

Duplicate of #195

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants