Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
AArch64 support #550
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
clivem
commented
Mar 1, 2016
|
kernel support! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
popcornmix
Mar 1, 2016
Contributor
This isn't going to happen from us any time soon. A 64-bit kernel is not trivial (and could be produced by community).
|
This isn't going to happen from us any time soon. A 64-bit kernel is not trivial (and could be produced by community). |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
deborah-c
Mar 1, 2016
Could it be produced by community? I think it might well need changes to the VC firmware to correspond, as interface structures would potentially change shape
deborah-c
commented
Mar 1, 2016
|
Could it be produced by community? I think it might well need changes to the VC firmware to correspond, as interface structures would potentially change shape |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
popcornmix
Mar 1, 2016
Contributor
The kernel could be. Depends on the implementation if the interface to VC needs to change. Forcing 32-bit pointers in interface to VC would be a sensible solution that wouldn't need a VC side change.
|
The kernel could be. Depends on the implementation if the interface to VC needs to change. Forcing 32-bit pointers in interface to VC would be a sensible solution that wouldn't need a VC side change. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
pelwell
Mar 1, 2016
Contributor
MMAL is an awkward case, since it passes kernel pointers to the client and expects them to be echoed back - the space is 32-bits, so some form of compression or lookup table would be required.
|
MMAL is an awkward case, since it passes kernel pointers to the client and expects them to be echoed back - the space is 32-bits, so some form of compression or lookup table would be required. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
6by9
Mar 1, 2016
MMAL is an awkward case, since it passes kernel pointers to the client and expects them to be echoed back - the space is 32-bits, so some form of compression or lookup table would be required.
Really? That doesn't sound right as kernel pointers have no meaning outside the kernel.
I'm happy to take a look if you'll email me details of the bit of concern.
6by9
commented
Mar 1, 2016
Really? That doesn't sound right as kernel pointers have no meaning outside the kernel. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
pelwell
Mar 1, 2016
Contributor
Take a look here:
https://github.com/raspberrypi/linux/blob/rpi-4.1.y/drivers/media/platform/bcm2835/mmal-msg.h#L259
here:
https://github.com/raspberrypi/linux/blob/rpi-4.1.y/drivers/media/platform/bcm2835/mmal-vchiq.c#L424
and here:
https://github.com/raspberrypi/linux/blob/rpi-4.1.y/drivers/media/platform/bcm2835/mmal-vchiq.c#L510
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
6by9
Mar 1, 2016
Fair cop - not nice. V4l2 driver is just copying the way mmal did it.
How brave are we feeling? We could pull in the rpmsg mmal service instead, however that loses the bulk transfer facility so may need a slight change to the client code.
edit Hang on, that is the V4L2 driver only, so all kernel side. It's expecting VC to echo back a kernel pointer, not userspace.
I do have some changes planned for V4L2 which may help here (GSH and DC are aware). I'll check in a moment, but does the MMAL interface to userland have this same nastiness?
6by9
commented
Mar 1, 2016
|
Fair cop - not nice. V4l2 driver is just copying the way mmal did it. How brave are we feeling? We could pull in the rpmsg mmal service instead, however that loses the bulk transfer facility so may need a slight change to the client code. edit Hang on, that is the V4L2 driver only, so all kernel side. It's expecting VC to echo back a kernel pointer, not userspace. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
6by9
Mar 1, 2016
:-( Userland also expects VC to preserve a kernel pointer https://github.com/raspberrypi/userland/blob/master/interface/mmal/vc/mmal_vc_msgs.h#L360
That is just VC and kernel side, so could be updated fairly easily, but would be an ABI change between firmware and kernel (or we need the firmware to try and handle multiple different versions of structure).
There's a couple of other pointers in structures passed to VC which would need attention too (eg https://github.com/raspberrypi/userland/blob/master/interface/mmal/vc/mmal_vc_msgs.h#L421)
Are the other services OK?
IL had some niggles with having to set OMX_SKIP64BIT due to structure padding mismatches, but how does ILCS shape up more generally? Something will still need to reduce kernel 64bit pointers to 32 bit physicals for VC.
VCSM? Mailbox services?
My memory is failing me - did we ever get a 64bit kernel running? All userspaces were certainly 32bit.
6by9
commented
Mar 1, 2016
|
:-( Userland also expects VC to preserve a kernel pointer https://github.com/raspberrypi/userland/blob/master/interface/mmal/vc/mmal_vc_msgs.h#L360 Are the other services OK? My memory is failing me - did we ever get a 64bit kernel running? All userspaces were certainly 32bit. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
ncguk
Mar 1, 2016
Speaking from a position of zero knowledge - how much of the Debian arm64 kernel source can be used before you run into problems?
ncguk
commented
Mar 1, 2016
|
Speaking from a position of zero knowledge - how much of the Debian arm64 kernel source can be used before you run into problems? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
deborah-c
Mar 2, 2016
At Broadcom, the intention was 64 bit user land over 32 bit kernel, as a long term thing.
deborah-c
commented
Mar 2, 2016
|
At Broadcom, the intention was 64 bit user land over 32 bit kernel, as a long term thing. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
TheSin-
Mar 2, 2016
why not 64bit across? Debian has a 64bit kernel and dist for arm. I know the Pi specific stuff would still need to be done, but why were they planning 32bit kernel? This is a curiosity question.
TheSin-
commented
Mar 2, 2016
|
why not 64bit across? Debian has a 64bit kernel and dist for arm. I know the Pi specific stuff would still need to be done, but why were they planning 32bit kernel? This is a curiosity question. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
grigorig
Mar 2, 2016
32 bit kernel w/ 64 bit userland? I wasn't aware that this is a possible combination. Seems like a strange idea.
grigorig
commented
Mar 2, 2016
|
32 bit kernel w/ 64 bit userland? I wasn't aware that this is a possible combination. Seems like a strange idea. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
6by9
Mar 2, 2016
At Broadcom, the intention was 64 bit user land over 32 bit kernel, as a long term thing.
I'd remembered other way up - 64 bit kernel, 32 bit userland (as that was the current state of Android). I couldn't remember if that work had actually happened - did we actually have A53s in a chip that was brought up?
6by9
commented
Mar 2, 2016
I'd remembered other way up - 64 bit kernel, 32 bit userland (as that was the current state of Android). I couldn't remember if that work had actually happened - did we actually have A53s in a chip that was brought up? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
pelwell
Mar 2, 2016
Contributor
64-bit user space with 32-bit kernel is not possible on ARMv8. The kernel (especially the task switching) needs to be able to access all register state used by user space, which wouldn't be possible if the kernel was in 32-bit mode. The ARMv8 architecture allows an AArch32->AArch64 transition as the result of an exception/interrupt, and AArch64->AArch32 on return from an exception; the reverse routes don't exist.
|
64-bit user space with 32-bit kernel is not possible on ARMv8. The kernel (especially the task switching) needs to be able to access all register state used by user space, which wouldn't be possible if the kernel was in 32-bit mode. The ARMv8 architecture allows an AArch32->AArch64 transition as the result of an exception/interrupt, and AArch64->AArch32 on return from an exception; the reverse routes don't exist. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
grigorig
Mar 2, 2016
Okay, good to see this clarified.
On x86, 64 bit kernels have some (small) performance advantages even if combined with 32 bit userspace. Maybe that's a possible motivation to get it working on Pi 3 as well.
grigorig
commented
Mar 2, 2016
|
Okay, good to see this clarified. On x86, 64 bit kernels have some (small) performance advantages even if combined with 32 bit userspace. Maybe that's a possible motivation to get it working on Pi 3 as well. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
pelwell
Mar 2, 2016
Contributor
There must be figures out there from all of those other A53-based SBCs comparing 32-bit vs 64-bit kernels - let's see some.
|
There must be figures out there from all of those other A53-based SBCs comparing 32-bit vs 64-bit kernels - let's see some. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
Ferroin
Mar 2, 2016
It's worth noting that the biggest reason on x86 for a performance increase is not the wider registers, but the fact that x86_64 has more general purpose registers available, which means on average you need fewer load/store operations to do the same calculation. AArch64 however has the same required registers as AArch32, so x86 is not really a good point of comparison for the performance difference.
While I don't personally have any figures, I can attest that there is a small but noticeable performance improvement for 64-bit vs 32-bit on both SPARC and PPC with recent kernels. I've not seen figures for any 64-bit ARM processors, but I would assume there will be a similar small but noticeable performance increase there as well as the differences between 32 and 64 bit modes on SPARC/PPC are relatively similar to those on ARM when compared to the changes on x86.
That said, I think the big thing that will really make the difference stand out is the fact that the in-kernel timekeeping structures are in the process of being converted from 32-bit to 64-bit to avoid the y2038 issue. Once that hits mainline, most 32-bit systems will likely show measurably lower performance as a result.
On a slightly separate note, I seem to recall hearing something about AArch64 natively supporting use of 32-bit pointers in otherwise 64-bit code (kind of like the X32 ABI on x86, just supported directly in hardware). If that is the case, then it might make handling the issues with pointer width a bit easier (and also result in overall better memory usage).
Ferroin
commented
Mar 2, 2016
|
It's worth noting that the biggest reason on x86 for a performance increase is not the wider registers, but the fact that x86_64 has more general purpose registers available, which means on average you need fewer load/store operations to do the same calculation. AArch64 however has the same required registers as AArch32, so x86 is not really a good point of comparison for the performance difference. While I don't personally have any figures, I can attest that there is a small but noticeable performance improvement for 64-bit vs 32-bit on both SPARC and PPC with recent kernels. I've not seen figures for any 64-bit ARM processors, but I would assume there will be a similar small but noticeable performance increase there as well as the differences between 32 and 64 bit modes on SPARC/PPC are relatively similar to those on ARM when compared to the changes on x86. That said, I think the big thing that will really make the difference stand out is the fact that the in-kernel timekeeping structures are in the process of being converted from 32-bit to 64-bit to avoid the y2038 issue. Once that hits mainline, most 32-bit systems will likely show measurably lower performance as a result. On a slightly separate note, I seem to recall hearing something about AArch64 natively supporting use of 32-bit pointers in otherwise 64-bit code (kind of like the X32 ABI on x86, just supported directly in hardware). If that is the case, then it might make handling the issues with pointer width a bit easier (and also result in overall better memory usage). |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
deborah-c
commented
Mar 2, 2016
|
Sorry, my bad: I've clearly misremembered! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
grigorig
Mar 4, 2016
On a slightly separate note, I seem to recall hearing something about AArch64 natively supporting use of 32-bit pointers in otherwise 64-bit code (kind of like the X32 ABI on x86, just supported directly in hardware). If that is the case, then it might make handling the issues with pointer width a bit easier (and also result in overall better memory usage).
I think it's called AArch64-ILP32. I am not sure if it is a good idea to use such an unusual ABI. No regular AArch32 or AArch64 binaries will work without a costly multilib setup.
grigorig
commented
Mar 4, 2016
I think it's called AArch64-ILP32. I am not sure if it is a good idea to use such an unusual ABI. No regular AArch32 or AArch64 binaries will work without a costly multilib setup. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
Ferroin
Mar 4, 2016
We would need a multilib setup anyway for 32-bit support if we do a 64-bit version, otherwise we're actively breaking compatibility with existing systems. On the Pi, flash storage space is cheap and upgradeable, whereas RAM is not, and this situation is exactly the type of thing that such ABI's are designed for.. On top of that,t we don't need to worry about AArch64 compatibility, we have no established user base using it, and people are more likely to either use stuff bundled with Raspbian (or whatever other distribution)) or built locally than third party proprietary code, with the sole exception being the Oracle JDK, which isn't as critical as it was because we have much better performance now and IcedTea should run fine (and there's no hardware acceleration for Java on newer processors anyway, so using Jazelle doesn't really provide any performance improvement). Such compatibility would be nice, but should by no means be mandatory.
The big deciding factor should really be whether we can support all three ABI's at the same time (I know of no distribution on x86 that currently supports all three options there (32, 64, and x32), even though the kernel fully supports all having all three operating modes on the same system), and whether the processor itself supports it (I think it's optional, but I'm not sure, I've never had the time to read the ARMv8 ABI spec).
Aside from that, my point was more that using that ABI in the kernel may allow us to avoid having to deal with pointer width issues in the kernel drivers. I'm not certain however that the kernel fully supports it yet though in mainline.
Ferroin
commented
Mar 4, 2016
|
We would need a multilib setup anyway for 32-bit support if we do a 64-bit version, otherwise we're actively breaking compatibility with existing systems. On the Pi, flash storage space is cheap and upgradeable, whereas RAM is not, and this situation is exactly the type of thing that such ABI's are designed for.. On top of that,t we don't need to worry about AArch64 compatibility, we have no established user base using it, and people are more likely to either use stuff bundled with Raspbian (or whatever other distribution)) or built locally than third party proprietary code, with the sole exception being the Oracle JDK, which isn't as critical as it was because we have much better performance now and IcedTea should run fine (and there's no hardware acceleration for Java on newer processors anyway, so using Jazelle doesn't really provide any performance improvement). Such compatibility would be nice, but should by no means be mandatory. The big deciding factor should really be whether we can support all three ABI's at the same time (I know of no distribution on x86 that currently supports all three options there (32, 64, and x32), even though the kernel fully supports all having all three operating modes on the same system), and whether the processor itself supports it (I think it's optional, but I'm not sure, I've never had the time to read the ARMv8 ABI spec). Aside from that, my point was more that using that ABI in the kernel may allow us to avoid having to deal with pointer width issues in the kernel drivers. I'm not certain however that the kernel fully supports it yet though in mainline. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
ED6E0F17
Mar 5, 2016
(Upstream, not rpi-specific) Kernel ILP32 support is not fully baked, but someone is putting a lot of effort into it:
ED6E0F17
commented
Mar 5, 2016
|
(Upstream, not rpi-specific) Kernel ILP32 support is not fully baked, but someone is putting a lot of effort into it: |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
grigorig
Mar 5, 2016
I'm not very convinced that going for AArch64-ILP32 is a good idea. Raspbian is stuck with an unusual ARMv6 hard-fp architecture/ABI, but it was necessary given the BCM2835 SoC. Now we have a chance of finally switching to a standard ABI, so let's do that instead of going for some questionable new ABI that doesn't really have much support upstream.
Regarding multiarch, having to support less architectures is always a good thing. AArch64-ILP32 would add a third architecture into the mix. And storage might be cheap, but it's not free either! Also, multiarch can actually increase RAM usage because shared libraries can't be shared if processes of multiple architectures are running at the same time. This can be a pretty big deal if large frameworks like Qt are used.
grigorig
commented
Mar 5, 2016
|
I'm not very convinced that going for AArch64-ILP32 is a good idea. Raspbian is stuck with an unusual ARMv6 hard-fp architecture/ABI, but it was necessary given the BCM2835 SoC. Now we have a chance of finally switching to a standard ABI, so let's do that instead of going for some questionable new ABI that doesn't really have much support upstream. Regarding multiarch, having to support less architectures is always a good thing. AArch64-ILP32 would add a third architecture into the mix. And storage might be cheap, but it's not free either! Also, multiarch can actually increase RAM usage because shared libraries can't be shared if processes of multiple architectures are running at the same time. This can be a pretty big deal if large frameworks like Qt are used. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
niklas88
Mar 5, 2016
I wonder how big the case is for binary compatibility anyway? Most Raspberry Pi users that actually have non-repository software probably either use scripting languages like Python or have the source and shouldn't have a problem combining a system upgrade with recompiling their code. Also interestingly Go 32bit ARM executables would only need a 32bit glibc besides there being ARM64 support. So that basically leaves the people working with C, C++ without access to source code while still having a desire to upgrade.
On the other hand many people likely do want to port code to ARM64 and would greatly benefit from the Rasberry Pi as an inexpensive ARM64 platform. So yeah I really don't see the Raspberry Pi as depending on binary compatibility.
niklas88
commented
Mar 5, 2016
|
I wonder how big the case is for binary compatibility anyway? Most Raspberry Pi users that actually have non-repository software probably either use scripting languages like Python or have the source and shouldn't have a problem combining a system upgrade with recompiling their code. Also interestingly Go 32bit ARM executables would only need a 32bit glibc besides there being ARM64 support. So that basically leaves the people working with C, C++ without access to source code while still having a desire to upgrade. On the other hand many people likely do want to port code to ARM64 and would greatly benefit from the Rasberry Pi as an inexpensive ARM64 platform. So yeah I really don't see the Raspberry Pi as depending on binary compatibility. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
turnip86
Mar 7, 2016
For the love of all things digital, please provide an AArch64 kernel build. Debian and Arch already have arm64 ports, and a large number of Rapsberry Pi owners are using those distros already, one of the motivations being armv7 support on Pi 2. There are significant performance increases - 15% to 30% - in running AArch64 code versus AArch32 on Cortex A53:
http://www.cnx-software.com/2016/03/01/64-bit-arm-aarch64-instructions-boost-performance-by-15-to-30-compared-to-32-bit-arm-aarch32-instructions/
(pelwell, Ferroin: was this what you're looking for?)
And this does not take into account the benefits of AArch32 compared to ARMv7, like load-acquire/store-release, new VFP float and SIMD instructions, and the cryptography extensions. https://www.element14.com/community/servlet/JiveServlet/previewBody/41836-102-1-229511/ARM.Reference_Manual.pdf (page 106)
One group of users that will directly benefit from this are people who use the Pi for media and emulation. OpenELEC, OSMC and RetroPie all have separate armv6 and armv7 releases specifically to maximize performance.
Would any Raspberry-specific userland code need to be patched for this?
turnip86
commented
Mar 7, 2016
|
For the love of all things digital, please provide an AArch64 kernel build. Debian and Arch already have arm64 ports, and a large number of Rapsberry Pi owners are using those distros already, one of the motivations being armv7 support on Pi 2. There are significant performance increases - 15% to 30% - in running AArch64 code versus AArch32 on Cortex A53: http://www.cnx-software.com/2016/03/01/64-bit-arm-aarch64-instructions-boost-performance-by-15-to-30-compared-to-32-bit-arm-aarch32-instructions/ And this does not take into account the benefits of AArch32 compared to ARMv7, like load-acquire/store-release, new VFP float and SIMD instructions, and the cryptography extensions. https://www.element14.com/community/servlet/JiveServlet/previewBody/41836-102-1-229511/ARM.Reference_Manual.pdf (page 106) One group of users that will directly benefit from this are people who use the Pi for media and emulation. OpenELEC, OSMC and RetroPie all have separate armv6 and armv7 releases specifically to maximize performance. Would any Raspberry-specific userland code need to be patched for this? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
Ferroin
Mar 7, 2016
@grigorig The big things that made me think about it were:
- AArch64-ILP32 is intended for memory constrained systems which will never need a 64-bit address space (VC4 limits us to 4G RAM, which fits this perfectly)
- There were multiple comments made about certain components only using 32-bit pointers, and thus potentially needing significant work to handle properly from a regular AArch64 kernel.
I was advocating it less because I want to deal with it than because I thought it might help as a starting point.
@turnip86 There would likely be some significant code changes needed. From what I understand based on discussion both here and elsewhere, some of the hardware components only deal in 32-bit pointers, and handling that sanely will take some work, not only in the vc binaries, but likely also in most of the third-party stuff that uses hardware acceleration.
Ferroin
commented
Mar 7, 2016
|
@grigorig The big things that made me think about it were:
@turnip86 There would likely be some significant code changes needed. From what I understand based on discussion both here and elsewhere, some of the hardware components only deal in 32-bit pointers, and handling that sanely will take some work, not only in the vc binaries, but likely also in most of the third-party stuff that uses hardware acceleration. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
MrTomasz
Mar 9, 2016
Maybe let's try first running proper kernel in AArch64 mode?
I already did bunch of work to try boot it, but still can't see kernel booting on UART... as I mentioned on forums, it's not "make ARCH=arm64 defconfig Image" simple shot...
Anyone working on 64bit kernel as well?
MrTomasz
commented
Mar 9, 2016
|
Maybe let's try first running proper kernel in AArch64 mode? I already did bunch of work to try boot it, but still can't see kernel booting on UART... as I mentioned on forums, it's not "make ARCH=arm64 defconfig Image" simple shot... Anyone working on 64bit kernel as well? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
TheSin-
Mar 9, 2016
I currently am but no where near ready for a test boot yet. And my Pi3 doesn't arrive for weeks yet sadly.
TheSin-
commented
Mar 9, 2016
|
I currently am but no where near ready for a test boot yet. And my Pi3 doesn't arrive for weeks yet sadly. |
MrTomasz
commented
Mar 9, 2016
|
@TheSin- |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
TheSin-
Mar 10, 2016
Once I get a full build sure, but I'm sure I won't be the first or fastest source, there are ppl here much stronger at this stuff them me, I'm I'm just using the debian build system with cross compiling ATM, and I haven't made it very far cause i"m still messing the the defines for the .config build.
Not to mention I'm still working with 4.1 which debian no longer supports, been thinking about jumping to 4.4 but debian testing is only on 4.3, so lots to decide still. And I have no idea how stable the 4.3 and/or 4.4 branches are here. I assume everything in the 4.1 branch gets back ports to the other branches, but haven't looked into it. Though I'm sure a newer kernel would be easier to work with for arm64.
TheSin-
commented
Mar 10, 2016
|
Once I get a full build sure, but I'm sure I won't be the first or fastest source, there are ppl here much stronger at this stuff them me, I'm I'm just using the debian build system with cross compiling ATM, and I haven't made it very far cause i"m still messing the the defines for the .config build. Not to mention I'm still working with 4.1 which debian no longer supports, been thinking about jumping to 4.4 but debian testing is only on 4.3, so lots to decide still. And I have no idea how stable the 4.3 and/or 4.4 branches are here. I assume everything in the 4.1 branch gets back ports to the other branches, but haven't looked into it. Though I'm sure a newer kernel would be easier to work with for arm64. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
TheSin-
Mar 10, 2016
okay so with the 4.1 tree and the debian build system I've finally got config and such working I believe, but now I'm at my first VC issue, this is where things are gonna get icky for me anyhow as I assume we are going to have to convert everything to force 32bit integers.
/root/rpi/linux-4.1/linux/drivers/char/broadcom/vc_cma/vc_cma.c:462:11: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
.write = vc_cma_proc_write,
^
/root/rpi/linux-4.1/linux/drivers/char/broadcom/vc_cma/vc_cma.c:462:11: note: (near initialization for ‘vc_cma_proc_fops.write’)
/root/rpi/linux-4.1/linux/drivers/char/broadcom/vc_cma/vc_cma.c: In function ‘vc_cma_alloc_chunks’:
/root/rpi/linux-4.1/linux/drivers/char/broadcom/vc_cma/vc_cma.c:584:3: error: implicit declaration of function ‘dmac_flush_range’ [-Werror=implicit-function-declaration]
dmac_flush_range(chunk_addr, chunk_addr + chunk_size);
^
/root/rpi/linux-4.1/linux/drivers/char/broadcom/vc_cma/vc_cma.c:585:3: error: implicit declaration of function ‘outer_inv_range’ [-Werror=implicit-function-declaration]
outer_inv_range(__pa(chunk_addr), __pa(chunk_addr) +
^
/root/rpi/linux-4.1/linux/drivers/char/broadcom/vc_cma/vc_cma.c: In function ‘cma_worker_proc’:
/root/rpi/linux-4.1/linux/drivers/char/broadcom/vc_cma/vc_cma.c:651:7: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
if ((unsigned int)msg >= VC_CMA_MSG_MAX) {
^
/root/rpi/linux-4.1/linux/drivers/char/broadcom/vc_cma/vc_cma.c:658:11: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
type = (int)msg;
^
In file included from /root/rpi/linux-4.1/linux/include/linux/printk.h:6:0,
from /root/rpi/linux-4.1/linux/include/linux/kernel.h:13,
from /root/rpi/linux-4.1/linux/drivers/char/broadcom/vc_cma/vc_cma.c:34:
/root/rpi/linux-4.1/linux/include/linux/kern_levels.h:4:18: error: format ‘%d’ expects argument of type ‘int’, but argument 3 has type ‘long unsigned int’ [-Werror=format=]
#define KERN_SOH "\001" /* ASCII Start Of Header */
^
/root/rpi/linux-4.1/linux/include/linux/kern_levels.h:10:18: note: in expansion of macro ‘KERN_SOH’
#define KERN_ERR KERN_SOH "3" /* error conditions */
^
/root/rpi/linux-4.1/linux/drivers/char/broadcom/vc_cma/vc_cma.c:64:9: note: in expansion of macro ‘KERN_ERR’
printk(KERN_ERR fmt "\n", ##__VA_ARGS__)
^
/root/rpi/linux-4.1/linux/drivers/char/broadcom/vc_cma/vc_cma.c:678:6: note: in expansion of macro ‘LOG_ERR’
LOG_ERR
^
/root/rpi/linux-4.1/linux/drivers/char/broadcom/vc_cma/vc_cma.c:732:12: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
(unsigned int)page);
^
/root/rpi/linux-4.1/linux/drivers/char/broadcom/vc_cma/vc_cma.c:64:30: note: in definition of macro ‘LOG_ERR’
printk(KERN_ERR fmt "\n", ##__VA_ARGS__)
^
Should I make a PR on the linux tree for the Kconfig changes? I'm mostly just reusing the 2709 stuff for now, since I don't have a 2710 to get more specific, I'd just like to be able to build to start, I know the VC stuff is going to take some time and planning but we all have to start someplace ;)
TheSin-
commented
Mar 10, 2016
|
okay so with the 4.1 tree and the debian build system I've finally got config and such working I believe, but now I'm at my first VC issue, this is where things are gonna get icky for me anyhow as I assume we are going to have to convert everything to force 32bit integers.
Should I make a PR on the linux tree for the Kconfig changes? I'm mostly just reusing the 2709 stuff for now, since I don't have a 2710 to get more specific, I'd just like to be able to build to start, I know the VC stuff is going to take some time and planning but we all have to start someplace ;) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
MrTomasz
Mar 10, 2016
You can try first to disable that kind of things. I believe it shall boot with minimal subset of things...
Remember also to disable EFI in config, otherwise you will create incompatible kernel binary.
MrTomasz
commented
Mar 10, 2016
|
You can try first to disable that kind of things. I believe it shall boot with minimal subset of things... Remember also to disable EFI in config, otherwise you will create incompatible kernel binary. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
TheSin-
Mar 10, 2016
yeah I just wanted to try with the VC stuff to start see how far I can make it. And as for EFI it's all set the same as my rpi and rpi2 builds. Anyhow trying it now with VC stuff disabled.
TheSin-
commented
Mar 10, 2016
|
yeah I just wanted to try with the VC stuff to start see how far I can make it. And as for EFI it's all set the same as my rpi and rpi2 builds. Anyhow trying it now with VC stuff disabled. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
TheSin-
Mar 11, 2016
okay disabled VC stuff to try and get a little further, I'm now stuck with
/tmp/ccyU2uoV.s: Assembler messages:
/tmp/ccyU2uoV.s:117: Error: missing immediate expression at operand 1 -- `dsb '
/tmp/ccyU2uoV.s:199: Error: missing immediate expression at operand 1 -- `dsb '
/tmp/ccyU2uoV.s:297: Error: missing immediate expression at operand 1 -- `dsb '
/root/rpi/linux-4.1/linux/scripts/Makefile.build:258: recipe for target 'drivers/dma/bcm2708-dmaengine.o' failed
make[7]: *** [drivers/dma/bcm2708-dmaengine.o] Error 1
Seems like pretty much all the RPI stuff is going to have issues of some sort. Asm is not my thing so I'm going to have to skip that I assume.
TheSin-
commented
Mar 11, 2016
|
okay disabled VC stuff to try and get a little further, I'm now stuck with
Seems like pretty much all the RPI stuff is going to have issues of some sort. Asm is not my thing so I'm going to have to skip that I assume. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
MrTomasz
Mar 11, 2016
I don't have my code right now with me, but if I remember correctly, you're building with CONFIG_DMA_BCM2708_LEGACY=y which as I understand, it is wrong for BCM2709 (and 2710).
I did it in this way:
config DMA_BCM2708_LEGACY
bool "BCM2708 DMA legacy API support"
depends on (DMA_BCM2708 && !ARCH_BCM2710)
default y
MrTomasz
commented
Mar 11, 2016
|
I don't have my code right now with me, but if I remember correctly, you're building with CONFIG_DMA_BCM2708_LEGACY=y which as I understand, it is wrong for BCM2709 (and 2710). I did it in this way:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
TheSin-
commented
Mar 11, 2016
|
nice i'll try that thanks, I'm using 2709 as a base |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
madscientist42
Mar 16, 2016
How're things coming along on this? Many are waiting with bated breath on the people trying right now (no sense in a bunch of duplicated efforts...)
madscientist42
commented
Mar 16, 2016
|
How're things coming along on this? Many are waiting with bated breath on the people trying right now (no sense in a bunch of duplicated efforts...) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
popcornmix
Mar 16, 2016
Contributor
Very impressive progress here.
In the last week there has been:
a 64-bit demo with uart output
a 64-bit port of U-boot
a 64-bit upstream kernel (single core only, and no gpu features)
|
Very impressive progress here. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
madscientist42
Mar 16, 2016
Epic. I'll need to pop over there to grab the work ongoing so that I can get a rough-cut for OE metadata there going. :D
madscientist42
commented
Mar 16, 2016
|
Epic. I'll need to pop over there to grab the work ongoing so that I can get a rough-cut for OE metadata there going. :D |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
swarren
Apr 9, 2016
Should this be closed now? Per the 3rd comment here, the Pi Foundation is going to leave 64-bit kernel support to the community which implies, and besides that aspect should probably be covered by a bug against the kernel git not the firmware git. The firmware does now support 64-bit booting, and any remaining issues re: that feature are covered by issue #579.
swarren
commented
Apr 9, 2016
|
Should this be closed now? Per the 3rd comment here, the Pi Foundation is going to leave 64-bit kernel support to the community which implies, and besides that aspect should probably be covered by a bug against the kernel git not the firmware git. The firmware does now support 64-bit booting, and any remaining issues re: that feature are covered by issue #579. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
Ruffio
commented
Jun 29, 2016
|
Should this be closed? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
xcvista
Jul 23, 2016
I wonder if this method can solve this 32-bit pointer issue:
- If we are talking about physical addresses, since the Raspberry Pi hardware will not support anything over 4GB anytime soon, we can safely crop off the high bits and pass to the hardware.
- If we are talking about virtual addresses, I think the "canonical address" concept from amd64 can be borrowed: all high 32 bits of a 64-bit virtual address have to be the same as bit 31 and any virtual memory address out of that range result in SIGSEGV. In other words, limit the virtual address space to
0x0000000000000000-0x000000007fffffffand0xffffffff80000000-0xffffffffffffffffThis means that when a pointer is passed to the hardware the top 32 bits can be safely cropped off, and when passed back it can be safely sign extended into a 64-bit canonical address.
xcvista
commented
Jul 23, 2016
|
I wonder if this method can solve this 32-bit pointer issue:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
popcornmix
Jul 23, 2016
Contributor
I assume this scheme wouldn't help Mongo DB which I believe maps the whole database file ( > 4GB) to virtual RAM. That's the only example I've seen reported as requiring a 64-bit address space to run.
But yes, if virtual and physical address spaces are limited to 32-bit then that would avoid the issue of pointers (e.g. userdata/cookies) being returned to applications from GPU callbacks. I'm sure some would argue that is not a fully 64-bit system (although with only 1GB of physical RAM the limitation is unlikely to affect many use cases).
|
I assume this scheme wouldn't help Mongo DB which I believe maps the whole database file ( > 4GB) to virtual RAM. That's the only example I've seen reported as requiring a 64-bit address space to run. But yes, if virtual and physical address spaces are limited to 32-bit then that would avoid the issue of pointers (e.g. userdata/cookies) being returned to applications from GPU callbacks. I'm sure some would argue that is not a fully 64-bit system (although with only 1GB of physical RAM the limitation is unlikely to affect many use cases). |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
xcvista
Jul 23, 2016
@popcornmix This is almost exactly what the x32 ABI for amd64 is - 32-bit pointers for an otherwise 64-bit system. I think this can be a stop-gap method between the 32-bit only and fully 64-bit kernel.
Another method would be introducing one layer of indirection in the kernel. Whenever the userland passes a pointer to the GPU, it is catched by the kernel, put into a buffer, and an kernel pointer to the buffer is passed to the GPU instead. The kernel still have to keep itself inside the top-half "canonical address" range for this to work though, as pointers are still passed with their high bits cropped off. This can affect the efficiency of user-mode GPU calls but removes the 32-bit pointer length limit.
It seem to me that this pair fits well in the current Raspbian/Raspbian Lite release model. The first have a virtual memory size limit of 4GB but have faster graphics, better suited as a desktop system; while the latter have full 64-bit virtual memory space but graphics can be atrociously slow, better suited as a headless server system.
xcvista
commented
Jul 23, 2016
|
@popcornmix This is almost exactly what the x32 ABI for amd64 is - 32-bit pointers for an otherwise 64-bit system. I think this can be a stop-gap method between the 32-bit only and fully 64-bit kernel. Another method would be introducing one layer of indirection in the kernel. Whenever the userland passes a pointer to the GPU, it is catched by the kernel, put into a buffer, and an kernel pointer to the buffer is passed to the GPU instead. The kernel still have to keep itself inside the top-half "canonical address" range for this to work though, as pointers are still passed with their high bits cropped off. This can affect the efficiency of user-mode GPU calls but removes the 32-bit pointer length limit. It seem to me that this pair fits well in the current Raspbian/Raspbian Lite release model. The first have a virtual memory size limit of 4GB but have faster graphics, better suited as a desktop system; while the latter have full 64-bit virtual memory space but graphics can be atrociously slow, better suited as a headless server system. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
popcornmix
Jul 23, 2016
Contributor
There is an option of using 32-bit pointers globally (as a compiler default), but that precludes using standard 64-bit debian packages, so is not a favoured option.
64-bit pointers that are forced (through some kernel virtual address limiting) to only have 32 significant bits is a possibility, but doesn't fix Mongo DB.
I think the layer of indirection in the kernel<->GPU interface is probably the best option, but there may be some performance hit in the lookups. Probably not critical in general as I suspect the number of messages awaiting a response from GPU will normally be low, but there may be some situations where it gets to be a problem.
Currently we haven't seen strong evidence (e.g. benchmarks) that show there will be a noticeable performance improvement when moving to 64-bit, so it's unlikely to become a default configuration for raspbian and hence not a very high priority. We'd certainly like to support it for users who are interested, so suggestions for good ways to solve it are welcome.
|
There is an option of using 32-bit pointers globally (as a compiler default), but that precludes using standard 64-bit debian packages, so is not a favoured option. 64-bit pointers that are forced (through some kernel virtual address limiting) to only have 32 significant bits is a possibility, but doesn't fix Mongo DB. I think the layer of indirection in the kernel<->GPU interface is probably the best option, but there may be some performance hit in the lookups. Probably not critical in general as I suspect the number of messages awaiting a response from GPU will normally be low, but there may be some situations where it gets to be a problem. Currently we haven't seen strong evidence (e.g. benchmarks) that show there will be a noticeable performance improvement when moving to 64-bit, so it's unlikely to become a default configuration for raspbian and hence not a very high priority. We'd certainly like to support it for users who are interested, so suggestions for good ways to solve it are welcome. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
xcvista
Jul 23, 2016
@popcornmix Both limited pointer solution and GPU trapping solution allows the use of standard Debian packages, and the trade-off is virtual memory space versus graphics performance. I think this should be a choice up to the user to make.
A 64-bit processor can handle SHA512 (as well as its friends SHA384, SHA512/224 and SHA512/256) much faster than a 32-bit one as the internal states, being 64-bit long, can fit in registers natively. Also AArch64 have more registers than AArch32, allowing for more aggressive optimizations.
xcvista
commented
Jul 23, 2016
|
@popcornmix Both limited pointer solution and GPU trapping solution allows the use of standard Debian packages, and the trade-off is virtual memory space versus graphics performance. I think this should be a choice up to the user to make. A 64-bit processor can handle SHA512 (as well as its friends SHA384, SHA512/224 and SHA512/256) much faster than a 32-bit one as the internal states, being 64-bit long, can fit in registers natively. Also AArch64 have more registers than AArch32, allowing for more aggressive optimizations. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
cleverca22
Jul 24, 2016
would it be possible to do both?
only use half of the 64 bits for any app dealing with the gpu
but use the full 64 bits for non-gpu things like mongodb?
cleverca22
commented
Jul 24, 2016
|
would it be possible to do both? only use half of the 64 bits for any app dealing with the gpu but use the full 64 bits for non-gpu things like mongodb? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
xcvista
Jul 25, 2016
@cleverca22 Then how do you tell them apart? What if a program that have already claimed a memory block out of the canonical memory block suddenly start to call GPU?
xcvista
commented
Jul 25, 2016
|
@cleverca22 Then how do you tell them apart? What if a program that have already claimed a memory block out of the canonical memory block suddenly start to call GPU? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
cleverca22
Jul 30, 2016
only thing i can think of there is a flag in the ELF headers that you set at compile time, to promise to never do GPU calls
though now that i think of it, you could also modify the userland, to just use mmap() to create a secondary heap in the lower 4gig of the userland?
cleverca22
commented
Jul 30, 2016
|
only thing i can think of there is a flag in the ELF headers that you set at compile time, to promise to never do GPU calls though now that i think of it, you could also modify the userland, to just use mmap() to create a secondary heap in the lower 4gig of the userland? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
xcvista
Jul 30, 2016
@cleverca22 There is a MAP_32BIT flag in mmap(2) for AMD64. Maybe we can implement this for AArch64? Usual malloc(3) does not have a virtual memory location promise (and can go over 2GB) but mmap(2) with MAP_32BIT guarantees a sub-2GB address range.
xcvista
commented
Jul 30, 2016
|
@cleverca22 There is a |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
cleverca22
Jul 31, 2016
and since your dealing with relatively large buffers being shared to the GPU, mmap isn't really an overhead, malloc will often internally re-route to mmap when you request large blocks
cleverca22
commented
Jul 31, 2016
|
and since your dealing with relatively large buffers being shared to the GPU, mmap isn't really an overhead, malloc will often internally re-route to mmap when you request large blocks |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
xcvista
Aug 1, 2016
@cleverca22 @popcornmix So to round up: I think we can use a straight full 64-bit AArch64 kernel and implement the MAP_32BIT flag for mmap(2) with the same semantics as implemented on AMD64. This pose no performance penalty as there is no pointer trapping involved and no reserved memory. The MAP_32BIT flag allows allocating (or mapping) memory with its highest 33 bits of a 64-bit pointer guaranteed to be zero over the entire allocation block, allowing GPU-facing code to allocate memory with pointers that is safe to be cropped short.
This means:
- The existing VC driver stack will still work after refactoring with 64-bit in mind,
- No change to VC code needed,
- The kernel must relocate itself to high-half GPU-safe memory
0xffffffff80000000-0xffffffffffffffffbefore making GPU calls, - Existing VC-facing code must be modified to use
mmap(2)withMAP_32BITto allocate memory that would be passed to the VC - Optionally, implement a memory range check in VC kernel code to
EBADMorSIGSEGVout calls with non canonical address.
xcvista
commented
Aug 1, 2016
|
@cleverca22 @popcornmix So to round up: I think we can use a straight full 64-bit AArch64 kernel and implement the This means:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
xcvista
Aug 2, 2016
Another point of optimization with a 64-bit kernel with MAP_32BIT is code address optimization with address space layout randomization.
CAO means that code segments of PIE are loaded out of the GPU-safe range. This requires ASLR facility so might as well implement it as well. It randomizes the address layout of both the kernel and the loaded PIE, reducing the likelihood a stack or heap overflow attack working.
xcvista
commented
Aug 2, 2016
|
Another point of optimization with a 64-bit kernel with CAO means that code segments of PIE are loaded out of the GPU-safe range. This requires ASLR facility so might as well implement it as well. It randomizes the address layout of both the kernel and the loaded PIE, reducing the likelihood a stack or heap overflow attack working. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
Gaunah
commented
Jan 7, 2017
|
Is this issue still relevant? e.g. a chance to get AArch64 support? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
madscientist42
Jan 8, 2017
I think it is... Raspbian's (i.e. The Pi Foundation) dragging their feet, but it seems that SuSE and Arch have managed to get there. (I got SuSE to boot on my Pi 3, but had varying hardware issues (i.e. It doesn't play nice with a PiTop monitor and doesn't seem to work right with a Logitech Unifying HID device right... Arch is about to be evaluated here in a few moments.)
All things considered, there's little excuses for the Foundation to NOT embrace this as an option since they're phasing out the Pi 2's SoC for the Pi 3's with the Pi 2 now only being sans WiFi and Bluetooth. It makes a HELL of a lot more sense to have two worlds- PiZero/Pi and then Pi2/3, with the old 2's being in the other 32-bit ARMv6/7 world and the rest being in the ARMv8 properly. You gain a LOT from being in AArch32, you gain even MORE in AArch64.
madscientist42
commented
Jan 8, 2017
|
I think it is... Raspbian's (i.e. The Pi Foundation) dragging their feet, but it seems that SuSE and Arch have managed to get there. (I got SuSE to boot on my Pi 3, but had varying hardware issues (i.e. It doesn't play nice with a PiTop monitor and doesn't seem to work right with a Logitech Unifying HID device right... Arch is about to be evaluated here in a few moments.) All things considered, there's little excuses for the Foundation to NOT embrace this as an option since they're phasing out the Pi 2's SoC for the Pi 3's with the Pi 2 now only being sans WiFi and Bluetooth. It makes a HELL of a lot more sense to have two worlds- PiZero/Pi and then Pi2/3, with the old 2's being in the other 32-bit ARMv6/7 world and the rest being in the ARMv8 properly. You gain a LOT from being in AArch32, you gain even MORE in AArch64. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
madscientist42
Jan 8, 2017
Now, SuSE side-stepped some of this- they used the UEFI "firmware" path. No telling what Arch did yet- I'll be seeing this in a bit and reporting back.
madscientist42
commented
Jan 8, 2017
|
Now, SuSE side-stepped some of this- they used the UEFI "firmware" path. No telling what Arch did yet- I'll be seeing this in a bit and reporting back. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
pelwell
Jan 8, 2017
Contributor
It makes a HELL of a lot more sense to have two worlds- PiZero/Pi and then Pi2/3, with the old 2's being in the other 32-bit ARMv6/7 world and the rest being in the ARMv8 properly.
That sounds like three worlds.
You gain a LOT from being in AArch32, you gain even MORE in AArch64.
Back in June, @popcornmix wrote "Currently we haven't seen strong evidence (e.g. benchmarks) that show there will be a noticeable performance improvement when moving to 64-bit", and that remains true today. If you have some compelling numbers then please share them with us.
That sounds like three worlds.
Back in June, @popcornmix wrote "Currently we haven't seen strong evidence (e.g. benchmarks) that show there will be a noticeable performance improvement when moving to 64-bit", and that remains true today. If you have some compelling numbers then please share them with us. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
madscientist42
Jan 8, 2017
That's because nobody's DONE a lot of benchmarks.
I'd opine that many said the PRECISELY same things about x86-64 when it came out. I know, I was one of the early adopters, getting access to one of the Clawhammer prototypes.
Most of the commentary talk about "increased" memory sizes and the like and don't have a single clue about what they're talking about, unfortunately. I'll probably get benchmarks for myself and see.
As for your numbers...
http://www.anandtech.com/show/7335/the-iphone-5s-review/4
30% is enough to bother with. 15%'s a bit shallow, but it's a gain all the same. Some things gain massive jumps of 200% over the 32-bits with a few things losing single-digit percentages, which could be merely implementation fails.
I'm one for not leaving things lying on the floor JUST to be able to boot one single image across the line (which you can't do anyhow...you've got PiZero/Pi and Pi2/3 images right now as it is.) Sorry, the argument there IS specious and invalid, based on just that alone.
Clear numbers have been said- and only a couple of months past when this was boldly (and quite incorrectly, I might add) said. Time to ditch the rubbish and think about what you can gain from it all.
madscientist42
commented
Jan 8, 2017
•
|
That's because nobody's DONE a lot of benchmarks. I'd opine that many said the PRECISELY same things about x86-64 when it came out. I know, I was one of the early adopters, getting access to one of the Clawhammer prototypes. Most of the commentary talk about "increased" memory sizes and the like and don't have a single clue about what they're talking about, unfortunately. I'll probably get benchmarks for myself and see. As for your numbers... http://www.anandtech.com/show/7335/the-iphone-5s-review/4 30% is enough to bother with. 15%'s a bit shallow, but it's a gain all the same. Some things gain massive jumps of 200% over the 32-bits with a few things losing single-digit percentages, which could be merely implementation fails. I'm one for not leaving things lying on the floor JUST to be able to boot one single image across the line (which you can't do anyhow...you've got PiZero/Pi and Pi2/3 images right now as it is.) Sorry, the argument there IS specious and invalid, based on just that alone. Clear numbers have been said- and only a couple of months past when this was boldly (and quite incorrectly, I might add) said. Time to ditch the rubbish and think about what you can gain from it all. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
madscientist42
Jan 8, 2017
As for Arch...it's up. Now I get to do my own analysis and benchmarking. The fact that SuSE saw fit to make this is a hint for most...they're not ones to "waste time" on things.
madscientist42
commented
Jan 8, 2017
|
As for Arch...it's up. Now I get to do my own analysis and benchmarking. The fact that SuSE saw fit to make this is a hint for most...they're not ones to "waste time" on things. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
xcvista
Jan 9, 2017
@pelwell Two worlds: ARMv6 AArch32 version and ARMv8 AArch64 64-bit-only version.
I think that the MAP_32BIT hack I mentioned above is still relevant. The libc can be modified to use MAP_32BIT in malloc(3) by default (this shouldn't break compatibility) to make the migration easier.
xcvista
commented
Jan 9, 2017
•
|
@pelwell Two worlds: ARMv6 AArch32 version and ARMv8 AArch64 64-bit-only version. I think that the |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment|
Pi2 is ARMv7. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
xcvista
Jan 9, 2017
@pelwell There are two Pi 2 revisions, the Rev 1.1 and the Rev 1.2. Rev 1.1 shipped with BCM2836 which is a ARMv7 processor. Rev 1.2 have been updated with BCM2837, an ARMv8 AArch64 processor.
The ARMv6 version runs on Pi 0, 1, 1+, 2 Rev 1.1 and CM1, while ARMv8 AArch64 version runs on Pi 2 Rev 1.2, 3 and CM3.
xcvista
commented
Jan 9, 2017
|
@pelwell There are two Pi 2 revisions, the Rev 1.1 and the Rev 1.2. Rev 1.1 shipped with BCM2836 which is a ARMv7 processor. Rev 1.2 have been updated with BCM2837, an ARMv8 AArch64 processor. The ARMv6 version runs on Pi 0, 1, 1+, 2 Rev 1.1 and CM1, while ARMv8 AArch64 version runs on Pi 2 Rev 1.2, 3 and CM3. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment|
Have you tried booting a 2836-based Pi2 with a 2835 image? |
Ruffio
commented
Jan 9, 2017
|
@pelwell Is this really the level of arguments when discussing 32 vs 64 bits kernel? Wordings? Shouldn't the arguments be about performance, compatibility, pros/cons, overall architecture and what the future brings? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
Ferroin
Jan 9, 2017
He asked a perfectly valid question given what @xcvista proposed. If the suggested plan going forwards is to be running an ARMv6 kernel on Pi 0, 1 1+, CM1 and 2r1.1 and an ARMv8 kernel on 2r1.2, 3, and CM3, then it needs to be determined how well a BCM2835 kernel (the ARMv6 one) works on a BCM2836 system (the Pi 2 r1.1 SoC).
Ferroin
commented
Jan 9, 2017
|
He asked a perfectly valid question given what @xcvista proposed. If the suggested plan going forwards is to be running an ARMv6 kernel on Pi 0, 1 1+, CM1 and 2r1.1 and an ARMv8 kernel on 2r1.2, 3, and CM3, then it needs to be determined how well a BCM2835 kernel (the ARMv6 one) works on a BCM2836 system (the Pi 2 r1.1 SoC). |
|
@Ruffio Support multiple kernel configurations is a resource drain, which is why the 2 vs 3 question is important. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
xcvista
commented
Jan 9, 2017
|
@Ferroin Currently the AArch32 image contains ARMv6 and v7 kernels at the same time. That is not being changed as ARMv6 userland still works with a v7 kernel. What I am talking about is a new image based on AArch64 which not only calls for a new kernel but also a new userland. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
Ferroin
commented
Jan 9, 2017
|
Your statement suggested (at least to both myself and @pelwell) eliminating the ARMv7 kernel from the mix, which if it would work reasonably well, would be a viable option on the kernel side because the Foundation doesn't want to support all that many kernels. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
pelwell
Jan 9, 2017
Contributor
Precisely. And while we are prepared to selectively improve performance on some Pis, we are loathe to slow down some Pis in order to speed up others, particularly where the older, slower models are adversely affected.
|
Precisely. And while we are prepared to selectively improve performance on some Pis, we are loathe to slow down some Pis in order to speed up others, particularly where the older, slower models are adversely affected. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
Ferroin
Jan 9, 2017
Slightly OT, but I'm curious how much difference there actually is between the ARMv6 and ARMv7 kernels beyond the generic ISA differences between v6 and v7. IOW, is there actually all that much maintenance burden from a coding perspective, or is it mostly just testing? Looking at the code with my admittedly somewhat lacking background regarding the drivers in question, it doesn't look like there's much actual difference between v6 and v7 code. Looking at the issue tracker, there appears to be very little other than the initial stuff that's expected when adding new code that's been an issue resulting from a difference between v6 and v7. Yes, it's not anywhere near as trivial to add v8 support (although from looking at Arch, it looks like most of the work other than the GPU drivers is already done), and yes this will break compiled code (although the percentage of stuff on the Pi that's compiled code and not scripted is probably pretty small), but you've got a community that's pretty willing to help with testing and debugging, and given that, I think this won't add anywhere near as much work as you seem to think once the initial work of porting the (arguably poorly written given that they're not 64-bit clean) drivers is complete.
Ferroin
commented
Jan 9, 2017
|
Slightly OT, but I'm curious how much difference there actually is between the ARMv6 and ARMv7 kernels beyond the generic ISA differences between v6 and v7. IOW, is there actually all that much maintenance burden from a coding perspective, or is it mostly just testing? Looking at the code with my admittedly somewhat lacking background regarding the drivers in question, it doesn't look like there's much actual difference between v6 and v7 code. Looking at the issue tracker, there appears to be very little other than the initial stuff that's expected when adding new code that's been an issue resulting from a difference between v6 and v7. Yes, it's not anywhere near as trivial to add v8 support (although from looking at Arch, it looks like most of the work other than the GPU drivers is already done), and yes this will break compiled code (although the percentage of stuff on the Pi that's compiled code and not scripted is probably pretty small), but you've got a community that's pretty willing to help with testing and debugging, and given that, I think this won't add anywhere near as much work as you seem to think once the initial work of porting the (arguably poorly written given that they're not 64-bit clean) drivers is complete. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
pelwell
Jan 9, 2017
Contributor
There are differences in the ARM address map between BCM2835 and BCM2836 - it isn't just the ISA - but as far as ongoing development is concerned the overhead is primarily "just" one of building and testing the different variants.
|
There are differences in the ARM address map between BCM2835 and BCM2836 - it isn't just the ISA - but as far as ongoing development is concerned the overhead is primarily "just" one of building and testing the different variants. |
grigorig commentedMar 1, 2016
Is there a chance of AArch64 builds of the userspace and kernel? What's missing to get this to work?