Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSE2 to NEON compatibility #1606

Draft
wants to merge 4 commits into
base: master
from
Draft

SSE2 to NEON compatibility #1606

wants to merge 4 commits into from

Conversation

@falkTX
Copy link
Contributor

falkTX commented Feb 27, 2020

Everything builds, but I did not test them yet. 馃儚
This patch-set is used in the KXStudio repositories armhf and arm64 builds.

falkTX added 4 commits Feb 27, 2020
Signed-off-by: falkTX <falktx@falktx.com>
Signed-off-by: falkTX <falktx@falktx.com>
Signed-off-by: falkTX <falktx@falktx.com>
Signed-off-by: falkTX <falktx@falktx.com>
@@ -691,7 +691,7 @@ __m128 TANH(__m128 in, __m128 drive)
return _mm_max_ps(_mm_min_ps(y, y_max), y_min);
}

#if !_M_X64
#if !_M_X64 && !defined(NEON_SSE)

This comment has been minimized.

Copy link
@baconpaul

baconpaul Feb 27, 2020

Collaborator

These SSE1 instructions aren't actually used anywhere now anyway - I should really remove them.

@@ -5,6 +5,8 @@ configurationXmlStart:
.incbin "../../resources/data/configuration.xml"
configurationXmlEnd:
.global configurationXmlNullTerminator
#ifndef __arm__

This comment has been minimized.

Copy link
@baconpaul

baconpaul Feb 27, 2020

Collaborator

We will need this to be null terminated somehow otherwise the read will fail and surge can crash in some situations.

@baconpaul

This comment has been minimized.

Copy link
Collaborator

baconpaul commented Feb 27, 2020

Tossed in a few comments. All looks pretty reasonable. Thank you for pushing these back upstream!

One other one is: sse2neon.h should be placed in libs/sse2neon with a clear license and provenance file (like we did with catch2) and not into src/common.

@falkTX

This comment has been minimized.

Copy link
Contributor Author

falkTX commented Feb 27, 2020

No problem.
I did this as a test first to see if it was possible. So you think eventually it could be merged?
The xml assembly stuff I have no idea how to fix or handle.

@baconpaul

This comment has been minimized.

Copy link
Collaborator

baconpaul commented Feb 27, 2020

Yeah I would have no problem merging this. In fact I would like to. Probably needs a bit of focus but not impossible at all.

And we are sort of increasing our risk with the head of the codebase for a bit now we have 1.6.6 shipped and are pushing towards 1.7 - so this is a good time to do it.

@hexdump0815

This comment has been minimized.

Copy link

hexdump0815 commented Mar 3, 2020

just for reference: i'm doing something similar (but more ugly i guess) on my armhf/aarch64 build of vcvrack - i use the more complete simde headers:
https://github.com/hexdump0815/vcvrack-dockerbuild-v1/blob/master/prepare.sh#L28
and then simply rename all the function calls accordingly in a brute force way (the ugly part):
https://github.com/hexdump0815/vcvrack-dockerbuild-v1/blob/master/simde-ify.sh
is this maybe what you ment with xml stuff?
https://github.com/hexdump0815/vcvrack-dockerbuild-v1/blob/master/surge-rack-surge.armv7l.patch
and on armhf i had to disable unaligned access to avoid crashes:
https://github.com/hexdump0815/vcvrack-dockerbuild-v1/blob/master/surge-rack.armv7l.patch

the resulting surge vcvrack plugin works on both armhf and aarch64

best wishes - hexdump

@falkTX

This comment has been minimized.

Copy link
Contributor Author

falkTX commented Mar 6, 2020

That is very helpful, thanks!
Going to see if the % operator works for my builds, I assume yes.

Any special reason to pick -mfpu=vfpv3-d16 over the usual -mfpu=neon-vfpv4 ?

@falkTX

This comment has been minimized.

Copy link
Contributor Author

falkTX commented Mar 6, 2020

Sadly I still get a crash in the build server.
So I will have to run the armhf in actual hardware now for real testing :/

@baconpaul

This comment has been minimized.

Copy link
Collaborator

baconpaul commented Mar 6, 2020

So all that is doing is null terminating the configuration xml memory block. Any other way to stick a zero at the end would work too

@falkTX

This comment has been minimized.

Copy link
Contributor Author

falkTX commented Mar 6, 2020

I expected the %object to do the trick, since it seems to work for @hexdump0815
But perhaps the issue is somewhere else, or the cloud builder environment has missing instructions.
I will have more details soon, need to find and install debian on my Pi. I am not in a rush though

@baconpaul

This comment has been minimized.

Copy link
Collaborator

baconpaul commented Mar 7, 2020

One other thought: linux and only linux reads that configuration from memory. on mac and windows we read it from resources. If you have a #if ARM you could modify SurgeStorage.cpp (you can clearly see where it gets it) and add an if dev to read from resources.

I think, in retrospect, it was a mistake to do linux this way but back when we all were doing it it seemed reasonable. Since we need so many other things from the shared resource directory though singling this one out on only linux is something we could reverse if it becomes a real problem

@hexdump0815

This comment has been minimized.

Copy link

hexdump0815 commented Mar 7, 2020

@falkTX - in case you would like to have a ready to go debian or ubuntu image for a raspberry-pi 3 - you might use those: https://github.com/hexdump0815/imagebuilder/releases/tag/200206-01 ... they are available in armhf and aarch64 versions :)

and here are some useful audio optimizaions for the rpi (they are for raspbian, but should be working on debian and ubuntu too): https://github.com/hexdump0815/vcvrack-dockerbuild-v1/blob/master/readme-raspbian.txt ... in that context maybe interesting too: https://github.com/hexdump0815/sonaremin/blob/master/files/extra-files/etc/rc.audio

good luck and best wishes - hexdump

@baconpaul

This comment has been minimized.

Copy link
Collaborator

baconpaul commented Mar 7, 2020

Is there a docker image or some such we can use to try a pi build?

@hexdump0815

This comment has been minimized.

Copy link

hexdump0815 commented Mar 7, 2020

@hexdump0815

This comment has been minimized.

Copy link

hexdump0815 commented Mar 29, 2020

@baconpaul @falkTX - if any of you should have any compiled builds for armhf and aarch64 ready and want me to test them, please let me know - i can easily do so on ubuntu 18.04 and debian buster on both armhf and aarch64 - just let me know ...

@baconpaul

This comment has been minimized.

Copy link
Collaborator

baconpaul commented Mar 29, 2020

Hi @hexdump0815 sorry I haven't actually been active on this - been working on skinning engine for a 1.7 release. Nothing coming from me.

@hexdump0815

This comment has been minimized.

Copy link

hexdump0815 commented Mar 29, 2020

@baconpaul - no problem, there is no hurry :)

@hexdump0815

This comment has been minimized.

Copy link

hexdump0815 commented Apr 1, 2020

ok - i gave the hacking approach i used for the surge-rack module for vcvrack a try to compile the surge plugin itself that way too and finally got it working on both armv7l and aarch64 ... this approach is really just a hack to get it compiled, but its at least a proof of concept that surge works on those platforms - i tested the vst's with reaper on both of them ... here are my notes and patches from this try: https://github.com/hexdump0815/surge-arm-build ... under releases one can find compiled versions ...

what is essential for armv7l is that it does not use neon and -mno-unaligned access - otherwise it will generate crashes due to bus errors - the drawback is that it uses way more cpu without neon ... aarch64 works perfectly fine with neon

@baconpaul

This comment has been minimized.

Copy link
Collaborator

baconpaul commented Apr 1, 2020

Wow simdeify.sh has quite a regex!!

This is cool. It looks like mostly flag changes. I wonder if we could coax premake, if an ARM variable is set, to generate those diffs without a patch. So basically have premake make a makefile which runs simdyfy and outputs the changed flags.

Thank you for sharing that. Neat stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can鈥檛 perform that action at this time.