[DISCUSS] hyperscan support ARM #197

bzhaoopenstack · 2019-11-07T16:25:04Z

Hi hyperscan team,

I'm an newbee for hyperscan project. I'm so excited to have a conversation with you.

We have a plan to make hyperscan to support ARM64 function. And we will propose a series of PRs to make this happen, including hardware platform logical judgement code ,ARM NEON instruction set support and etc.. We won't propose intrusive changes to existing code. Now the detailed design are still uncertain, just a draft. Hope community can take part in the detailed feature design at the beginning.

But before the whole work begins, we want to know community attitude about this. We hope the kind feedback from your side.

Thanks very much.

bzhaoopenstack · 2019-11-08T01:49:56Z

Hi team, @xiangwang1 , @fatchanghao , @Nor7th
Are you around? Please see our proposal. Any ideas are welcome.

Thanks

xiangwang1 · 2019-11-08T03:20:00Z

Current Hyperscan is specifically designed and optimized for Intel CPUs, including the selection of algorithms and utilization of SIMD instructions. I think there could be potential performance hit if the work is only about porting x86 instructions to corresponding ones in ARM NEON.

From Intel's perspective, we are not in a position to port Hyperscan to ARM.

We may consider it unless there're common interests in the community where developers other than us could push this forward and prove it as a viable path to take.

bzhaoopenstack · 2019-11-08T07:52:48Z

Hi @xiangwang1 ,
Thanks for reply. ;-)

I think that would be good if you can consider. I will explain more according to your feedback.

First, our thought/plan is introducing a totally new support for ARM. I mean we won't porting the existing x86 instructions to ARM. We plan to rewrite the algorithms and utilization based on ARM NEON, make some performance improvement and split a new "branch" to make it support ARM. Also we will introduce hardware platform logical judgement to install or work based on the underlying devices(x86 or arm). So the plan is introducing a other code branch to hyperscan, there is no any affect to existing x86 code/function, because it will call NEON instructions and rewrited algorithms/utilization to execute on ARM, we just want to introduce a new platform, and make hyperscan can run on ARM with a high performance.
From Intel's side, I really understand about that. But this might be a good chance to extend the applications of hyperscan. Let's make hyperscan better. ;-)
I found there are several issues [1][2][3] mentioned the requests of hyperscan can work on arm. So I think users/developers have had a voice for a while. That's exactly what we want to do if this could be done. And that' great if those guys could be here to say something. ;-)

So we need the community to help review the plan(design), and give us good suggestion to how to make this done, including small part of platform judgment script modification, some others we don't realized and etc..

Thanks.

[1] #187
[2] #159
[3] #34

bzhaoopenstack · 2019-11-11T02:12:12Z

Hi @xiangwang1 ,
How do you think about it? Wish your kind suggestion. If we notice that it's valuable, we can introduce more detailed plan to you. And that's great if you could help to review it.

Thanks

codecat007 · 2019-11-12T05:37:56Z

We have successfully ported hyperscan to the ARM platform(aslo MIPS,NO SIMD instructions support,Performance improvement is not high), and it turns out that this is not difficult. But we didn't do much optimization work, you guys can go deeper.

zzqcn · 2019-11-12T09:01:09Z

@codecat007 hi, I'm interested about the hyperscan porting and have some try in past days.
Could you share something for that?

bzhaoopenstack · 2019-11-13T03:39:23Z

Thanks for concern here. ;-)

codecat007 · 2019-11-15T00:34:10Z

@zzqcn You can use the simd library（just like simde: https://github.com/nemequ/simde） to implement an middle layer for simd fuction calls.

zzqcn · 2019-11-15T02:39:26Z

@codecat007 Thanks for your reply. I converted SSE to Neon intrinsics via sse2neon, but the compiled hyperscan this way has runtime bugs on ARM. I will try simde instead.

bzhaoopenstack · 2019-11-19T02:44:40Z

I think we need to wait for maintainer team member to consider and reply for the following steps. Hope hyperscan team member could give some good advices. Maybe @xiangwang1

zzqcn · 2019-11-20T06:50:33Z

@codecat007 With sse2neon and simde's help, I ported hyperscan 4.6.0 to ARM. It's basically working, with some bugs.

I build and run the unit test (just in unit/hyperscan/), then 3476 test cases PASSED and 169 FAILED. The failed cases: hyperscan_test_result.txt

Did you do some tests for your porting? Thanks for any suggestions.

zzqcn · 2019-11-21T03:22:51Z

I have ported hyperscan v5.2.0 to ARMv7 with simde. All 3746 unit test cases PASSED (run and test with qemu-arm).

My fork: https://github.com/zzqcn/hyperscan, and my commit: zzqcn@249178a

I don't known much about SSE, Neon, etc, so any suggestion or code review is helpful for me.

bzhaoopenstack · 2020-01-16T01:10:37Z

Hi guys, seems a post #212 in hyperscan and willing to support both of x86 and aarch64.

@zzqcn @codecat007 .

@xiangwang1 @fatchanghao @Nor7th
Hope hyperscan team could review it and leave some kind reviews.

bzhaoopenstack · 2020-01-16T01:12:27Z

cc author to join this discussion. @tqltech

daveMmd · 2020-05-12T08:47:42Z

@zzqcn Hi, I wonder the performance of the ported hyperscan. Does the added middle layer(simde) has a heavy impact on performance? Hope for your reply, thanks.

mr-c · 2020-05-12T08:53:06Z

@daveMmd On native x86 processors with AVX (for example) the usage of the SIMD Everywhere header-only library is optimized out by the compiler into the existing direct calls to the AVX intrinsics. On non-X86 platforms, the SIMD Everywhere headers enable the code to run where it wouldn't before, and often using the SIMD intrinsics of that non-X86 processor.

daveMmd · 2020-05-16T01:56:28Z

@daveMmd On native x86 processors with AVX (for example) the usage of the SIMD Everywhere header-only library is optimized out by the compiler into the existing direct calls to the AVX intrinsics. On non-X86 platforms, the SIMD Everywhere headers enable the code to run where it wouldn't before, and often using the SIMD intrinsics of that non-X86 processor.

Thanks! Though the code is enabled to run both with SIMD intrinsics on x86 and non-x86 processors, the data structure is tailored for x86 processor. Thus I think there can be performance penalty on non-x86 processors. I just wonder how big the penalty is.

mr-c · 2020-05-16T09:35:36Z

the data structure is tailored for x86 processor. Thus I think there can be performance penalty on non-x86 processors. I just wonder how big the penalty is.

Good point, "SIMD Everywhere" doesn't prevent the addition of architecture specific variations later, but means you get a functional version today, which is nice for applications that have a hard dependency on hyperscan.

tqltech · 2020-05-19T06:48:44Z

@daveMmd On native x86 processors with AVX (for example) the usage of the SIMD Everywhere header-only library is optimized out by the compiler into the existing direct calls to the AVX intrinsics. On non-X86 platforms, the SIMD Everywhere headers enable the code to run where it wouldn't before, and often using the SIMD intrinsics of that non-X86 processor.

Thanks! Though the code is enabled to run both with SIMD intrinsics on x86 and non-x86 processors, the data structure is tailored for x86 processor. Thus I think there can be performance penalty on non-x86 processors. I just wonder how big the penalty is.

@mr-c @daveMmd We have modified hyperscan for armv8 processors. Improve the performance by using the NEON instructions, inline assembly, data alignment, instruction alignment, memory data prefetching, static branch prediction, code structure optimization, etc. The optimized hyperscan performance is about 80% of x86. The repository：https://github.com/kunpengcompute/hyperscan

daveMmd · 2020-05-19T16:12:26Z

@tqltech Awesome! It must be a big work!

hulksmaaash · 2020-06-08T22:28:58Z

@daveMmd On native x86 processors with AVX (for example) the usage of the SIMD Everywhere header-only library is optimized out by the compiler into the existing direct calls to the AVX intrinsics. On non-X86 platforms, the SIMD Everywhere headers enable the code to run where it wouldn't before, and often using the SIMD intrinsics of that non-X86 processor.

Thanks! Though the code is enabled to run both with SIMD intrinsics on x86 and non-x86 processors, the data structure is tailored for x86 processor. Thus I think there can be performance penalty on non-x86 processors. I just wonder how big the penalty is.

@mr-c @daveMmd We have modified hyperscan for armv8 processors. Improve the performance by using the NEON instructions, inline assembly, data alignment, instruction alignment, memory data prefetching, static branch prediction, code structure optimization, etc. The optimized hyperscan performance is about 80% of x86. The repository：https://github.com/kunpengcompute/hyperscan

@tqltech I am curious to know if you have you measured your optimizations against what was done in the Marvell port for aarch64?
https://github.com/MarvellEmbeddedProcessors/hyperscan

tqltech · 2020-06-18T12:29:29Z

@daveMmd On native x86 processors with AVX (for example) the usage of the SIMD Everywhere header-only library is optimized out by the compiler into the existing direct calls to the AVX intrinsics. On non-X86 platforms, the SIMD Everywhere headers enable the code to run where it wouldn't before, and often using the SIMD intrinsics of that non-X86 processor.

Thanks! Though the code is enabled to run both with SIMD intrinsics on x86 and non-x86 processors, the data structure is tailored for x86 processor. Thus I think there can be performance penalty on non-x86 processors. I just wonder how big the penalty is.

@mr-c @daveMmd We have modified hyperscan for armv8 processors. Improve the performance by using the NEON instructions, inline assembly, data alignment, instruction alignment, memory data prefetching, static branch prediction, code structure optimization, etc. The optimized hyperscan performance is about 80% of x86. The repository：https://github.com/kunpengcompute/hyperscan

@tqltech I am curious to know if you have you measured your optimizations against what was done in the Marvell port for aarch64?
https://github.com/MarvellEmbeddedProcessors/hyperscan

@hulksmaaash I used the performance test tool hsbench that comes with hyperscan to measure the optimization results.

Yikun · 2020-08-25T06:10:20Z

Hi team, @xiangwang1 , @fatchanghao , @Nor7th

For now, does the team have any plan on aarch64 support of hyperscan upstream?

hulksmaaash · 2020-08-25T18:48:24Z

@Yikun I believe the answer is still the same from last year (#197 (comment)). I am curious, what is your interest in having aarch64 support for hyperscan? If there is enough external interest then I may be able to gather internal engineering support to justify the work and on-going maintenance.

eliaslevy · 2020-08-25T20:54:39Z

For our use case, we use Hyperscan on Linux, macOS, and Windows. With Macs beginning the transition to Apple silicon based on ARM, we are obviously interested in support for the architecture so we can continue our cross platform work using Hyperscan.

It is understandable why Intel may not be interested in supporting the architecture, but I would counter that adding support will ensure that the project continues to be a viable option for people that must support multiple platforms, instead of having them look for alternatives that they can use across the platforms they must support.

Yikun · 2020-08-26T06:59:00Z

@hulksmaaash Thanks for the reply.

I got some info from our product team, some friend are using Hyperscan on Linux in Kunpeng Server (which is the aarch64 based server). We also know there are some case in Amazon EC2 A1 Instances.

So we think the aarch64 support is really necessary.

hulksmaaash · 2020-09-17T14:12:48Z

FYI, there is an Arm sponsored effort (see below) now to port and optimize hyperscan for Arm. The work as only just begun, but the end goal is to work with the maintainers to have the updates merged, and then continue to provide support for the aarch64 architecture as both the project and architecture progresses.

https://github.com/VectorCamp/hyperscan

hulksmaaash · 2020-09-24T14:31:15Z

For those interested, the first PR has been submitted that separates the architecture specific code to pave the way for adding aarch64 support....and any other future architectural specific code.

#272

hulksmaaash · 2020-12-07T20:29:34Z

FYI - aarch64 port has been completed here:

https://github.com/VectorCamp/hyperscan/tree/feature/add-arm-support

with further NEON SIMD optimizations to come. PR will be submitted soon.

hulksmaaash · 2020-12-08T14:36:20Z

PR for ARMv8 support submitted here: #287

hulksmaaash · 2020-12-16T16:15:16Z

FYI, we have been informed that the project maintainers have

"no plan to give multi-arch support for Hyperscan"

and will

"keep Hyperscan as x86 only and deliver continuous designs and optimizations based on instruction-set from current and future Intel CPUs"

We will consider the best path forward to ensure Hyperscan will work for users who desire support for non-x86 architectures, and update those who express interest.

evrial · 2021-01-21T20:28:45Z

Oh well, what a surprise. Progress train moves forward, RIP Intel.

edsiper · 2022-01-17T19:37:01Z

I was hoping this got portable to ARM too :/

hulksmaaash · 2022-01-18T14:51:02Z

I was hoping this got portable to ARM too :/

It did ;-) https://github.com/VectorCamp/vectorscan

hulksmaaash mentioned this issue Sep 22, 2020

on ARMv8 cpu No intrinsics found #267

Closed

gliwka mentioned this issue Apr 28, 2021

Apple Silicon support gliwka/hyperscan-java#128

Closed

BarsMonster mentioned this issue May 23, 2023

[BUG] Hyperscan not supported for ARM64 == Error logs for bad hs database? rspamd/rspamd#4493

Closed

1 task

gliwka mentioned this issue Nov 24, 2023

is hyperscan abandoned? #421

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DISCUSS] hyperscan support ARM #197

[DISCUSS] hyperscan support ARM #197

bzhaoopenstack commented Nov 7, 2019

bzhaoopenstack commented Nov 8, 2019

xiangwang1 commented Nov 8, 2019

bzhaoopenstack commented Nov 8, 2019

bzhaoopenstack commented Nov 11, 2019

codecat007 commented Nov 12, 2019

zzqcn commented Nov 12, 2019

bzhaoopenstack commented Nov 13, 2019 •

edited

Loading

codecat007 commented Nov 15, 2019 •

edited

Loading

zzqcn commented Nov 15, 2019

bzhaoopenstack commented Nov 19, 2019 •

edited

Loading

zzqcn commented Nov 20, 2019

zzqcn commented Nov 21, 2019

bzhaoopenstack commented Jan 16, 2020

bzhaoopenstack commented Jan 16, 2020

daveMmd commented May 12, 2020

mr-c commented May 12, 2020

daveMmd commented May 16, 2020

mr-c commented May 16, 2020

tqltech commented May 19, 2020 •

edited

Loading

daveMmd commented May 19, 2020

hulksmaaash commented Jun 8, 2020

tqltech commented Jun 18, 2020

Yikun commented Aug 25, 2020

hulksmaaash commented Aug 25, 2020

eliaslevy commented Aug 25, 2020

Yikun commented Aug 26, 2020

hulksmaaash commented Sep 17, 2020 •

edited

Loading

hulksmaaash commented Sep 24, 2020

hulksmaaash commented Dec 7, 2020 •

edited

Loading

hulksmaaash commented Dec 8, 2020 •

edited

Loading

hulksmaaash commented Dec 16, 2020 •

edited

Loading

evrial commented Jan 21, 2021

edsiper commented Jan 17, 2022

hulksmaaash commented Jan 18, 2022

[DISCUSS] hyperscan support ARM #197

[DISCUSS] hyperscan support ARM #197

Comments

bzhaoopenstack commented Nov 7, 2019

bzhaoopenstack commented Nov 8, 2019

xiangwang1 commented Nov 8, 2019

bzhaoopenstack commented Nov 8, 2019

bzhaoopenstack commented Nov 11, 2019

codecat007 commented Nov 12, 2019

zzqcn commented Nov 12, 2019

bzhaoopenstack commented Nov 13, 2019 • edited Loading

codecat007 commented Nov 15, 2019 • edited Loading

zzqcn commented Nov 15, 2019

bzhaoopenstack commented Nov 19, 2019 • edited Loading

zzqcn commented Nov 20, 2019

zzqcn commented Nov 21, 2019

bzhaoopenstack commented Jan 16, 2020

bzhaoopenstack commented Jan 16, 2020

daveMmd commented May 12, 2020

mr-c commented May 12, 2020

daveMmd commented May 16, 2020

mr-c commented May 16, 2020

tqltech commented May 19, 2020 • edited Loading

daveMmd commented May 19, 2020

hulksmaaash commented Jun 8, 2020

tqltech commented Jun 18, 2020

Yikun commented Aug 25, 2020

hulksmaaash commented Aug 25, 2020

eliaslevy commented Aug 25, 2020

Yikun commented Aug 26, 2020

hulksmaaash commented Sep 17, 2020 • edited Loading

hulksmaaash commented Sep 24, 2020

hulksmaaash commented Dec 7, 2020 • edited Loading

hulksmaaash commented Dec 8, 2020 • edited Loading

hulksmaaash commented Dec 16, 2020 • edited Loading

evrial commented Jan 21, 2021

edsiper commented Jan 17, 2022

hulksmaaash commented Jan 18, 2022

bzhaoopenstack commented Nov 13, 2019 •

edited

Loading

codecat007 commented Nov 15, 2019 •

edited

Loading

bzhaoopenstack commented Nov 19, 2019 •

edited

Loading

tqltech commented May 19, 2020 •

edited

Loading

hulksmaaash commented Sep 17, 2020 •

edited

Loading

hulksmaaash commented Dec 7, 2020 •

edited

Loading

hulksmaaash commented Dec 8, 2020 •

edited

Loading

hulksmaaash commented Dec 16, 2020 •

edited

Loading