Skip to content

Perform runtime check for SSE 4.2 instructions#795

Merged
ohler55 merged 1 commit intoohler55:developfrom
stanhu:sh-runtime-sse42
Jul 26, 2022
Merged

Perform runtime check for SSE 4.2 instructions#795
ohler55 merged 1 commit intoohler55:developfrom
stanhu:sh-runtime-sse42

Conversation

@stanhu
Copy link
Copy Markdown
Contributor

@stanhu stanhu commented Jul 25, 2022

If SSE 4.2 is not available on the system, don't attempt to use SIMD instructions.

Relates to #789

@stanhu stanhu force-pushed the sh-runtime-sse42 branch 2 times, most recently from 4560ef5 to d566782 Compare July 25, 2022 23:19
@ohler55
Copy link
Copy Markdown
Owner

ohler55 commented Jul 25, 2022

Detecting earlier is better. Running the check on each parse is expensive. I'd rather the test was done when the Makefile is generated. There is a check in the current extconf.rb. Is that not working? If not I could use some help getting that to work on some of the older machines you might have access to. Is that possible?

If SSE 4.2 is not available on the system, don't attempt
to use SIMD instructions.

Relates to ohler55#789
@stanhu stanhu force-pushed the sh-runtime-sse42 branch from d566782 to 6e1997d Compare July 25, 2022 23:29
@stanhu
Copy link
Copy Markdown
Contributor Author

stanhu commented Jul 25, 2022

Detecting earlier is better. Running the check on each parse is expensive.

Doesn't this change only do the detection when the module is initialized?

I'd rather the test was done when the Makefile is generated. There is a check in the current extconf.rb. Is that not working?

We package a build using SSE 4.2 platform, but some users run the code on older hardware. We either need a way to disable the SSE instructions outright, or make the check dynamic.

Tensorflow behaves the same way: they ship a Python wheel that can use SSE instructions when available.

@ohler55
Copy link
Copy Markdown
Owner

ohler55 commented Jul 25, 2022

I see, looks like what you have should be good then. I'll merge once tests are done and cut a release.

@ohler55 ohler55 merged commit 91376b9 into ohler55:develop Jul 26, 2022
@ohler55
Copy link
Copy Markdown
Owner

ohler55 commented Jul 26, 2022

Released

stanhu added a commit to stanhu/oj that referenced this pull request Aug 2, 2022
Compiling with `-msse4.2` is problematic because it may cause a number
of other SSE instructions to be used. For example:

```
$ ./elfx86exts oj.so
MODE64 (call)
CMOV (cmovne)
SSE2 (movq)
SSE41 (pinsrq)
SSE1 (movups)
SSSE3 (pshufb)
SSE3 (movddup)
SSE42 (pcmpestri)
CPU Generation: Unknown
```

The runtime check in ohler55#795 is therefore not sufficient to prevent SSE
instructions from leaking elsewhere in the shared library. As a
result, loading Oj on older machines with these libraries will result
in a "illegal instruction" crash.

Introduce a `--with-sse42` flag that can be used to configure this
gem.
stanhu added a commit to stanhu/oj that referenced this pull request Aug 2, 2022
Compiling with `-msse4.2` is problematic because it may cause a number
of other SSE instructions to be used. For example:

```
$ ./elfx86exts oj.so
MODE64 (call)
CMOV (cmovne)
SSE2 (movq)
SSE41 (pinsrq)
SSE1 (movups)
SSSE3 (pshufb)
SSE3 (movddup)
SSE42 (pcmpestri)
CPU Generation: Unknown
```

The runtime check in ohler55#795 is therefore not sufficient to prevent SSE
instructions from leaking elsewhere in the shared library. As a
result, loading Oj on older machines with these libraries will result
in a "illegal instruction" crash.

Introduce a `--with-sse42` flag that can be used to configure this
gem.
stanhu added a commit to stanhu/oj that referenced this pull request Aug 2, 2022
Compiling with `-msse4.2` is problematic because it may cause a number
of other SSE instructions to be used. For example:

```
$ ./elfx86exts oj.so
MODE64 (call)
CMOV (cmovne)
SSE2 (movq)
SSE41 (pinsrq)
SSE1 (movups)
SSSE3 (pshufb)
SSE3 (movddup)
SSE42 (pcmpestri)
CPU Generation: Unknown
```

The runtime check in ohler55#795 is therefore not sufficient to prevent SSE
instructions from leaking elsewhere in the shared library. As a
result, loading Oj on older machines with these libraries will result
in a "illegal instruction" crash.

Introduce a `--with-sse42` flag that can be used to configure this
gem.
ohler55 pushed a commit that referenced this pull request Aug 6, 2022
Compiling with `-msse4.2` is problematic because it may cause a number
of other SSE instructions to be used. For example:

```
$ ./elfx86exts oj.so
MODE64 (call)
CMOV (cmovne)
SSE2 (movq)
SSE41 (pinsrq)
SSE1 (movups)
SSSE3 (pshufb)
SSE3 (movddup)
SSE42 (pcmpestri)
CPU Generation: Unknown
```

The runtime check in #795 is therefore not sufficient to prevent SSE
instructions from leaking elsewhere in the shared library. As a
result, loading Oj on older machines with these libraries will result
in a "illegal instruction" crash.

Introduce a `--with-sse42` flag that can be used to configure this
gem.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants