Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[patch] Generate sqrtsd opcode instead of external call to sqrt on amd64 #5795

Closed
vicuna opened this Issue Oct 18, 2012 · 4 comments

Comments

Projects
None yet
2 participants
@vicuna
Copy link
Collaborator

vicuna commented Oct 18, 2012

Original bug ID: 5795
Reporter: @chambart
Assigned to: @lefessan
Status: closed (set by @xavierleroy on 2015-12-11T18:19:32Z)
Resolution: fixed
Priority: normal
Severity: feature
Version: 4.00.1
Fixed in version: 4.01.0+dev
Category: back end (clambda to assembly)
Related to: #6020
Monitored by: @hcarty

Bug description

The sqrt function can be optimised by calling directly the processor instruction, avoiding a call.

Additional information

The patch was not tested on windows

File attachments

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Nov 9, 2012

Comment author: @lefessan

Applied in trunk at revision r13086.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 18, 2012

Comment author: @alainfrisch

Has anyone done any performance comparison? I'm curious about the performance gains to be expected, to see if it's worth the trouble extending the 32-bit (x87) backend as well (this backend is still very much useful for Windows...).

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 18, 2012

Comment author: @chambart

As far as I know the x87 backend is already doing that. See i386/selection.ml line 240.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jun 9, 2013

Comment author: @xavierleroy

As far as I know the x87 backend is already doing that.

But only if -ffast-math is selected. Not using "fsin", "fcos", etc by default is justified because those instructions are not quite 100% IEEE754 compliant. But it could make sense to always generate "fsqrt" since, AFAIK, this instruction implements proper IEEE754 behavior. This would have the advantage of working around bugs in Win32 CRT's sqrt() function, cf. #6020.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.