Skip to content

RFC: sincos for computing sine and cosine simultaneously #100

@rreusser

Description

@rreusser

Checklist

Please ensure the following tasks are completed before filing an issue.

  • Read and understood the Code of Conduct.
  • Searched for existing issues and pull requests.
  • If this is a general question, searched the FAQ for an existing answer.
  • If this is a feature request, the issue name begins with RFC: .

Description

Description of the issue (or feature request).

Complex trig functions, to name one example, require computation of both sine and cosine. A first cut for their implementations would just evaluate them independently, but it would be nice to take advantage of the ability to compute sine and cosine simultaneously to speed things up, if possible.

Notes:

  • Trying to figure out at which level they're evaluated together. If at the horner's method level, then it'd be sending alternating powers of x to one of two summations. I'm not convinced this would be much of an improvement since it just interleaves work and doesn't seem like it'd reduce multiplications. This suggests to me maybe scaling and reduction are really the only simultaneous part, unless there are fancy tricks. But that still might be worthwhile. sincos seems to be a thing that exists, but I haven't found a good source that explains how to do it.
  • It seems like boost just does the scaling and the reduction simultaneously, though I wonder if somewhere it's farming this out to the fsincos instruction. I'm having a pretty difficult time navigating the boost source TBH.
  • this presentation says sincos is as fast as sine alone (says tested in single precision in c code, I think?)
  • Stackoverflow: What is the fastest way to compute sin and cos together? seems mostly focused on approximations and how to get gnu stdlib to optimize via the sincos instruction.
  • CORDIC doesn't seem appropriate. That's for when multiplication isn't available.
  • Intel has an interesting approach with a bunch of methods like ippsSinCos_64f_A26 that guarantee, say, 26 correct bits so that you can taylor it (that's a pun) to your needs. No implementation hints though, obv. Interesting, but this is getting off topic.

Anyway, just a thought. Conclusion: maybe it's worthwhile to scale and reduce together and just use the existing sine/cosine kernels. This would mean passing two numbers as a result. I was surprised from the complex inverse benchmarks to see that writing into an existing array for output actually seemed just a bit slower. So if this is even worthwhile, it would definitely be necessary to benchmark it to see if it's even an improvement.

If this isn't worthwhile, then evaluating them independently seems fine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    FeatureIssue or pull request for adding a new feature.MathIssue or pull request specific to math functionality.RFCRequest for comments. Feature requests and proposed changes.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions