RFC: sincos for computing sine and cosine simultaneously

## Checklist

> Please ensure the following tasks are completed before filing an issue.

* [x] Read and understood the [Code of Conduct][code-of-conduct].
* [x] Searched for existing issues and pull requests.
* [x] If this is a general question, searched the [FAQ][faq] for an existing answer.
* [x] If this is a feature request, the issue name begins with `RFC: `.


## Description

> Description of the issue (or feature request).

Complex trig functions, to name one example, require computation of both sine and cosine. A first cut for their implementations would just evaluate them independently, but it would be nice to take advantage of the ability to compute sine and cosine simultaneously to speed things up, if possible.

Notes:
- Trying to figure out at which level they're evaluated together. If at the horner's method level, then it'd be sending alternating powers of `x` to one of two summations. I'm not convinced this would be much of an improvement since it just interleaves work and doesn't seem like it'd reduce multiplications. This suggests to me maybe scaling and reduction are really the only simultaneous part, unless there are fancy tricks. But that still might be worthwhile. `sincos` seems to be a thing that exists, but I haven't found a good source that explains how to do it.
- It seems like boost just does [the scaling and the reduction](https://github.com/NumScale/boost.simd/blob/develop/include/boost/simd/arch/common/detail/simd/trig_base.hpp#L158-L182) simultaneously, though I wonder if somewhere it's farming this out to the [fsincos](http://x86.renejeschke.de/html/file_module_x86_id_115.html) instruction. I'm having a pretty difficult time navigating the boost source TBH.
- [this presentation](http://convecs.inria.fr/people/Jingyan.Jourdan_Lu/pdf/talk-asap2012.pdf) says sincos is as fast as sine alone (says tested in single precision in c code, I think?)
- [Stackoverflow: What is the fastest way to compute sin and cos together?](https://stackoverflow.com/questions/2683588/what-is-the-fastest-way-to-compute-sin-and-cos-together) seems mostly focused on approximations and how to get gnu stdlib to optimize via the `sincos` instruction.
- [CORDIC](https://en.wikipedia.org/wiki/CORDIC) doesn't seem appropriate. That's for when multiplication isn't available.
- [Intel has an interesting approach](https://software.intel.com/sites/default/files/managed/a1/cb/ipps_0.pdf) with a bunch of methods like ` ippsSinCos_64f_A26` that guarantee, say, 26 correct bits so that you can taylor it (that's a pun) to your needs. No implementation hints though, obv. Interesting, but this is getting off topic.

Anyway, just a thought. Conclusion: maybe it's worthwhile to scale and reduce together and just use the existing sine/cosine kernels. This would mean passing two numbers as a result. I was surprised from the complex inverse benchmarks to see that writing into an existing array for output actually seemed just a bit slower. So if this is even worthwhile, it would definitely be necessary to benchmark it to see if it's even an improvement.

If this isn't worthwhile, then evaluating them independently seems fine.



[code-of-conduct]: https://github.com/stdlib-js/stdlib/blob/develop/CODE_OF_CONDUCT.md
[faq]: https://github.com/stdlib-js/stdlib/blob/develop/FAQ.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RFC: sincos for computing sine and cosine simultaneously #100

Checklist

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

RFC: sincos for computing sine and cosine simultaneously #100

Description

Checklist

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions