Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add contract: optimizing numpy's einsum expression #5488

Merged
merged 1 commit into from
Sep 28, 2016

Conversation

dgasmith
Copy link
Contributor

Greetings everyone,
Numpy's einsum function can compute arbitrary expressions; however, there are two drawbacks to using pure einsum: einsum does not consider building intermediate arrays for possible reductions in overall rank and is not currently capable of using a vendor BLAS. The contract function aims to solve these issues and effectively set out to answer the question: "How fast can I execute a Einstein summation expression in numpy?" The answer turns out to be quite fast, often only 2-3 times slower than an optimized C or Fortran expression, especially when a vendor BLAS utilized. In addition, multi-argument einsum speed issues (like #5366) should be solved and, when BLAS functionality is added, can reproduce multi_dot like behavior (#4977).

The original repository has additional information that I have not been able to cram into the current documentation. There are additional test scripts that provide random expression testing, single expression debugging, and path comparisons, in addition to evaluate timing. Other branches on this repo have both BLAS and ellipses in the subscript functionality.

Following a few recent PR's I have decided to break this up into three major commits:

  • Initial commit. The basics; input parsing, both path functions, and the necessary equipment for looping through the contraction pairs.
  • BLAS commit: Adds BLAS functionality. This can either be done by using tensordot (tensordot would need an update to prevent arbitrary copies) or by building tensordot like functionality.
  • Index commit: Order einsum ouput indices to try and reduce the number of nd-transposes for BLAS and try to build indices that are optimal for einsum (poor einsum indexing order can result in 2x slowdowns).

I think this will help split up some of the complexity and the first PR is already large enough.

5 False bcde,fb->cdef gc,hd,cdef->efgh
5 False cdef,gc->defg hd,defg->efgh
5 False defg,hd->efgh efgh->efgh
>>> ein_result = np.einsum('ea,fb,abcd,gc,hd->efgh', C, C, I, C, C, path=opt_path[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no line calculating opt_result, and this one computing ein_result is passing a wrong keyword argument pat to np.einsum.

@shoyer
Copy link
Member

shoyer commented Feb 3, 2015

For the opt_path option, it would be nice to return a machine readable version, i.e., something that I could paste in to substitute for einsum. It would be even better if this function used a cache for its arguments (sort of like numba's jit) so the optimal path for arguments with fixed shape only needs to be calculated once.

My vote is also for a name like einsumopt to make clear the shared origin and differences from einsum. contract sounds very generic to me.

@dgasmith
Copy link
Contributor Author

dgasmith commented Feb 3, 2015

@shoyer
For machine readable are you thinking something along the lines of:

# Regular einsum
np.einsum('aab,bc', op1, op2)
# Machine readable path return
np.dot(np.einsum('aab->ab', op1), op2)

This can become exceedingly long for the more complicated paths. I have written cache like algorithms for my own usage of this type of script, I will look into adding them here.

I was thinking contract because it should be a mixing pot of different algorithms. Right now it only uses einsum; however, it will use vendor BLAS before it is complete and anything else that comes along can be added as well. For example, if numpy ever offered an interface to a tensor framework this could be leveraged as well.

@njsmith
Copy link
Member

njsmith commented Feb 3, 2015

Shouldn't the name be "einsum"? Names are for users, and users care about
interfaces, not internal implementation. I don't see why we should have two
different public functions that do the same thing?
On 3 Feb 2015 06:13, "Daniel Smith" notifications@github.com wrote:

@shoyer https://github.com/shoyer
For machine readable are you thinking something along the lines of:

Regular einsum

np.einsum('aab,bc', op1, op2)# Machine readable path return
np.dot(np.einsum('aab->ab', op1), op2)

This can become exceedingly long for the more complicated paths. I have
written cache like algorithms for my own usage of this type of script, I
will look into adding them here.

I was thinking contract because it should be a mixing pot of different
algorithms. Right now it only uses einsum; however, it will use vendor
BLAS before it is complete and anything else that comes along can be added
as well. For example, if numpy ever offered an interface to a tensor
framework this could be leveraged as well.


Reply to this email directly or view it on GitHub
#5488 (comment).

@dgasmith
Copy link
Contributor Author

dgasmith commented Feb 3, 2015

@njsmith I was starting to think along the same lines. The main issues that I see:

  • A few extra keywords would need to be added einsum making this function appear more complex. At the minimum we would need:
    • memory - If 0 do not optimized the expression and either use BLAS or einsum, else allow optimization.
    • path - Can either be a string to choose the type of path optimization or a tuple which would represent a path to execute.
    • return_path - Returns a path that can be fed back into einsum and a human readable component.
  • We would be increasing the overhead of the einsum function. Although, most of this should be moved to the C-side eventually.
  • Optimization should be defaulted to off in order to prevent the possible breaking of legacy code and unexpected memory overhead.

@shoyer
Copy link
Member

shoyer commented Feb 3, 2015

Shouldn't the name be "einsum"?

Yes, of course.

If performance is a concern, my suggestion would be to write this in Cython, not C. That, along with caching, should take care of any performance concerns.

It doesn't particularly matter to me what form the path is returned in as long as it can be used as an argument to the function to skip the optimization step. Generating Python code is probably not a path we want to go down.

@charris
Copy link
Member

charris commented Feb 4, 2015

Shouldn't the name be "einsum"? Names are for users, and users care about interfaces, not internal >implementation.

+1 as long as it doesn't break compatibility. With the long term prospect of moving more stuff down into multiarray, we might want to look into reorganizing that directory to separate out large, semi-standalone parts like this.

@dgasmith
Copy link
Contributor Author

dgasmith commented Feb 9, 2015

I have been fairly busy this week, but I wanted to drop in and say that I am looking into this. I really like pushing everything into the einsum function and the extra control would be a nice addition.


@charris Can you expand a bit on reorganizing the core directory? I will likely be refactoring some of the einsum code anyway, so keeping this in mind would be helpful.


@shoyer Writing this in cython would be much easier, is the C conversion done with the cythonize script?

With the caching, we would need to do a thread safe implementation incase someone runs a threaded einsum implementation (e.g. subprocess.Pool). Do you have experience with this?

We may have missed each other at some point on the path saving functionality. Path saving is currently implemented and shown in the documentation. Is there anything wrong with the way it is done currently?


I probably need to run this through the mailing list again to solicit feedback. Overall, I think most of the major implementation questions have been answered.

@shoyer
Copy link
Member

shoyer commented Feb 9, 2015

@dgasmith I'm not entirely sure how the Cython build process is done in NumPy, but I know it is done (the numpy.random module is written in Cython).

This is actually a case where Python GIL makes things easy -- by default, everything in Python is thread safe. You simply would not release the GIL for the path optimization step.

Yes, I do see that path saving is current implemented. This is probably fine, though possibly even unnecessary if we get transparent caching going.

@njsmith
Copy link
Member

njsmith commented Feb 9, 2015

Unfortunately using cython here won't be as trivial as one might think. The
issue is that cython has a strong belief that each .pyx file should create
a separate shared library module. This is fine for np.random which lives
off on its own anyway and doesn't need any special access to numpy
internals, but the einsum implementation is currently part of the big
multiarray module.

Two possibilities for enabling cython usage:

  1. pull all the einsum code out into its own module, like np.random. The
    possible problem here is that if einsum relies on any internal functions
    then it'll lose access to them. I don't know whether it does or not.

  2. figure out how to coax cython into generating C code that can be built
    into a single shared library with other code. This would be super useful
    (there's lots of stuff in numpy that would be nicer to write in cython),
    but requires some arcane knowledge of how cython and c and python all
    interact.
    On 9 Feb 2015 09:45, "Stephan Hoyer" notifications@github.com wrote:

@dgasmith https://github.com/dgasmith I'm not entirely sure how the
Cython build process is done in NumPy, but I know it is done (the
numpy.random module is written in Cython).

This is actually a case where Python GIL makes things easy -- by default,
everything in Python is thread safe. You simply would not release the GIL
for the path optimization step.

Yes, I do see that path saving is current implemented. This is probably
fine, though possibly even unnecessary if we get transparent caching going.


Reply to this email directly or view it on GitHub
#5488 (comment).

@dgasmith
Copy link
Contributor Author

dgasmith commented Feb 9, 2015

It seems just writing this in C would be the easiest route. The largest issue that I see is loosing out on set arithmetic. I would guess from the limited number of valid einsum indices (48) a bit array set implementation would be the easiest and likely very fast. Although, I have not thought about this before. Does anyone have an opinion before I dig into it?

@juliantaylor
Copy link
Contributor

a set of real world use cases where this is too slow would be useful first, to better judge what exactly needs to be speed up. Then we can decide how to do it.
I'd focus more on the features and tests, optimization can be done later.

for cython we could simple concatenate a bunch of files before running cython to get around the duplication issue.

@juliantaylor
Copy link
Contributor

from a brief look at the code it does not look like code that would profit much from using cython, it definitely needs to be profiled before deciding on the way to go. Unless there is one really obvious bottleneck that needs typing it looks like it should stay python or be really low level C using special purpose data structures instead of generic python ones.

@shoyer
Copy link
Member

shoyer commented Feb 9, 2015

Is there a straightforward way to use internal routines written in C for einsum (e.g., for argument parsing) on the Python side? That was part of the reason why I was thinking it might be desirable to do this from Cython/C.

@jaimefrio
Copy link
Member

In the numpy.lib module you have np.interp, which is a wrapper around a Python function written in C, numpy.core.multiarray.interp that does the heavy lifting. This C functions lives in numpy/core/src/multiarray/compiled_base.c and is made part of numpy.core.multiarray in numpy/core/src/multiarray/multiarraymodule.c. Unless we agree on a better place, I think compiled_base.c is the place to put non-core functionality to be used elsewhere.

As for writing stuff in Python first, right now np.einsum is defined in numpy/core/numeric.py to be numpy.core.multiarray.einsum, but we might as well add an einsum function there that did some preliminary work and ended up calling the C function afterwards.

@dgasmith
Copy link
Contributor Author

If next up is dropping this into numpy/core/numeric.py the first thing to do is likely the inevitable input parsing rewrite:

  • Move the numpy/core/src/multiarray/multiarraymodule.c function array_einsum to einsum_parse_input and instead of calling PyArray_EinsteinSum (the actual einsum function) return a tuple or dictionary of the parsed input.
  • Make a new wrapper in multiarraymodule.c to PyArray_EinsteinSum that takes the above dictionary or tuple.

I would prefer a dictionary over a tuple for passing information, but either is fine. My main concern would be ensuring that reference counting is still handled correctly.

If this sounds fine, I can try to write this out.

@njsmith
Copy link
Member

njsmith commented Feb 11, 2015

That sounds fine. Or instead of a dict/tuple (which are relatively
heavyweight python objects) it might make sense to just directly return
whatever c representation is most convenient to use downstream. (No point
in creating a fancy dict representation if we just end up having to "parse"
the dict later.)

But I haven't actually looked at the code, so for all I know it's already
using dicts internally :-)
On 11 Feb 2015 10:49, "Daniel Smith" notifications@github.com wrote:

If next up is dropping this into numpy/core/numeric.py the first thing to
do is likely the inevitable input parsing rewrite:

  • Move the numpy/core/src/multiarray/multiarraymodule.c function
    array_einsum to einsum_parse_input and instead of calling
    PyArray_EinsteinSum (the actual einsum function) return a tuple or
    dictionary of the parsed input.
  • Make a new wrapper in multiarraymodule.c to PyArray_EinsteinSum that
    takes the above dictionary or tuple.

I would prefer a dictionary over a tuple for passing information, but
either is fine. My main concern would be ensuring that reference counting
is still handled correctly.

If this sounds fine, I can try to write this out.


Reply to this email directly or view it on GitHub
#5488 (comment).

@charris charris added this to the 1.10.0 release milestone Apr 7, 2015
@charris
Copy link
Member

charris commented May 12, 2015

On a quick looksie, this looks sort of, but not quite, ready. In particular, the question of whether it should be an improved einsum, which requires complete compatibility. I'm tempted to put this off to Numpy 1.11.0 unless it becomes active again.

@dgasmith
Copy link
Contributor Author

@charris The code that actually does the optimization is fairly close to done. Where I keep running into issues is trying to take advantage of the current C parsing. We only really need the einsum string returned to the python side, unfortunately part of the parsing is written in the primary einsum code and is not easily extractable. I am not quite sure I can get the information back out without making fairly substantial changes to the code, which is not something I would really like to do.

The other option is for optimized einsum contractions to have its own parsing python side. If you need to worry about optimization you definitely will not notice the extra overhead. Although, it would create two codes that do the same thing.

@charris
Copy link
Member

charris commented May 12, 2015

Yeah, the einsum c parsing is a handful. I don't mind doing parsing python side. I guess the question is do we add a new function, or maybe a keyword. I wouldn't mind a bit of help here from the others who have looked at the code: @shoyer @njsmith @juliantaylor @jaimefrio .

@njsmith
Copy link
Member

njsmith commented May 13, 2015

From the user point of view I really think optimization should be as
transparent as possible. There's no reason to be shy about changing the
existing code -- the goal is to have one good einsum, not two that are good
at different pieces... maybe that means moving the parsing entirely into
python, e.g.

@charris https://github.com/charris The code that actually does the
optimization is fairly close to done. Where I keep running into issues is
trying to take advantage of the current C parsing. We only really need the
einsum string returned to the python side, unfortunately part of the
parsing is written in the primary einsum code and is not easily
extractable. I am not quite sure I can get the information back out without
making fairly substantial changes to the code, which is not something I
would really like to do.

The other option is for optimized einsum contractions to have its own
parsing python side. If you need to worry about optimization you definitely
will not notice the extra overhead. Although, it would create two codes
that do the same thing.


Reply to this email directly or view it on GitHub
#5488 (comment).

@dgasmith
Copy link
Contributor Author

@njsmith I was originally planning to leave optimization off by default. It is possible that turning optimization on would break some einsum use cases due to memory overhead. I believe this is an unlikely scenario, but worth considering.

@dgasmith
Copy link
Contributor Author

I started working on merging this into the einsum function with python side parsing:
https://github.com/dgasmith/numpy/tree/opt_einsum3

Thinking on this a bit more, all einsum kwargs besides out are invalid for optimized einsum and realistically optimization should only be attempted for arrays of dtype=np.double. It seems most of the changes to einsum make it more flexible, optimization will make it significantly less so.

@njsmith
Copy link
Member

njsmith commented May 20, 2015

It's fine if einsum is faster in simple cases and slower if more complex
features are used.
On May 19, 2015 6:18 PM, "Daniel Smith" notifications@github.com wrote:

I started working on merging this into the einsum function with python
side parsing:
https://github.com/dgasmith/numpy/tree/opt_einsum3

Thinking on this a bit more, all einsum kwargs besides out are invalid
for optimized einsum and realistically optimization should only be
attempted for arrays of dtype=np.double. It seems most of the changes to
einsum make it more flexible, optimization will make it significantly less
so.


Reply to this email directly or view it on GitHub
#5488 (comment).

@charris
Copy link
Member

charris commented Jun 10, 2015

@dgasmith Keep us informed of progress. I might put this to numpy 1.11.

@charris charris added this to the 1.11.0 release milestone Jun 14, 2015
@dgasmith
Copy link
Contributor Author

@charris The complete overhead for computing a path (parsing the input, finding the path, and organization that data) with default options is about 150us. Looks like einsum takes a minimum of 5-10us to call as a reference. So the worst case scenario would be that the optimization overhead makes einsum 30x slower. Personally id go for turning optimization off by default and then revisiting if someone tackles the parsing issue to reduce the overhead.

For my first question I added a function called np.einsum_path that prints out a detailed map of what an optimized path does. This can be rolled into np.einsum by adding something like return_path to np.einsum itself. I don't know how sensitive you guys are about adding to the base namespace.

Should I revert the add_doc documentation, but leave the documentation in einsumfunc.py? Not quite sure what your are detailing here.

I don't really have an opinion on most documentation/filename issues. Just tell me what needs fixing.

@charris
Copy link
Member

charris commented Sep 13, 2016

Given the call overhead, I agree that turning off optimization by default is the way to go.

Hmm, the examples add_doc are correct but now refer to the "new" einsum. I note that it would be very easy to rename multiarray.einsum, just change the exported name in multiarraymodule.c. Maybe call it "c_einsum" or even "_einsum". Note that I am just tossing out suggestions here as having identical names for two different functions, both of which we would like to document, just seems awkward.

I am not a fan of multiple output types depending on passed parameters. It seems simple enough to call np.einsum_path for the curious who want to explore just how neat it is :)

@shoyer @njsmith @seberg This is getting close to a merge, I'd welcome any final input at this point.

@charris
Copy link
Member

charris commented Sep 13, 2016

Note that at some future date it might make sense to call einsum_path from the C code and put the rest of the argument parsing and the contraction loop in array_einsum. I think moving things around like that in the future would not be a big compatibility problem and might cut down on the call overhead.

@seberg
Copy link
Member

seberg commented Sep 13, 2016

If it is very simple, another choice might be to default to no optimization based on some simple heuristic such as the input size. In that case it could be even neater to call back into python for optimization instead the other way around. However, I guess it should not make a big difference.

The einsum path information seems useful if you have a bigger problem and want to know how to approach it. I could also imagine just makeing it np.lib.einsum.einsum_path (or whatever) and pointing to it from the documentation, though.

Have to look more at this to give real input though, will try to do that later.

Also accepts an explicit contraction list from the ``np.einsum_path``
function.
See ``np.einsum_path`` for more details.
memory_limit : int, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a parameter that only has an effect if another a parameter is set is not very nice
maybe accepting a tuple for optimize with the first being the optimize value and the second the memory limit is better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can do that. Id like to make it so that someone can simply say optimize=True as well. The default memory_limit is pretty reasonable and is fine for 99% of cases I think.

@dgasmith
Copy link
Contributor Author

dgasmith commented Sep 13, 2016

@seberg The biggest problem I've had with simple heuristics is parsing the input python-side is quite slow by itself (30-100us), at that point computing the path isn't a huge overhead. Parsing C-side is really quite beneficial, disentangling einsum from its parsing C-side seemed like a fairly large project however.

One possibility would be to add a series of heuristics that check at different parsing stages if you should default back to c_einsum or not. This has always seemed more complex than its worth at this stage.

@juliantaylor I liked the idea of removing memory_limit as a separate keyword so I rearranged that a bit. The optimize keyword is getting a bit crowded, let me know if you see any issues with the current setup.

@charris Moving the c_einsum around a bit looks good to me. Would we add the full documentation back to c_einsum, or only a part of it?

@charris
Copy link
Member

charris commented Sep 19, 2016

@dgasmith Let's rename "einsum", line 4142 in multiarraymodule.c to "_einsum" and fix the import statement in numpy/core/einsumfunc.py. The original documentation with a modification mentioning that the function is private and a see also reference to np.einsum should be good enough.

EDIT: At some point we might want to move bits around, but I think we should be able to do that without causing problems. AFAICT, no one currently uses multiarray.einsum directly except for the import in numeric.py.

@charris
Copy link
Member

charris commented Sep 23, 2016

@dgasmith I think we can put this in once the remaining details are taken care of.

@homu
Copy link
Contributor

homu commented Sep 25, 2016

☔ The latest upstream changes (presumably #7980) made this pull request unmergeable. Please resolve the merge conflicts.

@@ -0,0 +1,979 @@
from numpy.core.multiarray import einsum as c_einsum
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This need some module documentation and the standard lead in

"""
Implementation of optimized einsum.

"""
from __future__ import division, absolute_import, print_function

overall_contraction = input_subscripts + "->" + output_subscript
header = ("scaling", "current", "remaining")

speedup = naive_cost / float(opt_cost)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Python 3 division (see above), the float is not needed.

def einsum(*operands, **kwargs):
"""
einsum(subscripts, *operands, out=None, dtype=None, order='K',
casting='safe', optimize=True, memory_limit=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimize default is False, not True.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory_limit seems to be gone.

Specifies the subscripts for summation.
operands : list of array_like
These are the arrays for the operation.
out : ndarray, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ndarray, None}

These are the arrays for the operation.
out : ndarray, optional
If provided, the calculation is done into this array.
dtype : data-type, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be {data-type, None}.

These are the arrays for the operation.
out : ndarray, optional
If provided, the calculation is done into this array.
dtype : data-type, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs the default value documented.

* 'safe' means only casts which can preserve values are allowed.
* 'same_kind' means only safe casts or casts within a kind,
like float64 to float32, are allowed.
* 'unsafe' means any data conversions may be done.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need default documented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charris I could not find another place were casting had a default or where there was extra text after a list. So I hope the following is parsed correctly. Should the default be added to other casting descriptions as well?

Copy link
Member

@charris charris Sep 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is 'safe', implemented in array_einsum. Not sure what to say for the dtype argument, maybe smallest common type compatible with safe casting.

We try to document the default values of all optional arguments, although we evidently haven't been perfect about that. So the answer to the last guestion is "yes".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion it's actually a cleaner style to document default values only in function signatures. Otherwise, it's easy to get out of sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we added defaults to the text for dtype, it would probably best to add them for all signatures. Might be simpler to skip the defaults for now and open a new issue to decide either way so that it can be applied evenly.

Controls if intermediate optimization should occur. No optimization
will occur if False and True will default to the 'greedy' algorithm.
Also accepts an explicit contraction list from the ``np.einsum_path``
function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs default documented. I wouldn't put See ... on a separate line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to memory_limit, it is still part of the function signature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I popped the memory_limit argument as @juliantaylor recommended. I updated the documentation, but didnt take out memory_limit in other places. Ill fix this up as well.

@charris
Copy link
Member

charris commented Sep 25, 2016

The conflict is the release notes. Three things that need fixing/checking

  • Compatibility header at beginning of einsumfunc.
  • Make sure documented signatures are correct.
  • Optional parameters need to have the default value listed.

Changing my previous suggestion, c_einsum instead of _einsum for the multiarray version. The original documentation can be obtained from master with git co master numpy/add_newdocs.py and edited for the new name.

If you don't have time for this, let me know and I will fix up the few remaining nits.

@dgasmith
Copy link
Contributor Author

@charris Ill clean up the remaining issues Monday morning. Thanks for looking through this!

@dgasmith dgasmith force-pushed the opt_einsum branch 2 times, most recently from 83e9433 to 9c8ad81 Compare September 26, 2016 16:17
Returns
-------
output : ndarray
The calculation based on the Einstein summation convention.

See Also
--------
dot, inner, outer, tensordot
einsum, dot, inner, outer, tensordot

Notes
-----
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, the examples need changing, but I can do that later. I expect this function might be changing a bit down the line in any case.

@charris charris merged commit 86c780d into numpy:master Sep 28, 2016
@charris
Copy link
Member

charris commented Sep 28, 2016

OK, let's get this in before the release notes change again. Thanks @dgasmith.

@dgasmith
Copy link
Contributor Author

@charris Great, thanks for getting this merged in! Please ping me for any required fixes.

I will probably have some tweaks to the algorithms down the line and ill probably look into adding BLAS calls back as well.

@shoyer
Copy link
Member

shoyer commented Sep 28, 2016

This is awesome, thanks @dgasmith for all your hard work on this one!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet