Heterogeneous output in Lambdify #112

bjodah · 2016-11-07T00:58:01Z

To address #107.

There are some test failures still. I'll update and ping once this is ready for review.
I changed my mind and this now only returns a tuple if multiple expressions are given (e.g. a vector and a matrix). This means that this should be a non-breaking change.

isuruf · 2016-11-08T09:06:43Z

@bjodah, Lambdify's __call__ has considerable overhead.

Here's some python code that I wrote,

outputs_ravel = list(itertools.chain(*outputs))
try:
    cb = symengine.Lambdify(inputs, outputs_ravel, backend="llvm")
except TypeError:
    cb = symengine.Lambdify(inputs, outputs_ravel)
def func(*args):
    result = []
    n = numpy.empty(len(outputs_ravel))
    t = cb.unsafe_real(np.concatenate([arg.ravel() for arg in args]), n)
    start = 0
    for output in outputs:
        elems = reduce(mul, output.shape)
        result.append(n[start:start+elems].reshape(output.shape))
        start += elems
    return result
return func

This makes SymEngine as fast as PyDy's cythonize. Updated the timings here, symengine/symengine#1094 (comment)

bjodah · 2016-11-08T10:25:12Z

@isuruf
Thanks, I'll need to make the tests pass and then I'll try to optimize Lambdify.__call__.
One approach would be to make NumPy a hard dependency of symengine.py, how would you feel about that? cdef'ing arguments as cnp.ndarray will most likely remove a large portion of the overhead - constructing memoryviews is quite expensive, and dealing with nested python-lists/tuples even more so. The code around Lambdify would also be considerably simpler.

bjodah · 2017-02-26T20:23:20Z

@isuruf I added the -a flag to cython and it looks like the overhead in __call__ is related to the use
of memoryviews. Should we make NumPy a hard dependency of Lambdify? (that would make the code less complex and considerably faster). I could of course code a "fast path" for NumPy but I feel like it's getting out of hand already trying to support Cython's array, Python's array.array and NumPy arrays.

isuruf · 2017-02-27T03:15:13Z

I'm fine with making numpy a hard dependency of just Lambdify

isuruf · 2017-02-27T03:17:07Z

@bjodah, note that in #112 (comment) I've used unsafe_real and it has memoryviews in it, so there might be another issue as well.

bjodah · 2017-06-13T11:55:10Z

I'm looking into using SymEngine in the codegen tutorial at SciPy.
So I started looking into this branch again.

But when I try to compile symengine (not the python wrapper) with llvm I get a test failure:

33/37 Test #33: test_lambda_double ...............***Failed    0.00 sec

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test_lambda_double is a Catch v1.1 b14 (develop) host application.
Run with -? for options

-------------------------------------------------------------------------------
Check llvm and lambda are equal
-------------------------------------------------------------------------------
/opt/symengine/symengine/tests/eval/test_lambda_double.cpp:173
...............................................................................

/opt/symengine/symengine/tests/eval/test_lambda_double.cpp:173: FAILED:
due to a fatal error condition:

I have created this Dockerized environment which reproduces the error:
https://github.com/bjodah/gsoc2017-bjodah/tree/master/se_llvm

I tried to follow the Travis scripts closely but obviously I failed. Any idea what I am doing wrong?

EDIT: I will use symengine from the symengine channel instead.

isuruf · 2017-06-13T14:41:12Z

I think this is because llvm was compiled with gcc 4.8.5 and you are using gcc 5.4.0. I'm surprised that there were no link errors.

bjodah · 2017-06-13T14:50:56Z

could be, modifying CPPFLAGS did not help:

$ git diff
diff --git a/se_llvm/20_compile_symengine.sh b/se_llvm/20_compile_symengine.sh
index e116ece..a3545d6 100755
--- a/se_llvm/20_compile_symengine.sh
+++ b/se_llvm/20_compile_symengine.sh
@@ -11,6 +11,7 @@ export CXX="ccache clang++"
 export CC="ccache clang"
 export CCACHE_DIR=/ccache
 ccache -M 400M
+export CPPFLAGS="-D_GLIBCXX_USE_CXX11_ABI=0"
 cmake -DCMAKE_INSTALL_PREFIX=$our_install_dir -DCMAKE_BUILD_TYPE:STRING="Release" -DWITH_LLVM:BOOL=ON -DINTEGER_CLASS:STRING=gmp -DBUILD_SHARED_LIBS:BOOL=ON -DWITH_MPC=yes $SOURCE_DIR
 make -j 4
 make install

gives same error. I thought setting CXX and CC was enough (I set them to use clang with ccache):

+-------------------------------+
| Configuration results SUMMARY |
+-------------------------------+

CMAKE_C_COMPILER:   /opt/se/bin/ccache
CMAKE_CXX_COMPILER: /opt/se/bin/ccache
CMAKE_BUILD_TYPE: Release
...

Travis CI tests for Python 3.2 using gcc 3.7 give an error: Cython generated C contains uninitialized references.

bjodah · 2017-06-14T13:57:07Z

Current status of benchmark/Lambdify.py:

$ CFLAGS="-march=native -ffast-math -O3" python benchmarks/Lambdify.py 
SymEngine (lambda double) speed-up factor (higher is better) vs sympy:       5.1172
symengine (LLVM)          speed-up factor (higher is better) vs sympy:       5.7401
symengine (ManualLLVM)    speed-up factor (higher is better) vs sympy:       6.4252
benchmarks/Lambdify.py:78: UserWarning: Cython code for Lambdify.__call__ is slow.
  warnings.warn("Cython code for Lambdify.__call__ is slow.")
Hard-coded Cython code    speed-up factor (higher is better) vs sympy:       23.096

(speed-up factors below 1 before moving use_numpy kwarg from __call__ to __init__. I am now investigating numpy array vs. memoryviews.

isuruf · 2017-06-14T15:45:26Z

@bjodah, can you post the timings without CFLAGS="-march=native -ffast-math -O3" ? We haven't turned on aggressive optimizations in LLVM.

bjodah · 2017-06-14T15:49:13Z

Sure, that was a factor of ~11 (my local uncommited state using numpy.ndarray is currently seeing ~7.5 for symengine (LLVM), need to fix some tests still)

bjodah · 2017-06-15T12:39:52Z

Hmm... those numbers were against SymPy master. Using sympy-1.0 I now get:

Hard-coded Cython code          speed-up factor (higher is better) vs sympy:       8.7338

so apparently SymPy master's lambdify has had a regression (1/3 of the performance).

bjodah · 2017-06-15T13:04:53Z

Comparing against sympy-1.0 & using -O1:

$ CFLAGS="-O1" python benchmarks/Lambdify.py 
SymEngine (lambda double)       speed-up factor (higher is better) vs sympy:       1.9232
symengine (lambda double + CSE) speed-up factor (higher is better) vs sympy:      0.98464
symengine (LLVM)                speed-up factor (higher is better) vs sympy:       2.3853
symengine (ManualLLVM)          speed-up factor (higher is better) vs sympy:       2.2815
Hard-coded Cython code          speed-up factor (higher is better) vs sympy:       3.8374

bjodah · 2017-06-15T14:27:49Z

I think making NumPy a hard requirement for Lambdify has a some trade-offs:

slightly better performance
arguably more readable code (source of Lambdify.__call__ more accessible now)
a bit more tricky setup.py -- we need some compile option WITH_NUMP or so

on that last point: I have to implement that still. What is the best approach for that? (if we want to make NumPy i hard requirement that is). I'm thinking putting the whole Lambdify definition inside a block?, i.e.

IF WITH_NUMPY:
    cdef class Lambdify:
        ...

…erence

bjodah · 2017-06-15T14:35:39Z

(regarding the sympy.lambdify "regression": there was none, I just needed to specify modules='math')

isuruf · 2017-06-15T15:29:01Z

If you have some time, can you run this example, pydy/pydy#360 ? If not, I'll be able to run it over the weekend.

isuruf · 2017-06-15T15:31:14Z

putting the whole Lambdify definition inside a block

I'm not sure if cimports work inside such blocks. If it does, then that's probably the best approach

bjodah · 2017-06-17T17:11:31Z

For links = 10 and time_steps=1000 using SymEngine:

The derivation took 0.18646 seconds.

Generating with lambdify method.
--------------------------------
The code generation took 2.79032 seconds.
ODE integration took 4.79481 seconds.

not using SymEngine (note that setting USE_SYMENGINE=0 causes sympy to use symengine...):

The derivation took 3.05501 seconds.

Generating with lambdify method.
--------------------------------
The code generation took 0.81127 seconds.
ODE integration took 390.68950 seconds.

Generating with cython method.
------------------------------
The code generation took 69.66981 seconds.
ODE integration took 4.25866 seconds.

isuruf · 2017-06-17T18:37:14Z

I ran the benchmark with master, PR + ManualLLVM, PR + LLVM
Used the following patch to use this PRs multiple output feature.

@@ -682,24 +688,9 @@ class LambdifyODEFunctionGenerator(ODEFunctionGenerator):
 
         if USE_SYMENGINE:
             import symengine
-            outputs_ravel = list(chain(*outputs))
-            try:
-                cb = symengine.Lambdify(inputs, outputs_ravel, backend="llvm")
-            except (TypeError, ValueError):
-                # TypeError if symengine is old, ValueError if symengine is
-                # not compiled with llvm support
-                cb = symengine.Lambdify(inputs, outputs_ravel)
+            cb = symengine.Lambdify(inputs, *outputs)
             def func(*args):
-                result = []
-                n = np.empty(len(outputs_ravel))
-                cb.unsafe_real(np.concatenate([a.ravel() for a in args]), n)
-                start = 0
-                for output in outputs:
-                    elems = reduce(mul, output.shape)
-                    result.append(n[start : (start + elems)]
-                                        .reshape(output.shape))
-                    start += elems
-                return result
+                return cb(np.concatenate([a.ravel() for a in args]))
             return func
         else:
             modules = [{'ImmutableMatrix': np.array}, 'numpy']

ManualLLVM is still faster,

isuruf · 2017-06-17T18:38:54Z

Oops, pressed enter before finishing commenting.

Pendulum with 10 links.
=======================
The derivation took 0.16069 seconds.

Generating with lambdify method.
--------------------------------
The code generation took 0.19171 seconds.
ODE integration took 5.25970 seconds.

Pendulum with 10 links.
=======================
The derivation took 0.15894 seconds.

Generating with lambdify method.
--------------------------------
The code generation took 0.19862 seconds.
ODE integration took 4.90780 seconds.

Compared to master, this PR is better.

Pendulum with 10 links.
=======================
The derivation took 0.16067 seconds.

Generating with lambdify method.
--------------------------------
The code generation took 0.19473 seconds.
ODE integration took 5.08710 seconds.

bjodah · 2017-06-17T19:52:48Z

OK, so I extracted a small test case reproducing ManualLLVM being faster:

$ python3 benchmarks/Lambdify_6_links_rhs.py 
[1, 1, 1, 1, 1, 1, 1, 21*cos(1), -6*cos(1), -5*cos(1), -4*cos(1), -3*cos(1), -2*cos(1), -cos(1)]
SymEngine (lambda double)       speed-up factor (higher is better) vs sympy:       2.6123
symengine (LLVM)                speed-up factor (higher is better) vs sympy:       8.1947
symengine (ManualLLVM)          speed-up factor (higher is better) vs sympy:       8.4328
benchmarks/Lambdify_6_links_rhs.py:94: UserWarning: Cython code for Lambdify.__call__ is slow.
  warnings.warn("Cython code for Lambdify.__call__ is slow.")
Hard-coded Cython code          speed-up factor (higher is better) vs sympy:        25.12

Will work on improving that benchmark.

isuruf · 2017-06-17T20:04:22Z

That benchmark also has one output. We should get a benchmark for heterogeneous output in Lambdify.

bjodah · 2017-06-17T20:06:59Z

you're right. I'll update the benchmark (tomorrow or Monday) to also calculate: f(t), the Jacobian matrix of f(t) and df/dt in the same call.

bjodah · 2017-06-18T06:56:44Z

__call__ is now running slightly faster than ManualLLVM for heterogeneous output:

python3 benchmarks/heterogenous_output_Lambdify.py
SymEngine (lambda double)       speed-up factor (higher is better) vs sympy:       40.702
symengine (LLVM)                speed-up factor (higher is better) vs sympy:       248.95
symengine (ManualLLVM)          speed-up factor (higher is better) vs sympy:       244.31

isuruf · 2017-06-18T06:58:08Z

symengine/lib/symengine_wrapper.pyx

+        cdef int *accum_out_sizes
+        cdef object numpy_dtype
+
+        def __cinit__(self, args, *exprs, bool real=True):


Keyword-only arguments are python 3 only right?

I believe keyword only args are when **kwargs is also present.

See https://www.python.org/dev/peps/pep-3102/

Syntactically, the proposed changes are fairly simple. The first change is to allow regular arguments to appear after a varargs argument: def sortwords(*wordlist, case_sensitive=False): ...

~~Ah, I see, but Cython supports it even for Python 2 right?~~
Just checked: kw-only arguments are supported even in Python 2 when using Cython.

bjodah · 2017-06-19T12:08:50Z

@isuruf I think this might be ready for review

isuruf

Looks good to me.

@certik, can you have a look? This PR adds a hard dependency on numpy for Lambdify functionality to keep the code simpler and a little bit faster.

isuruf · 2017-06-19T12:57:54Z

symengine/__init__.py

+if have_numpy:
+    from .lib.symengine_wrapper import Lambdify, LambdifyCSE
+
+    def lambdify(args, exprs):


It's better to have the real and backend variables here.

isuruf · 2017-06-19T12:59:11Z

CMakeLists.txt

@@ -59,6 +58,14 @@ else()
    set(HAVE_SYMENGINE_LLVM False)
 endif()

+if(WITH_NUMPY)


WITH_NUMPY should be a cached variable.

isuruf · 2017-06-19T13:11:25Z

CMakeLists.txt

@@ -59,6 +58,14 @@ else()
    set(HAVE_SYMENGINE_LLVM False)
 endif()

+if(WITH_NUMPY)
+    find_package(NumPy REQUIRED)
+    include_directories(${NUMPY_INCLUDE_PATH})


Are we not linking the numpy libraries?

I just checked and there are no symbols from numpy libraries linked in. It seems we are not using anything that requires linking. I wonder whether that means we can build with one numpy version and have it work on other numpy versions.

Yes, it is my understanding that the NumPy C API is largely back-compatible. And no, the numpy functions used in the code uses the python layer (e.g. numpy.empty). It is possible that we could gain some performance by directly calling the NumPy C API in __call__ (that would then require linking).

If so, I think we can just use numpy.ndarray.ctypes to get a array pointer and have numpy only as a runtime requirement. I'll have a look to see if this is possible.

isuruf · 2017-06-19T13:12:44Z

symengine/lib/symengine_wrapper.pyx

+        cdef list out_shapes
+        cdef readonly bint real
+        cdef readonly int n_exprs
+        cdef int *out_sizes


Can this be a std::vector?

certik

That looks good to me. Depending on NumPy is fine I think.

If people object against that, we can try to make it optional, but I think sympy and symengine belongs to the scipy stack, and all the scipy stack depends on numpy. So I have no problems with it.

certik · 2017-06-21T04:38:15Z

To make it clear --- symengine.py belongs to the scipy stack. symengine itself should only depend on a few well maintained C++ libraries.

np.arange creates a numpy array with long int elements which is 32 bit on windows.

isuruf · 2017-06-22T03:01:16Z

Thanks for the PR, @bjodah

isuruf · 2017-06-23T11:17:26Z

@bjodah, packaging symengine for conda with numpy as a build-time dependency means the conda package has to be compiled for each numpy version. I'd like to avoid that if possible with numpy only as a runtime requirement. Can you try my branch at https://github.com/isuruf/symengine.py/tree/memview and see if memviews have a performance penalty? I ran the benchmarks myself and for me I don't see much difference.

isuruf · 2017-06-24T13:34:32Z

ping, @bjodah

bjodah · 2017-06-24T14:13:58Z

@isuruf just tried it. The performance is the same on my machine. +1 for using memviews

Stub for heterogeneous output in Lambdify

a1987d8

bjodah added 2 commits February 26, 2017 11:46

Merge branch 'master' into heterogeneous-Lambdify

68b88c1

Fix tests for heterogenous Lambdify

2c2abad

Merge branch 'master' into heterogeneous-Lambdify

723b0d2

bjodah added 2 commits June 14, 2017 14:06

Fix Lambdify for heterogeneous input

3b1fae7

Workaround not to use cdef vector[int] in cdef class.

dc1877e

Travis CI tests for Python 3.2 using gcc 3.7 give an error: Cython generated C contains uninitialized references.

bjodah force-pushed the heterogeneous-Lambdify branch from 26ddbaf to dc1877e Compare June 14, 2017 12:53

bjodah added 2 commits June 14, 2017 15:13

Benchmark llvm backend

e5c5e83

Set use_numpy at init of Lambdify (6x performance gain)

49cb51e

bjodah added 2 commits June 15, 2017 14:42

Drop support for array.array & cython.view.array

b713a61

Optimize Lambdify.__call__ performance

0395740

bjodah mentioned this pull request Jun 15, 2017

Use numpy in default asv config (for lambdify etc.) sympy/sympy_benchmarks#39

Merged

Specify sympy.lambdify(..., module='math') for Lambdify benchmark ref…

ca777f6

…erence

Fix FindNumPy.cmake

c52d52a

Add benchmark from pydy (6 links)

18fc670

bjodah added 2 commits June 18, 2017 08:40

Handle SymPy matrix in Lambdify. Update benchmark.

1a015f2

Optimizing _Lambdify.__call__

3b2182f

bjodah force-pushed the heterogeneous-Lambdify branch from 1ca217c to 3b2182f Compare June 18, 2017 06:56

isuruf reviewed Jun 18, 2017

View reviewed changes

Fix lambdify & LambdifyCSE

871017c

bjodah force-pushed the heterogeneous-Lambdify branch from 0f333e9 to 871017c Compare June 18, 2017 22:14

bjodah changed the title ~~[WIP] Stub for heterogeneous output in Lambdify~~ Heterogeneous output in Lambdify Jun 19, 2017

isuruf reviewed Jun 19, 2017

View reviewed changes

Lambdify: env chosen backend, std::vector, kwargs in lambdify

90739ec

isuruf requested a review from certik June 20, 2017 05:19

certik approved these changes Jun 21, 2017

View reviewed changes

isuruf added 2 commits June 21, 2017 18:43

Test with numpy

e465ad2

Avoid overflow in windows in test

0154f9c

np.arange creates a numpy array with long int elements which is 32 bit on windows.

isuruf merged commit 8aff3c8 into symengine:master Jun 22, 2017

bjodah deleted the heterogeneous-Lambdify branch June 22, 2017 15:40

Heterogeneous output in Lambdify #112

Heterogeneous output in Lambdify #112

Uh oh!

Conversation

bjodah commented Nov 7, 2016

Uh oh!

isuruf commented Nov 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjodah commented Nov 8, 2016

Uh oh!

bjodah commented Feb 26, 2017

Uh oh!

isuruf commented Feb 27, 2017

Uh oh!

isuruf commented Feb 27, 2017

Uh oh!

bjodah commented Jun 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

isuruf commented Jun 13, 2017

Uh oh!

bjodah commented Jun 13, 2017

Uh oh!

bjodah commented Jun 14, 2017

Uh oh!

isuruf commented Jun 14, 2017

Uh oh!

bjodah commented Jun 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjodah commented Jun 15, 2017

Uh oh!

bjodah commented Jun 15, 2017

Uh oh!

bjodah commented Jun 15, 2017

Uh oh!

bjodah commented Jun 15, 2017

Uh oh!

isuruf commented Jun 15, 2017

Uh oh!

isuruf commented Jun 15, 2017

Uh oh!

bjodah commented Jun 17, 2017

Uh oh!

isuruf commented Jun 17, 2017

Uh oh!

isuruf commented Jun 17, 2017

Uh oh!

bjodah commented Jun 17, 2017

Uh oh!

isuruf commented Jun 17, 2017

Uh oh!

bjodah commented Jun 17, 2017

Uh oh!

bjodah commented Jun 18, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bjodah Jun 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bjodah commented Jun 19, 2017

Uh oh!

isuruf left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

isuruf commented Nov 8, 2016 •

edited

Loading

bjodah commented Jun 13, 2017 •

edited

Loading

bjodah commented Jun 14, 2017 •

edited

Loading

bjodah Jun 18, 2017 •

edited

Loading

bjodah commented Jun 24, 2017 •

edited

Loading