New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heterogeneous output in Lambdify #112

Merged
merged 23 commits into from Jun 22, 2017

Conversation

Projects
None yet
3 participants
@bjodah
Contributor

bjodah commented Nov 7, 2016

To address #107.

There are some test failures still. I'll update and ping once this is ready for review.
I changed my mind and this now only returns a tuple if multiple expressions are given (e.g. a vector and a matrix). This means that this should be a non-breaking change.

@isuruf

This comment has been minimized.

Show comment
Hide comment
@isuruf

isuruf Nov 8, 2016

Member

@bjodah, Lambdify's __call__ has considerable overhead.

Here's some python code that I wrote,

outputs_ravel = list(itertools.chain(*outputs))
try:
    cb = symengine.Lambdify(inputs, outputs_ravel, backend="llvm")
except TypeError:
    cb = symengine.Lambdify(inputs, outputs_ravel)
def func(*args):
    result = []
    n = numpy.empty(len(outputs_ravel))
    t = cb.unsafe_real(np.concatenate([arg.ravel() for arg in args]), n)
    start = 0
    for output in outputs:
        elems = reduce(mul, output.shape)
        result.append(n[start:start+elems].reshape(output.shape))
        start += elems
    return result
return func

This makes SymEngine as fast as PyDy's cythonize. Updated the timings here, symengine/symengine#1094 (comment)

Member

isuruf commented Nov 8, 2016

@bjodah, Lambdify's __call__ has considerable overhead.

Here's some python code that I wrote,

outputs_ravel = list(itertools.chain(*outputs))
try:
    cb = symengine.Lambdify(inputs, outputs_ravel, backend="llvm")
except TypeError:
    cb = symengine.Lambdify(inputs, outputs_ravel)
def func(*args):
    result = []
    n = numpy.empty(len(outputs_ravel))
    t = cb.unsafe_real(np.concatenate([arg.ravel() for arg in args]), n)
    start = 0
    for output in outputs:
        elems = reduce(mul, output.shape)
        result.append(n[start:start+elems].reshape(output.shape))
        start += elems
    return result
return func

This makes SymEngine as fast as PyDy's cythonize. Updated the timings here, symengine/symengine#1094 (comment)

@bjodah

This comment has been minimized.

Show comment
Hide comment
@bjodah

bjodah Nov 8, 2016

Contributor

@isuruf
Thanks, I'll need to make the tests pass and then I'll try to optimize Lambdify.__call__.
One approach would be to make NumPy a hard dependency of symengine.py, how would you feel about that? cdef'ing arguments as cnp.ndarray will most likely remove a large portion of the overhead - constructing memoryviews is quite expensive, and dealing with nested python-lists/tuples even more so. The code around Lambdify would also be considerably simpler.

Contributor

bjodah commented Nov 8, 2016

@isuruf
Thanks, I'll need to make the tests pass and then I'll try to optimize Lambdify.__call__.
One approach would be to make NumPy a hard dependency of symengine.py, how would you feel about that? cdef'ing arguments as cnp.ndarray will most likely remove a large portion of the overhead - constructing memoryviews is quite expensive, and dealing with nested python-lists/tuples even more so. The code around Lambdify would also be considerably simpler.

@bjodah

This comment has been minimized.

Show comment
Hide comment
@bjodah

bjodah Feb 26, 2017

Contributor

@isuruf I added the -a flag to cython and it looks like the overhead in __call__ is related to the use
of memoryviews. Should we make NumPy a hard dependency of Lambdify? (that would make the code less complex and considerably faster). I could of course code a "fast path" for NumPy but I feel like it's getting out of hand already trying to support Cython's array, Python's array.array and NumPy arrays.

Contributor

bjodah commented Feb 26, 2017

@isuruf I added the -a flag to cython and it looks like the overhead in __call__ is related to the use
of memoryviews. Should we make NumPy a hard dependency of Lambdify? (that would make the code less complex and considerably faster). I could of course code a "fast path" for NumPy but I feel like it's getting out of hand already trying to support Cython's array, Python's array.array and NumPy arrays.

@isuruf

This comment has been minimized.

Show comment
Hide comment
@isuruf

isuruf Feb 27, 2017

Member

I'm fine with making numpy a hard dependency of just Lambdify

Member

isuruf commented Feb 27, 2017

I'm fine with making numpy a hard dependency of just Lambdify

@isuruf

This comment has been minimized.

Show comment
Hide comment
@isuruf

isuruf Feb 27, 2017

Member

@bjodah, note that in #112 (comment) I've used unsafe_real and it has memoryviews in it, so there might be another issue as well.

Member

isuruf commented Feb 27, 2017

@bjodah, note that in #112 (comment) I've used unsafe_real and it has memoryviews in it, so there might be another issue as well.

@bjodah

This comment has been minimized.

Show comment
Hide comment
@bjodah

bjodah Jun 13, 2017

Contributor

I'm looking into using SymEngine in the codegen tutorial at SciPy.
So I started looking into this branch again.

But when I try to compile symengine (not the python wrapper) with llvm I get a test failure:

33/37 Test #33: test_lambda_double ...............***Failed    0.00 sec

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test_lambda_double is a Catch v1.1 b14 (develop) host application.
Run with -? for options

-------------------------------------------------------------------------------
Check llvm and lambda are equal
-------------------------------------------------------------------------------
/opt/symengine/symengine/tests/eval/test_lambda_double.cpp:173
...............................................................................

/opt/symengine/symengine/tests/eval/test_lambda_double.cpp:173: FAILED:
due to a fatal error condition:

I have created this Dockerized environment which reproduces the error:
https://github.com/bjodah/gsoc2017-bjodah/tree/master/se_llvm

I tried to follow the Travis scripts closely but obviously I failed. Any idea what I am doing wrong?

EDIT: I will use symengine from the symengine channel instead.

Contributor

bjodah commented Jun 13, 2017

I'm looking into using SymEngine in the codegen tutorial at SciPy.
So I started looking into this branch again.

But when I try to compile symengine (not the python wrapper) with llvm I get a test failure:

33/37 Test #33: test_lambda_double ...............***Failed    0.00 sec

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test_lambda_double is a Catch v1.1 b14 (develop) host application.
Run with -? for options

-------------------------------------------------------------------------------
Check llvm and lambda are equal
-------------------------------------------------------------------------------
/opt/symengine/symengine/tests/eval/test_lambda_double.cpp:173
...............................................................................

/opt/symengine/symengine/tests/eval/test_lambda_double.cpp:173: FAILED:
due to a fatal error condition:

I have created this Dockerized environment which reproduces the error:
https://github.com/bjodah/gsoc2017-bjodah/tree/master/se_llvm

I tried to follow the Travis scripts closely but obviously I failed. Any idea what I am doing wrong?

EDIT: I will use symengine from the symengine channel instead.

@isuruf

This comment has been minimized.

Show comment
Hide comment
@isuruf

isuruf Jun 13, 2017

Member

I think this is because llvm was compiled with gcc 4.8.5 and you are using gcc 5.4.0. I'm surprised that there were no link errors.

Member

isuruf commented Jun 13, 2017

I think this is because llvm was compiled with gcc 4.8.5 and you are using gcc 5.4.0. I'm surprised that there were no link errors.

@bjodah

This comment has been minimized.

Show comment
Hide comment
@bjodah

bjodah Jun 13, 2017

Contributor

could be, modifying CPPFLAGS did not help:

$ git diff
diff --git a/se_llvm/20_compile_symengine.sh b/se_llvm/20_compile_symengine.sh
index e116ece..a3545d6 100755
--- a/se_llvm/20_compile_symengine.sh
+++ b/se_llvm/20_compile_symengine.sh
@@ -11,6 +11,7 @@ export CXX="ccache clang++"
 export CC="ccache clang"
 export CCACHE_DIR=/ccache
 ccache -M 400M
+export CPPFLAGS="-D_GLIBCXX_USE_CXX11_ABI=0"
 cmake -DCMAKE_INSTALL_PREFIX=$our_install_dir -DCMAKE_BUILD_TYPE:STRING="Release" -DWITH_LLVM:BOOL=ON -DINTEGER_CLASS:STRING=gmp -DBUILD_SHARED_LIBS:BOOL=ON -DWITH_MPC=yes $SOURCE_DIR
 make -j 4
 make install

gives same error. I thought setting CXX and CC was enough (I set them to use clang with ccache):

+-------------------------------+
| Configuration results SUMMARY |
+-------------------------------+

CMAKE_C_COMPILER:   /opt/se/bin/ccache
CMAKE_CXX_COMPILER: /opt/se/bin/ccache
CMAKE_BUILD_TYPE: Release
...
Contributor

bjodah commented Jun 13, 2017

could be, modifying CPPFLAGS did not help:

$ git diff
diff --git a/se_llvm/20_compile_symengine.sh b/se_llvm/20_compile_symengine.sh
index e116ece..a3545d6 100755
--- a/se_llvm/20_compile_symengine.sh
+++ b/se_llvm/20_compile_symengine.sh
@@ -11,6 +11,7 @@ export CXX="ccache clang++"
 export CC="ccache clang"
 export CCACHE_DIR=/ccache
 ccache -M 400M
+export CPPFLAGS="-D_GLIBCXX_USE_CXX11_ABI=0"
 cmake -DCMAKE_INSTALL_PREFIX=$our_install_dir -DCMAKE_BUILD_TYPE:STRING="Release" -DWITH_LLVM:BOOL=ON -DINTEGER_CLASS:STRING=gmp -DBUILD_SHARED_LIBS:BOOL=ON -DWITH_MPC=yes $SOURCE_DIR
 make -j 4
 make install

gives same error. I thought setting CXX and CC was enough (I set them to use clang with ccache):

+-------------------------------+
| Configuration results SUMMARY |
+-------------------------------+

CMAKE_C_COMPILER:   /opt/se/bin/ccache
CMAKE_CXX_COMPILER: /opt/se/bin/ccache
CMAKE_BUILD_TYPE: Release
...

bjodah added some commits Jun 14, 2017

Workaround not to use ``cdef vector[int]`` in cdef class.
Travis CI tests for Python 3.2 using gcc 3.7 give an error:
Cython generated C contains uninitialized references.
@bjodah

This comment has been minimized.

Show comment
Hide comment
@bjodah

bjodah Jun 14, 2017

Contributor

Current status of benchmark/Lambdify.py:

$ CFLAGS="-march=native -ffast-math -O3" python benchmarks/Lambdify.py 
SymEngine (lambda double) speed-up factor (higher is better) vs sympy:       5.1172
symengine (LLVM)          speed-up factor (higher is better) vs sympy:       5.7401
symengine (ManualLLVM)    speed-up factor (higher is better) vs sympy:       6.4252
benchmarks/Lambdify.py:78: UserWarning: Cython code for Lambdify.__call__ is slow.
  warnings.warn("Cython code for Lambdify.__call__ is slow.")
Hard-coded Cython code    speed-up factor (higher is better) vs sympy:       23.096

(speed-up factors below 1 before moving use_numpy kwarg from __call__ to __init__. I am now investigating numpy array vs. memoryviews.

Contributor

bjodah commented Jun 14, 2017

Current status of benchmark/Lambdify.py:

$ CFLAGS="-march=native -ffast-math -O3" python benchmarks/Lambdify.py 
SymEngine (lambda double) speed-up factor (higher is better) vs sympy:       5.1172
symengine (LLVM)          speed-up factor (higher is better) vs sympy:       5.7401
symengine (ManualLLVM)    speed-up factor (higher is better) vs sympy:       6.4252
benchmarks/Lambdify.py:78: UserWarning: Cython code for Lambdify.__call__ is slow.
  warnings.warn("Cython code for Lambdify.__call__ is slow.")
Hard-coded Cython code    speed-up factor (higher is better) vs sympy:       23.096

(speed-up factors below 1 before moving use_numpy kwarg from __call__ to __init__. I am now investigating numpy array vs. memoryviews.

@isuruf

This comment has been minimized.

Show comment
Hide comment
@isuruf

isuruf Jun 14, 2017

Member

@bjodah, can you post the timings without CFLAGS="-march=native -ffast-math -O3" ? We haven't turned on aggressive optimizations in LLVM.

Member

isuruf commented Jun 14, 2017

@bjodah, can you post the timings without CFLAGS="-march=native -ffast-math -O3" ? We haven't turned on aggressive optimizations in LLVM.

@bjodah

This comment has been minimized.

Show comment
Hide comment
@bjodah

bjodah Jun 14, 2017

Contributor

Sure, that was a factor of ~11 (my local uncommited state using numpy.ndarray is currently seeing ~7.5 for symengine (LLVM), need to fix some tests still)

Contributor

bjodah commented Jun 14, 2017

Sure, that was a factor of ~11 (my local uncommited state using numpy.ndarray is currently seeing ~7.5 for symengine (LLVM), need to fix some tests still)

@bjodah

This comment has been minimized.

Show comment
Hide comment
@bjodah

bjodah Jun 15, 2017

Contributor

Hmm... those numbers were against SymPy master. Using sympy-1.0 I now get:

Hard-coded Cython code          speed-up factor (higher is better) vs sympy:       8.7338

so apparently SymPy master's lambdify has had a regression (1/3 of the performance).

Contributor

bjodah commented Jun 15, 2017

Hmm... those numbers were against SymPy master. Using sympy-1.0 I now get:

Hard-coded Cython code          speed-up factor (higher is better) vs sympy:       8.7338

so apparently SymPy master's lambdify has had a regression (1/3 of the performance).

@bjodah

This comment has been minimized.

Show comment
Hide comment
@bjodah

bjodah Jun 15, 2017

Contributor

Comparing against sympy-1.0 & using -O1:

$ CFLAGS="-O1" python benchmarks/Lambdify.py 
SymEngine (lambda double)       speed-up factor (higher is better) vs sympy:       1.9232
symengine (lambda double + CSE) speed-up factor (higher is better) vs sympy:      0.98464
symengine (LLVM)                speed-up factor (higher is better) vs sympy:       2.3853
symengine (ManualLLVM)          speed-up factor (higher is better) vs sympy:       2.2815
Hard-coded Cython code          speed-up factor (higher is better) vs sympy:       3.8374
Contributor

bjodah commented Jun 15, 2017

Comparing against sympy-1.0 & using -O1:

$ CFLAGS="-O1" python benchmarks/Lambdify.py 
SymEngine (lambda double)       speed-up factor (higher is better) vs sympy:       1.9232
symengine (lambda double + CSE) speed-up factor (higher is better) vs sympy:      0.98464
symengine (LLVM)                speed-up factor (higher is better) vs sympy:       2.3853
symengine (ManualLLVM)          speed-up factor (higher is better) vs sympy:       2.2815
Hard-coded Cython code          speed-up factor (higher is better) vs sympy:       3.8374
@bjodah

This comment has been minimized.

Show comment
Hide comment
@bjodah

bjodah Jun 15, 2017

Contributor

I think making NumPy a hard requirement for Lambdify has a some trade-offs:

  • slightly better performance
  • arguably more readable code (source of Lambdify.__call__ more accessible now)
  • a bit more tricky setup.py -- we need some compile option WITH_NUMP or so

on that last point: I have to implement that still. What is the best approach for that? (if we want to make NumPy i hard requirement that is). I'm thinking putting the whole Lambdify definition inside a block?, i.e.

IF WITH_NUMPY:
    cdef class Lambdify:
        ...
Contributor

bjodah commented Jun 15, 2017

I think making NumPy a hard requirement for Lambdify has a some trade-offs:

  • slightly better performance
  • arguably more readable code (source of Lambdify.__call__ more accessible now)
  • a bit more tricky setup.py -- we need some compile option WITH_NUMP or so

on that last point: I have to implement that still. What is the best approach for that? (if we want to make NumPy i hard requirement that is). I'm thinking putting the whole Lambdify definition inside a block?, i.e.

IF WITH_NUMPY:
    cdef class Lambdify:
        ...
@bjodah

This comment has been minimized.

Show comment
Hide comment
@bjodah

bjodah Jun 15, 2017

Contributor

(regarding the sympy.lambdify "regression": there was none, I just needed to specify modules='math')

Contributor

bjodah commented Jun 15, 2017

(regarding the sympy.lambdify "regression": there was none, I just needed to specify modules='math')

@isuruf

This comment has been minimized.

Show comment
Hide comment
@isuruf

isuruf Jun 15, 2017

Member

If you have some time, can you run this example, pydy/pydy#360 ? If not, I'll be able to run it over the weekend.

Member

isuruf commented Jun 15, 2017

If you have some time, can you run this example, pydy/pydy#360 ? If not, I'll be able to run it over the weekend.

@isuruf

This comment has been minimized.

Show comment
Hide comment
@isuruf

isuruf Jun 15, 2017

Member

putting the whole Lambdify definition inside a block

I'm not sure if cimports work inside such blocks. If it does, then that's probably the best approach

Member

isuruf commented Jun 15, 2017

putting the whole Lambdify definition inside a block

I'm not sure if cimports work inside such blocks. If it does, then that's probably the best approach

@bjodah

This comment has been minimized.

Show comment
Hide comment
@bjodah

bjodah Jun 15, 2017

Contributor

Sure - I'll run those later tonight or tomorrow.

Contributor

bjodah commented Jun 15, 2017

Sure - I'll run those later tonight or tomorrow.

@bjodah

This comment has been minimized.

Show comment
Hide comment
@bjodah

bjodah Jun 17, 2017

Contributor

For links = 10 and time_steps=1000 using SymEngine:

The derivation took 0.18646 seconds.

Generating with lambdify method.
--------------------------------
The code generation took 2.79032 seconds.
ODE integration took 4.79481 seconds.

not using SymEngine (note that setting USE_SYMENGINE=0 causes sympy to use symengine...):

The derivation took 3.05501 seconds.

Generating with lambdify method.
--------------------------------
The code generation took 0.81127 seconds.
ODE integration took 390.68950 seconds.

Generating with cython method.
------------------------------
The code generation took 69.66981 seconds.
ODE integration took 4.25866 seconds.
Contributor

bjodah commented Jun 17, 2017

For links = 10 and time_steps=1000 using SymEngine:

The derivation took 0.18646 seconds.

Generating with lambdify method.
--------------------------------
The code generation took 2.79032 seconds.
ODE integration took 4.79481 seconds.

not using SymEngine (note that setting USE_SYMENGINE=0 causes sympy to use symengine...):

The derivation took 3.05501 seconds.

Generating with lambdify method.
--------------------------------
The code generation took 0.81127 seconds.
ODE integration took 390.68950 seconds.

Generating with cython method.
------------------------------
The code generation took 69.66981 seconds.
ODE integration took 4.25866 seconds.
@isuruf

This comment has been minimized.

Show comment
Hide comment
@isuruf

isuruf Jun 17, 2017

Member

I ran the benchmark with master, PR + ManualLLVM, PR + LLVM
Used the following patch to use this PRs multiple output feature.

@@ -682,24 +688,9 @@ class LambdifyODEFunctionGenerator(ODEFunctionGenerator):
 
         if USE_SYMENGINE:
             import symengine
-            outputs_ravel = list(chain(*outputs))
-            try:
-                cb = symengine.Lambdify(inputs, outputs_ravel, backend="llvm")
-            except (TypeError, ValueError):
-                # TypeError if symengine is old, ValueError if symengine is
-                # not compiled with llvm support
-                cb = symengine.Lambdify(inputs, outputs_ravel)
+            cb = symengine.Lambdify(inputs, *outputs)
             def func(*args):
-                result = []
-                n = np.empty(len(outputs_ravel))
-                cb.unsafe_real(np.concatenate([a.ravel() for a in args]), n)
-                start = 0
-                for output in outputs:
-                    elems = reduce(mul, output.shape)
-                    result.append(n[start : (start + elems)]
-                                        .reshape(output.shape))
-                    start += elems
-                return result
+                return cb(np.concatenate([a.ravel() for a in args]))
             return func
         else:
             modules = [{'ImmutableMatrix': np.array}, 'numpy']

ManualLLVM is still faster,

Member

isuruf commented Jun 17, 2017

I ran the benchmark with master, PR + ManualLLVM, PR + LLVM
Used the following patch to use this PRs multiple output feature.

@@ -682,24 +688,9 @@ class LambdifyODEFunctionGenerator(ODEFunctionGenerator):
 
         if USE_SYMENGINE:
             import symengine
-            outputs_ravel = list(chain(*outputs))
-            try:
-                cb = symengine.Lambdify(inputs, outputs_ravel, backend="llvm")
-            except (TypeError, ValueError):
-                # TypeError if symengine is old, ValueError if symengine is
-                # not compiled with llvm support
-                cb = symengine.Lambdify(inputs, outputs_ravel)
+            cb = symengine.Lambdify(inputs, *outputs)
             def func(*args):
-                result = []
-                n = np.empty(len(outputs_ravel))
-                cb.unsafe_real(np.concatenate([a.ravel() for a in args]), n)
-                start = 0
-                for output in outputs:
-                    elems = reduce(mul, output.shape)
-                    result.append(n[start : (start + elems)]
-                                        .reshape(output.shape))
-                    start += elems
-                return result
+                return cb(np.concatenate([a.ravel() for a in args]))
             return func
         else:
             modules = [{'ImmutableMatrix': np.array}, 'numpy']

ManualLLVM is still faster,

@isuruf

This comment has been minimized.

Show comment
Hide comment
@isuruf

isuruf Jun 17, 2017

Member

Oops, pressed enter before finishing commenting.

Pendulum with 10 links.
=======================
The derivation took 0.16069 seconds.

Generating with lambdify method.
--------------------------------
The code generation took 0.19171 seconds.
ODE integration took 5.25970 seconds.
Pendulum with 10 links.
=======================
The derivation took 0.15894 seconds.

Generating with lambdify method.
--------------------------------
The code generation took 0.19862 seconds.
ODE integration took 4.90780 seconds.

Compared to master, this PR is better.

Pendulum with 10 links.
=======================
The derivation took 0.16067 seconds.

Generating with lambdify method.
--------------------------------
The code generation took 0.19473 seconds.
ODE integration took 5.08710 seconds.
Member

isuruf commented Jun 17, 2017

Oops, pressed enter before finishing commenting.

Pendulum with 10 links.
=======================
The derivation took 0.16069 seconds.

Generating with lambdify method.
--------------------------------
The code generation took 0.19171 seconds.
ODE integration took 5.25970 seconds.
Pendulum with 10 links.
=======================
The derivation took 0.15894 seconds.

Generating with lambdify method.
--------------------------------
The code generation took 0.19862 seconds.
ODE integration took 4.90780 seconds.

Compared to master, this PR is better.

Pendulum with 10 links.
=======================
The derivation took 0.16067 seconds.

Generating with lambdify method.
--------------------------------
The code generation took 0.19473 seconds.
ODE integration took 5.08710 seconds.
@bjodah

This comment has been minimized.

Show comment
Hide comment
@bjodah

bjodah Jun 17, 2017

Contributor

OK, so I extracted a small test case reproducing ManualLLVM being faster:

$ python3 benchmarks/Lambdify_6_links_rhs.py 
[1, 1, 1, 1, 1, 1, 1, 21*cos(1), -6*cos(1), -5*cos(1), -4*cos(1), -3*cos(1), -2*cos(1), -cos(1)]
SymEngine (lambda double)       speed-up factor (higher is better) vs sympy:       2.6123
symengine (LLVM)                speed-up factor (higher is better) vs sympy:       8.1947
symengine (ManualLLVM)          speed-up factor (higher is better) vs sympy:       8.4328
benchmarks/Lambdify_6_links_rhs.py:94: UserWarning: Cython code for Lambdify.__call__ is slow.
  warnings.warn("Cython code for Lambdify.__call__ is slow.")
Hard-coded Cython code          speed-up factor (higher is better) vs sympy:        25.12

Will work on improving that benchmark.

Contributor

bjodah commented Jun 17, 2017

OK, so I extracted a small test case reproducing ManualLLVM being faster:

$ python3 benchmarks/Lambdify_6_links_rhs.py 
[1, 1, 1, 1, 1, 1, 1, 21*cos(1), -6*cos(1), -5*cos(1), -4*cos(1), -3*cos(1), -2*cos(1), -cos(1)]
SymEngine (lambda double)       speed-up factor (higher is better) vs sympy:       2.6123
symengine (LLVM)                speed-up factor (higher is better) vs sympy:       8.1947
symengine (ManualLLVM)          speed-up factor (higher is better) vs sympy:       8.4328
benchmarks/Lambdify_6_links_rhs.py:94: UserWarning: Cython code for Lambdify.__call__ is slow.
  warnings.warn("Cython code for Lambdify.__call__ is slow.")
Hard-coded Cython code          speed-up factor (higher is better) vs sympy:        25.12

Will work on improving that benchmark.

@isuruf

This comment has been minimized.

Show comment
Hide comment
@isuruf

isuruf Jun 17, 2017

Member

That benchmark also has one output. We should get a benchmark for heterogeneous output in Lambdify.

Member

isuruf commented Jun 17, 2017

That benchmark also has one output. We should get a benchmark for heterogeneous output in Lambdify.

@bjodah

This comment has been minimized.

Show comment
Hide comment
@bjodah

bjodah Jun 17, 2017

Contributor

you're right. I'll update the benchmark (tomorrow or Monday) to also calculate: f(t), the Jacobian matrix of f(t) and df/dt in the same call.

Contributor

bjodah commented Jun 17, 2017

you're right. I'll update the benchmark (tomorrow or Monday) to also calculate: f(t), the Jacobian matrix of f(t) and df/dt in the same call.

@bjodah

This comment has been minimized.

Show comment
Hide comment
@bjodah

bjodah Jun 18, 2017

Contributor

__call__ is now running slightly faster than ManualLLVM for heterogeneous output:

python3 benchmarks/heterogenous_output_Lambdify.py
SymEngine (lambda double)       speed-up factor (higher is better) vs sympy:       40.702
symengine (LLVM)                speed-up factor (higher is better) vs sympy:       248.95
symengine (ManualLLVM)          speed-up factor (higher is better) vs sympy:       244.31
Contributor

bjodah commented Jun 18, 2017

__call__ is now running slightly faster than ManualLLVM for heterogeneous output:

python3 benchmarks/heterogenous_output_Lambdify.py
SymEngine (lambda double)       speed-up factor (higher is better) vs sympy:       40.702
symengine (LLVM)                speed-up factor (higher is better) vs sympy:       248.95
symengine (ManualLLVM)          speed-up factor (higher is better) vs sympy:       244.31

@bjodah bjodah changed the title from [WIP] Stub for heterogeneous output in Lambdify to Heterogeneous output in Lambdify Jun 19, 2017

@bjodah

This comment has been minimized.

Show comment
Hide comment
@bjodah

bjodah Jun 19, 2017

Contributor

@isuruf I think this might be ready for review

Contributor

bjodah commented Jun 19, 2017

@isuruf I think this might be ready for review

@isuruf

Looks good to me.

@certik, can you have a look? This PR adds a hard dependency on numpy for Lambdify functionality to keep the code simpler and a little bit faster.

Show outdated Hide outdated symengine/__init__.py
@@ -59,6 +58,14 @@ else()
set(HAVE_SYMENGINE_LLVM False)
endif()
if(WITH_NUMPY)

This comment has been minimized.

@isuruf

isuruf Jun 19, 2017

Member

WITH_NUMPY should be a cached variable.

@isuruf

isuruf Jun 19, 2017

Member

WITH_NUMPY should be a cached variable.

Show outdated Hide outdated CMakeLists.txt
Show outdated Hide outdated symengine/lib/symengine_wrapper.pyx

@isuruf isuruf requested a review from certik Jun 20, 2017

@certik

certik approved these changes Jun 21, 2017

That looks good to me. Depending on NumPy is fine I think.

If people object against that, we can try to make it optional, but I think sympy and symengine belongs to the scipy stack, and all the scipy stack depends on numpy. So I have no problems with it.

@certik

This comment has been minimized.

Show comment
Hide comment
@certik

certik Jun 21, 2017

Contributor

To make it clear --- symengine.py belongs to the scipy stack. symengine itself should only depend on a few well maintained C++ libraries.

Contributor

certik commented Jun 21, 2017

To make it clear --- symengine.py belongs to the scipy stack. symengine itself should only depend on a few well maintained C++ libraries.

isuruf added some commits Jun 21, 2017

Avoid overflow in windows in test
np.arange creates a numpy array with long int elements which is
32 bit on windows.

@isuruf isuruf merged commit 8aff3c8 into symengine:master Jun 22, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@isuruf

This comment has been minimized.

Show comment
Hide comment
@isuruf

isuruf Jun 22, 2017

Member

Thanks for the PR, @bjodah

Member

isuruf commented Jun 22, 2017

Thanks for the PR, @bjodah

@bjodah bjodah deleted the bjodah:heterogeneous-Lambdify branch Jun 22, 2017

@isuruf

This comment has been minimized.

Show comment
Hide comment
@isuruf

isuruf Jun 23, 2017

Member

@bjodah, packaging symengine for conda with numpy as a build-time dependency means the conda package has to be compiled for each numpy version. I'd like to avoid that if possible with numpy only as a runtime requirement. Can you try my branch at https://github.com/isuruf/symengine.py/tree/memview and see if memviews have a performance penalty? I ran the benchmarks myself and for me I don't see much difference.

Member

isuruf commented Jun 23, 2017

@bjodah, packaging symengine for conda with numpy as a build-time dependency means the conda package has to be compiled for each numpy version. I'd like to avoid that if possible with numpy only as a runtime requirement. Can you try my branch at https://github.com/isuruf/symengine.py/tree/memview and see if memviews have a performance penalty? I ran the benchmarks myself and for me I don't see much difference.

@isuruf

This comment has been minimized.

Show comment
Hide comment
@isuruf

isuruf Jun 24, 2017

Member

ping, @bjodah

Member

isuruf commented Jun 24, 2017

ping, @bjodah

@bjodah

This comment has been minimized.

Show comment
Hide comment
@bjodah

bjodah Jun 24, 2017

Contributor

@isuruf just tried it. The performance is the same on my machine. +1 for using memviews

Contributor

bjodah commented Jun 24, 2017

@isuruf just tried it. The performance is the same on my machine. +1 for using memviews

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment