Skip to content

Conversation

bjodah
Copy link
Contributor

@bjodah bjodah commented Nov 7, 2016

To address #107.

There are some test failures still. I'll update and ping once this is ready for review.
I changed my mind and this now only returns a tuple if multiple expressions are given (e.g. a vector and a matrix). This means that this should be a non-breaking change.

@isuruf
Copy link
Member

isuruf commented Nov 8, 2016

@bjodah, Lambdify's __call__ has considerable overhead.

Here's some python code that I wrote,

outputs_ravel = list(itertools.chain(*outputs))
try:
    cb = symengine.Lambdify(inputs, outputs_ravel, backend="llvm")
except TypeError:
    cb = symengine.Lambdify(inputs, outputs_ravel)
def func(*args):
    result = []
    n = numpy.empty(len(outputs_ravel))
    t = cb.unsafe_real(np.concatenate([arg.ravel() for arg in args]), n)
    start = 0
    for output in outputs:
        elems = reduce(mul, output.shape)
        result.append(n[start:start+elems].reshape(output.shape))
        start += elems
    return result
return func

This makes SymEngine as fast as PyDy's cythonize. Updated the timings here, symengine/symengine#1094 (comment)

@bjodah
Copy link
Contributor Author

bjodah commented Nov 8, 2016

@isuruf
Thanks, I'll need to make the tests pass and then I'll try to optimize Lambdify.__call__.
One approach would be to make NumPy a hard dependency of symengine.py, how would you feel about that? cdef'ing arguments as cnp.ndarray will most likely remove a large portion of the overhead - constructing memoryviews is quite expensive, and dealing with nested python-lists/tuples even more so. The code around Lambdify would also be considerably simpler.

@bjodah
Copy link
Contributor Author

bjodah commented Feb 26, 2017

@isuruf I added the -a flag to cython and it looks like the overhead in __call__ is related to the use
of memoryviews. Should we make NumPy a hard dependency of Lambdify? (that would make the code less complex and considerably faster). I could of course code a "fast path" for NumPy but I feel like it's getting out of hand already trying to support Cython's array, Python's array.array and NumPy arrays.

@isuruf
Copy link
Member

isuruf commented Feb 27, 2017

I'm fine with making numpy a hard dependency of just Lambdify

@isuruf
Copy link
Member

isuruf commented Feb 27, 2017

@bjodah, note that in #112 (comment) I've used unsafe_real and it has memoryviews in it, so there might be another issue as well.

@bjodah
Copy link
Contributor Author

bjodah commented Jun 13, 2017

I'm looking into using SymEngine in the codegen tutorial at SciPy.
So I started looking into this branch again.

But when I try to compile symengine (not the python wrapper) with llvm I get a test failure:

33/37 Test #33: test_lambda_double ...............***Failed    0.00 sec

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test_lambda_double is a Catch v1.1 b14 (develop) host application.
Run with -? for options

-------------------------------------------------------------------------------
Check llvm and lambda are equal
-------------------------------------------------------------------------------
/opt/symengine/symengine/tests/eval/test_lambda_double.cpp:173
...............................................................................

/opt/symengine/symengine/tests/eval/test_lambda_double.cpp:173: FAILED:
due to a fatal error condition:

I have created this Dockerized environment which reproduces the error:
https://github.com/bjodah/gsoc2017-bjodah/tree/master/se_llvm

I tried to follow the Travis scripts closely but obviously I failed. Any idea what I am doing wrong?

EDIT: I will use symengine from the symengine channel instead.

@isuruf
Copy link
Member

isuruf commented Jun 13, 2017

I think this is because llvm was compiled with gcc 4.8.5 and you are using gcc 5.4.0. I'm surprised that there were no link errors.

@bjodah
Copy link
Contributor Author

bjodah commented Jun 13, 2017

could be, modifying CPPFLAGS did not help:

$ git diff
diff --git a/se_llvm/20_compile_symengine.sh b/se_llvm/20_compile_symengine.sh
index e116ece..a3545d6 100755
--- a/se_llvm/20_compile_symengine.sh
+++ b/se_llvm/20_compile_symengine.sh
@@ -11,6 +11,7 @@ export CXX="ccache clang++"
 export CC="ccache clang"
 export CCACHE_DIR=/ccache
 ccache -M 400M
+export CPPFLAGS="-D_GLIBCXX_USE_CXX11_ABI=0"
 cmake -DCMAKE_INSTALL_PREFIX=$our_install_dir -DCMAKE_BUILD_TYPE:STRING="Release" -DWITH_LLVM:BOOL=ON -DINTEGER_CLASS:STRING=gmp -DBUILD_SHARED_LIBS:BOOL=ON -DWITH_MPC=yes $SOURCE_DIR
 make -j 4
 make install

gives same error. I thought setting CXX and CC was enough (I set them to use clang with ccache):

+-------------------------------+
| Configuration results SUMMARY |
+-------------------------------+

CMAKE_C_COMPILER:   /opt/se/bin/ccache
CMAKE_CXX_COMPILER: /opt/se/bin/ccache
CMAKE_BUILD_TYPE: Release
...

bjodah added 2 commits June 14, 2017 14:06
Travis CI tests for Python 3.2 using gcc 3.7 give an error:
Cython generated C contains uninitialized references.
@bjodah bjodah force-pushed the heterogeneous-Lambdify branch from 26ddbaf to dc1877e Compare June 14, 2017 12:53
@bjodah
Copy link
Contributor Author

bjodah commented Jun 14, 2017

Current status of benchmark/Lambdify.py:

$ CFLAGS="-march=native -ffast-math -O3" python benchmarks/Lambdify.py 
SymEngine (lambda double) speed-up factor (higher is better) vs sympy:       5.1172
symengine (LLVM)          speed-up factor (higher is better) vs sympy:       5.7401
symengine (ManualLLVM)    speed-up factor (higher is better) vs sympy:       6.4252
benchmarks/Lambdify.py:78: UserWarning: Cython code for Lambdify.__call__ is slow.
  warnings.warn("Cython code for Lambdify.__call__ is slow.")
Hard-coded Cython code    speed-up factor (higher is better) vs sympy:       23.096

(speed-up factors below 1 before moving use_numpy kwarg from __call__ to __init__. I am now investigating numpy array vs. memoryviews.

@isuruf
Copy link
Member

isuruf commented Jun 14, 2017

@bjodah, can you post the timings without CFLAGS="-march=native -ffast-math -O3" ? We haven't turned on aggressive optimizations in LLVM.

@bjodah
Copy link
Contributor Author

bjodah commented Jun 14, 2017

Sure, that was a factor of ~11 (my local uncommited state using numpy.ndarray is currently seeing ~7.5 for symengine (LLVM), need to fix some tests still)

@bjodah
Copy link
Contributor Author

bjodah commented Jun 15, 2017

Hmm... those numbers were against SymPy master. Using sympy-1.0 I now get:

Hard-coded Cython code          speed-up factor (higher is better) vs sympy:       8.7338

so apparently SymPy master's lambdify has had a regression (1/3 of the performance).

@bjodah
Copy link
Contributor Author

bjodah commented Jun 15, 2017

Comparing against sympy-1.0 & using -O1:

$ CFLAGS="-O1" python benchmarks/Lambdify.py 
SymEngine (lambda double)       speed-up factor (higher is better) vs sympy:       1.9232
symengine (lambda double + CSE) speed-up factor (higher is better) vs sympy:      0.98464
symengine (LLVM)                speed-up factor (higher is better) vs sympy:       2.3853
symengine (ManualLLVM)          speed-up factor (higher is better) vs sympy:       2.2815
Hard-coded Cython code          speed-up factor (higher is better) vs sympy:       3.8374

@bjodah
Copy link
Contributor Author

bjodah commented Jun 15, 2017

I think making NumPy a hard requirement for Lambdify has a some trade-offs:

  • slightly better performance
  • arguably more readable code (source of Lambdify.__call__ more accessible now)
  • a bit more tricky setup.py -- we need some compile option WITH_NUMP or so

on that last point: I have to implement that still. What is the best approach for that? (if we want to make NumPy i hard requirement that is). I'm thinking putting the whole Lambdify definition inside a block?, i.e.

IF WITH_NUMPY:
    cdef class Lambdify:
        ...

@bjodah
Copy link
Contributor Author

bjodah commented Jun 15, 2017

(regarding the sympy.lambdify "regression": there was none, I just needed to specify modules='math')

@isuruf
Copy link
Member

isuruf commented Jun 15, 2017

If you have some time, can you run this example, pydy/pydy#360 ? If not, I'll be able to run it over the weekend.

@isuruf
Copy link
Member

isuruf commented Jun 15, 2017

putting the whole Lambdify definition inside a block

I'm not sure if cimports work inside such blocks. If it does, then that's probably the best approach

@bjodah
Copy link
Contributor Author

bjodah commented Jun 17, 2017

For links = 10 and time_steps=1000 using SymEngine:

The derivation took 0.18646 seconds.

Generating with lambdify method.
--------------------------------
The code generation took 2.79032 seconds.
ODE integration took 4.79481 seconds.

not using SymEngine (note that setting USE_SYMENGINE=0 causes sympy to use symengine...):

The derivation took 3.05501 seconds.

Generating with lambdify method.
--------------------------------
The code generation took 0.81127 seconds.
ODE integration took 390.68950 seconds.

Generating with cython method.
------------------------------
The code generation took 69.66981 seconds.
ODE integration took 4.25866 seconds.

@isuruf
Copy link
Member

isuruf commented Jun 17, 2017

I ran the benchmark with master, PR + ManualLLVM, PR + LLVM
Used the following patch to use this PRs multiple output feature.

@@ -682,24 +688,9 @@ class LambdifyODEFunctionGenerator(ODEFunctionGenerator):
 
         if USE_SYMENGINE:
             import symengine
-            outputs_ravel = list(chain(*outputs))
-            try:
-                cb = symengine.Lambdify(inputs, outputs_ravel, backend="llvm")
-            except (TypeError, ValueError):
-                # TypeError if symengine is old, ValueError if symengine is
-                # not compiled with llvm support
-                cb = symengine.Lambdify(inputs, outputs_ravel)
+            cb = symengine.Lambdify(inputs, *outputs)
             def func(*args):
-                result = []
-                n = np.empty(len(outputs_ravel))
-                cb.unsafe_real(np.concatenate([a.ravel() for a in args]), n)
-                start = 0
-                for output in outputs:
-                    elems = reduce(mul, output.shape)
-                    result.append(n[start : (start + elems)]
-                                        .reshape(output.shape))
-                    start += elems
-                return result
+                return cb(np.concatenate([a.ravel() for a in args]))
             return func
         else:
             modules = [{'ImmutableMatrix': np.array}, 'numpy']

ManualLLVM is still faster,

@isuruf
Copy link
Member

isuruf commented Jun 17, 2017

Oops, pressed enter before finishing commenting.

Pendulum with 10 links.
=======================
The derivation took 0.16069 seconds.

Generating with lambdify method.
--------------------------------
The code generation took 0.19171 seconds.
ODE integration took 5.25970 seconds.
Pendulum with 10 links.
=======================
The derivation took 0.15894 seconds.

Generating with lambdify method.
--------------------------------
The code generation took 0.19862 seconds.
ODE integration took 4.90780 seconds.

Compared to master, this PR is better.

Pendulum with 10 links.
=======================
The derivation took 0.16067 seconds.

Generating with lambdify method.
--------------------------------
The code generation took 0.19473 seconds.
ODE integration took 5.08710 seconds.

@bjodah
Copy link
Contributor Author

bjodah commented Jun 17, 2017

OK, so I extracted a small test case reproducing ManualLLVM being faster:

$ python3 benchmarks/Lambdify_6_links_rhs.py 
[1, 1, 1, 1, 1, 1, 1, 21*cos(1), -6*cos(1), -5*cos(1), -4*cos(1), -3*cos(1), -2*cos(1), -cos(1)]
SymEngine (lambda double)       speed-up factor (higher is better) vs sympy:       2.6123
symengine (LLVM)                speed-up factor (higher is better) vs sympy:       8.1947
symengine (ManualLLVM)          speed-up factor (higher is better) vs sympy:       8.4328
benchmarks/Lambdify_6_links_rhs.py:94: UserWarning: Cython code for Lambdify.__call__ is slow.
  warnings.warn("Cython code for Lambdify.__call__ is slow.")
Hard-coded Cython code          speed-up factor (higher is better) vs sympy:        25.12

Will work on improving that benchmark.

@isuruf
Copy link
Member

isuruf commented Jun 17, 2017

That benchmark also has one output. We should get a benchmark for heterogeneous output in Lambdify.

@bjodah
Copy link
Contributor Author

bjodah commented Jun 17, 2017

you're right. I'll update the benchmark (tomorrow or Monday) to also calculate: f(t), the Jacobian matrix of f(t) and df/dt in the same call.

@bjodah bjodah force-pushed the heterogeneous-Lambdify branch from 1ca217c to 3b2182f Compare June 18, 2017 06:56
@bjodah
Copy link
Contributor Author

bjodah commented Jun 18, 2017

__call__ is now running slightly faster than ManualLLVM for heterogeneous output:

python3 benchmarks/heterogenous_output_Lambdify.py
SymEngine (lambda double)       speed-up factor (higher is better) vs sympy:       40.702
symengine (LLVM)                speed-up factor (higher is better) vs sympy:       248.95
symengine (ManualLLVM)          speed-up factor (higher is better) vs sympy:       244.31

cdef int *accum_out_sizes
cdef object numpy_dtype

def __cinit__(self, args, *exprs, bool real=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keyword-only arguments are python 3 only right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe keyword only args are when **kwargs is also present.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://www.python.org/dev/peps/pep-3102/

Syntactically, the proposed changes are fairly simple. The first change is to allow regular arguments to appear after a varargs argument:

def sortwords(*wordlist, case_sensitive=False):
    ...

Copy link
Contributor Author

@bjodah bjodah Jun 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, but Cython supports it even for Python 2 right?
Just checked: kw-only arguments are supported even in Python 2 when using Cython.

@bjodah bjodah force-pushed the heterogeneous-Lambdify branch from 0f333e9 to 871017c Compare June 18, 2017 22:14
@bjodah bjodah changed the title [WIP] Stub for heterogeneous output in Lambdify Heterogeneous output in Lambdify Jun 19, 2017
@bjodah
Copy link
Contributor Author

bjodah commented Jun 19, 2017

@isuruf I think this might be ready for review

Copy link
Member

@isuruf isuruf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@certik, can you have a look? This PR adds a hard dependency on numpy for Lambdify functionality to keep the code simpler and a little bit faster.

if have_numpy:
from .lib.symengine_wrapper import Lambdify, LambdifyCSE

def lambdify(args, exprs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to have the real and backend variables here.

@@ -59,6 +58,14 @@ else()
set(HAVE_SYMENGINE_LLVM False)
endif()

if(WITH_NUMPY)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WITH_NUMPY should be a cached variable.

CMakeLists.txt Outdated
@@ -59,6 +58,14 @@ else()
set(HAVE_SYMENGINE_LLVM False)
endif()

if(WITH_NUMPY)
find_package(NumPy REQUIRED)
include_directories(${NUMPY_INCLUDE_PATH})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we not linking the numpy libraries?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked and there are no symbols from numpy libraries linked in. It seems we are not using anything that requires linking. I wonder whether that means we can build with one numpy version and have it work on other numpy versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is my understanding that the NumPy C API is largely back-compatible. And no, the numpy functions used in the code uses the python layer (e.g. numpy.empty). It is possible that we could gain some performance by directly calling the NumPy C API in __call__ (that would then require linking).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, I think we can just use numpy.ndarray.ctypes to get a array pointer and have numpy only as a runtime requirement. I'll have a look to see if this is possible.

cdef list out_shapes
cdef readonly bint real
cdef readonly int n_exprs
cdef int *out_sizes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be a std::vector?

@isuruf isuruf requested a review from certik June 20, 2017 05:19
Copy link
Contributor

@certik certik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good to me. Depending on NumPy is fine I think.

If people object against that, we can try to make it optional, but I think sympy and symengine belongs to the scipy stack, and all the scipy stack depends on numpy. So I have no problems with it.

@certik
Copy link
Contributor

certik commented Jun 21, 2017

To make it clear --- symengine.py belongs to the scipy stack. symengine itself should only depend on a few well maintained C++ libraries.

isuruf added 2 commits June 21, 2017 18:43
np.arange creates a numpy array with long int elements which is
32 bit on windows.
@isuruf isuruf merged commit 8aff3c8 into symengine:master Jun 22, 2017
@isuruf
Copy link
Member

isuruf commented Jun 22, 2017

Thanks for the PR, @bjodah

@bjodah bjodah deleted the heterogeneous-Lambdify branch June 22, 2017 15:40
@isuruf
Copy link
Member

isuruf commented Jun 23, 2017

@bjodah, packaging symengine for conda with numpy as a build-time dependency means the conda package has to be compiled for each numpy version. I'd like to avoid that if possible with numpy only as a runtime requirement. Can you try my branch at https://github.com/isuruf/symengine.py/tree/memview and see if memviews have a performance penalty? I ran the benchmarks myself and for me I don't see much difference.

@isuruf
Copy link
Member

isuruf commented Jun 24, 2017

ping, @bjodah

@bjodah
Copy link
Contributor Author

bjodah commented Jun 24, 2017

@isuruf just tried it. The performance is the same on my machine. +1 for using memviews

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants