Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSoC][Discussion] Improving New Assumptions #17066

Open
ShubhamKJha opened this issue Jun 21, 2019 · 14 comments

Comments

Projects
None yet
5 participants
@ShubhamKJha
Copy link
Member

commented Jun 21, 2019

There are so many ideas proposed for speeding up and enhancing the New Assumptions system. Many of them are implemented to some extent in some PRs, but are stagnant now. Such ideas should be checked for their feasibility. And PRs can be revived if they can speed up the querying in New Assumptions.
This issue would act as an issue-tree to collect all such issues and PRs.

@ShubhamKJha

This comment has been minimized.

Copy link
Member Author

commented Jun 21, 2019

One of the ideas proposed is in #9728 to use pycosat as an optional dependency. PR #9813 adds that.

I tried to profile the PR, but somehow sympy_benchmarks fails for that.

C:\Users\Shubham\Desktop\sympy\benchmarks\my_fork\sympy_benchmarks>asv --config asv.conf.conda.json run -s 1 --bench logic pycosat
· Creating environments
· Discovering benchmarks
· Running 8 total benchmarks (1 commits * 2 environments * 4 benchmarks)
[  0.00%] · For sympy commit fb5acce2 <pycosat>:
[  0.00%] ·· Building for conda-py2.7-fastcache-mpmath-numpy-pycosat
[  0.00%] ·· Benchmarking conda-py2.7-fastcache-mpmath-numpy-pycosat
[  6.25%] ··· Running (logic.LogicSuite.time_dpll--)...
[ 25.00%] ··· Running (logic.LogicSuite.time_pycosat--).
[ 31.25%] ··· logic.LogicSuite.time_dpll                                                                      3.89±0.1s
[ 37.50%] ··· logic.LogicSuite.time_dpll2                                                                       375±6ms
[ 43.75%] ··· logic.LogicSuite.time_load_file                                                                12.6±0.2ms
[ 50.00%] ··· logic.LogicSuite.time_pycosat                                                                      failed
[ 50.00%] ·· Building for conda-py3.6-fastcache-mpmath-numpy-pycosat
[ 50.00%] ·· Benchmarking conda-py3.6-fastcache-mpmath-numpy-pycosat
[ 56.25%] ··· Running (logic.LogicSuite.time_dpll--)..
[ 68.75%] ··· Running (logic.LogicSuite.time_load_file--)..
[ 81.25%] ··· logic.LogicSuite.time_dpll                                                                      4.09±0.1s
[ 87.50%] ··· logic.LogicSuite.time_dpll2                                                                      362±10ms
[ 93.75%] ··· logic.LogicSuite.time_load_file                                                                  11.1±1ms
[100.00%] ··· logic.LogicSuite.time_pycosat                                                                      failed

Nevertheless, I tested the benchmarks with timeit:

In [1]: import my_test
True

In [2]: timeit my_test.test_it()
315 ms ± 12.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [3]: timeit my_test.test_it_dpll2()
347 ms ± 4.19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]: timeit my_test.test_it()
303 ms ± 6.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [5]: timeit my_test.test_it_dpll2()
347 ms ± 4.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Here, test_it() has the benchmarking code for pycosat based sat solver.
There appears to be a performance gain of 30-40 ms with pycosat. Though profiling suggests most of this time is spent in to_cnf, on an average pycosat based solver performs twice as fast as dpll2.

@asmeurer

This comment has been minimized.

Copy link
Member

commented Jun 21, 2019

That confirms what I had seen before. pycosat is a speedup, but most of the time is spent outside of the SAT solver in things like to_cnf.

@asmeurer

This comment has been minimized.

Copy link
Member

commented Jun 21, 2019

What is the failure in the benchmarks for pycosat?

@ShubhamKJha

This comment has been minimized.

Copy link
Member Author

commented Jun 21, 2019

What is the failure in the benchmarks for pycosat?

Don't know. I have added pycosat in the dependency matrix for asv but still it failed.

@oscargus

This comment has been minimized.

Copy link
Contributor

commented Jun 22, 2019

Not sure if this is the correct place, but it would be convenient to have a function that puts assumptions on a variable given bounds. One use case would be in integration, where, say, the variable x is between y and z. Now, depending on the assumptions on y and z something can be said about x automatically. I've thoguh about writing such a function and there will be lots of cases if one just list all the possibilities and make if-clauses. Maybe there is a better way in the assumption system that can be used?

Edit: I think the integration more or less handles this
However, in multi-variable integration, this is not introduced until a particular variable is integrated, although it could be introduced already from the beginning. There is a case in the Wester tests that would give a significant speed-up (and that is quite simple, so I'd say that in general it will give speed-up).

@highpost

This comment has been minimized.

Copy link
Contributor

commented Jun 22, 2019

Being able to add a unit interval assumption to a symbol would be helpful for integrals and probabilities in sympy.stats. I tried using sp.assuming, but it didn't have any effect on the results.

import sympy as sp

n  = sp.symbols('n', real=True, constant=True)
x  = sp.symbols('x', real=True)

with sp.assuming(sp.Q.is_true(p > 1)):
    display(sp.Integral(x**(-n), (x, 1, sp.oo)).doit())
@ShubhamKJha

This comment has been minimized.

Copy link
Member Author

commented Jun 22, 2019

Currently, assuming or anything from new assumptions doesn't work with Relationals(>,<,etc). Adding bounds to a variable (e.g a bound of (a, b)) is equivalent to adding two assumptions x > a and x < b. With relationals synced with assumptions this would not be difficult.

@oscarbenjamin

This comment has been minimized.

Copy link
Contributor

commented Jun 23, 2019

I ran a very limited benchmark and got this (using cProfile and gprof2dot):
output
The to_cnf and get_all_relevant_facts functions take most of the time.

In general I think that it seems to me that the new assumptions are built on top of the core in terms of sympy objects. So another way of looking at this is that the majority of the time is spent doing operations with sympy objects (args, __new__, ordered etc) which are much slower than Python's built in types. It seems to me that a system built this way will always be too slow to be used in the core.

@oscargus

This comment has been minimized.

Copy link
Contributor

commented Jun 26, 2019

Would it help to provide expressions in cnf-form earlier? Like returning cnf-form from get_known_facts (probably using a new proxy function)? Also, running a (limited) simplify on some expressions may help since I've noticed that the forms may have expressions that can simply be simplified, see e.g. #16194 (comment) where the right-most expression has both ~z and z. If one knows that everything is on cnf-format it would be quite easy to implement such a simplification (maybe the sat-solvers has it, but then each function is called quite a few times.

It may be interesting to eyeball some of the resulting expressions and get some idea if they are overly complex...

@oscargus

This comment has been minimized.

Copy link
Contributor

commented Jun 26, 2019

Sometimes it is better to check the code before commenting... There clearly exists get_known_facts_cnf already.

@oscarbenjamin

This comment has been minimized.

Copy link
Contributor

commented Jun 26, 2019

I think that right now what's slow here is not the general algorithm but just all of the SymPy objects that are created and used as the variables of the algorithm. If this was operated by a fast core that used native Python types then it could be much faster. Also a clearly defined separation between internal utilities and end-user API could help.

Example: LatticeOp.__new__ in total consumes 60% of cumulative running time. 10% of all time in the benchmark I showed is spent in LatticeOp.__new__ as called from get_all_relevant_facts in these lines

if i >= iterations:
return And(*relevant_facts)
return And(*relevant_facts)

If you were trying to implement an efficient core here you would just return the facts as a Python set rather than constructing an And out of them.

The benchmark I used was just to time

x,y,z=symbols('x,y,z')
refine(re(x*y*z), Q.real(y) & Q.real(z))

with caching turned off. This is just an example that I spotted as slow when testing a recent PR. From that I get 750,000 calls to sympify taking 24% of cumulative time (with cacheing that reduces to 130,000). The sympify function should only be used at the boundaries where SymPy needs to normalise input from users. It shouldn't be called in the inner loops of a performance-sensitive algorithm. The problem here is that every time you create an And every argument gets sympified. Using native types immediately eliminates that cost.

@asmeurer

This comment has been minimized.

Copy link
Member

commented Jun 26, 2019

@oscargus actually you are right. We need to do that, but for the facts generated by satask. Actually we could even go further (assuming the performance gain is enough) and directly generate the list of integer representation used by satisfiable. I don't know if that would actually make a huge difference in performance over using expressions, and if it doesn't we should not do it because it will make things harder to maintain.

There is already a PR starting some of these ideas here #11789. We just need to work on cleaning it up (@ShubhamKJha is working on this).

@oscarbenjamin

This comment has been minimized.

Copy link
Contributor

commented Jun 26, 2019

I never know which Oscar anyone is referring to with the @ references :)

@asmeurer

This comment has been minimized.

Copy link
Member

commented Jun 26, 2019

At least you always know who is commenting ;) I always think one of you is the other when glancing at my email notifications (fortunately you have different avatars on GitHub).

@oscarbenjamin oscarbenjamin referenced this issue Jul 21, 2019

Open

[WIP] Improvement in satask #17144

1 of 2 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.