Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python interface performance #513

Open
mbargull opened this Issue Aug 22, 2018 · 22 comments

Comments

Projects
None yet
4 participants
@mbargull
Copy link
Contributor

mbargull commented Aug 22, 2018

Hi @msoos,
I am a contributor to the Conda, the package manager, and I'm currently working on speeding up the dependency resolution therein. One very important aspect of this is that at the very heart of it, conda is calling out to a SAT solver, which is PicoSAT (through the Python interface provided by pycosat), currently.
Now what I'm proposing is to do a little bit of "heart surgery" on Conda and make it "powered by CryptoMiniSat", a change I know would be welcomed by the Conda team. Plus, I'm very excited about it since some initial testing showed incredibly good results with CryptoMiniSat solving instances multiple times faster than the solver we are currently using!

There is a slight hurdle for us to switch to CryptoMiniSat right away, though:
The way we use the SAT solver is that we call it multiple times (depending on the instance, more than fifty times) with slightly different clauses (we have a base set of clauses and add different additional constraints to that). Now, for some problematic instances the number of clauses can be a up to a few million. We observed that in those cases the time taken to add those clauses to the solver instance (i.e., calling Solver.add_clauses from the Python interface) is considerably high and sometimes even a lot higher than the actual SAT solving itself when CryptoMiniSat is used.
I only took a quick look at https://github.com/msoos/cryptominisat/blob/5.6.4/python/src/pycryptosat.cpp but couldn't see any significant differences for the Python C-API part in comparison to what pycosat does at https://github.com/ContinuumIO/pycosat/blob/0.6.3/pycosat.c#L91-L145. So I'm not really sure if the extra time taken is only in the Python interface in pycryptosat.cpp or if creating millions of clauses is generally time consuming for CryptoMiniSat.
If you could help us with that problem, I would be very grateful. And if we could claim that "powered by CryptoMiniSat" for conda and thus gain those solver speed improvements, I believe many people from the Conda, conda-forge, Bioconda, ..., communities will be too 👍.
(cc @kalefranz, @mcg1969)
[ EDIT: typo s/"hear surgery"/"heart surgery"/ ]

@msoos

This comment has been minimized.

Copy link
Owner

msoos commented Aug 22, 2018

Yes!! I am very much in favour of using CMS in conda :) The only thing I really need is one of your examples that would allow me to reproduce the issue. Note that there are two things here:

  1. The startup time using the current way you are representing the problem. This is a fix that has more to do with C++, copying std::vector-s, etc.

  2. Potentially fixing the way you are representing the problem. Sometimes, you can represent the problem in a different way, using less clauses and variables, which could considerably lower both the startup and the solving time.

I say let's concentrate on (1) for the moment. Once I get that in shape, I'll probably understand (2) better and maybe offer some help with that too :) So, to help with (1), can you please give me a python script + input file or a set of steps I would need to perform (e.g. in a virtual machine) to reproduce the issue? Then I could debug :)

@msoos

This comment has been minimized.

Copy link
Owner

msoos commented Aug 22, 2018

(PS: Sorry for the late reply, I was working, this is my hobby so that's the reason for the delay. I will have time to work in this in the coming days luckily, so if you give me a set of reproducible steps to trigger the slow clause adding, I'll fix it. Note that I can't just add random clauses to reproduce -- unfortunately, the structure of the clauses, the number of variables, the size of the clauses, etc. added can meaningfully impact the way things are called and the code is exercised)

@mcg1969

This comment has been minimized.

Copy link

mcg1969 commented Aug 23, 2018

Hey there! I'm one of the core conda developers, and heavily invested in our current Picosat-based implementation. I'm thrilled that @mbargull is tackling this and that you've offered some help. CMS has tremendous potential to improve our SAT performance.

One challenge we face here is that while our problems are somewhat incremental, they do involve some backtracking. From a mathematical point of view, our model is actually a multi-objective, pseudoboolean optimization problem. We start with a set of mandatory logical clauses that must be satisfied; then we proceed by performing a sequence of pseudoboolean optimization passes, implemented with bisection. We handle the conversion of the pseudoboolean inequalities to standard logical form using a BDD approach.

So at first glance we could exploit an incremental approach, but many of our bisection steps are unsatisfiable. When that happens, we have to roll back the latest set of incremental clauses, and apply a less strict set of clauses (e.g., we relax the pseudoboolean inequality). The solvers I've seen that implement incremental solver haven't (at least, by my reading) supported the kind of state-saving or rollback capability needed to handle this efficiently—and Picosat certainly doesn't—so we've simply had to feed a sequence of entirely independent SAT problems to the solver.

Fortunately, the total solution time CryptoMiniSAT is likely to require is entirely reasonable, save for the particular bottleneck that @mbargull has observed. If the compiled layer here is not the bottleneck, and only the Python interface, then the likelihood is high that we can develop a faster interface for our purposes. Maybe we do implement some sort of "clause queue" on the C++ side that allows us to incrementally load and roll back the clause set on the C++ side, as needed, even if we can't exploit an actual incremental solve.

@kalefranz

This comment has been minimized.

Copy link

kalefranz commented Aug 23, 2018

Hey @msoos. Just wanted to weigh that I'm also really excited about the possibilities and opportunities we have here, especially after seeing @mbargull's benchmarks in conda/conda#7676.

I agree with the two focus areas and how you've outlined them in #513 (comment). That's also how I foresee this effort going. Cleaning up the startup time of CMS seems like it could be low hanging fruit that ends up being pretty broadly applicable, not just to conda. If we could make progress there, we could do a simple switch-out of PicoSAT/pycosat for CMS, leaving all the details and potential quirks of how we're setting up the SAT problem intact.

Then, having you involved in a second phase of work, where we get your take on conda's use of SAT, either improvements in how we set up the SAT problem in general, or ways we can exploit specific optimizations in CMS, would be totally awesome.

@kalefranz

This comment has been minimized.

Copy link

kalefranz commented Aug 23, 2018

Regarding the setup of a working dev environment for you... Start with the instructions here. It's basically just cloning the conda repo, then executing . dev/start. That'll create a ./devenv environment for you, make it fully activated in your current shell, and install conda in python's "develop / editable" mode, where the source code in ./conda is what the python interpreter is using.

Of course, that'll be using pycosat. So second step would be something like

git remote add mbargull git@github.com:mbargull/conda
git fetch mbargull
git checkout python-side-speedups

and then you'll be interacting with the code in Marcel's conda/conda#7676 PR. Specific instructions for flipping to CMS are at conda/conda#7676 (comment). However I wouldn't use conda create -nx -c bioconda -c conda-forge -c defaults qualimap=2.2.2a=3 as the initial test case for development. That's probably a good test case for that phase no. 2 of work we were discussing :)

I think a good initial test case, from conda/conda#7676 (comment), is

conda create -nx --dry-run anaconda -v -c defaults

and specifically targeting the line

TOTAL    39.834        59 conda.common.logic.CryptoMiniSatSolver.setup

@mbargull do you agree? Did I miss anything?

@mbargull

This comment has been minimized.

Copy link
Contributor Author

mbargull commented Aug 23, 2018

Yes!! I am very much in favour of using CMS in conda :)

@msoos, thank you so much for offering all that help! I had hoped for a positive resonance but didn't at all expect so much support. Very much appreciated!

The only thing I really need is one of your examples that would allow me to reproduce the issue. [...]

Whatever I can do to help you help us help us all 😄.
@kalefranz already described how to replicate the calls from conda itself, but if you need anything else like CNF files of those instances, I can gladly provide those.

I say let's concentrate on (1) for the moment.

👍

I'll probably understand (2) better and maybe offer some help with that too :)

That is very generous, thank you!

So, to help with (1), can you please give me a python script + input file or a set of steps I would need to perform (e.g. in a virtual machine) to reproduce the issue? Then I could debug :)

I'll see if I can put together some smaller examples for you to play with.

From a mathematical point of view, our model is actually [...]

@mcg1969, thanks for putting this into much better words than could've done!

we have to roll back the latest set of incremental clauses, and apply a less strict set of clauses

I already saw that CMS has CMSat::Solver::save_state/CMSat::Solver::load_state which allows you to save/load some state information to/from a file. That immediately made me quite hopeful that CMS might provide that roll back functionality we need (though I'm furiously trying to keep my excitement back as I don't know the inner working of CMS, yet, and thus don't know how much of the overall state can be stored/restored with that ).

Then, having you involved in a second phase of work, where we get your take on conda's use of SAT, either improvements in how we set up the SAT problem in general, or ways we can exploit specific optimizations in CMS, would be totally awesome.

💯 👍

@mbargull do you agree? Did I miss anything?

Sounds about right, conda create -nx --dry-run anaconda is a good case to look at. To get more consistent results, I would also set the environment variable CONDA_LOCAL_REPODATA_TTL to 999999 or so, so that the set of packages doesn't change. Plus, to show the timing information, you can run it with

$ \
CONDA_TIMED_LOGGING=1 \
CONDA_INSTRUMENTATION_ENABLED=1 \
CONDA_LOCAL_REPODATA_TTL=999999 \
conda create -nx --dry-run anaconda -vv
@msoos

This comment has been minimized.

Copy link
Owner

msoos commented Aug 23, 2018

Ahhh... let me address all of your comments here, one by one. First, about rollback:

Thee solvers I've seen that implement incremental solver haven't (at least, by my reading) supported the kind of state-saving or rollback capability needed to handle this efficiently

So... there is rollback ;) Even in picoSAT, though probably I shouldn't tell you that :D The trick is the following. You have the core clauses:

a V b V c
-a V b

Now you want to add new clauses that you may want to remove:

f V g
-f V e

You have to do this:

x V f V g
x V -f V e

and then you call the solver with:

solve_with_assumptions(x = false)

This will effectively force the original clauses to be used (since X is false, so the solver sees only the literals of the clauses you wanted to add). To remove the clauses, you simply add to the solver:

add_clause(x = true)

It's kinda tricky but cool. Where CMS excels is that it will do the right thing when it comes to assumed clauses and variables. Picosat would too from a correctness perspective, but CMS is a lot more scalable.

The gist of it is that you introduce a dummy variable, here called "x" and then you add this as a literal to every clause you may need to remove later. Then, you call solve() with the assumed literal x=false. What you have to take care of is not to introduce hundreds of these dummy variables. So if you have groups of clauses that you need to remove in groups, then each group of clauses should use the same dummy variable. This way, you can unroll, and you don't even need to unroll in a historical order. X could be for group 1, Y for group 2 and Z for group 3. Then you can solve(x=false, y=true, z=false) hence turning these groups of clauses on and off as you like :) If you add x=true then of course you delete that group of clauses for good. It's best to do that if you know you won't need them any longer (the solver will free the memory and will solve faster).

The above is all about (2), i.e. how you represent the problem. It seems we can potentially make a big difference here, adding a lot less clauses, potentially removing a lot of the problems related to the setup phase. Also note that if you do it this way, the solver will keep memory of the work it has already done. That can lead to an important performance impact as well. Could you take a shot at doing the above?

@msoos

This comment has been minimized.

Copy link
Owner

msoos commented Aug 23, 2018

I already saw that CMS has CMSat::Solver::save_state/CMSat::Solver::load_state which allows you to save/load some state information to/from a file

That is correct. It can actually do that, and it's a well-tested system, too. However, I would like to approach it from the perspective of the clause group switch on/off using dummy variables as per above. It would make your system work with any SAT solver, and it's a tried and tested method. The load/save is pretty cool and I could expose it to the outside, but it may have some restrictions -- I need to think through what exactly.

msoos added a commit that referenced this issue Aug 23, 2018

Don't reallocate clause every time in python interface
This is an obvious fix

Related to issue #513

msoos added a commit that referenced this issue Aug 23, 2018

msoos added a commit that referenced this issue Aug 23, 2018

Introducing "freshness" -- indicator of a new solver object
This should dramatically improve the speed of adding new clauses to the
system.

Related to #513
@msoos

This comment has been minimized.

Copy link
Owner

msoos commented Aug 23, 2018

OK, so there have been a number of fixes pushed :) I think it might do the trick. Could you please try? I used perf record conda.... to get some performance analysis but it would be nice if you could tell me how you got those nicely printed timings :) I couldn't make those work. I mean these:

TOTAL    39.834        59 conda.common.logic.CryptoMiniSatSolver.setup

However, I could get timings for CMS (I could switch it to CMS, yay) and the perf analysis really helped. However, I don't know how to check the newer CMS's performance... how do you make it use the newer CMS? Do you need to tag and then it can be pulled in with conda install conda-forge::cryptominisat? I'm a bit hesitant to do that, I need to test more before I can tag this release.

Please take a peek, and please let me know how I can pull in the latest CMS and how I can get those sweet timings :)

PS: If you could tell me a command line that takes less time that'd be nice too, I like to iterate fast(er). I understand that it needs to run for some time to have meaningful performance timings, but something less than in the minutes range would be nice. 10-20 is usually enough, at least at first :) Thanks!

@mcg1969

This comment has been minimized.

Copy link

mcg1969 commented Aug 24, 2018

I love the dummy variable idea! I look forward to having the opportunity to test it.

@mbargull

This comment has been minimized.

Copy link
Contributor Author

mbargull commented Aug 24, 2018

I love the dummy variable idea! I look forward to having the opportunity to test it.

Yes, if CMS handles it so well for us, then this would be quite handy! One thing I wonder is if CMS is also happy if you add those dummy variables to virtually all clauses -- because I think that is what conda would do when we adopt this...

OK, so there have been a number of fixes pushed :) I think it might do the trick. Could you please try?

Thank you for implementing those improvements so quickly! I tried it out with gh-515 where I posted some numbers.

but it would be nice if you could tell me how you got those nicely printed timings :)

You get those if you run

export CONDA_INSTRUMENTATION_ENABLED=1 CONDA_TIMED_LOGGING=1

in your shell before running conda with -v or -vv. I had to remove those timings though, since the CI tests were not happy with it. So you'd have to checkout commit conda/conda@e381cba to still get those.
But I think the simple script from #515 (comment) should be a much more easily manageable substitute. I don't think the kind of clauses conda creates are that much different from those "dump" one created by that script 😉. It seems to really just be a bottleneck of creating those millions clauses (also sometimes can use a little over a million variables -- not too efficient, I can imagine) quickly.

how do you make it use the newer CMS

I modified the build from the conda-forge feedstock to build from a local copy with conda-build. But it's probably more convenient to just build it the way you usually do and the pip install it into conda's environment. But again, the script from #515 (comment) (or a C++ equivalent thereof) should be more convenient overall.

Thanks so much for working with us on this!

@msoos

This comment has been minimized.

Copy link
Owner

msoos commented Aug 24, 2018

I wonder is if CMS is also happy if you add those dummy variables to virtually all clauses

Yes, but only if you have a few such dummy variables in total. So if you have say < 15 dummy variables in total, it's OK. Note that you can use a dummy variable with as many clauses as you like, millions if you want. So this does not limit the number of clauses in any way. We can experiment with more dummy variables, too, but keeping it under 50 is definitely a good idea. The less, the better, so keep that in mind :)

msoos added a commit that referenced this issue Aug 25, 2018

msoos added a commit that referenced this issue Aug 25, 2018

msoos added a commit that referenced this issue Aug 25, 2018

msoos added a commit that referenced this issue Aug 25, 2018

No need to check this so verbosely, it's checked later as assert
It's less user friendly but faster

Related to #513

msoos added a commit that referenced this issue Aug 25, 2018

mbargull added a commit to mbargull/cryptominisat that referenced this issue Aug 25, 2018

mbargull added a commit to mbargull/cryptominisat that referenced this issue Aug 25, 2018

mbargull added a commit to mbargull/cryptominisat that referenced this issue Aug 25, 2018

mbargull added a commit to mbargull/cryptominisat that referenced this issue Aug 25, 2018

mbargull added a commit to mbargull/cryptominisat that referenced this issue Aug 25, 2018

mbargull added a commit to mbargull/cryptominisat that referenced this issue Aug 25, 2018

@mbargull

This comment has been minimized.

Copy link
Contributor Author

mbargull commented Aug 26, 2018

I wonder is if CMS is also happy if you add those dummy variables to virtually all clauses

Yes, but only if you have a few such dummy variables in total. So if you have say < 15 dummy variables in total, it's OK. [...] keeping it under 50 is definitely a good idea.

Thanks for the clarification! If we were to use this in a "less intelligent" way in conda, we'd create between 50 to 100 in the worst cases, I think 😆. So it'd probably made sense for us to do a "full reset" at times, i.e., restart the solver completely with the clauses we know have to be fixed for successive runs and remove the dummy variables on those.

@msoos

This comment has been minimized.

Copy link
Owner

msoos commented Aug 26, 2018

It may be a good idea to do a reset, but I'd go full-on dummy variable (even if 100 or more) first. Then see how the performance is. The difference could be big enough that you won't necessarily have to optimise.

@msoos

This comment has been minimized.

Copy link
Owner

msoos commented Aug 28, 2018

BTW, I would like to give praise here to @horenmar who in issue #445 really helped with improving performance of adding variables, which was awfully bad. He also demonstrated the issue of slow adding of clauses but at the time I couldn't debug. With your help it was a lot easier. So, thanks @horenmar :)

@msoos

This comment has been minimized.

Copy link
Owner

msoos commented Sep 3, 2018

The final configuration difference, comparing apples to apples (i.e. list to list):

setup CMS(old)     list  0.280 0.286 0.301 0.305 0.311
setup CMS(new)     list  0.571 0.574 0.589 0.595 0.597

So that's about twice the speed :) Also, the same with arrays:

setup CMS     array 0.189 0.190 0.192 0.196 0.198

That's approx 3x the speed! I think that's pretty cool. Thanks to all of you for your help, most especially @mbargull :)

I'm curious, have had time to make progress with the dummy variable idea?

@kalefranz

This comment has been minimized.

Copy link

kalefranz commented Sep 4, 2018

I'm really excited about all this work you all are doing. Looking forward to the day when metadata downloading and management is the bottleneck again. Possible we're not that far off if you guys keep it up.

@mbargull

This comment has been minimized.

Copy link
Contributor Author

mbargull commented Sep 4, 2018

I'm curious, have had time to make progress with the dummy variable idea?

Unfortunately, no. I won't be able to take a look at this in the next months, so I'm betting on @mcg1969 to find the time to play with that 😉.
Even if it were to perform less well than we are hoping, there are some problem instances for which this could show huge improvements, I think. So I'm really hoping someone picks this work up ^^.

@mcg1969

This comment has been minimized.

Copy link

mcg1969 commented Sep 4, 2018

Ha! Yes, I am looking forward to trying it. I’m afraid I don’t know when I will be able to, however. And until one of us gets dedicated block of time to really exercise and test, it’s important in my view that we focus on preserving are just in logic completely. Marcel knows as well as anyone else the cost we incur when we take solver changes too lightly.

@mbargull

This comment has been minimized.

Copy link
Contributor Author

mbargull commented Sep 4, 2018

Ha! Yes, I am looking forward to trying it.

❤️

it’s important in my view that we focus on preserving are just in logic completely. Marcel knows as well as anyone else the cost we incur when we take solver changes too lightly.

Indeed! So even though this should be comparatively easy and fast to implement, I know we have to "tread cautiously" here -- and this is where the time constraints come into play..

until one of us gets dedicated block of time to really exercise and test

If you need to sweeten the deal for management, just tell them you can greatly improve on the UX by making the interaction with your big flagship package, anaconda, much faster 😉.

@msoos

This comment has been minimized.

Copy link
Owner

msoos commented Oct 28, 2018

Hey, do you know what's going on? There was quite a bit of work with the Windows build and everything but it all died down... I am just wondering what is going on :) Thanks!

@mcg1969

This comment has been minimized.

Copy link

mcg1969 commented Oct 28, 2018

I do not, but I can definitely tell you that there is a renewed focus just this week on getting the performance of conda improved, and I asked @kalefranz about the status of this work... I know he's been busy but I definitely think this is going to be getting some attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.