Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed improvements in loading time to Rubi Integration module #13339

Closed
wants to merge 2 commits into from

Conversation

arihantparsoya
Copy link
Contributor

@arihantparsoya arihantparsoya commented Sep 23, 2017

#13241

In [1]: %time from sympy.integrals.rubi.rubi import rubi_integrate
CPU times: user 2.67 s, sys: 96.7 ms, total: 2.77 s
Wall time: 2.81 s
  • Rubi now uses .pickle file to directly load the rules instead of constructing the discrimination net of all the rules(which was time consuming). The pickled file is placed at sympy/integrals/rubi/.

  • I have pickled the rules using dill module.

Partially fixes sympy#13241 .

Pickled the rubi rules using `dill` module.

Rubi now uses .pickle file to directly load the rules instead of constructing the discrimination net of all the rules
@gxyd
Copy link
Contributor

gxyd commented Sep 24, 2017

Has this resulted in reduction of importing time? You can paste the timings of import, just like Ondřej did.

BTW I am getting different timings from what @certik posted:
(On master, not on this branch)

In [1]: %time from sympy.integrals.rubi.rubi import rubi_integrate
CPU times: user 12 ms, sys: 0 ns, total: 12 ms
Wall time: 9.92 ms

@arihantparsoya
Copy link
Contributor Author

It is much faster on my computer now than before:

Without .pickle file

In [1]: %time from sympy.integrals.rubi.rubi import rubi_integrate
CPU times: user 3min 22s, sys: 1.32 s, total: 3min 23s
Wall time: 3min 26s

With .pickle file

In [1]: %time from sympy.integrals.rubi.rubi import rubi_integrate
CPU times: user 2.67 s, sys: 96.7 ms, total: 2.77 s
Wall time: 2.81 s

@arihantparsoya
Copy link
Contributor Author

@Upabjojr , I have picked the rules using dill module. dill makes it possible to pickle lambda functions.

We will need dill to load the rules too.

@Upabjojr
Copy link
Contributor

We will need dill to load the rules too.

OK, though I still believe that the best solution would be to make the lambdas full functions, so that we can avoid using dill.

@Upabjojr
Copy link
Contributor

How did you use dill, I don't see it in the code.

As another issue: I'm strongly opposed to include binary files into SymPy. If needed, I would rather suggest to include the code to generate the binary file, but not the binary file itself.

@Upabjojr
Copy link
Contributor

Furthermore, we still need to do some work with the rules, in many cases they give the wrong results. I believe we could have another GSoC next year to finish RUBI.

@Upabjojr
Copy link
Contributor

Including binary files doesn't allow to compare the versions and is likely to burst the size of the development branch.

I remember experiencing something similar in the past, I've seen a development branch requiring around 100 GB of space, 99.99% used by binary files regenerated for every commit.

@arihantparsoya
Copy link
Contributor Author

arihantparsoya commented Sep 24, 2017

How did you use dill, I don't see it in the code.

I have used dill to pickle the matcher. That will need to be installed to load the rules. I thought pickle module would be enough but it isn't.

The code to generate the binary file is really trivial:

import dill
from sympy.integrals.rubi.rubi import rubi_object
rubi = rubi_object()
file_Name = 'pickled_rubi_rules.p'
fileObject = open(file_Name, 'wb')
dill.dump(rubi, fileObject)
fileObject.close()

Shall I add this in documentation?

@arihantparsoya
Copy link
Contributor Author

I was going to start working on testing the rules again after getting this work. I wanted to improve loading time to make testing easier.

@Upabjojr
Copy link
Contributor

You can include the generator of the pickled file. There's no way we are going to include a binary pickle file.

@Upabjojr
Copy link
Contributor

Shall I add this in documentation?

You can add that code directly to the SymPy code, just use import_module for dill.

@Upabjojr
Copy link
Contributor

By the way, in order to match exp(x) I believe we need to transform exp(x) into Pow(S.Exp1, x, evaluate=False), as a preprocessor to the integral.

@asmeurer
Copy link
Member

How big is the binary file? GitHub doesn't show the filesize.

@asmeurer
Copy link
Member

You can try cloudpickle. I believe unlike dill, it generates a valid pickle format, so you only need it to serialize, not to deserialize.

@ylemkimon
Copy link
Member

ylemkimon commented Sep 25, 2017

@Upabjojr
Copy link
Contributor

Upabjojr commented Sep 25, 2017

I will not merge any PR containing binary files. The big problem is that every time you modify the rules, the binary file is likely to be noticeably different, and git will probably have to store a whole new blob.

@certik
Copy link
Member

certik commented Sep 25, 2017

Yes, please don't merge any binary files into the sympy git repository. For now, why not to create a script that generates this binary blob, and users can (optionally) run it themselves on their machines?

@Upabjojr
Copy link
Contributor

@parsoyaarihant a nice idea would be to generate the pickled file as a background process, I'd use multiprocessing for example:

import multiprocessing

def function_generating_blob(...):
    ...

p = multiprocessing.Process(target=function_generating_blob, args=(arg1, arg2, ...))
p.start()

(hope I wrote everything correctly) This should start the function under target as a separate process in the background and allow you to continue to use the current Python session. This could be a loading strategy.

@bjodah
Copy link
Member

bjodah commented Sep 30, 2017

Another +1 for not checking in any generated binaries in the repo. If it's useful in the source distribution it could be made to be generated by the sdist command in setup.py. On that note, I've always wondered why github doesn't provide a user defined warning limit on total size of PR...

@Upabjojr
Copy link
Contributor

If it's useful in the source distribution it could be made to be generated by the sdist command in setup.py.

I think our aim is to generate the decision tree, pickled files can be useful for development purposes.

@czgdp1807
Copy link
Member

@Upabjojr I think you are working on something related to this PR. What's your view on this one? Is this work still needed?

@Upabjojr Upabjojr closed this Sep 29, 2019
@Upabjojr
Copy link
Contributor

No binary files into SymPy please. This PR should definitely be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants