GSoC 2019 Proposal Shubham Kumar Jha : Improving Assumptions
SymPy: Improving Assumptions
Table of contents
- Basic information
- Background and programming experience
- Contributions to SymPy
- The Project
Name : Shubham Kumar Jha
Email Id : email@example.com
Phone Number : +91 8851531962
University : N.I.T. Agartala, India
Gitter Username : ShubhamKJha
GitHub : @ShubhamKJha
Location : Agartala, Tripura, India
Time-zone : IST (GMT/UTC +5:30)
Background and programming experience
I am a third-year undergraduate student of Computer Science & Engineering at National Institute of Technology Agartala, India.
I am currently using Ubuntu 18.04 O.S. dual booted with Windows 10. I use a combination of Atom text-editor and terminal for development purposes. Atom is quite comfortable to use and easily extendable through plugins. For terminal I use
Terminator. I also use Sublime Text Editor and I am quite comfortable with Vim/Nano and other command-line tools. For Python-related projects I use
PyCharm as my IDE and debugger.
I have done basic engineering courses in Mathematics during the first and second year of my undergraduate course. Few more courses I have done that are suited for this project are:
- Data Structures and Algorithms
- Discrete Mathematics
- Digital Logic and circuits
- Software Engineering
- Graph Theory and Combinatorics
In this semester I am having a course on Artificial Intelligence which heavily includes Logic and Inference Mechanisms. I would have completed these before the beginning of this Summer.
I actively participate in competitive coding contests on various platforms. I recently took part in
ACM-ICPC Kolkata regionals during this Winter.
I have been writing code in Python for four years now. It was introduced to me as a subject in the curriculum. Since then I have worked on various projects using Python.
Pygame was the first third-party module I used and developed games for fun. I use Python for prototyping and development alike. I like Python because it has a very decent learning curve.
I like the list comprehension and other such syntactic features of Python which reduces the amount of code required. Also, I like the
itertools module which handles the iteration based algorithms very nicely.
Contributions to SymPy
The following are the contributions I made to SymPy repository.
- #15525 : Removed Epiphany from documentation.
- #15979 : Fix for finite Fourier series.
- #16022 : Removed an unreachable part of code.
- #16131 : Created class for finite Fourier series.
- #16334 : Improved code of Pow class.
- #16357 : Improved code for Mul class.
- #16450 : Improved code for posify.
- #16213 : Improved handlers in new assumptions
- #16313 : The is_positive property of non-real values should be False.
- #16332 : Query about a certain inconsistency in the power module.
SymPy is a Computer Algebra System (CAS) written in Python programming language. For any CAS, their assumptions mechanism hold important value. This subsystem records and infers properties based on the queries asked and can simplify the outcomes of different operations.
Currently, SymPy has two sets of assumptions system. The former which is called 'old assumptions' is time-tested and fast while the new one (called 'New Assumptions') is logically rich but slower.
The new assumptions system tends to separate the symbol from the assumptions associated with it. Logically, this system is more robust and extensible than the old one.
>>> x = Symbol('x', positive=True) >>> x.is_positive True
>>> x = Symbol('x') >>> facts = Q.positive(x) >>> ask(Q.negative(x), facts) False
A great amount of discussion has already been done on the need for the new assumptions in SymPy.(refs  and )
This project aims to improve the capabilities of 'new assumptions' system to make it faster and more powerful. A lot of work had already been done in this direction. This project will pick up the work and extend it.
Improving assumptions is of great importance to SymPy community. In essence, the project is to remove the old assumptions as much as possible from the core and use new assumptions instead. But in order to make this possible, the new assumptions should be able to work at least as good as the old ones. Currently, the new assumptions work as a tiered system where the query is first handled by the Handler mechanism and then the more powerful
satask is called.
The handler system is essentially the old assumptions organized in a different way. Hence, it has all the shortcomings of the old system. It is difficult to scale. Though
satask is more powerful, it lacks speed and it is still not possible to translate all the assumptions into logical queries. The logic module also requires improvement.
This project will try to improve the capabilities of
satask and reduce the need for handler mechanism.
Though there is a lot of work to be done I will try to formulate some key milestone tasks as the project for this GSoC.
Task 1 : Remove inconsistencies in assumptions (both Old and New system)
While working on issue #16213 , I came across some inconsistencies in the old system (they are #16332 and #16313) which is essentially the drawbacks of old assumptions. This is also the reason the handler system won't scale and should not be improved much. I will try to put most of my energy towards improving the new assumptions. But before the new assumptions become good enough, the old one is going to be the de-facto system for all the modules. I will try to remove all such inconsistencies. In some scenarios, the old and new systems don't sync. One such example is at ref .
There are many issues popping up over the definition of assumptions (mostly in old system). Since old system forms the basis of definition for new assumptions, they need to be handled. In the case of ambiguity in the mathematical definition of any assumption, a consensus of the community would be required. This process may take time and hence I would do other tasks while constantly looking for inconsistencies.
Task 2 : Implement First Order Logic in SymPy
The logic module is essentially built around the assumptions system and is an integral part of the assumptions mechanism. Currently, it lacks the capability of forming complete FOL queries. This restricts the power of SymPy. This task would be to include an FOL logic module into SymPy.
FOL covers predicates and quantification and is a much needed part for a system handling assumptions. Currently, SymPy has the less powerful Propositional logic only. Also, this would pave way for other logic systems to be built into SymPy.
A good amount of work had been done at the PR #7608. I will be implementing a few tweaks to sync and merge it with SymPy.
- Currently, the module has many name conflicts with basic SymPy objects. I would try to either reuse objects from SymPy or create unique identities for them.
- Implement functionality for equality operator (needed for task-4) in current FOL.
- Add proper printers for this module.
Much work had been done on this topic and I still need to go through it. I would be sending PRs over this shortly.
Task 3 : Improve the SAT solver
SymPy has an optimized form of DPLL algorithm already built in it. But performance during the conversion of clauses to CNF can be improved. Some of the possible approaches I have come across are.
By removing symmetries in CNF (paper).
By implementing NICEDAGs (paper).
The above approaches create equisatisfiable CNF form of the clauses which is well suited for
satask. There are more possible solutions I need to go through. I will prepare a better roadmap before I start to work on it.
If mentors will agree, I will also try to have the SAT solver built in C language.
Task 4 : Implement SMT solving capabilities in SymPy
The Satisfiability Modulo Theories(SMT) problem is a decision problem for logical formulas with respect to combinations of background theories expressed in FOL(with equality).
The SMT solver uses DPLL(T) algorithm at the core. It consists of the following set of procedures:
The given FOL expression is checked with the standard SAT solver. In case of UNSAT, the overall expression is deemed unsatisfiable. While if the clauses are satisfiable the possible model is then passed for the second step.
The result is then passed to Theory solvers, in case of UNSAT at this level all the contradictory clauses are added to the expression and the first process is repeated.
The above two processes are repeated for finding the satisfiable model.
With a good SAT solver already built into SymPy, I will focus my attention towards developing efficient Theory Solvers for the following theories, in order(ref ):
Theory of quantifier-free uninterpreted functions with equalities (QF_UF)
Theory of equality with uninterpreted functions (EUF).
The theory of integers(TZ)
The theory of reals(TR)
After that, I would work on combining them (Nelson Oppen approach, ref ).
These along with SAT solver would form the core for SMT solver.
A great amount of planning is needed to complete this task in time. Currently, I am going through possible approaches and taking notes of them. I will base my work over open source SMT solvers like Z3.
With powerful SMT solver, most of the weaknesses of current assumptions system would be reduced.
Task 5 : Add support for Relational queries
This task is out of order and needs a decision on implementation.
Currently, new assumptions don't support queries involving
>>> # In new assumptions >>> ask(Q.is_true(x>0), Q.positive(x)) # => None >>> >>> # In old assumptions >>> x = Symbol('x', positive=True) >>> x > 0 True
With SMT solver, this task would be completed by default. Before that happens, this task can borrow logic from the old assumptions since they handle such cases very efficiently.
One workaround can also be :
>>> ask(Q.is_true(x>y), assumptions) >>> # should return the result of the following >>> ask(Q.positive(x-y),assumptions) >>> # or >>> ask(Q.negative(x-y),assumptions) >>>
With support of
Relationals available in the new system, we can have the following:
With new system supporting
Relationalsit would reduce the user's dependency on old syntax for such queries. Also, given that many sub-systems in SymPy rely on relational queries, these can then be converted to new syntax.
Currently, the queries like
ask(Q.positive(x+y+z),assumptions)takes into account the assumptions on each variable and proceeds through a set of cases to arrive at the result. This system may give incorrect results with the ignorance of some cases. This is what the old assumptions is doing and the current implementation of new assumptions has followed.
With the support of
Relationals, I will try to convert some of the predicates to their relational equivalents which would be added upon for the queries like above. The relational result acquired can then be used to infer the required propositions.
Task 6 :Making old assumptions call the new assumptions in the core
Most of the modules heavily use old assumptions and converting them to new API would be a serious task. Considering that the API structure of old assumptions is better, it would be a loss to remove old assumptions syntax completely. Instead the old assumptions should be made to call the new assumptions in the core. There is a great amount of discussion already made in this direction. The new assumptions are already calling old assumptions. This helps to query assumptions entangled with the symbols. Making old assumptions call the new assumptions, we can maintain the better old API and still have the advantages of the new mechanism.
This task can only be covered when the new assumptions has all the capabilities of the old assumptions. This project aims to do this task without any unnecessary improvements in than Handler system. The handler system can then be removed and the better
satask be used.
I have prepared a tentative timeline for the above-mentioned tasks. Though I would try to stick to this timeline, there can be bugs or other issues which may hinder the flow. In any case, this timeline would help keep track of the progress made.
I will use this time to take notes of the current system in SymPy regarding assumptions mechanism. The old assumptions system is spread all across the codebase and without proper knowledge about it, any change can have severe ripple effects. I want to keep the ripple effects minimum.
I have gone through a lot of issues and PRs regarding assumptions but there still may be some other related issues left. I would use this time to look into them.
Also I would to try to learn more about the SAT and SMT solving algorithms. This field is huge and many optimized algorithms are already in use. I have been studying them for a while but there may be some apt solutions for the specific case of SymPy that I have not looked into. I would use this time to come up with a better roadmap for it.
Community Bonding Period
I will use this time to communicate with the mentors and take their feedback. I will develop my approaches over them. Since I will be having my exams during this, I may be slow but I would catch up quickly. I will also be doing the above tasks during this period.
I have not mentioned above, but I would be sending regular PRs during both the Pre-GSoC and community bonding period. I tend to understand the codebase far better while fixing some issue rather than just reading the code. These PRs would help me know my project better.
This time will be spent on Task-2. Since a good headstart is already present, this should not take time.
This time will be spent on Task-3. I plan to have a strategy already with me during the previous weeks so this can be worked on easily.
This time will be spent on Task-4 and Task-5. Here Task-4 is going to take most of the time. I would try to already have a proper understanding to do this task in the previous weeks. Task-5 could already be done by Task-4 since SMT should already be able to do Relational queries.
Catch up everything.
With task-4 and task-5 completed the new assumptions should be able to handle all the capabilities of old assumptions. Then work can be started on Task-6.
Wrap up all the unfinished work.
Complete whatever is left and get it merged.
I intend to write all the documentation and tests in their respective weeks. I will try to complete all the work in the allotted time (except for some unwanted bugs and pending decisions) and only then move forward.
Though nothing can be said about the actual development period, but I've tried to allot more than enough time for each task.
Also, Task-1 would be worked on throughout the timeline by regular PRs.
Post GSoC period
Even after GSoC I will keep contributing to SymPy. I will try to complete this project in this summer only, but if it is not possible I will work on them even after this summer.
I have no commitments during this summer which gives me enough time to work on the above-mentioned tasks. I am able to work more than 40 hours a week on the project.
I will keep sending pull requests regularly to make reviewing easy and I will also try to keep small chunks of code in each commit.
The project also aims to bring forward the work on new assumptions so that further contributions can be done easily. Though not mentioned but I would maintain issue trees whenever I can and document anything that appears ambiguous.
Since the main project of removing old assumptions has been in the pipeline of SymPy for the past several years, there has been a lot of issues and PRs. I have tried to go through all of them and my proposal may reflect ideas from them( given in ref  to ).
I am very enthusiastic about my work being a part of SymPy. I don't want any part of my work remaining stagnant and would complete it.
. Assumptions Handlers
. Assumptions (wiki)