GSoC 2017 Application Arihant Parsoya: Rubi Integrator

Arihant Parsoya edited this page May 10, 2017 · 1 revision

Table of Contents

About Me

Name and Contact Info

Name: Arihant Parsoya

University: Indian Institute of Technology Bombay


github username: parsoyaarihant


Time Zone: IST (UTC +5:30)

Personal Background

I am Arihant Parsoya, a second year undergraduate student at Indian Institute of Technology Bombay, India. I am interested in physics and computer science. I work on macOS Sierra with Vim as my primary text editor. I use Vim over other editors because it allows me to personalise my workflow and improve my productivity. I want to help others solve problems using SymPy effortlessly by making its integration module faster and easier to use. I started using Python when I was working on an image processing project using openCV. I like Python for its user intuitive API and large community which is readily available for help. I have been using Git for over a year now to contribute towards SymPy and other programming projects based on openCV and data analysis.

Programming Experience

I have been programming in Python since the last two years and I have garnered a good amount of knowledge. I use Python mostly for data analysis in my projects. Python has powerful libraries and good documentation which help any beginner implement ideas easily. I began using SymPy around a year ago to recheck my calculations and solve problems with tedious hand calculations. This helped improve my productivity in learning courses. I used Singularity Functions module extensively while I was taking Solid Mechanics course. Singularity functionsodule was helpful because it gave all the relevant parameters I needed(such as slope, deviation, etc).

I like building new things and tinkering with ideas. I have created things like scrumtracker, open source gaming website, etc. I also participate in coding hackathons very frequently. My team secured 3rd position in hackathon where we built online medical checkup system.


I started contributing to SymPy in May 2015. I already had some exposure to open source since I used it extensively for my coding projects. When I submitted my first PR to SymPy, the community was very encouraging and that is when I realised that being a part of SymPy community would be one of the best opportunity to improve myself as a programmer and solve real world problems.

Here is the list of my contributions in chronological order:

  • (Open) Improved pattern matching for is_Pow patterns#12396
  • (Merged) Improved manual integrate for manualintegrate(x**y, x) #12253
  • (Merged) Updated expr.coeff() doctstring #12201
  • (Merged) Empty intersection returns UniversalSet #12183
  • (Merged) Implemented recursive calling of evalf() function #12096
  • (Merged) Improved Latex Printing of Piecewise Functions #11897
  • (Merged) Interval(oo, oo) returns S.EmptySet #11796
  • (Merged) Enabled ASCII printing of Singularity Functions #11794
  • (Open) Enabled Non Integer `nu` values in bessel.rewrite() #11785
  • (Closed) Raises Error instead of RuntimeWarning #11729
  • (Merged) Improved distance() Function #11618
  • (Merged) Improved ITE() function #11535
  • (Merged) Added check_assumptions in Main Namespace #11509
  • (Open) Changed Printing of Logic Expressions #11448
  • (Open) Implemented Ellipsis in slicing Sympy Matrices #11392
  • (Open) Creating Reals set as an alternative to S.Reals #11383
  • (Merged) Enabled refine_MatMul to accept non Matrix arguments #11365
  • (Merged) Making is_Mul LaTeX expressions consistent with Pretty Printer #11298
  • (Merged) Assumptions in assumptions expr. #11276
  • (Open) Partial derivative support for as_finite_diff #11242
  • (Merged) Implemented floordiv in sympy #11213
  • (Merged) Improving Floor and Ceiling functions #11210
  • (Open) Enabled support for partial derivatives in as_finite_diff #11177
  • (Open) Implemented Spherical Coordinate System #11133
  • (Open) Improved solve_univariate_inequality function #11109
  • (Merged) Enabled commutative property for MatrixElement #11084
  • (Merged) imported missing Poly function in file #11067

Contribution to Symengine

  • (Merged) Improved sec(x) and csc(x) functions for x = 0 case #1020
  • (Merged) Moving clang-format back to main tests #1004

Issues Reported

  • (Open) Mathematica Parser Error #12309
  • (Closed) manualintegrate doesn't check domain of variables #12251
  • (Closed) Unable to solve heaviside equations #11216
  • (Closed) Set range of a Symbol #11217

The Plan


Rule based integration (Rubi) consists of ~10,000 transformation rules which have been tabulated since last two centuries. CAS can match the integrand with the right rule to directly solve the integration without using general integration algorithms. Adding Rubi to SymPy can be very helpful to SymPy's integration module since it frees developers of algorithms from having to worry about the annoying and trivial problems and the special cases, and instead focus on the genuinely hard and interesting problems. Rubi rules are designed such that it always result in simpler expressions, which is helpful to SymPy users.

Ondřej Čertík have previously worked on implementing Rubi, which implements a subset of Rubi trigonometric rules in SymPy using if/then/else decision tree. Much of my ideas are inspired from his work.

My intention is to implement Rubi Integrator in SymPy which includes doing the following:

  • Create new Match object for pattern matching in SymPy.
  • Implement ways to automatically generate decision tree by parsing rules implemented in Mathematica.
  • Build tools that check consistency of the rules.
  • Writing and generating documentation of rules.
  • Measuring the speed of integration after each module is implemented.


Multiple approaches have been proposed to implement rule based integration(Rubi). Such as:

Pattern Matching

In this approach, the given expression is matched with all ~10,000 rules until it finds its match. Once matched, rule transformation is applied on the expression. This approach has O(n) time complexity. We need to define new class TransformationRule which is capable of transforming a matching expression when certain conditions are satisfied.

Example(taken from Francesco Bonazzi's proposal)

>>> a, b, x, y = symbols("a b x y")
>>> tr = TransformationRule(a+sin(b), sin(a)*cos(b), wild=(b,), wildseq=(a,))
>>> tr.transform(3+x+sin(y))

Decision Tree

This approach requires implementing decision tree for a known pattern. We need to do some pattern matching beforehand to identify which pattern does the expression belong to and apply rules based on the conditions of Wild variables. Pattern matching in this case will be very localised to small subset of expressions which can be handled easily.

I will be using decision tree approach to implement Rubi integration in SymPy as it allows us to add additional functionality on rules such as returning Piecewise function and narrow down the number of patterns to be matched while searching for correct transformation rule.

Improve Pattern Matching

Sympy already has two pattern matchers .match() and unify which can be used to match patterns. All of SymPy's matchers so far were concerned in matching the whole expression. A user may have a huge expression, and be willing to apply a rewriterule to only some parts of it, that's why we need subtree matching. Francesco Bonazzi suggested to create a Match:

>>> a, b, x, y = symbols("a b x y")
>>> m = Match(a+sin(b), wild=(b,), wildseq=(a,))
>>> m.match(3+x+sin(y))
{a: 3+x, b: y}

wild signifies a wildnode and wildseq can match a tuple of wildnodes. In the above API, we can control which symbols should be used as wildcards and which should be used as wildseq without using Wild symbols.

Pattern matching for commutative expressions is NP problem. It's time cost increases exponentially as the length of expression increase. Hence, I plan to implement a pattern matcher specifically designed for Rubi integration. This is going to have less functionality than a general pattern matcher but it will be faster, which is what we need to implement Rubi integration.

manualintegrate and Rubi

manualintegrate() in SymPy also relies on rule based integration technique. However the rules implemented doesn't cover integration of special functions(such as Gamma and Zeta functions). It also has functionality to return the steps and rules involved in solving the integration.

>>> print(repr(integral_steps((x**2 + 3)**2 , x)))
RewriteRule(rewritten=x**4 + 6*x**2 + 9, substep=AddRule(substeps=[PowerRule(base=x, exp=4, context=x**4, symbol=x), ConstantTimesRule(constant=6, other=x**2, substep=PowerRule(base=x, exp=2, context=x**2, symbol=x), context=6*x**2, symbol=x), ConstantRule(constant=9, context=9, symbol=x)], context=x**4 + 6*x**2 + 9, symbol=x), context=(x**2 + 3)**2, symbol=x)

Implementing rubi_integrate() will be similar to manualintegrate() and it will be merged with manualintegrate(). So, far only 25 rules are implemented in manualintegrate(). We need to replace those rules with 10,000 Rubi integration rules and implement decision trees which identifies correct rule for each expression. I will discuss implementation details in later sections.

Utility Functions

Utility functions are required to facilitate integration methods to check conditions or use other integration technique. For example(written in mathematica):

Int[1/(a_+b_.*x_+c_.*x_^2),x_Symbol] :=
  -2*Subst[Int[1/Simp[b^2-4*a*c-x^2,x],x],x,b+2*c*x] /;
FreeQ[{a,b,c},x] && NonzeroQ[b^2-4*a*c]
Require using Subst[Int[],var,subs], NonzeroQ[exp] functions to solve the equation. There are many other functions that needs to be implemented in order to make implementation of rules easier. I will use functions which are already implemented in mathematica as reference to implement these functions in SymPy. Mathics(Open source clone of Mathematica) has good parsing capabilities of Mathematica code, I will use it to convert rules into SymPy syntax.

Piecewise function

Current integrate()in SymPy is smart enough to return the answer in Piecewise function when necessary conditions are unknown to integrate the expression. In some situations, the rules implemented may not match the expression exactly due to insufficient conditions given in the expression. In these cases, the function should return a Piecewise function whenever possible. For example SymPy returns:

>>> integrate(x**y, x)
⎧log(x)  for y = -1
⎪ y + 1            
⎪──────  otherwise
⎪y + 1>>> integrate(cos(x*y),x)
⎧   x      for y = 0
⎪────────  otherwise
⎩   y            
This behaviour is handled by heurisch_wrapper() in heurisch integration algorithm which finds the poles of the function and re-evaluates the integration at the poles. However, this approach require calling the integrate function multiple times and it is best to check the validity of each expression in the rule itself. Aaron said:
So even though it requires a check in each rule, I think it's better to do it that way.
It's also more efficient because we aren't calling solve(), and we aren't even calling integrate() again,
which would have to run through the whole algorithm (even manualintegrate would have
to do the pattern matching again). Here we just return the full thing in one step.
In a more complicated case we may still need to call to integrate() or a specific rule.

Similar behaviour is expected from rubi_integrate(). We can return a Piecewise function in the end of most decision tree (usually in else: condition) depending on the hierarchy of rules.


Match object


Subgraph Isomorphism is a NP problem in Computer Science. Handling commutative expressions such as Mul and Add is particularly hard as the matcher has to match all possible combinations of pattern with the expression to get the right answer.

Mathematica also tries all the possible orders of arguments to try and find a match (taken from The Mathematica Book):

Whenever Mathematica encounters an orderless or commutative function such as Plus or Times in a pattern, 
it effectively tests all the possible orders of arguments to try and find a match. 
Sometimes, there may be several orderings that lead to matches. 
In such cases, Mathematica just uses the first ordering it finds. 
For example, h[x_ + y_, x_ + z_] could match h[a + b, a + b] with x a, y b, z b or with x b, y a, z a. 
Mathematica tries the case x a, y b, z b first, and so uses this match. 

However, I think their code is very optimised. I intend to implement the pattern matcher specifically for Rubi integration which can be more efficient than a general pattern matcher, I have the following ideas in mind:

partition() function

Time complexity of matcher can rapidly increase with length of expression. Partitioning of Algebraic Subexpressions can help increase efficiency of matcher.

Implement a partition() function which can partition an expression(node) into a dict of subexpressions(leaves of the node) which contains integration variable and which doesn’t:

>>> partition(a*sqrt(20)*cos(x)*sin(x)*(exp(x) + 10), x)
{'Trig': [cos(x), sin(x)], 'Add': [exp(x) + 10], 'constant' : [a*sqrt(20)]}
>>> partition(10 + sin(x)*3 + log(x), x)
{'Mul': [sin(x) * 3], 'Log': [log(x)], 'constant' : [10]}


match() function tries to match leaves of pattern to leaves of expr having same head(Mul, Trig><code>, etc.). <code>match() terminates when the integration variable(x) of pattern and expr matches as symbols. If a match is not found, it uses backtracking to match another leaf of the node. Note: This pattern matcher can only match linear expressions i.e. the expressions in which no variable(except integration variable) repeats twice. a + b*x is acceptable but a + a*x is not.


pattern = a + b*cos(x)
expr = c + d*cos(x)

partition(pattern) = {'Mul' : [b*cos(x)], 'constant' : [a]}
partition(expr) = {'Mul' : [d*cos(x)], 'constant' : [c]}

for key in partition(pattern).keys(): # Match leaves having same head except for ‘constant’ terms
	match(partition(pattern)[key], partition(expr)[key])

Basic structure of pattern matcher

A Pattern object is used to store each node in the expression tree. These objects are used while matching a expression pattern. Structure can be defined as follows:

def Pattern_create(expr, var):
    # Identifies the type of expression and creates appropriate pattern object.
    if expr.is_atom():
        return AtomPattern(expr, var)
        return ExpressionPattern(expr, var)

class Pattern(Basic):
    # Base class for all pattern objects
    create = staticmethod(Pattern_create)

class ExpressionPattern(Pattern):
    def __init__(self, pattern, var): # `var` is variable of integration
        self.head = Pattern.create(pattern)
        self.leaves = [partition(pattern, var)]
        self.expr = pattern

Each node in the expression tree is stored as a Pattern object.

Match object will do a structural matching of expression and pattern by traversing down the leaves of each node until it finds a match. If a match is not possible, it uses backtracking to traverse another leaf having same head(Add, Mul, etc.).

class Match(ExpressionPattern):
    # Class of Match object.

    def __init__(self, pattern, transform, wild=None, wildseq=None, var)
    # Creates pattern object of `Pattern`.

    def match(self, expr):
    # returns transformed expression if `pattern` and `expr` are matched.
    # uses `match_wildnodes()` and `wildsequence()` to match the pattern using backtracking.

    def match_wildnodes(self, pattern_leaf, expr_leaf):
    # match for wildcard happens here
    # compares `pattern_leaf` of pattern with `expr_leaf` leaf to determine a match.
    # The function traverses the expression tree of `pattern_leaf ` to match wildnodes in `expr`

    def match_wildseq(self, pattern_leaf, expr_leaf):
    # match for wildsequence happens here
    # compares `pattern_leaf` of pattern to match nodes or sequence of nodes in a expression.
    # The function traverses the expression tree of `pattern_leaf ` to find match in `expr`

manualintegrate and Rubi

It is important to implement Algebraic module first as many rules in other modules has dependency to integration of Algebraic functions(while using substitution techniques). I plan to implement rules in the order stated in the Rubi website:

  1. Algebraic
  2. Exponential
  3. Logarithmic
  4. Trigonometric
  5. Inverse Trigonometric
  6. Hyperbolic
  7. Inverse Hyperbolic
  8. Special Functions
  9. Miscellaneous
def rubi_integrator(func, var, showRules=False, exprtype=None, hints=None)

I have written a gist here which uses most of manualintegrate() code to implement a subset of Rubi integration rules. We first identify the type of expression(Algebraic, Exponential, etc.) and decide which subset of rules apply for that expression. Pseudo code for classification:

def integral_steps(integrand, symbol, **options):
    integral = IntegralInfo(integrand, symbol)
    def key(integral):
        Function which categorises integrand into Algebraic, Exponential, etc.
        integrand = integral.integrand
        elif integrand.has(sympy.sin):
            return sympy.functions.elementary.trigonometric.sin
        elif integrand.has(sympy.cos):
            return sympy.functions.elementary.trigonometric.cos
    result = do_one(
        null_safe(switch(key, {
            sympy.Add: add_rule,
            sympy.sin: sin_rule,
            sympy.cos: cos_rule,
            sympy.Number: constant_rule
    return result

Sometimes we already know the type of expression we are dealing with, in that case, we can pass a exprType argument to the function.

Additional Functionality:

  • showRules argument can be passed to the function rubi_integrate to show the rule used to evaluate the expression.
  • showSteps argument can be passed to the function rubi_integrate to show intermediate steps in the evaluation of the integration.
key() classifies the type of the function and returns the class of the expression. Using do_one()we find the rule which applies to the integrand. Example of decision tree for sin_rule:
def sin_rule(integral):
    integrand, symbol = integral
    base, exp = integrand.as_base_exp()
    if integrand.match(c*sympy.sin(a+b*x)**n):
        cn = integrand.match(c*sympy.sin(a+b*x)**n)
        if eq(cn[n],0) or eq(cn[b],0) or eq(cn[c],0):
            return SineRule1(integrand, symbol)
            if ge(cn[n],1):
                if eq(cn[n],1):
                    return SineRule2(cn[a],cn[b],cn[c], integrand, symbol)

Sample output of integral_steps():

>>> integral_steps((2*sympy.sin(3 + 0*x))**1,x)
SineRule1(context=2*sin(3), symbol=x)

Implementation of Rules

We can implement rules directly in the decision tree itself. Example:

def rubi_linear_rational_integrator(a,b,m,x):
    if eq(a,0):
        if eq(m,-1):
            return log(x)/b
            return x**(m+1)/(m+1)*(b**(m))

Or, define each rule as individual function:

def rubi_linear_rational_integrator(a,b,m,x):
    if eq(a,0):
        if eq(m,-1):
            return rational_linear1(b, integrand, x)
            return rational_linear2(b, m, integrand, x)

def rational_linear1(a, integrand, symbol):
    Integral((a*x)**(-1), x) -> log(x)/a
    return log(x/a)

def eval_sine2(a, m, integrand, symbol):
    Integral((a*x)**m, x) -> x**(m+1)/(m+1)*(a**(m))
    return x**(m+1)/(m+1)*(a**(m))
Defining each rule as a function is a better approach as it allow us to do two additional things:
  • Check consistency of rules after they are implemented.
  • Writing documentation for each rule after function definition.
Consistency Check of Rules

My goal will be to minimize the number of rules, by making them mutually exclusive while at the same time maximizing their generality.

Defining each rules as separate function allows us to check the rule applied for a particular integrand(For complete sample code please refer this gist):

assert integral_steps((2*sympy.sin(3 + 2*x))**1,x).__class__ == SineRule2
assert integral_steps((2*sympy.sin(3 + 0*x))**1,x).__class__ == SineRule1

I will develop tests which checks:

  • At most one of the rules can be applicable to any given expression
  • Conditions on a properly defined rule must restrict its application to the domain over which it is valid.
  • Applications of rules must eventually result in an expression that can be made no simpler (i.e. an expression to which no rules applies)

Decision Tree Generation

DownValues in Mathematica allows us to arrange the rules in order of their complexity. By using FullForm[DownValues[ Int ]], we can generate AST syntax of rules. Francesco has already converted Rubi rules written in Mathematica into AST syntax. He has also written a parser which converts them to Python syntax.

Example the following Mathematica code:

If[EqQ[n,0] || EqQ[b,0] || EqQ[d,0],
  a*x + IntSin[c,d,b,n,x]

converts to:

if (Or(EqQ(n, 0), EqQ(b, 0), EqQ(d, 0))):
    Mul(Pow(Add(a, Mul(b, Sin(Add(c, Mul(d, x))))), n), x)
    if (EqQ(a, 0)):
        IntSin(c, d, b, n, x)
        if (EqQ(n, 1)):
            Add(Mul(a, x), IntSin(c, d, b, n, x))a

I will modify Francesco’s program to generate decision tree from AST syntax of Mathematica rules. I have written a gist which generated decision tree from Mathematica syntax.

Pseudo code for decision tree generator:

def decision_tree_generator(ast_string):
    rules = parse_AST_rules_code()# get individual rules along with their pattern, 
        # condition and transformed expression.
    construct_tree(rules) # constructs decision tree based on the rules.
    # construct_tree() creates a tree from the conditions of each rule where each node is a 
    # condition and the leaves are transformed expression. By doing a inorder traversal of the tree, 
    # we can generate decision tree.


My vacations will start from 5th May. I will use community bonding period to discuss more about implementation of pattern matcher with my mentors. I will start working as soon as my vacations start and implement basic framework of Match object in SymPy.

Community bonding period

  • Create basic Pattern class and implement partition() function.
  • Use Mathics to convert Mathematica utility functions and Rubi tests to SymPy syntax.
Week 1 (30 May - 5 June)
  • Implement ExpressionPattern and start work on Match object.
  • Submit PR for partition() function with tests.
Week 2, 3 and 4(6 - 26 June)
  • Implement match() function which matches wildnodes and wildcards.
  • Implement match_wildnodes() and match_wildseq()
  • Submit PR for Match with tests.
Week 5 and 6 (27 June - 10 July)
  • Improve Francesco’s parser to generate decision tree from Rubi rules compiled as AST (generated by Mathematica).
  • Generate decision tree for all rules.
  • Submit PR for paser.
Week 7 and 8 (11 July - 24 July)
  • Implement Algebraic and Logarithmic rules in manualintegate.
  • Measure speed of integration w.r.t. normal SymPy integration.
  • Submit PR for Algebraic and Logarithmic rules.
Week 9 (25 July - 31 July)
  • Implement Rubi rules of remaining expression types(similar to implementation of Algebraic and Logarithmic rules).
  • Add tests and submit PR for remaining rules.
Week 10 (1 August - 6 August)
  • Add tests for consistency check of rules.
Week 11 (7 August - 14 August )
  • Measure speed of Rubi Module.
  • Write Sphinx and generate LaTeX pdf documentation of rules implemented.
Week 12 (15 August - 21 August)
  • Buffer period.
  • Fix bugs and complete documentation of all rules.

How do I fit in?

I have been working with SymPy for around a year now. I have been reading about pattern matching algorithms for mathematical expressions through books(such as 'The Mathematica Book' which discusses working of pattern matcher in Mathematica) and research papers to come up with better ideas on implementation of pattern matcher. I am comfortable in using Mathics(open source clone of Mathematica) and have been reading their code to understand how their pattern matching works. I also asked few questions about its source code on their mailing list.

I have worked out my ideas for subset of rules and implemented them in gists(implementation, decision tree) to have better understanding of implementation. This helped me to understand the algorithm used by manualintegrate and how to add Rubi to it.


I don't have any commitments during summer and I am going to work 50-60 hours per week during Summers. My college restarts in August hence I have planned the project timeline such that important work gets completed before my college restarts. I will commit myself to complete most of my work before August. I will not have major exams till mid September and thus will still be able to contribute around 30 hours in a week during college.

The work done in this project will form a platform for future Rubi based projects such as solvers, matrix algebra, etc. I wish to be closely involved with development of those ideas and fixing any issues and bugs found in this module. I believe that this project will be a huge boost for integration module as well as other Rubi based projects which are currently under development. I have developed this proposal in consultation with my mentors. I have had discussions with Ondřej Čertík, Francesco Bonazzi and Aaron Meurer.

Relevant Discussions and References

  1. Ondřej Čertík's PR on Rubi #8036
  2. Github issues on Rubi integrator #12233, #7749
  3. Rubi Website
  4. Francesco Bonazzi's Google Group discussion on Pattern Matching
  5. James Crist's discussion on Pattern Matcher
  6. Francesco Bonazzi's PR on Sort key for rule pattern dispatch
  7. Proposal for a new pattern matching by Francesco Bonazzi's
  8. A Knowledge Repository for Indefinite Integration Based on Transformation Rules by D.J. Jeffrey and A.D. Rich
  9. Searching Techniques for Intergral Tables by Richard J Fateman
  10. Pattern Matching in Mathematica - Wikipedia
  11. Rules and Patterns in Mathematica
  12. Partitioning of algebraic subexpressions in computer algebra systems: an alternative to matching with an application to symbolic integration by Richard Fateman
  13. Complexity of matching problems by Dan Benanav
  14. Validating Rule-based Algorithms by László Lengyel
  15. Mathics
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.