Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow external dependencies #7339

Closed
mrocklin opened this issue Mar 26, 2014 · 46 comments
Closed

Allow external dependencies #7339

mrocklin opened this issue Mar 26, 2014 · 46 comments
Labels
external Uncategorised Doesn't fit other labels...

Comments

@mrocklin
Copy link
Member

Currently SymPy depends only on the standard library; it does not allow dependencies on external code. Should we stop this tradition and allow external dependencies?

Reasons against dependencies

  • They complicate installation
  • Versioning can get confusing
  • We'll have to coordinate API changes and tests and so forth
  • Each version of SymPy needs to specify which versions of external libraries it works with and which versions it doesn't
  • git bisect stops working, because each commit in general depends on a different external library version --- so one would need to quickly install proper external dependencies for each commit (that is hard to do currently)
  • Bundling external libraries with SymPy isn't actually all that bad
  • Things that are split out might get less visibility, due to no longer being in a popular package (sympy).
  • The community might get fragmented.
  • Code that no one supports is more likely to bitrott if it is split out. At least unused code in SymPy has its tests run all the time.

Reasons for dependencies

  • Installation and versioning worries can be offloaded onto widespread package managers
  • There exists useful external libraries that we would like to use
  • SymPy developers actually make a lot of good, general code. Unfortunately this code doesn't see external use and so, when that developer leaves, that code becomes stale. By breaking off work to external projects the ecosystem is more likely to pick it up and maintain that code, offloading considerable long-term work from the SymPy community.
  • We will reinvent fewer wheels

Possible requirements for hard (required) external dependency

This is a list of possible requirements, we should select some of these.

  • Supports what SymPy supports - Pure Python, 2.6+, 3.2+, PyPy
  • Does not cause conflicting license issues (this basically means it needs to be BSD-style)
  • Easy to install
  • Supported and responsive to reasonable feature requests / bug reports. Likely to be supported well into the future.

Some of the virtues above are subjective.

Here are some other nice things that tend to imply these virtues

  • Widely used libraries are broadly installed and have good community support. These can be either
    • Actively developed projects with a community that would answer questions about installation/usage problems and fix bugs
    • Legacy code, that doesn't need changes, but is available in all major distributions

Requirements for soft (optional) external dependency

  • Either
    • Core functionality does not depend on dependency
    • Some supported fallback exists
  • ...

Hard Dependencies Already Found Bundled In SymPy

  • mpmath

Soft Dependencies Used in SymPy

  • gmpy
  • numpy
  • matplotlib
  • scipy
  • cython

Useful External Dependencies

  • unittesting frameworks (this is a little less of an issue; it's not a runtime dependency)
  • multipledispatch
  • SAT solvers
  • CSymPy, other fast symbolic libraries

SymPy Modules that could gain more exposure if separated

  • polys?
  • logic?
@projetmbc
Copy link

Hello.

Maybe a first step would be to see which external dependencies could be useful ?

@toolforger
Copy link
Contributor

Same requirement for soft external dependencies.
They are a smaller user base, but that doesn't affect the issues.

Not sure about the issue with git bisect. I can't believe we're the first to face the problem, and it's been around for long enough so a good solution should have been developed.

On useful external dependencies:
Unit testing. Our current testing framework is a fork of a really old revision of I-forgot-its-name. Over the years, we amended and extended the fork, and today, it is a formidable, hard-to-maintain mess.

@certik
Copy link
Member

certik commented Mar 26, 2014

@toolforger -- one solution for git bisect is to only use such external dependencies, whose API doesn't change and whose bugs don't influence working of SymPy.

@mrocklin
Copy link
Member Author

I don't think that the multipledispatch project will ever be widely adopted. Most people don't need multiple dispatch; its a niche problem.

@mrocklin
Copy link
Member Author

At the same time I'd like to see multipledispatch used in SymPy but also think that forking off multipledispatch and sucking it into the SymPy codebase is a bad idea.

@certik
Copy link
Member

certik commented Mar 26, 2014

So you should modify the requirements draft so that multidispatch makes it.

Sent from my mobile phone.
On Mar 26, 2014 4:43 PM, "Matthew Rocklin" notifications@github.com wrote:

At the same time I'd like to see multipledispatch used in SymPy but also
think that forking off multipledispatch and sucking it into the SymPy
codebase is a bad idea.


Reply to this email directly or view it on GitHubhttps://github.com//issues/7339#issuecomment-38749830
.

@asmeurer
Copy link
Member

You didn't list what to me is the main argument against dependencies, which is bitrott. Based on your last bullet point for for dependencies, I think we believe in opposite things. I think that moving unused code away is the best way for it to die. At least code that is in SymPy has tests run against it, and has visibility.

I also want to avoid fragmentation of the community.

Also, you missed a pretty important soft dependency: gmpy.

@asmeurer
Copy link
Member

I would put an extra stipulation that hard dependencies have a liberal license (BSD-style).

@asmeurer
Copy link
Member

I disagree that you won't be able to build a community about multipledispatch. If you really believe that, then to me that's a strong argument that it shouldn't be a dependency. If Matthew Rocklin gets hit by a bus and multipledispatch ceases to exist, then we have a mess on our hands. But if Matthew Rocklin gets hit by a bus and multipledispatch lives on, the mess is less so.

@asmeurer
Copy link
Member

Oh, just realized that you want people to just edit the top post. I'll do that.

@asmeurer
Copy link
Member

So I've edited the top post with those issues.

@asmeurer
Copy link
Member

I think we are maybe conflating a few issues here into one, which will make them harder to resolve. I think the following are not equal:

  • Requiring an external dependency only for development. I think this is fine. We have a ton of dependencies to build the docs for example. Unittesting frameworks fall into this category as well (and btw, py.test can be used to run our current test suite).
  • Splitting off an existing part of SymPy into its own project on GitHub, still in the sympy user.
  • Splitting off an existing part of SymPy into someone else's namespace.
  • Unbundling mpmath (this is similar to the previous point, but it's big enough to be its own issue I think)
  • Taking an existing project that someone else in the community has written and maintained and making it a dependency of SymPy (like CSymPy or multipledispatch).
  • Taking an existing project that someone outside the community has written and making it a dependency of SymPy (I don't know, maybe making a fast C SAT solver a requirement, or some nice parsing library).

That also doesn't take into account how core they would be to the functionality of SymPy or how "optional" the dependencies would be (for instance, gmpy is a lot more optional than numpy: everything that can be done with gmpy installed can be done without it, only slower). Really, each case has to be looked at individually.

@certik
Copy link
Member

certik commented Mar 27, 2014

I strongly agree with everything that Aaron said. Very good points about multipledispatch.

I think that separating a project from sympy, let's say "polys" gets less exposure, not more. (commenting to the last point in the issue description).

@mrocklin
Copy link
Member Author

At least code that is in SymPy has tests run against it, and has visibility

A project being outside of the codebase doesn't stop us from being able to test against it.

I also want to avoid fragmentation of the community.

We are fragmented anyway; this isn't necessarily a bad thing. It's impossible to keep a group of this size entirely cohesive. I don't think that I've ever touched the physics code. I think that the PyDy model has been super-successful. They develop at their own (generally faster) pace but keep in touch with and contribute back to the core.

Regarding multipledispatch, me, and busses,

I'm going to claim that multipledispatch is simple enough, of small enough scope, and has sufficient documentation, so that a moderately experienced developer can pick it up without much trouble in the case of my unexpected demise. Small projects with limited scope are more robust to large automobiles.

I think that separating a project from sympy, let's say "polys" gets less exposure, not more. (commenting to the last point in the issue description).

I think that polys currently has no exposure outside of sympy. Also this might not be the best example. Honestly what came to mind for me was my old work on unify and strategies, both of which I have split off already. Maybe there are better examples here?

@certik
Copy link
Member

certik commented Mar 27, 2014

I guess it boils down to how to best manage our work. Things are different for depenencies like Python, gmpy, that are managed by different people and communities. But things like mpmath, pydy, multipledispatch are managed by people from the same community really (i.e. they contribute to sympy as well as other projects). Github makes maintaining and contributing to multiple projects very simple (and if anything, this will only become simpler in the future). In fact, I think it's a good workflow to develop an idea outside of sympy, get contributors to it, get it up and running and then start talking how to best use it with sympy, as you have done with multipledispatch, or as I am doing with CSymPy, or as PyDy guys are doing with it.

For multipledispatch, if I was doing it, I would maintain it as a separate project, tried as many other projects to use it as possible, and simply copied it to sympy to be used within it. Two options can happen:

  • either it will be picked up by several projects. Then the best is to maintain it outside (and either make sympy depend on it, or copy it to sympy)
  • it will only be used by sympy. Then, eventually once the API becomes stable, probably just copy it to SymPy, instead of making SymPy depending on it (I would still also keep it separate in case some other project picks it up)

Anyway, that's just what I would do, you might want to do things differently.

(Note: mpmath is the same thing --- it's developed outside, but copied in sympy for now. Last time I checked sympy was the only project depending on it, so it belongs to the second option. If lots of other projects in Python start depending on it, it will become the first option.)

@rlamy
Copy link
Member

rlamy commented Mar 27, 2014

The general issue of dependencies is a large part of why I lost interest in contributing to SymPy.

FWIW, I agree with everything @mrocklin mentioned. I'll just add that in practice SymPy is very far from being dependency-free. The minimal setup to do useful things with SymPy includes IPython, numpy and possibly matplotlib. The Debian package pulls in even more stuff: the last time I sudo apt-get installed python-sympy on a new system, it installed about 500 MiB of random stuff.

Anyway, I hope that multipledispatch won't disappear inside the bowels of SymPy, because I have a few potential uses in mind for it, e.g. rpython is in need of a good dispatch solution.

@skirpichev
Copy link
Contributor

For multipledispatch, if I was doing it, I would maintain it as a separate project, tried as many other projects to use it as possible, and simply copied it to sympy to be used within it.

Why ever forking again every external project sympy depend on? AFAIR - the only sane reason from mpmath-related discussions was mythical "hard to install". It's not hard, really.

Last time I checked sympy was the only project depending on it, so it belongs to the second option.

According to debian's popcon: there is ~300 mpmath installations vs ~1300 of python-sympy (which still uses bundled copy of mpmath).

@certik
Copy link
Member

certik commented Mar 27, 2014

@rlamy, @skirpichev --- just so you know, I am not "dead-set" against dependencies (for example in CSymPy we use dependencies instead of reimplementing things on our own). But I do think there are pros and cons, as written up above in the issue description, so it is not a black and white decision.

Why don't you help us clearly spell out the conditions in "Requirements for hard (required) external dependency"? Because clearly, as it is written now, mpmath or multipledispatch does not satisfy it (e.g. mpmath doesn't pass the first point "widely used library", as @skirpichev pointed out, it's used within sympy, other usage is minor, and there are no packages in Debian that depend on it, besides sympy would if we split it). So from your comments and our past discussion, I think you must not agree with this point --- either you don't agree that the requirement "widely used" should be here, or you don't agree that "mpmath does not pass this requirement". Would you agree?

And so I think we need to spell this point out more clearly and clarify whether or not the listed packages pass this point.

I am really open about this --- let's have a frank discussion about these requirements. I offered above what I think are good requirements, let's call them A. Since you disagree with them, please write up requirements which you think are better for sympy, let's call them B. And let's discuss req. A and req. B.

@skirpichev
Copy link
Contributor

On Thu, Mar 27, 2014 at 06:13:57AM -0700, Ondřej Čertík wrote:

Because clearly, as it is
written now, mpmath or multipledispatch does not satisfy it (e.g. mpmath
doesn't pass the first point "widely used library", as [3]@skirpichev
pointed out, it's used within sympy, other usage is minor

I don't think that my data can prove this but not reverse. The
difference is not too huge, not even an order of magnitude.

either you don't agree that the requirement
"widely used" should be here, or you don't agree that "mpmath does not
pass this requirement".

First. I think it's a very minor issue (through, I'm not sure that mpmath
doesn't pass the requirement for "widely used library").

In my view, requirements for hard external dependency should
look like this (in order of importance):

  1. the library should be usable without sympy, e.g. it
    doesn't have hard dependencies on it
  2. it's supported
  3. Supports what SymPy supports - Pure Python, 2.6+, 3.2+, PyPy
  4. Has a compatible license (I like GPL, we shouldn't ban such a projects ;))

@mrocklin
Copy link
Member Author

I've edited the hard requirements section to show off what I think of as the virtues we're looking for, e.g. "no license issues" rather than specific hard requirements e.g. "BSD style license." Please review and edit.

I like @skirpichev s note that the project should be able to stand on it's own. This would be specifically important for projects that we wanted to pull out of SymPy.

@certik
Copy link
Member

certik commented Mar 27, 2014

@skirpichev --- check out the issue description, @mrocklin updated it, does it reflect the way you see it?

For reference, this is what's in there now:

### Possible requirements for hard (required) external dependency

This is a list of possible requirements, we should select some of these.

* Supports what SymPy supports - Pure Python, 2.6+, 3.2+, PyPy
* Does not cause conflicting license issues
* Easy to install
* Supported and responsive to reasonable feature requests / bug reports

Some of the virtues above are subjective.

Here are some other nice things that tend to imply these virtues

* Widely used libraries are broadly installed and have good community support.  These can be either
    * Actively developed projects with a community that would answer questions about installation/usage problems and fix bugs
    * Legacy code, that doesn't need changes, but is available in all major distributions

@skirpichev
Copy link
Contributor

does it reflect the way you see it?

I think so. Probably, my first item in the list is self evident, so it's ok to omit it.

@certik
Copy link
Member

certik commented Mar 27, 2014

Related to this discussion is this document: http://web.ornl.gov/~8vt/TribitsLifecycleModel_eScience_2012.pdf, see the chapter "V. SELF SUSTAINING OPEN SOURCE SOFTWARE: DEFINED", which talks about dependencies too, e.g.:
"Minimal controlled internal and external dependencies
The software has well structured internal dependencies and minimal external upstream software dependencies and those dependencies are carefully managed."

@rlamy
Copy link
Member

rlamy commented Mar 27, 2014

@certik If you read the rest of the chapter, you'll see that the authors don't support your position, cf. "For example, a given downstream customer may only fundamentally need a few classes but if the software has entangling dependencies within itself, the customer may be forced to port hundreds of thousands of lines of code just to get functionality that should be contained in a few thousand lines of code." or "the goal is not to have zero dependencies".

@certik
Copy link
Member

certik commented Mar 27, 2014

@rlamy I agree.

@mrocklin
Copy link
Member Author

I like the discussion that has happened so far, we've raised and discussed a number of issues.

I think that this discussion would be better focused with a particular action that we could consider, alter, and decide on. To that end I propose the following:

  1. We change SymPy to depend on multipledispatch as an external dependency
  2. All documents that state that SymPy doesn't depend on external dependencies be updated
  3. We add the above list of requirements to our documentation (I'll hunt around for a good place)

I'm happy to do this work and submit a pull request.

Disclaimer, this is a bit self serving, as I'm also trying to gain a bit of exposure for multipledispatch. Other test dependencies welcome.

I don't think that discussion is over, I just want to up the ante a little by proposing something concrete.

@mrocklin
Copy link
Member Author

Here is a use case of multiple dispatch to clean up set simplification

#2979

From my perspective this PR is ready to go if we accept dependencies.

@certik
Copy link
Member

certik commented Mar 29, 2014

One thing that I am worried about the new multipledispatch library is if its API is already stable? It's a 2 months old library. If we release sympy 0.7.6 that depends on version A of multipledispatch, but then we need to change the API a bit and release multipledispatch B, and sympy 0.7.7 doesn't work with version A and sympy 0.7.6 doesn't work with version B, then we create lots of trouble for our users.

@mrocklin
Copy link
Member Author

Yup, that's a valid concern. On the flip side once multipledispatch has users it's much harder for me to experiment and change things.

However, in this particular case there are a couple of things we can do.

  1. Multipledispatch supports a few interfaces, one of which is the interface used by singledispatch in Python 3.4 functools. It's safe to say that this interface is pretty stable
  2. Because I happen to develop both projects I can probably manage any changes simultaneously in the two codebases for the first few months

Finally, multipledispatch has a very small scope. I consider it to be fairly complete now. I mostly expect only performance tweaks and some Python 3 sugar in the future.

@certik
Copy link
Member

certik commented Mar 29, 2014

Very cool. If we do decide to use it as a dependency, I at least want to spend few days playing with it (I haven't had a chance yet, besides sending some trivial PRs). This is a huge change, so I want to make sure we don't screw up.

@mrocklin
Copy link
Member Author

Right, I should also add the disclaimer that multipledispatch isn't battle tested. SymPy would be the first major project. It does do quite well in my logic programming system though which is complex enough to raise issues.

@ellisonbg
Copy link
Member

I am fine with external deps in general, but we should consider each of them separately.

@certik
Copy link
Member

certik commented Mar 30, 2014

We talked about it a bit on G+ with 8 GSoC mentors. I am ok with external
deps (I think all of us were). But each case should be debated separately.

Sent from my mobile phone.
On Mar 29, 2014 6:54 PM, "Brian E. Granger" notifications@github.com
wrote:

I am fine with external deps in general, but we should consider each of
them separately.


Reply to this email directly or view it on GitHubhttps://github.com//issues/7339#issuecomment-39014247
.

@asmeurer
Copy link
Member

@rlamy so if we change our policy on this will you start contributing again?

@asmeurer
Copy link
Member

I'd like to keep the BSD-style restriction, at least for hard dependencies. I don't want to prevent SymPy from being usable by people who cannot use (L)GPL code. Plus unlike @skirpichev I absolutely hate the GPL :)

@asmeurer
Copy link
Member

Also, more practically, having an (L)GPL dependency would mean having this isolated codebase of code that cannot ever be imported into SymPy, even minor chunks of copy-paste (this is one reason I hate the GPL btw).

@asmeurer
Copy link
Member

I also share Ondrej's concern about API stability.

I think in general, a requirement for a hard dependency should be that the project adhere to some of the same standards that SymPy does (or at least tries to :). For example:

  • Keeps a stable API. Deprecate API breaks.
  • Maintains a healthy, welcoming community (ideally with a large intersection with our own, at least once we start using it).
  • The code is all on GitHub.
  • BSD-style license.
  • Is easy for us to push a release for, in case there are some breaks, we can easily make sure that upstreamed changes are put into a full release.

At the end of the day, it should make little difference to me if I am contributing to core SymPy or to a dependency.

So far, the best way to do this has been to keep code in SymPy itself. I think it's not impossible to have it otherwise, though, as numerous other projects have shown.

I think it goes back to my comment above of different kind of dependencies. @certik @mrocklin @rlamy @skirpichev etc., what are you opinions on each of the six bullet points from that comment?

@toolforger
Copy link
Contributor

You can't copy&paste GPL'd code, but you can inspect it and write it "in your own words".
It's not THAT hard actually.

@certik
Copy link
Member

certik commented Mar 31, 2014

(Actually, most people believe you can't inspect it, as that would also be considered "derivative work". The only way is that one person inspects and writes a specification, and another person implements this specification without looking at the GPL code.)

@toolforger
Copy link
Contributor

Those people are mistaken. Just by looking at the code you do MOST DEFINITELY NOT create a derivative work.
Besides, I highly doubt that "most people" believe that. Of the people I'm talking with, not a single one would subscribe to that belief; in fact this is the first time I ever hear that anybody believes this.

The "clean-room approach" exists for another reason: To make it 100% provable that no copying ever could have happened, not even subconsciously.

@toolforger
Copy link
Contributor

About the bullet points: Is there a way to have different versions of a library coexist in SymPy?
If no, then Python's library infrastructure is fundamentally broken - not that we can do much about it, but we'd need to decide:
a) either we allow users to combine SymPy with arbitrary external libraries, then we can't use hard dependencies for runtime at all,
b) or we do not allow that, then hard runtime dependencies are not an issue.

I'm a bit uneasy about using a different policy for test.
There are people who do both development and usage work; a different policy would mean they'd need to set up separate Python environments, one with the dependencies and one without.
Also, there might be people who want to run the test suite as part of the validation-before-publication routine.
I'm not sure if either use case is really a showstopper for having different dependencies.

In general, I do not think that external dependencies are a problem.
I'd simply weigh the advantages (less code to write and maintain ourselves) against the disadvantages (need to keep up with an external, possibly-changing API).
I.e. I'd be all for using pytest; it's widely used so they won't change the API in incompatible ways, and it should be good enough to cover our needs today (plus our own testing framework has become quite hard to maintain - it's a bit of a mess, I once tried a bit of refactoring and gave up).
I'd be less enthusiastic about a library that's used just by a handful of projects, and/or under heavy development.

Enough rambling :-)

@certik
Copy link
Member

certik commented Apr 9, 2014

Here is a good discussion about using mpmath as an external dependency, starting with this comment:

#7393 (comment)

@toolforger
Copy link
Contributor

It's essentially another potential problem: Is upstream cooperative enough with bug fixing that it's worth it using their code?
The other issue is how to reliably detect and compare library versions.

@stevenleeS0ht
Copy link
Contributor

stevenleeS0ht commented Jan 7, 2020

@certik , @mrocklin , @smichr , @asmeurer In my point of view, if the 3rd part dependency is mature, robust, approval, api-stable and pure python. It would be nice to installed as external dependency.

if (target dependency is not mature and still in alpha-stage):
    if (there is qualified alternative):
        use alternative.
    else:
        if (the author of dependency also have interests in Sympy):
            communicate with him and discuss for the future development.

        else:  # not for productive purpose, just for demonstration and education.
            make a fork. but refactor the majority of code.
            highly opimise and tweak it for Sympy. Remove useless features.
            
            if (the dependency is small):
                copy it as submodule of Sympy
            else:
                 put it under Sympy organization, and maintain it separately.

What is your opinion?

@stevenleeS0ht
Copy link
Contributor

@certik , @mrocklin , @smichr , @asmeurer , Currently, git submodule is not occurred in Sympy development. I also don't want it occurs in the future.

Python language provide the mechanism in nature for dependency management.

And git submodule requires a lot of extra operation and bring unnecessary complexity to project. It might be a option for C/C++ project. But never a option for modern language project.

@certik
Copy link
Member

certik commented Jan 7, 2020

@stevenleeS0ht SymPy now uses many optional dependencies (NumPy, SciPy, LLVM, ...). So I think this issue is now fixed.

@certik certik closed this as completed Jan 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external Uncategorised Doesn't fit other labels...
Projects
None yet
Development

No branches or pull requests

10 participants