Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How fast will my open source code break? #1

Open
moorepants opened this issue Sep 18, 2018 · 2 comments
Open

How fast will my open source code break? #1

moorepants opened this issue Sep 18, 2018 · 2 comments
Labels

Comments

@moorepants
Copy link
Contributor

moorepants commented Sep 18, 2018

One of my biggest complaints about open source software is the fact that APIs do not remain stable. If I create a research paper using a software stack, publish, don't maintain it, and then come back ~1 year later it seems to take a day or more to update the software such that it can function with the updated dependencies. One year isn't that long of a time in a research world. This isn't good for reproducibility and I don't think we should have to shop a VM with a paper that freezes the entire stack. I've also noticed that my Matlab code that is 10+ years old tends to run just fine on new version, leading me to believe that Mathworks takes this much more seriously.

I'm interested in characterizing:

  • how quickly changes in downstream dependencies break scientific software
  • the ranking of stability in API for core software packages
  • comparing the API stability culture among languages, e.g. Python and R
  • how deep in the stack do you have to go to get stable APIs (for example the Linux kernel API is probably rock solid stable)

Hypothesis: On average a given script or software package that relies on a high level scientific computing software stack will break within a year due to unstable dependency APIs.

Prior art

Haven't found anything much yet.

Methods

Here is an idea for a method to do this:

  1. Download a package or script at the top of (or near top of) the stack and log its release date
  2. Install the dependencies specified at the time of release and ensure the software runs
  3. Increment the dependency versions in chronological order and test if the script/package still runs at every increment. You can detect whether is runs or not and also whether deprecation warnings are emitted. If a single dependency fails, you can then fix it at the last working version and then continue to increment the other until you get to the script's release date or all dependencies fail.
  4. Record the dates that your software gets deprecation warnings and fails.

Another method:

Track a code bases through git commits and somehow measure the frequency and time of depredations and removals.

We will have to find a reliable way to get old dependencies installed. This is often quite a painful process to simply get things installed as they were from some point in the past.

Another thought:

We could check how many tests of a prior version raise errors or deprecation warnings.

@moorepants
Copy link
Contributor Author

I added this project idea here: https://mechmotum.github.io/jobs/msc/how-fast-will-open-source-break.html.

@moorepants
Copy link
Contributor Author

A static analysis tool to identify deprecated Python code: https://github.com/QuantStack/memestra. Could be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant