Skip to content
This repository has been archived by the owner on Oct 8, 2021. It is now read-only.

Let's get rid of dependencies! (was: Dependency Priorities) #278

Closed
jpfairbanks opened this issue Jan 17, 2016 · 33 comments
Closed

Let's get rid of dependencies! (was: Dependency Priorities) #278

jpfairbanks opened this issue Jan 17, 2016 · 33 comments
Labels
discussion requires in-depth discussion - participation encouraged

Comments

@jpfairbanks
Copy link
Contributor

@sbromberger why do you think we should be eliminating dependencies?

@sbromberger
Copy link
Owner

Couple of interrelated reasons.

My goal for LightGraphs is to become the de facto standard for graph modeling in Julia. This means that it should strive to be as dependency-free as possible since we're not sure how or whether the other packages will be maintained over time. I also don't want someone who needs LightGraphs to have to install multiple other packages from multiple other parties except where it's not feasible to provide core functionality without them, and I don't want to have to keep tweaking LG code to keep up with external API changes from other packages.

The trend right now is to add package dependencies for single functions that are well outside core code. I was under the impression that we had agreed to move things like that outside of LG proper into their own packages. I'm still ok with that; in fact, I'm giving serious thought as to what makes a core graph modeling package, and the conclusion I'm coming to is that LG as it stands now is itself too complex.

We have several explicit dependencies, and a couple of optional ones. Until we can have a discussion on refactoring, I don't think that adding a dependency on a large package just for a single peripheral function is worth it.

@jpfairbanks
Copy link
Contributor Author

  1. Are we only using Clustering.jl for kmeans in community/detection.jl?
  2. If we don't use a DisjointSets implementation (like that in DataStructures.jl) then we can't have kruskal's.
  3. When you say a "core graph modeling package" do you mean you want to eliminate all of the analysis functions?

@sbromberger
Copy link
Owner

  1. as far as I can tell, yes.
  2. even more reason to consider a separate package.
  3. I don't know yet what would be considered "core". It would have to be a carefully-reasoned, explicit, as-unambiguous-as-possible definition, though, if it has any chance of sticking.

@sbromberger
Copy link
Owner

According to pkg.julialang.org, LG currently depends on 22 packages. This is a problem.

@jpfairbanks
Copy link
Contributor Author

  1. I don't know yet what would be considered "core". It would have to be a carefully-reasoned, explicit, as-unambiguous-as-possible definition, though, if it has any chance of sticking.

There seems to be four categories of functionality:

  1. Core data structures Graph, DiGraph along with functions for creating, accessing, and modifying them.
  2. Classical graph algorithms such as BFS, DFS, A*, Kruskal's Dijkstra's, Floyd-Warshall, flows ...
  3. Network analysis functions: Clustering coefficients, centrality, pagerank, community detection ...
  4. generators, test datasets, small graphs, and persistence.

I think you could have a collection of 1 to 4 packages encapsulating this functionality.
The division that makes sense to me is separating out subset 1 from the rest.
That way anybody could depend on just the data structure and manipulation of it.

Putting categories 1 and 2 together makes sense as I could see a large number of applications that need to do classical graph operations but not network analysis functions. However I think that drawing a line between category 2 and 3 will be a constant source of disagreement.

I am inclined to minimize the number of repositories capturing these 4 function groups. Spreading out the people resources that we have will not be conducive to achieving our goals. I think our best bet to resolve the dependency issue is to carve out a section of the package to be a core with minimal dependencies and then have the rest with a very liberal dependency policy.

@CarloLucibello
Copy link
Contributor

I'm inclined towards having a single and omnicomprensive graph package, now that we factored out "graph datasets". It seems to me more comfortable both for the end users and the mantainers.

What about having DataStructures.jl as an optional dependence for the the Kruskal's algorithm as we did for JuMP and the matching algorithm?

Also, we could rule to have as (optional) dependences only packages actively mantained and with multiple contributors.

@sbromberger
Copy link
Owner

Conditional dependencies aren't working out since there's no official support for them in Julia, and they start erroring when you try to integrate with them (see #267 and the current issue with GraphMatrices and Compat/Base imports).

I could see moving the big sections out: centrality, persistence, flow, ... basically any place we have a subdirectory off of src.

Bottom line: there are 22 dependencies for LightGraphs, and that makes it too fragile, I think.

@sbromberger sbromberger added the discussion requires in-depth discussion - participation encouraged label Jan 18, 2016
@CarloLucibello
Copy link
Contributor

I could see moving the big sections out: centrality, persistence, flow, ... basically any place we have a subdirectory off of src.

Your are suggesting one repo for each subdirectory or just one for all of them?

@sbromberger
Copy link
Owner

One for each.

@CarloLucibello
Copy link
Contributor

Could this be a solution https://github.com/one-more-minute/Requires.jl?

@sbromberger
Copy link
Owner

@CarloLucibello sorry for the delayed response; I had forgotten you had asked the question.

Previous versions of LG used Requires.jl, but it doesn't really do what we need it to do. That is, it "worked", but in a sort of backwards way. I can't recall what the specific issue was, but after talking with the maintainer, it was clear that we were using it in a way that wasn't intended, so I stripped it back out.

git blame, particularly where the JuMP and GraphMatrices code is, should be instructive.

Edit 20160217: Actually, we still use Requires.jl. This is causing some issues, though. Conditional dependencies are still a bit of a kludge.

@jpfairbanks
Copy link
Contributor Author

In order to resolve #277 we need to figure out this issue.

I think there is a trend to remove core functionality from Base and make it a standalone package.

I agree with the idea that this is a good strategy because it decouples the release schedule from the julia core development.

The impact of this trend is that we might need to add to our dependencies just to keep the same functionality that was formerly in Base. This has already happened with Base.Combinatorics #229.

We can enter this into evidence that adding to our dependencies isn't necessarily bad. If we want a function that is already implemented, then we either need to import it or rewrite it.

@sbromberger
Copy link
Owner

I think you're right in general, but there's this Julian issue of cascading dependencies that tends to create cascading fragility. For example, Gadfly requires 16 non-Base packages. Each of those requires dependencies to the extent that if I depend on Gadfly, I have a dependence on 36 packages. This is admittedly an extreme example, but JuMP requires 11.

Where we depend on "leaf" packages there's no real issue. Where LG's dependencies are cascading, then there's a fragility introduced that I'd like to avoid. Right now LG depends on 22 packages, even though we've got only 9 non-base entries in REQUIRE. That's way too many, IMO.

Ideally there'd be some core of LightGraphs that would itself be a leaf package.

@jpfairbanks
Copy link
Contributor Author

So maybe we should count packages that depend on more than k nonbase packages as Major dependencies and focus on reducing those, for some k in 1--5.

This also has implications for ripping out the core data structure and splitting the package into LightGraphs and GraphAnalysis. What are the opinions on that?

@sbromberger
Copy link
Owner

To your first suggestion, ideally we'd reevaluate the dependencies that have large dependencies themselves. In descending order, I'm seeing

LightXML: 7
Clustering: 5
Combinatorics: 3
StatsBase: 3
ParserCombinator: 2
Docile: 1
GZip: 1

LightXML is only used (IIRC) for gexf file formats, and even then we only have the save method, and Clustering appears to only be used for community detection.

@jpfairbanks
Copy link
Contributor Author

There was a feature request for Spectral Clustering in the clustering.jl package JuliaStats/Clustering.jl#19.

We could write a general spectral partitioning alg like the following in Clustering.jl

function spectral_clustering(A::AbstractMatrix, k::Integer; nev::Int=0)
    evals, evecs, convstate = eigs(A; nev= nev==0? k+1: nev)
    clust = kmeans(evecs[:,2:end], zeros(d,k))
    return SpectralClusteringResult(convstate, clust)
end

Then to get the current functionality of our community detection one could write

using LightGraphs
using Clustering
function communities(g::Graph)
   A = nonbacktraing(g)
   result = Clustering.spectral(A)
end

g=Graph(...)
coms = communities(g)

Then neither LG or Clustering depend on each other.

@sbromberger
Copy link
Owner

Just as another datapoint: the recent dependency on JLD cascaded into 5 new packages required for LightGraphs. This is getting crazy.

@jpfairbanks
Copy link
Contributor Author

yeah I was a little surprised you accepted that PR

@jpfairbanks
Copy link
Contributor Author

Maybe we want to spin off the IO stuff into its own module then package which would reduce dependencies for XML, GZIP, JLD. Then the next task would be clustering as I described above. My concern is that having a set of package that all depend on each other we have to start worrying about syncing versions.

@sbromberger
Copy link
Owner

yeah I was a little surprised you accepted that PR

I must've been drunk. :)

My concern is that having a set of package that all depend on each other we have to start worrying about syncing versions.

I have the same concern, but I think that keeping things in sync within JuliaGraphs is going to be much easier than keeping things in sync between LightGraphs and 20+ other packages.

@jpfairbanks
Copy link
Contributor Author

Haha keepin' it 💯

Yeah that is probably true. Do you agree that I/O aka Persistence is the right place to start?

@sbromberger
Copy link
Owner

Do you agree that I/O aka Persistence is the right place to start?

It's a place to start, which is ahead of where we are now. If you want to create a new repo for the persistence stuff, please do.

Quick bikeshed on the name, and for future things that get pulled out: I propose "LG" + whatever function group we're pulling out, so this would be "LGPersistence.jl". I'm open to alternatives but I think we should

  1. mandate that all "official" LG extensions are kept within JuliaGraphs,
  2. the extensions have a common prefix, and
  3. we create a new group (initially with you and me in it) that will have access to the repos.

@sbromberger
Copy link
Owner

Also: have you guys tried to build / precompile LG lately? It takes (literally) 15 minutes on my machine, and fails the first time because GraphMatrices requires Compat but LG @requires GraphMatrices, which apparently doesn't resolve the chained dependencies.

@jpfairbanks
Copy link
Contributor Author

Precompiling only takes a few seconds on my MacBook.

@CarloLucibello
Copy link
Contributor

Aldo for me takes a few seconda on linux

@sbromberger
Copy link
Owner

Sorry, let me be a bit more clear: rm ~/.julia/lib/v0.4/*.ji (to remove all cached dependencies), then julia, then using LightGraphs.

@sbromberger
Copy link
Owner

OK, reviving the discussion because we've seen (thanks to @CarloLucibello ) that we actually CAN reduce dependencies when we want to 😄

What I propose as a starting point are some general rules:

  1. Dependencies on core Julia packages (things under JuliaLang) are ok.
  2. Dependencies on non-core leaf packages (0 subdependencies except for core Julia packages) are less ok, but still probably ok.
  3. Dependencies on non-core non-leaf packages require strict scrutiny.

Thoughts?

@sbromberger sbromberger changed the title Dependency Priorities Let's get rid of dependencies! (was: Dependency Priorities) Apr 28, 2016
@CarloLucibello
Copy link
Contributor

CarloLucibello commented Apr 28, 2016

This maybe goes against the very title of this issue, but as a consequence of these rules (which I totally support) we could merge #277, bringing along DataStructures wich is both in JuliaLang and a leaf.

Also there was some discussion on Collections being factorized out from Base, so maybe we will have to depend on DataStructures in any case in the near future

@jpfairbanks
Copy link
Contributor Author

These rules look good. As long as they don't become dogma. I think DataStructures.jl in particular has some really useful stuff in it. Lots of other packages should be using it, so most people will already have it installed.

@sbromberger
Copy link
Owner

sbromberger commented Apr 28, 2016

Well, what are rules but aspirational dogma? I think that we ought to have a very concrete way of evaluating the worth of changes that make LG more fragile. That is, rule 3 doesn't mean we'll never accept a PR that includes a non-core, non-leaf package, but it means that there had better be a very compelling reason we need to.

and @CarloLucibello - yes. Adoption of these rules means that #277 falls under rule 1, which means there should not be a significant barrier to merging. (This is the PR I used as the basis for that rule.)

sbromberger added a commit that referenced this issue May 4, 2016
Update developer guidelines per #278 (comment)
@sbromberger
Copy link
Owner

Closing given the documentation update in #357. We have a stated direction.

@jpfairbanks
Copy link
Contributor Author

@sbromberger

We have a stated direction

#progress

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
discussion requires in-depth discussion - participation encouraged
Projects
None yet
Development

No branches or pull requests

3 participants