Issue#28 #31

jpfairbanks · 2015-03-26T21:30:22Z

Is this sufficiently squashed version of #29? I just concatenated the commit messages. If you want something else, send the git commands to make it happen.

add docs for sparse_erdos_renyi Revert "add type declaration to match previous functions" This reverts commit 7f0cd59. moving to Distributions from StatsBase Merge pull request sbromberger#30 from JuliaGraphs/distributions Merge branch 'master' of git://github.com/sbromberger/LightGraphs.jl into fasterdosrenyi fix rst newline use div instead of Int() to evaluate n choose 2 Merge branch 'fasterdosrenyi' of github.com:jpfairbanks/LightGraphs.jl into fasterdosrenyi

coveralls · 2015-03-26T23:47:26Z

Coverage increased (+0.12%) to 92.46% when pulling 2969c4d on jpfairbanks:issue#28 into 023d658 on JuliaGraphs:master.

Issue#28

sbromberger · 2015-03-27T00:56:12Z

@jpfairbanks just thinking out loud - could we come up with a minimum value of p that would call for standard E-R (vs the sparse E-R)? If so, we could create a master function that would call the appropriate one transparently.

jpfairbanks · 2015-03-27T13:06:52Z

If p_n in o(n_n) then you might as well use dense. I think that using the
sparse algorithm should be the default because if you are sampling small
enough to store the n^2 edges you are small enough to not care about the
performance penalty of the sparse case. But maybe if the rejection rate
gets really high the penalty could be very large.

On Thu, Mar 26, 2015 at 8:56 PM Seth Bromberger notifications@github.com
wrote:

@jpfairbanks https://github.com/jpfairbanks just thinking out loud -
could we come up with a minimum value of p that would call for standard
E-R (vs the sparse E-R)? If so, we could create a master function that
would call the appropriate one transparently.

—
Reply to this email directly or view it on GitHub
#31 (comment)
.

sbromberger · 2015-03-27T15:07:35Z

I'm confused about this performance:

julia> @time a = erdos_renyi(10,0.4)
elapsed time: 2.0957e-5 seconds (9 kB allocated)
{10, 19} undirected graph

julia> @time a = erdos_renyi(100,0.4)
elapsed time: 0.000904757 seconds (884 kB allocated)
{100, 1986} undirected graph

julia> @time a = erdos_renyi(1000,0.4)
elapsed time: 0.131178128 seconds (84 MB allocated, 9.06% gc time in 4 pauses with 0 full sweep)
{1000, 199663} undirected graph

julia> @time a = erdos_renyi(10000,0.4)
elapsed time: 18.644758824 seconds (7943 MB allocated, 10.24% gc time in 319 pauses with 6 full sweep)
{10000, 19995368} undirected graph

julia> @time a = sparse_erdos_renyi(10,0.4)
elapsed time: 0.036506921 seconds (2 MB allocated)
{10, 19} undirected graph

julia> @time a = sparse_erdos_renyi(10,0.4)
elapsed time: 3.1323e-5 seconds (10 kB allocated)
{10, 18} undirected graph

julia> @time a = sparse_erdos_renyi(100,0.4)
elapsed time: 0.001067809 seconds (1024 kB allocated)
{100, 1961} undirected graph

julia> @time a = sparse_erdos_renyi(1000,0.4)
elapsed time: 0.167177456 seconds (106 MB allocated, 5.91% gc time in 5 pauses with 0 full sweep)
{1000, 200039} undirected graph

julia> @time a = sparse_erdos_renyi(10000,0.4)
elapsed time: 30.279688251 seconds (10890 MB allocated, 4.43% gc time in 453 pauses with 6 full sweep)
{10000, 19998701} undirected graph

I would've thought that with a p of 0.4 we'd be better off using sparse. Even with p=0.04 erdos_renyi() is faster:

julia> @time a = sparse_erdos_renyi(10000,0.04)
elapsed time: 2.355809486 seconds (1145 MB allocated, 8.81% gc time in 49 pauses with 2 full sweep)
{10000, 1998980} undirected graph

julia> @time a = erdos_renyi(10000,0.04)
elapsed time: 1.95749474 seconds (903 MB allocated, 21.43% gc time in 36 pauses with 2 full sweep)
{10000, 1997923} undirected graph

jpfairbanks · 2015-03-27T15:19:08Z

I like to think of the sparsity in terms of the average degree that is n*p
p*(n choose 2) = expectation(ne(g))
so if you grow n with constant p, the graph is become more dense in terms of average degree.
the sparse case is where p*n = o(n) for instance p=10/n or p=2log(n)/n.

We should still see where the extra memory usage is coming from. Is it all in the push! on the incidence lists?

sbromberger · 2015-03-27T15:27:55Z

It's not so much the memory usage, but the comparative speed. Under what conditions will sparse be measurably faster?

jpfairbanks · 2015-03-27T15:38:24Z

julia> n = 1000; p = 40/n ; @time a = erdos_renyi(n,p)
elapsed time: 0.182780117 seconds (16172068 bytes allocated, 11.96% gc time)
{1000, 19919} undirected graph

julia> n = 1000; p = 40/n ; @time a = sparse_erdos_renyi(n,p)
elapsed time: 0.047841846 seconds (12153792 bytes allocated)
{1000, 19864} undirected graph

sbromberger · 2015-03-27T15:39:39Z

I'm not seeing it:

julia> n = 1000; p = 40/n ; @time a = erdos_renyi(n,p)
elapsed time: 0.014021633 seconds (14 MB allocated, 16.19% gc time in 1 pauses with 0 full sweep)
{1000, 20146} undirected graph

julia> n = 1000; p = 40/n ; @time a = erdos_renyi(n,p)
elapsed time: 0.014753793 seconds (14 MB allocated, 15.99% gc time in 1 pauses with 0 full sweep)
{1000, 20033} undirected graph

julia> n = 1000; p = 40/n ; @time a = sparse_erdos_renyi(n,p)
elapsed time: 0.015578574 seconds (16 MB allocated)
{1000, 20094} undirected graph

julia> n = 1000; p = 40/n ; @time a = sparse_erdos_renyi(n,p)
elapsed time: 0.018151163 seconds (15 MB allocated, 14.64% gc time in 1 pauses with 0 full sweep)
{1000, 19766} undirected graph

jpfairbanks · 2015-03-27T15:52:14Z

I am on v0.4 and I modified the sparse implementation to print out the rejection and acceptance numbers, how many time has_edge returned true.

julia> n = 10000; for k in [2^i for i in [1,2,3,4,5,6,7]]; p = k/n ; print("$k\t"); @time a = sparse_erdos_renyi(n,p) end
2   rejected:   1   accepted:9903
elapsed time: 0.075272988 seconds (5890612 bytes allocated)
4   rejected:   4   accepted:19932
elapsed time: 0.014523272 seconds (9608252 bytes allocated)
8   rejected:   15  accepted:40421
elapsed time: 0.055326482 seconds (17789036 bytes allocated, 44.72% gc time)
16  rejected:   56  accepted:79441
elapsed time: 0.088910506 seconds (40110964 bytes allocated, 31.05% gc time)
32  rejected:   264 accepted:159680
elapsed time: 0.150687522 seconds (77842156 bytes allocated, 18.35% gc time)
64  rejected:   1049    accepted:319750
elapsed time: 0.322218162 seconds (160103476 bytes allocated, 17.70% gc time)
128 rejected:   4129    accepted:639030
elapsed time: 0.719800408 seconds (312232588 bytes allocated, 25.63% gc time)

julia> n = 10000; for k in [2^i for i in [1,2,3,4,5,6,7]]; p = k/n ; print("$k\t"); @time a = erdos_renyi(n,p) end
2   elapsed time: 0.173636768 seconds (4604088 bytes allocated)
4   elapsed time: 0.175601538 seconds (8055552 bytes allocated)
8   elapsed time: 0.182257545 seconds (14396040 bytes allocated)
16  elapsed time: 0.202327589 seconds (33890240 bytes allocated)
32  elapsed time: 0.274582328 seconds (65238184 bytes allocated, 12.65% gc time)
64  elapsed time: 0.39038162 seconds (135049480 bytes allocated, 14.73% gc time)
128 elapsed time: 0.629426463 seconds (263986184 bytes allocated, 19.37% gc time)

As you can see from the fact that when p is small increases in p do not translate into increases in run time the time is dominated by the time to generate the n^2 random numbers. But once p gets large the run time is dominated by the time to insert the edges into the data structure and this causes the run time between sparse and dense to converge.

sbromberger · 2015-03-27T15:57:12Z

There's a slight bug in the sparse implementation: you must also check that i != j (that is, that the source != the destination - we don't allow self-loops in Graphs.) I can fix that. I'm rethinking a "nocheck" option to add_edge but I'm not sure it's going to make things significantly faster. add_edge is slow because it needs to push onto vectors, I think.

jpfairbanks · 2015-03-27T16:45:49Z

I think the right way to do it is sprandbool(n, n, p) and then build the incidence list based on the CSC structure. That way you can avoid growing the vectors at all.

sbromberger · 2015-03-27T16:49:57Z

We already have that in the Graph constructor (check out Graph(nv, ne). That is, create a graph with order nv and size ne, where ne edges are created from i to j selected uniformly randomly from among 1:nv.

jpfairbanks · 2015-03-27T17:42:07Z

Do you mean this?

function Graph(nv::Integer, ne::Integer)
    g = Graph(nv)

    i = 1
    while i <= ne
        source = rand(1:nv)
        dest = rand(1:nv)
        e = (source, dest)
        if (source != dest) && !(has_edge(g,source,dest))
            i+= 1
            add_edge!(g,source,dest)
        end
    end
    return g
end

This does not call sprandbool. This is basically equivalent to sparse_erdos_renyi that I wrote. time wasted by not reading...
I'll have a PR with a function to convert from SparseMatrixCSC to Graph.

sbromberger · 2015-03-27T17:44:12Z

Yes - it's not meant to be equal to sprandbool but rather equivalent to the CSC matrix generation you proposed in #31 (comment) . I misunderstood your original code to be something different/more than this constructor but looking at it now it's pretty much equivalent as well.

sbromberger · 2015-03-27T17:45:50Z

(To be fair, I did mention this in #27 (comment) :) )

sbromberger · 2015-03-27T17:46:59Z

We have a matrix->graph function already. Check out function Graph{T<:Number}(adjmx::Array{T, 2})

jpfairbanks · 2015-03-27T17:47:25Z

Yea I should read better. Do you want to have performance benchmarks built into the repo? Should they go in test/perf/filename.jl

sbromberger · 2015-03-27T17:49:13Z

Do you want to have performance benchmarks built into the repo?

That's a really interesting idea. How would they work given that folks are running on different hardware - or is the intent to run these prior to merging commits to ensure everything's good? (That sounds like a great idea, actually.)

jpfairbanks · 2015-03-27T18:50:53Z

Well the first thing would be to just run them and have them write out the timing information to STDOUT. Using JSON is good because it has the flexibility to expand the schema as you add performance tests to run. Later we can write a script to process all the json and determine if there are performance regressions.

sbromberger · 2015-04-15T18:15:45Z

@jpfairbanks I still really like this idea of building benchmarks into the repo. I'll start an issue to track it.

jpfairbanks and others added 3 commits March 19, 2015 16:11

add type declaration to match previous functions

7f0cd59

Merge branch 'master' of github.com:jpfairbanks/LightGraphs.jl

ccfe321

sbromberger added a commit that referenced this pull request Mar 27, 2015

Merge pull request #31 from jpfairbanks/issue#28

78bbed7

Issue#28

sbromberger merged commit 78bbed7 into sbromberger:master Mar 27, 2015

This was referenced Mar 27, 2015

erdos_renyi is n^2 algorithm #27

Closed

Fasterdosrenyi #29

Closed

sbromberger mentioned this pull request Apr 15, 2015

Build benchmarks into the repo #45

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue#28 #31

Issue#28 #31

jpfairbanks commented Mar 26, 2015

coveralls commented Mar 26, 2015

sbromberger commented Mar 27, 2015

jpfairbanks commented Mar 27, 2015

sbromberger commented Mar 27, 2015

jpfairbanks commented Mar 27, 2015

sbromberger commented Mar 27, 2015

jpfairbanks commented Mar 27, 2015

sbromberger commented Mar 27, 2015

jpfairbanks commented Mar 27, 2015

sbromberger commented Mar 27, 2015

jpfairbanks commented Mar 27, 2015

sbromberger commented Mar 27, 2015

jpfairbanks commented Mar 27, 2015

sbromberger commented Mar 27, 2015

sbromberger commented Mar 27, 2015

sbromberger commented Mar 27, 2015

jpfairbanks commented Mar 27, 2015

sbromberger commented Mar 27, 2015

jpfairbanks commented Mar 27, 2015

sbromberger commented Apr 15, 2015

Issue#28 #31

Issue#28 #31

Conversation

jpfairbanks commented Mar 26, 2015

coveralls commented Mar 26, 2015

sbromberger commented Mar 27, 2015

jpfairbanks commented Mar 27, 2015

sbromberger commented Mar 27, 2015

jpfairbanks commented Mar 27, 2015

sbromberger commented Mar 27, 2015

jpfairbanks commented Mar 27, 2015

sbromberger commented Mar 27, 2015

jpfairbanks commented Mar 27, 2015

sbromberger commented Mar 27, 2015

jpfairbanks commented Mar 27, 2015

sbromberger commented Mar 27, 2015

jpfairbanks commented Mar 27, 2015

sbromberger commented Mar 27, 2015

sbromberger commented Mar 27, 2015

sbromberger commented Mar 27, 2015

jpfairbanks commented Mar 27, 2015

sbromberger commented Mar 27, 2015

jpfairbanks commented Mar 27, 2015

sbromberger commented Apr 15, 2015