Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graph,graph/{encoding/dot,internal/set,topo,traverse}: remove intsets imports #68

Merged
merged 2 commits into from Jun 16, 2017

Conversation

kortschak
Copy link
Member

This is a first step towards allowing int64 ID values and addresses the simple-to-handle cases.

Another PR for the more hairy community package will follow when I figure out a good approach; the problem is that intsets.Sparse iterates via a stateful method, which allows a nice method-based approach to handle subset iteration, versus complete set iteration without filling a complete set for the complete set case. set.Ints cannot do this because of limitations with range. Suggestions welcomed.

@vladimir-ch @mewmew Please take a look.

@kortschak
Copy link
Member Author

It turns out even changing the less hairy community package cases (non-multiplex) is problematic as the iteration order impacts on the algorithm (recall that it is a randomised algorithm so this might be expected). This suggests at least an interim approach. That commit will come soon.

@kortschak
Copy link
Member Author

The approach to maintaining an iterator required for the multiplex cases is handled by a minTaker now in louvain_common.go. This also works in a cut down manner for the non-multiplex case.

The impact on performance is not terrible:

$ benchstat old.bench nosort.bench
name                        old time/op  new time/op  delta
LouvainDirectedMultiplex-8  27.6ms ± 2%  29.4ms ± 2%  +6.58%  (p=0.000 n=10+10)
LouvainDirected-8           24.2ms ± 3%  24.7ms ± 2%  +2.07%  (p=0.022 n=10+9)
LouvainMultiplex-8          21.1ms ± 2%  22.3ms ± 4%  +5.79%  (p=0.000 n=10+10)
Louvain-8                   18.9ms ± 2%  19.3ms ± 2%  +2.33%  (p=0.000 n=10+10)

except for the fact that without consistent sort order of the nodes (which intsets.Sparse provided) we do not pass the tests. I think that it is worth being able to control the source of randomness in this package for reproducibility. So the IDs are sorted giving a worse performance (the code that is committed here).

$ benchstat old.bench new.bench
name                        old time/op  new time/op  delta
LouvainDirectedMultiplex-8  27.6ms ± 2%  30.8ms ± 3%  +11.60%  (p=0.000 n=10+10)
LouvainDirected-8           24.2ms ± 3%  26.5ms ± 3%   +9.50%  (p=0.000 n=10+10)
LouvainMultiplex-8          21.1ms ± 2%  24.1ms ± 4%  +14.43%  (p=0.000 n=10+10)
Louvain-8                   18.9ms ± 2%  20.7ms ± 3%   +9.74%  (p=0.000 n=10+10)

This can be fixed later.

@@ -306,6 +308,36 @@ func (d *dense) TakeMin(p *int) bool {
return true
}

// slice is a sparse integer set iterator.
type slice struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type reads as community.slice which does not communicate its use clearly. Perhaps change from slice to intSetSlice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with this. It's a minTaker that is a slice.

@mewmew
Copy link
Member

mewmew commented Jun 16, 2017

From what I can tell, the change looks good to me.

When reviewing the change, and especially as you commented on the performance impact of changing from intsets.Sparse to map[int]struct{}, I feel curious to ask if it would make sense to fork https://golang.org/x/tools/container/intsets and maintain a int64 version of it?

That could make the chance less intrusive and still maintain a good performance. What are your thoughts?

@kortschak
Copy link
Member Author

I had a look at the code for intsets and it's not trivial to make it a int64-handling package, and that adds significant maintenance burden.

@kortschak
Copy link
Member Author

What you will notice though is that in the bigids branch, the parts of community that handle this deal with ids that are in int, so after that has gone it, I can revert the parts of this PR this have caused the slow down. It's a bit of a rigmarole, but it the change involved in some of the routines from int -> int64 are not as simple as just a mechanical change. This is one of those.

@kortschak kortschak merged commit 569c4e3 into master Jun 16, 2017
@kortschak kortschak deleted the intsets branch June 16, 2017 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants