-
-
Notifications
You must be signed in to change notification settings - Fork 649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add set->hash #2766
base: master
Are you sure you want to change the base?
add set->hash #2766
Conversation
@@ -128,6 +128,13 @@ respectively. | |||
|
|||
} | |||
|
|||
@defproc[(set->hash [st (or/c set? set-mutable? set-weak?)]) hash?]{ | |||
|
|||
Converts a set to a hash table, if it is @tech{hash set}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wording: "... if it is a hash set"?
Also, it might be worth mentioning that, if the set is mutable, changes to it will be reflected in the returned hash and vice versa.
This seems to allow a user to make the set library violate its contracts and behave strangely. For example:
Would the performance still be okay if the hash is returned with a chaperone that forbids mutation? |
Or can we define the hash set's membership predicate in terms of |
Doesn't this leak too much information about the internal implementation of |
@rmculpepper Is this acceptable? |
I feel it's ok because:
What do others think? |
@stchang There are a bunch of other places in the implementation that would need to be similarly changed, e.g., [*] Aside: the implementation of |
Good catch, thanks. I think I got them all now.
What did you have in mind? Is something like this any faster? (and (= (hash-count table1) (hash-count table2))
(for/and ([k1 (in-hash-keys table1)]
[k2 (in-hash-keys table2)])
(and (hash-has-key? table1 k2)
(hash-has-key? table2 k1)))) |
I was thinking we'd do what the expander does. For immutable hashes, this will typically be a good deal faster, especially in the negative case. |
Ah, I didnt know about |
Not right now, but it could, in principle. |
Here's another example that violates the invariants of a custom set type:
Also, anyone iterating over the hash would also need some way of extracing the actual key from the |
Ok I understand now. What if it only worked for immutable hash sets? Or, could it just be added as an unsafe function? |
I guess I understand, too, but it's not clear to me that this library respects its own abstractions: #lang racket/base
(require racket/set)
(define-custom-set-types mumble
#:elem? exact-integer?
(lambda (a b) #f))
(define s1 (make-mutable-mumble))
(define s2 (make-mutable-mumble))
;; According to the comparison proc,
;; these 5s should be distinct.
(set-add! s1 5)
(set-add! s1 5)
(set-add! s2 5)
(set-add! s2 5)
(set-count s1) ;; => 1
(set=? s1 s2) ;; => #t |
I think that Perhaps it's worth noticing in the docs that it may not return a new fresh In the future (I'm a strong proponent of "Make an RFC for immutability and freshness in the standard library" racket/rhombus-prototype#22 and I think it's better to future-proof this function. Restricting it to immutable |
Good point. I guess my first point in #2766 (comment) is not really true then, since it doesnt create a new object. I worry that making a copy will negate any performance gain though? |
What about |
Re It seems to me that Going back to the original issue, what's an example of something that can't currently be done efficiently? |
A concrete example is implementing sequences for my graph library, where graphs are implemented with I'm trying to improve iteration speed by implementing sequences that take advantage of the "clause transform" case in
An alternative to extracting the hash is to add |
I'll try to read the code more carefully later. Just two ideas:
|
Forgot some details. Here is the code for the new sequences I defined. To repeat my experiments, change the example to use the following ;; slow: with set-first/next
(define-sequence-syntax in-weighted-graph-neighbors
(λ () #'in-weighted-graph-neighbors*)
(syntax-parser
[[(id) (_ g-expr v-expr)]
;; with set-first/rest
(for-clause-syntax-protect
#'[(id)
(:do-in
;; outer bindings
([(ht) (hash-ref (get-adjlist g-expr) v-expr)])
;; outer check
#t ;; TODO: fix
;; loop bindings
([s ht])
;; pos check
(not (set-empty? s))
;; inner bindings
([(id) (set-first s)])
;; preguard
#t
;; post guard
#t
;; loop args
((set-rest s)))])]
[_ #f])) ;; fast: with set->hash
(define-sequence-syntax in-weighted-graph-neighbors
(λ () #'in-weighted-graph-neighbors*)
(syntax-parser
[[(id) (_ g-expr v-expr)]
;; with set->hash
(for-clause-syntax-protect
#'[(id)
(:do-in
;; outer bindings
([(ht) (set->hash (hash-ref (get-adjlist g-expr) v-expr))])
;; outer check
#t ;; TODO: fix
;; loop bindings
([i (unsafe-immutable-hash-iterate-first ht)])
;; pos check
i
;; inner bindings
([(id) (unsafe-immutable-hash-iterate-key ht i)])
;; preguard
#t
;; post guard
#t
;; loop args
((unsafe-immutable-hash-iterate-next ht i)))])
[_ #f])) The timings do not change when using |
I haven't run this, but couldn't you write it as the following instead?
|
I'll try it but I didn't think |
I tried the |
Ah, it appears that The expansion of the following example suggests that sequence syntaxes can expand into other sequence syntaxes and get the fast path behavior:
|
I was just playing around with this, and @rmculpepper's version works and cuts the time on my machine from about 11.1s to about 6.3. So a big improvement. On the other hand, it is still about 3x what you get if you replace [Update:] |
Ok I tried it too and I'm seeing similar almost-2x speedup. Thanks for the suggestion Ryan. It's still 2-3x slower than just iterating on the underlying hash for some reason, which is strange because the code is similar. Only difference is the (identity) wrapper fns. Could that make such a big difference? I can't think of another way to extract the underlying hash that addresses the issues Ryan pointed out though, other than maybe putting it in |
In Bizarrely, starting with the defunctionalized version and changing |
You might get the same effect by declaring the |
A bit off-topic: it's not a big surprise that racket (classic) has better iteration performance than racketcs on immutable hashes. And racketcs does do better on constructing the graph. But when I tested a version of racketcs that uses a HAMT instead of the current Patricia trie implementation for immutable hashes, it did better on both construction and iteration. That was surprising. (The microbenchmark I've used in the past has consistently shown that the Patricia tries are so much faster at writes than HAMTs that it doesn't make sense to use the latter. But this seems like a better benchmark: Patricia version:
HAMT version:
I think I need to collect some more benchmarks. |
I just tried |
I see similar speedup but I'm confused why. Is there some kind of speculative execution going on? Is it normal for speculative execution to affect performance so much? |
It's unlikely that it's speculative execution; instead it's probably changing either inlining decisions or whether the compiler can prove that the function never errors/always returns 1 value/something else useful. |
I think this is unlikely to be merged in the current form, but I'm interested in (a) what this looks like on 7.7 Racket CS and (b) whether the improvements to the graph library actually got used. @stchang ? |
The k-cfa slowdown is a little worrying there. |
@samth This big difference is consistent with my earlier patricia vs. hamt benchmarks (where the hamts in question were not using stencil vectors). |
Adds a
set->hash
function that returns the underlying hash table of a hash set.Currently, it's impossible to efficiently iterate over a hash set manually, ie
in-mutable-set
and friends cannot be used when defining a custom iteration viadefine-sequence-syntax
for a data structure that uses sets.There's
set-first
andset-rest
but those are an order of magnitude worse than the hash iteration functions. We could manually addunsafe-immutable-set-first
, etc, analogous tounsafe-immutable-hash-iterate-first
, but this is easier.