New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify Dict code, and eliminate calls to Debug.crash #255

Closed
wants to merge 6 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@jvoigtlaender
Contributor

jvoigtlaender commented May 27, 2015

  • removing redundant bindings
  • inlining/simplifying case branches
  • removing an unused, and unexported, function
  • turn a potentially crashing auxiliary function into one that returns Maybe

Replaces https://github.com/elm-lang/core/pull/254.

jvoigtlaender added some commits May 27, 2015

remove redundant bindings
Note that `l` and `r` already have exactly those values due to the
pattern match of which this is a branch.
turn crash into maybe, and short-cut one case
Note that `l` in that branch is known to be `RBNode cl kl vl ll rl`.
@evancz

This comment has been minimized.

Show comment
Hide comment
@evancz

evancz May 27, 2015

Member

This looks good to me!

When you say "potentially crashing" do you mean "given arbitrary inputs, it can crash" or do you mean "as it is used in this library, it can crash"?

If the intent is that it should never get to that case because of the invariants of a red-black tree (which are not captured in the types), is it best to have a silent failure/bug or to immediately say something to the user if we have broken those invariants?

Member

evancz commented May 27, 2015

This looks good to me!

When you say "potentially crashing" do you mean "given arbitrary inputs, it can crash" or do you mean "as it is used in this library, it can crash"?

If the intent is that it should never get to that case because of the invariants of a red-black tree (which are not captured in the types), is it best to have a silent failure/bug or to immediately say something to the user if we have broken those invariants?

@jvoigtlaender

This comment has been minimized.

Show comment
Hide comment
@jvoigtlaender

jvoigtlaender May 27, 2015

Contributor

I meant that the function contained a call to crash, so with arbitrary inputs it could crash. But it is called only inside this library, and for the calls in the library an invariant holds that the crashing case cannot occur.

As long as the function remains internal, I think the current use with withDefault at the use site of max in the rem function is the best way to go. If the function were to become used more widespread, then probably changing its return type from Maybe to Result would be best. Then the Err case of the result could carry the current error message string. Even in such a setting, using the Result type would certainly be better than a call to crash?

(Maybe the best idea is to turn that Maybe into Result right away, and add a call to Result.toMaybe at the use site in my refactoring.)

Contributor

jvoigtlaender commented May 27, 2015

I meant that the function contained a call to crash, so with arbitrary inputs it could crash. But it is called only inside this library, and for the calls in the library an invariant holds that the crashing case cannot occur.

As long as the function remains internal, I think the current use with withDefault at the use site of max in the rem function is the best way to go. If the function were to become used more widespread, then probably changing its return type from Maybe to Result would be best. Then the Err case of the result could carry the current error message string. Even in such a setting, using the Result type would certainly be better than a call to crash?

(Maybe the best idea is to turn that Maybe into Result right away, and add a call to Result.toMaybe at the use site in my refactoring.)

jvoigtlaender added some commits May 27, 2015

preserve error string in code, using Result
Overall, this is semantically equivalent to the original code. No error
dropping is happening, since where the `withDefault` is now used it was
also previously guaranteed that the error case wouldn't have been
triggered.
@jvoigtlaender

This comment has been minimized.

Show comment
Hide comment
@jvoigtlaender

jvoigtlaender Aug 14, 2015

Contributor

I merged the master branch into this PR to make sure it still builds. It does, and is still an improvement IMO.

Contributor

jvoigtlaender commented Aug 14, 2015

I merged the master branch into this PR to make sure it still builds. It does, and is still an improvement IMO.

@jvoigtlaender

This comment has been minimized.

Show comment
Hide comment
@jvoigtlaender

jvoigtlaender Aug 15, 2015

Contributor

Just a walkthrough of the changes in this pull request (since the comments above do not anymore reflect exactly what happened here overall):

  • The min function was deleted. It wasn't used in the module, nor exported.
  • The max function was changed from having type Dict k v -> (k, v) and throwing a Debug.crash in one case to having type Result String (k, v). The function is only used inside the module at present, and its use there is unaffected by this "change" (see below). If the function ever gets exported from the module, signalling error through Result String is certainly better than just crashing. (The consumer of the result might still decide to crash on receiving the Err value, of course.)
  • In the rem function, there was a pattern matching on (l, r) vs. (RBNode cl kl vl ll rl, RBNode cr kr vr lr rr) and later bindings of l and r to exactly RBNode cl kl vl ll rl and RBNode cr kr vr lr rr. Those bindings were clearly superfluous, one can as well pass the original l and r on to where they are further needed. So the bindings were deleted.
  • What remains was a call of max on the l from the previous bullet point. Since that l is known to be RBNode cl kl vl ll rl from the pattern match, this was actually a call max (RBNode cl kl vl ll rl). This, I replaced by withDefault (kl, vl) (toMaybe (max rl)). This is justified by the definition of max, which has the cases:
max (RBNode _ key value _ (RBEmpty _)) = Ok (key, value)  -- previously, simply (key, value)
max (RBNode _ _ _ _ right)             = max right
max (RBEmpty _)                        = Err "..."        -- previously, Debug.crash "..."

That clearly makes the above a semantics-preserving transformation. (Because max (RBNode cl kl vl ll rl) will first look at whether rl is an RBEmpty, in which case (kl,vl) are to be returned, just as will happen in the new call withDefault (kl, vl) (toMaybe (max rl)) in that case. If rl is not an RBEmpty, then both the original call and the new call will return what max rl returns -- which is known to never hit an RBEmpty if rl itself was not an RBEmpty.)

(Note: Just for the explanation above, I have abbreviated all RBEmpty_elm_builtin and RBNode_elm_builtin to RBEmpty and RBNode.)

Contributor

jvoigtlaender commented Aug 15, 2015

Just a walkthrough of the changes in this pull request (since the comments above do not anymore reflect exactly what happened here overall):

  • The min function was deleted. It wasn't used in the module, nor exported.
  • The max function was changed from having type Dict k v -> (k, v) and throwing a Debug.crash in one case to having type Result String (k, v). The function is only used inside the module at present, and its use there is unaffected by this "change" (see below). If the function ever gets exported from the module, signalling error through Result String is certainly better than just crashing. (The consumer of the result might still decide to crash on receiving the Err value, of course.)
  • In the rem function, there was a pattern matching on (l, r) vs. (RBNode cl kl vl ll rl, RBNode cr kr vr lr rr) and later bindings of l and r to exactly RBNode cl kl vl ll rl and RBNode cr kr vr lr rr. Those bindings were clearly superfluous, one can as well pass the original l and r on to where they are further needed. So the bindings were deleted.
  • What remains was a call of max on the l from the previous bullet point. Since that l is known to be RBNode cl kl vl ll rl from the pattern match, this was actually a call max (RBNode cl kl vl ll rl). This, I replaced by withDefault (kl, vl) (toMaybe (max rl)). This is justified by the definition of max, which has the cases:
max (RBNode _ key value _ (RBEmpty _)) = Ok (key, value)  -- previously, simply (key, value)
max (RBNode _ _ _ _ right)             = max right
max (RBEmpty _)                        = Err "..."        -- previously, Debug.crash "..."

That clearly makes the above a semantics-preserving transformation. (Because max (RBNode cl kl vl ll rl) will first look at whether rl is an RBEmpty, in which case (kl,vl) are to be returned, just as will happen in the new call withDefault (kl, vl) (toMaybe (max rl)) in that case. If rl is not an RBEmpty, then both the original call and the new call will return what max rl returns -- which is known to never hit an RBEmpty if rl itself was not an RBEmpty.)

(Note: Just for the explanation above, I have abbreviated all RBEmpty_elm_builtin and RBNode_elm_builtin to RBEmpty and RBNode.)

@evancz

This comment has been minimized.

Show comment
Hide comment
@evancz

evancz Aug 15, 2015

Member

Do you have any idea what this costs in performance? Perhaps it's not smart to be worrying about this right now, but I have a sense that we will be wanting to make Dict and Array as fast as possible, and it will sometimes be worth it to do weird stuff within the module itself. I guess my concern here is that we are maybe paying quite a lot for this change but not getting a ton back.

I need to get https://github.com/JoeyEremondi/elm-benchmark set up for Joey's ongoing compiler work, so maybe it'd be a good time to start setting up some benchmarks for data structures like this. I'd be curious to see how much we gain by removing a ton of closures from the compiled code. So my vote would be to do some exploration in this direction before going with something that could be quite a bit slower.

Member

evancz commented Aug 15, 2015

Do you have any idea what this costs in performance? Perhaps it's not smart to be worrying about this right now, but I have a sense that we will be wanting to make Dict and Array as fast as possible, and it will sometimes be worth it to do weird stuff within the module itself. I guess my concern here is that we are maybe paying quite a lot for this change but not getting a ton back.

I need to get https://github.com/JoeyEremondi/elm-benchmark set up for Joey's ongoing compiler work, so maybe it'd be a good time to start setting up some benchmarks for data structures like this. I'd be curious to see how much we gain by removing a ton of closures from the compiled code. So my vote would be to do some exploration in this direction before going with something that could be quite a bit slower.

@jvoigtlaender

This comment has been minimized.

Show comment
Hide comment
@jvoigtlaender

jvoigtlaender Aug 15, 2015

Contributor

Well, part of my change actually removes work: the superfluous reconstruction of trees l and r in those bindings I removed.

Other than that, you are concerned about the Ok constructor? I have no idea whether/what impact that may have on performance.

Contributor

jvoigtlaender commented Aug 15, 2015

Well, part of my change actually removes work: the superfluous reconstruction of trees l and r in those bindings I removed.

Other than that, you are concerned about the Ok constructor? I have no idea whether/what impact that may have on performance.

@jvoigtlaender

This comment has been minimized.

Show comment
Hide comment
@jvoigtlaender

jvoigtlaender Aug 15, 2015

Contributor

Closing this. May make a smaller pull request with just the obvious improvement (and leaving the elimination of min to dead code detection).

Contributor

jvoigtlaender commented Aug 15, 2015

Closing this. May make a smaller pull request with just the obvious improvement (and leaving the elimination of min to dead code detection).

@jvoigtlaender jvoigtlaender deleted the jvoigtlaender:dict-simpl branch Aug 15, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment