Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add equal-always? #4076

Closed
wants to merge 23 commits into from
Closed

Conversation

AlexKnauth
Copy link
Member

@AlexKnauth AlexKnauth commented Dec 2, 2021

Add equal-always?, defining it such that (equal-always? a b) is true iff there exists a value v such that both a and b are chaperones of v.

It uses extensional equality for immutable data structures, and intensional equality for mutable data structures.

I still need to do several things before it would be ready, including:

  • Make sure the changes to chaperone-of? on mutable data are good, document them.
  • Add hash-code procedures corresponding to equal-always?.
  • Add hash-table data structures that use this to compare keys.
  • Add set data structures corresponding to those hash-tables.
  • For list operations that already have a version for each equality predicate, add new versions for corresponding to equal-always?
  • Add them to Racket BC as well, not just Racket CS.
  • Add documentation for equal-always?, equal-always?/recur, and the associated hash-codes, hash-tables, sets, etc.
  • Add prop:equal-always+hash
  • Add tests for the things I haven't tested yet.
  • Add docs for prop:equal-always+hash
  • In the docs for prop:equal+hash and gen:equal+hash, document what it means for an implementation to be "honest" and work properly with chaperone-of? and equal-always?, vs "dishonest".
  • Fix merge conflicts.
  • Run tests on both CS and BC versions, get it to pass on all architectures including ARM64.

@AlexKnauth
Copy link
Member Author

Why does that one stress test in pkgs/racket-test/tests/future/future.rkt have to print 10000 of the same line over and over?

@AlexKnauth
Copy link
Member Author

AlexKnauth commented Dec 13, 2021

The test that's hanging is pkgs/racket-test/tests/future/shutdown-stress.rkt.
The test was added in this commit and hasn't changed since then so far: e4c6a25
The commit message just says Relevant to #2725.
That explains why it's testing the interaction between places and futures, and also explains why that test failing might look like it hanging rather than visibly failing soon.

It should be starting 30 places, each with a thread that spawns a future which may or may not loop, and each place may or may not custodian-shutdown-all.
It should be printing ok (current-seconds) before starting each place, and on a previous test-run all 30 of those happened within 8 seconds.
But on the run that failed, it didn't print ok (current-seconds) even once, which... I think it means it didn't even start any of the places at all? Which is very strange. WHAT IS IT EVEN DOING THEN?!

or am I wrong about how printing works?
I might just be wrong about how printing works. Maybe it put the first printout in a buffer or something that wasn't flushed to actual output right away, and the first place just hanged and took up resources or something so that that buffer was never flushed to actual output?

@AlexKnauth
Copy link
Member Author

Side note: there's another branch on my fork called equal-always-bc, to test just the functionality I have on Racket BC so far.

@AlexKnauth
Copy link
Member Author

I labeled the commit broken because it fails on my machine (which I believe is based on an ARM64 architecture), but:

It seems to be passing the CI for everything except building Racket BC on ARM64... but on ARM64 the error seems nasty Aborted (core dumped) and on my machine building Racket BC fails similarly with a nasty Segmentation fault: 11. It's kind of ironic that I got it to build on everything except the architecture that my own machine is based on.

@rocketnia
Copy link
Contributor

rocketnia commented Dec 17, 2021

On the topic of how to express custom equal-always? behavior, I think there should be a gen:equal-always+hash and a prop:equal-always+hash that mostly line up with their equal? counterparts.

I think some common cases should be made more succinct:

  • #:property prop:equal-always+hash 'equal?/recur to delegate to the equal? behavior (but go right back to equal-always? on the next level of recursion).
  • #:property prop:equal-always+hash 'eq? to delegate to the eq? behavior.
  • #:property prop:equal-always+hash 'auto to get the same behavior as omitting the property.

It sounds like the default behavior of chaperone-of? and equal-always? is either the 'equal?/recur case or the 'eq? case, depending on whether any fields of the structure are mutable.

Structure type property guards can check whether the structure type's info has immutable fields and determine different property values based on that, so even if 'auto weren't offered as an innate feature of prop:equal-always+hash, users could define their own derived property that interpreted 'auto in terms of the structure type information itself.

Hmm, equal? actually cares whether the two values had prop:equal+hash customized on the same struct type ancestor. I'm not entirely sure of tbe purpose of that, but to keep things from getting confusing, I think equal-always? should do the same thing: checking that prop:equal-always+hash was customized on the same struct type ancestor.

I doubt the equal-always? same-ancestor check should interact with the equal? same-ancestor check. I think #:property prop:equal-always+hash 'equal?/recur should just be a shorthand for a fully spelled out implementation in terms of equal?/recur and the corresponding hash functions. If a subtype customizes equal?, I think a supertype's #:property prop:equal-always+hash 'equal?/recur should end up delegating to the subtype's behavior.

@rocketnia
Copy link
Contributor

rocketnia commented Dec 17, 2021

Hmm, found a potential issue in how I specified 'auto: What if a supertype uses #:property prop:equal-always+hash 'auto, which initially means the same thing as #:property prop:equal-always+hash 'equal?/recur, but a subtype introduces mutable fields?

Interpreting 'auto in the property's guard logic doesn't seem sufficient to simulate the property being omitted.

Perhaps the right way to resolve this is to say that every structure type with mutable fields automatically behaves as though it has #:property prop:equal-always+hash 'eq? unless otherwise specified. This unceremoniously overrides any customized behavior on the parent type, but for a pretty understandable reason, and it allows explicit 'auto to be consistent with omitting the property.

@rocketnia
Copy link
Contributor

I've been thinking about circumstances where one operation would dispatch to the other, and now I regret what I was just proposing.

If a type doesn't specify equal? behavior, I think it makes sense for equal? to fall back to the pickier equal-always?, just like it currently falls back to returning #f.

But in the world we're in now, where types customize equal? but not equal-always?, chaperone-of? already defaults to checking equal?.

This aspect of chaperone-of seems conceptually broken. If there's a user-defined struct out there somewhere that has only immutable fields, but it implements its equal? to do an "equal now" check of the fields' mutable contents, chaperone-of? is already an "equal now" operation, right? And if it's "equal now" in that case, then why does it bother checking for known mutable types or for mutable struct fields?

If "equal always" needs to do "equal now" sometimes for legacy reasons, and if it's natural for "equal now" to treat "equal always" as a fallback, then perhaps they should be one combined interface rather than dispatching back and forth. So I'm basically at square one. I'm sure I'll eventually have another set of opinions and another suggested design that takes this into account, but I should probably not get carried away until some other people have expressed the way they think about this.

@AlexKnauth
Copy link
Member Author

@rocketnia

This aspect of chaperone-of seems conceptually broken. If there's a user-defined struct out there somewhere that has only immutable fields, but it implements its equal? to do an "equal now" check of the fields' mutable contents, chaperone-of? is already an "equal now" operation, right? And if it's "equal now" in that case, then why does it bother checking for known mutable types or for mutable struct fields?

Yes that's a problem, a struct that (1) acts as a wrapper around a mutable data structure, (2) explicitly implements equal+hash with a layer of equal-now looking into that mutable data structure, and (3) has all fields immutable, does trick chaperone-of?, and with this PR it tricks equal-always? to match chaperone-of?. Here's an example:

(struct bxwrp (box)
  #:methods gen:equal+hash
  [;; implement one layer of equal-now explicitly:
   (define (equal-proc self other =?)
     (=? (unbox (bxwrp-box self)) (unbox (bxwrp-box other))))
   (define (hash-proc self rec) (rec (unbox (bxwrp-box self))))
   (define (hash2-proc self rec) (rec (unbox (bxwrp-box self))))])

(chaperone-of? (bxwrp (box 1)) (bxwrp (box 1))) ;=> #true but ideal should be #false
(equal-always? (bxwrp (box 1)) (bxwrp (box 1))) ;=> #true but ideal should be #false

Because of this I'm at least glad that by default chaperone-of? and equal-always? don't delegate to prop:equal+hash when the struct is mutable, because adding #:mutable to the struct is the easiest fix on the user's end as this struct is "conceptually" mutable:

(struct bxwrp (box) #:mutable
  #:methods gen:equal+hash
  [;; implement one layer of equal-now explicitly:
   (define (equal-proc self other =?)
     (=? (unbox (bxwrp-box self)) (unbox (bxwrp-box other))))
   (define (hash-proc self rec) (rec (unbox (bxwrp-box self))))
   (define (hash2-proc self rec) (rec (unbox (bxwrp-box self))))])

(chaperone-of? (bxwrp (box 1)) (bxwrp (box 1))) ;=> #false just like we want
(equal-always? (bxwrp (box 1)) (bxwrp (box 1))) ;=> #false just like we want

With your original suggestion the user could alternatively fix it by adding #:property prop:equal-always+hash 'eq?, but... well adding #:mutable is shorter and easier to understand.

However, prop:equal-always+hash 'equal?/recur could be useful on a "conceptually" immutable struct with a mutable field for a cache or something... promises and streams are two examples of this. A lazy data structure with a mutable cache field and a custom equal+hash implementation to force sub-parts by need to compare them, should ideally be able to specify that it's safe for chaperone-of? and equal-always? to delegate to that equal+hash implementation, and prop:equal-always+hash 'equal?/recur seems like a good way to do that.

@rocketnia
Copy link
Contributor

However, prop:equal-always+hash 'equal?/recur could be useful on a "conceptually" immutable struct with a mutable field for a cache or something... promises and streams are two examples of this. A lazy data structure with a mutable cache field and a custom equal+hash implementation to force sub-parts by need to compare them, should ideally be able to specify that it's safe for chaperone-of? and equal-always? to delegate to that equal+hash implementation, and prop:equal-always+hash 'equal?/recur seems like a good way to do that.

My opinions are still recovering from their self-defeat :) but I want to point out that this kind of data structure seems like it could just as well be implemented so that prop:equal+hash is just for forwarding (if it's present at all) and prop:equal-always+hash is where the real implementation is.

It seems like the differences only arise with subtyping, and the only time I've particularly needed struct subtyping is to let two structures with different sets of interfaces or different sets of fields be equal? to each other, so it's pretty far into "what if" territory for me. Do you know of some other practical uses for struct subtyping, particularly ones where there's basically no other way to accomplish something?

@AlexKnauth AlexKnauth force-pushed the equal-always branch 2 times, most recently from 7e126bd to d193932 Compare December 19, 2021 22:04
@AlexKnauth
Copy link
Member Author

I've added a test for the "honest" variant of bxwrp where it's declared #:mutable, but I while only added a comment about the "dishonest" variant for now, I plan to add a paragraph about that in the docs for prop:equal+hash and gen:equal+hash once I get around to adding docs.

@AlexKnauth
Copy link
Member Author

AlexKnauth commented Jan 15, 2022

The CI is failing on CI Win / buildtest-win (CS) at stage Run tests/racket/test in Run call .\racket\src\worksp\msvcprep.bat x86_amd64, with the error message:

[vcvarsall.bat] Environment initialized for: 'x86_x64'
error writing to stream port
  system error: Invalid access to memory location.; win_err=998
  context...:
   C:\Users\runneradmin\AppData\Roaming\Racket\8.4.0.3\pkgs\compiler-lib\compiler\commands\test.rkt:520:8
   C:\Users\runneradmin\AppData\Roaming\Racket\8.4.0.3\pkgs\compiler-lib\compiler\commands\test.rkt:515:3
   C:\Users\runneradmin\AppData\Roaming\Racket\8.4.0.3\pkgs\compiler-lib\compiler\commands\test.rkt:395:27
   C:\Users\runneradmin\AppData\Roaming\Racket\8.4.0.3\pkgs\compiler-lib\compiler\commands\test.rkt:397:0: call-with-summary
   [repeats 1 more time]
   .../private/map.rkt:40:19: loop
   C:\Users\runneradmin\AppData\Roaming\Racket\8.4.0.3\pkgs\compiler-lib\compiler\commands\test.rkt:1128:1: test
   body of "C:\Users\runneradmin\AppData\Roaming\Racket\8.4.0.3\pkgs\compiler-lib\compiler\commands\test.rkt"
   D:\a\racket\racket\racket\collects\raco\raco.rkt:41:0
   body of "D:\a\racket\racket\racket\collects\raco\raco.rkt"
   body of "D:\a\racket\racket\racket\collects\raco\main.rkt"
Error: Process completed with exit code 1.

This diverges from what a passing run should look like here:

[vcvarsall.bat] Environment initialized for: 'x86_x64'
raco test: "C:\\Users\\runneradmin\\AppData\\Roaming\\Racket\\8.4.0.2\\pkgs\\racket-test-core\\tests\\racket\\test.rkt"
raco test: @(test-responsible '(eli jay matthias mflatt robby ryanc samth))
Section(basic)
...

What does this mean?

Update:

The part of the error message error writing to stream port looks like it might come from file racket/src/io/port/fd-port.rkt, class fd-output-port, public method raise-write-error, line 180, in a call to raise-filesystem-error with the method's input n as another argument presumably containing the other part of the error message system error: Invalid access to memory location.; win_err=998.

In that same class fd-output-port, the method raise-write-error is used in two places: (1) in the private method flush-buffer and (2) in the override method write-out. I'm going to take a guess that the actual error happened in (2). In the body of the write-out method, in the else case of the outer cond there's a call to rktio_write_in: if that call produces a rktio-error? value, that would cause raise-write-error to be called.

The function rktio_write_in is defined in racket/src/rktio/rktio_fd.c, line 1058, where it seems to just delegate to rktio_write which is defined in the same file, line 1359. Of 4 places RKTIO_WRITE_ERROR shows up in this file, all are within this function body but only the last 2 seem relevant under #ifdef RKTIO_SYSTEM_WINDOWS: they are (1) return RKTIO_WRITE_ERROR; on line 1482, and (2) return RKTIO_WRITE_ERROR; on line 1739.

I can take a guess that it might have happened in (1) under if (!ok), where ok is set to the result of calling either WriteFile or WriteConsoleW, depending on the result of rktio_fd_is_terminal, and with that guess the 998 might come from get_windows_error(); which calls rktio_get_windows_error(rktio) which calls GetLastError().

I still don't know where the other part of the error message system error: Invalid access to memory location.; win_err=998 comes from or what might be causing it.

@mflatt
Copy link
Member

mflatt commented Jan 16, 2022

I'll take a look.

@mflatt
Copy link
Member

mflatt commented Jan 16, 2022

I haven't been able to replicate the crash seen for Windows in CI. I'll keep investigating, but I think it makes sense to assume for now that it's unrelated to this PR's changes.

@mflatt
Copy link
Member

mflatt commented Jan 16, 2022

Overall, the implementation here seems in very good shape. The code changes look right to the best of my ability to read the. The changes cover almost everything I could think of, and many things that I didn't immediately think of.

The only additional change I thought of: updating ad hoc optimizations in "optimize.c" and schemify, especially ones that convert equal? to eq? or eqv?. That could be done later after merging the base changes, though.

Maybe the only implementation piece to fill in is some prop:equal+hash adjustment or analog.

@AlexKnauth
Copy link
Member Author

Maybe the only implementation piece to fill in is some prop:equal+hash adjustment or analog.

@rocketnia and I were discussing this in the comments above starting with Nia's comment "On the topic of how to express custom equal-always? behavior."

In particular, someone writing a struct that uses mutation and wants #:property prop:equal-always+hash 'eq?, can currently just add #:mutable to get the desired behavior, and someone writing an immutable struct can just omit it and let prop:equal+hash handle it.

There is one case where a struct is "technically" mutable but "conceptually" immutable, and that's with Lazy data structures. I don't know of any existing Racket lazy data structures that implement prop:equal+hash, but if they exist or if they're added in the future, they would be the only ones I can think of that would benefit from using a prop:equal-always+hash.

@mflatt
Copy link
Member

mflatt commented Jan 16, 2022

There is one case where a struct is "technically" mutable but "conceptually" immutable, and that's with Lazy data structures. I don't know of any existing Racket lazy data structures that implement prop:equal+hash, but if they exist or if they're added in the future, they would be the only ones I can think of that would benefit from using a prop:equal-always+hash.

My understanding from the Rhombus meeting is that we're already pretty sure that we'll want that, so it seems like we should sort it out now rather than later, just in case doing so reveals some issue.

@rocketnia
Copy link
Contributor

If "equal always" needs to do "equal now" sometimes for legacy reasons, and if it's natural for "equal now" to treat "equal always" as a fallback, then perhaps they should be one combined interface rather than dispatching back and forth. So I'm basically at square one.

The idea of "equal always" having to dispatch to "equal now" for legacy reasons had me unmotivated about this for a while, but I think I've come up with something.

We don't have to think of this as a flaw in the equal-always? function; we can think of it as a flaw in the equal-proc method (and its related hashing methods). When a user-defined type implements equality by way of equal-proc, its equal-always? behavior defaults to calling the equal? behavior. But a new method could be added that fulfilled the same purpose as equal-proc but allowed the defaults to flow better. Or if preferred (e.g., to avoid a design where two method defaults form an infinite loop with each other), a new interface could be defined that had better defaults but still populated the same underlying methods using its #:derive-property functionality.

I think one of the overriding concerns for the design of a new method (or a whole new interface) is going to be performance, especially making sure the interfaces aren't doing a lot of unnecessary dynamic dispatch to each other in normal use. For performance's sake, I suspect a good design for the new equal-proc variant would be one that took an argument saying which kind of checking to do. But I can't say I know where the bottlenecks are on this.

@AlexKnauth
Copy link
Member Author

Oh I hadn't realized define-for-variants has now changed. Gotta add an extra #f at the end now I guess.

@AlexKnauth AlexKnauth force-pushed the equal-always branch 2 times, most recently from 522cf50 to 9c94eb6 Compare January 17, 2022 16:58
@rocketnia
Copy link
Contributor

rocketnia commented Mar 8, 2022

I recently started suggesting on the Rhombus equality thread that for more conventional arithmetic semantics, Rhombus's = would benefit from being an extensible form of Racket's numeric =, complete with equating certain values of meaningfully different precision and allowing certain values' comparisons to be false if not otherwise defined (like the way comparisons on NaN work). I also think this could coincide in with a useful kind of equality for reasoning about referential transparency. For the resulting notion of purity to be reasonably intuitive, I expect the most commonly defined behavior of = will coincide with equal-always?.

That got me back to thinking about this equal-always? situation that I found discouraging above:

[@rocketnia] This aspect of chaperone-of seems conceptually broken. If there's a user-defined struct out there somewhere that has only immutable fields, but it implements its equal? to do an "equal now" check of the fields' mutable contents, chaperone-of? is already an "equal now" operation, right? And if it's "equal now" in that case, then why does it bother checking for known mutable types or for mutable struct fields?

[@AlexKnauth]

Yes that's a problem, a struct that (1) acts as a wrapper around a mutable data structure, (2) explicitly implements equal+hash with a layer of equal-now looking into that mutable data structure, and (3) has all fields immutable, does trick chaperone-of?, and with this PR it tricks equal-always? to match chaperone-of?. Here's an example:

(struct bxwrp (box)
  #:methods gen:equal+hash
  [;; implement one layer of equal-now explicitly:
   (define (equal-proc self other =?)
     (=? (unbox (bxwrp-box self)) (unbox (bxwrp-box other))))
   (define (hash-proc self rec) (rec (unbox (bxwrp-box self))))
   (define (hash2-proc self rec) (rec (unbox (bxwrp-box self))))])

(chaperone-of? (bxwrp (box 1)) (bxwrp (box 1))) ;=> #true but ideal should be #false
(equal-always? (bxwrp (box 1)) (bxwrp (box 1))) ;=> #true but ideal should be #false

Because of this I'm at least glad that by default chaperone-of? and equal-always? don't delegate to prop:equal+hash when the struct is mutable, because adding #:mutable to the struct is the easiest fix on the user's end as this struct is "conceptually" mutable[...]

What if more than one bxwrp wraps the same box? Then those wrappers are indeed always going to be equal, despite the fact that they're not eq?. Having equal-always? behave like eq? for this type by tacking on #:mutable is one possible approach, but perhaps not the one the type author really prefers to go for.

It could be reasonable to add documentation to gen:equal+hash telling people that if they implement it for a structure type that doesn't have any mutable fields, and if they don't specialize equal-always? themselves, the chaperone-of? and equal-always? implementations will end up going through same path as equal?, and they should ideally define this path in an equal-always? way.

That documentation change would be something of a break in compatibility by telling users a new requirement they hadn't heard of before. However, the real introduction of this new requirement happened whenever chaperone-of?'s implementation started assuming it, and at this point it's hard to imagine a practical use case that's lost by making the requirement more explicit.

The issue I was thinking of at the time is actually slightly different than this bxwrp example. When I talked about a type that "implements its equal? to do an "equal now" check of the fields' mutable contents," I was imagining there must be a lot of code that does an "equal now" check not because it goes to the effort to do so, but because "equal now" is the single most obvious kind of deep check that currently exists. I was imagining an example like this:

(struct bxwrp3 (box)
  #:methods gen:equal+hash
  [(define (equal-proc self other =?)
     (equal? (bxwrp3-box self) (bxwrp3-box other)))
   (define (hash-proc self rec) (rec (bxwrp3-box self)))
   (define (hash2-proc self rec) (rec (bxwrp3-box self)))])

(chaperone-of? (bxwrp3 (box 1)) (bxwrp3 (box 1))) ;=> #true

Fortunately, this code can be fixed by calling the provided =? instead of equal?, and this is something people already tend to need to do for various reasons (to break cycles and to work properly with chaperone-of? and impersonator-of?). So this issue actually is much less of an issue than I thought.

In summary, I think my concerns with equal-always? are fading away. It turns out the main concern I had basically wasn't an issue, because the kind of legacy code I was imagining (bxwrp3) is already pretty well understood to be buggy and has a simple fix. Your example (bxwrp) is a more concerning instance of the genre, but that code also exhibits bugs already, just perhaps in a less well known way. It seems like describing this hazard in documentation is one way to make the situation better than before.

Helping people avoid the hazard is good too. I started to talk about that above with "a new method [or interface] could be added that [...] allowed the defaults to flow better," and in early February, we had a bunch of discussion about the details of this in the Rhombus #equality channel on the Racket Discord.

I don't recall exactly where we ended up with that. Maybe some of the details might still depend on how equality and ordered comparison would work in Rhombus?

@AlexKnauth
Copy link
Member Author

Closing in favor of #4236

@AlexKnauth AlexKnauth closed this May 21, 2022
@racket-discourse-github-bot

This pull request has been mentioned on Racket Discussions. There might be relevant details there:

https://racket.discourse.group/t/what-are-some-benefits-that-rhombus-development-has-brought-to-racket/2054/5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants