Skip to content
This repository has been archived by the owner on Aug 31, 2021. It is now read-only.

fix eq? for strings #98

Merged
merged 3 commits into from
Nov 20, 2019
Merged

fix eq? for strings #98

merged 3 commits into from
Nov 20, 2019

Conversation

jitwit
Copy link
Contributor

@jitwit jitwit commented Nov 13, 2019

I think this fixes the reference/value equality issue for strings (#97).

  • added%string=? to rt.mjs which uses String.valueOf() when checking for equality.
  • %string=? is used by string=? and has been added to runtime-imports and effect-free-callee?

Would it be better to write string=? in pure scheme? I was also wondering about how functions like string-length and string-ref should be added.

rt/rt.mjs Outdated
@@ -124,15 +126,16 @@ function rt(engine) {
// that's not specified on the Scheme language level. Using JS
// strings for symbols allows us to compare with ===, as is
// required.
'string?': x => typeof(x) === 'string' && x.charAt(0) === 's',
'string?': x => x instanceof String && x.charAt(0) === 's',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's getting into subtleties of JavaScript that I'm not as familiar with. What's the difference here between typeof(x) === 'string' and x instanceof String?

Copy link
Contributor Author

@jitwit jitwit Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand typeof returns a string indicating the type of a primitive value or returns'object' for everything else. In this case,instanceof looks for String.prototype in the prototype chain of x.

To hopefully get the right behavior with eq? I modified the functions that construct strings to wrap with new String(s) (%list->string and %symbol->string). Now, strings and gensyms are string objects, while symbols are still primitive string values.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure the typeof check is cheaper. Would be nice to preserve speed for string literals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just ran 9 runs of benchmark-compiler.sh on the current master and this change, and averaged the results for the stage3 compile.

Master: 0.684 seconds
This PR: 0.706 seconds

So, this version is about 3% slower (with all the caveats around whether compiling the compiler is a good benchmark; if I recall correctly, Schism doesn't a lot of string comparison, but maybe those are just leftover memories from my horrible initial implementation of symbols).

It's not a small regression, and I'm willing to take it for correctness, but it's the kind of thing that could add up over time. Maybe we can do this in a way that has equivalent performance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another downside to this pr is that the "fix" complicates the rt somewhat with mixed benefits. Factoring in the performance hit and the argument against mutable strings, a split from the standard seems reasonable, at least without a better solution to get the desired behavior.

How javascript treats strings and how schism uses === for eq?, makes it not obvious to me how else to get the distinguishable string behavior. Because === compares for value equality with 'string's but not with 'object's, wouldn't we need to change eq? or the representation for strings? Working with eq? doesn't really side step the problem at any rate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like what we'd have to do is wrap Schism strings in a JavaScript object to make === do the right thing.

It's kind of bizarre to me that JavaScript doesn't have a reference equality operator. Or, maybe it does for objects but strings aren't considered objects. Maybe the idea is that strings are values like integers are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gather there are string values but also string objects. Here, I'm wrapping strings by new String() constructor to get === to compare them by reference.

Actually, looking through the TSPL4 book, symbol->string strings should be treated differently. I just pushed some changes to allow for both behaviors.

The %string=? predicate first checks if typeof gives 'string' before doing the isinstanceof check. Hopefully that recovers some of the performance, and not make it worse!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just ran my benchmark again, although on a different computer this time. The average of 9 runs with this change was 1.117 s versus 1.138 s without. So, perhaps slight improvement, but likely just within the noise. I think we're at performance parity now.

Thanks for sticking with this!

@awwx
Copy link

awwx commented Nov 20, 2019

JavaScript primitive strings are immutable and there isn't a way to check them for equality by identity -- you can only compare them by value.

Given that you've chosen have immutable strings as well, you may want to consider whether you actually need eq? to check for string equality by identity. (You might be happy just checking strings for equality by value).

JavaScript String objects can mostly be used in places where JavaScript primitive strings are expected, but there are a couple of differences (eval, and of course code that does a typeof check, and IIRC some edge cases where data is being shared across different contexts such as iframes).

Thus if you're doing JavaScript interop, you'll want to ensure that you unwrap your Strings so that you're passing JavaScript primitive strings to JavaScript code.

This can be confusing because if you accidentally pass a String to external JavaScript code it often mostly works... until you hit one of the edge cases and then it doesn't.

Given the potential for confusion, if you want to wrap strings in a JavaScript object so that you can test for equality by identity, you may want to use your own JavaScript object wrapper instead of using JavaScript's String object... that way it'll be obvious if you accidentally pass the wrapper to code that's not expecting it.

@eholk
Copy link
Contributor

eholk commented Nov 20, 2019

Thanks for the info, @awwx.

It would potentially make sense to fold all of Schism's values into a SchismValue class, or something like that, rather than using bare JavaScript types. There are pros and cons though. There's something nice about Schism values being interoperable with JS with no wrapping or unwrapping. It's a discussion that's probably worth having.

For now, I think this PR is in a good state, so I'm going to go ahead and merge it. We may want to iterate some more on this, but I think this is a good step for now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants