Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix eq? for strings #98

Merged
merged 3 commits into from Nov 20, 2019

Conversation

@jitwit
Copy link
Contributor

jitwit commented Nov 13, 2019

I think this fixes the reference/value equality issue for strings (#97).

  • added%string=? to rt.mjs which uses String.valueOf() when checking for equality.
  • %string=? is used by string=? and has been added to runtime-imports and effect-free-callee?

Would it be better to write string=? in pure scheme? I was also wondering about how functions like string-length and string-ref should be added.

jitwit added 2 commits Nov 13, 2019
@eholk
eholk approved these changes Nov 13, 2019
rt/rt.mjs Outdated
@@ -124,15 +126,16 @@ function rt(engine) {
// that's not specified on the Scheme language level. Using JS
// strings for symbols allows us to compare with ===, as is
// required.
'string?': x => typeof(x) === 'string' && x.charAt(0) === 's',
'string?': x => x instanceof String && x.charAt(0) === 's',

This comment has been minimized.

Copy link
@eholk

eholk Nov 13, 2019

Member

This looks like it's getting into subtleties of JavaScript that I'm not as familiar with. What's the difference here between typeof(x) === 'string' and x instanceof String?

This comment has been minimized.

Copy link
@jitwit

jitwit Nov 13, 2019

Author Contributor

From what I understand typeof returns a string indicating the type of a primitive value or returns'object' for everything else. In this case,instanceof looks for String.prototype in the prototype chain of x.

To hopefully get the right behavior with eq? I modified the functions that construct strings to wrap with new String(s) (%list->string and %symbol->string). Now, strings and gensyms are string objects, while symbols are still primitive string values.

This comment has been minimized.

Copy link
@wingo

wingo Nov 14, 2019

Collaborator

Pretty sure the typeof check is cheaper. Would be nice to preserve speed for string literals.

This comment has been minimized.

Copy link
@eholk

eholk Nov 19, 2019

Member

I just ran 9 runs of benchmark-compiler.sh on the current master and this change, and averaged the results for the stage3 compile.

Master: 0.684 seconds
This PR: 0.706 seconds

So, this version is about 3% slower (with all the caveats around whether compiling the compiler is a good benchmark; if I recall correctly, Schism doesn't a lot of string comparison, but maybe those are just leftover memories from my horrible initial implementation of symbols).

It's not a small regression, and I'm willing to take it for correctness, but it's the kind of thing that could add up over time. Maybe we can do this in a way that has equivalent performance?

This comment has been minimized.

Copy link
@jitwit

jitwit Nov 20, 2019

Author Contributor

Another downside to this pr is that the "fix" complicates the rt somewhat with mixed benefits. Factoring in the performance hit and the argument against mutable strings, a split from the standard seems reasonable, at least without a better solution to get the desired behavior.

How javascript treats strings and how schism uses === for eq?, makes it not obvious to me how else to get the distinguishable string behavior. Because === compares for value equality with 'string's but not with 'object's, wouldn't we need to change eq? or the representation for strings? Working with eq? doesn't really side step the problem at any rate.

This comment has been minimized.

Copy link
@eholk

eholk Nov 20, 2019

Member

It seems like what we'd have to do is wrap Schism strings in a JavaScript object to make === do the right thing.

It's kind of bizarre to me that JavaScript doesn't have a reference equality operator. Or, maybe it does for objects but strings aren't considered objects. Maybe the idea is that strings are values like integers are.

This comment has been minimized.

Copy link
@jitwit

jitwit Nov 20, 2019

Author Contributor

I gather there are string values but also string objects. Here, I'm wrapping strings by new String() constructor to get === to compare them by reference.

Actually, looking through the TSPL4 book, symbol->string strings should be treated differently. I just pushed some changes to allow for both behaviors.

The %string=? predicate first checks if typeof gives 'string' before doing the isinstanceof check. Hopefully that recovers some of the performance, and not make it worse!

This comment has been minimized.

Copy link
@eholk

eholk Nov 20, 2019

Member

I just ran my benchmark again, although on a different computer this time. The average of 9 runs with this change was 1.117 s versus 1.138 s without. So, perhaps slight improvement, but likely just within the noise. I think we're at performance parity now.

Thanks for sticking with this!

@awwx

This comment has been minimized.

Copy link

awwx commented Nov 20, 2019

JavaScript primitive strings are immutable and there isn't a way to check them for equality by identity -- you can only compare them by value.

Given that you've chosen have immutable strings as well, you may want to consider whether you actually need eq? to check for string equality by identity. (You might be happy just checking strings for equality by value).

JavaScript String objects can mostly be used in places where JavaScript primitive strings are expected, but there are a couple of differences (eval, and of course code that does a typeof check, and IIRC some edge cases where data is being shared across different contexts such as iframes).

Thus if you're doing JavaScript interop, you'll want to ensure that you unwrap your Strings so that you're passing JavaScript primitive strings to JavaScript code.

This can be confusing because if you accidentally pass a String to external JavaScript code it often mostly works... until you hit one of the edge cases and then it doesn't.

Given the potential for confusion, if you want to wrap strings in a JavaScript object so that you can test for equality by identity, you may want to use your own JavaScript object wrapper instead of using JavaScript's String object... that way it'll be obvious if you accidentally pass the wrapper to code that's not expecting it.

@eholk

This comment has been minimized.

Copy link
Member

eholk commented Nov 20, 2019

Thanks for the info, @awwx.

It would potentially make sense to fold all of Schism's values into a SchismValue class, or something like that, rather than using bare JavaScript types. There are pros and cons though. There's something nice about Schism values being interoperable with JS with no wrapping or unwrapping. It's a discussion that's probably worth having.

For now, I think this PR is in a good state, so I'm going to go ahead and merge it. We may want to iterate some more on this, but I think this is a good step for now.

@eholk eholk merged commit 2aa18f0 into google:master Nov 20, 2019
2 checks passed
2 checks passed
cla/google All necessary CLAs are signed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.