Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bypass validity checks when quoting Symbols and Keywords #2496

Merged
merged 1 commit into from
Aug 18, 2023

Conversation

scauligi
Copy link
Member

Moves checked creation of Symbols and Keywords to new functions hy.sym() and hy.kw().

Without speedsym (best of 5 runs):

$ time hy -c "(for [_ (range 100000)] 'a)"
real    0m0.699s
user    0m0.680s
sys     0m0.010s

With speedsym (best of 5 runs):

$ time hy -c "(for [_ (range 100000)] 'a)"
real    0m0.102s
user    0m0.086s
sys     0m0.018

I've been using Hy as part of an x86 instruction simulator, where we're hooking every instruction and running a bit of Hy code; the slowdown from creating Symbols (because of quoted forms) is enough to get our simulator to time out.

@Kodiologist
Copy link
Member

The idea is to tell users to use hy.sym and hy.kw instead of the constructors, right? Because if they use the constructors, they can make syntactically illegal symbols and keywords and bring back all the bugs that the checks guard against. That seems like a dangerous situation to me: that the constructors of most models are fine to use, but those of two particular models are dangerous.

Instead of putting the constructors off-limits, how about changing the code generated by quotation? I think the from_parser argument might already support this. Then you should still get the speedup you want.

@scauligi
Copy link
Member Author

scauligi commented Aug 14, 2023

Yeah, the intention was that users should use hy.sym and hy.kw instead of the constructors from hy.models, since we use the Symbol constructor pretty pervasively throughout the codebase right now. Changing just quotation would speed up that specific case, but other places that explicitly or implicitly create Symbols would still be left on the slow path.

I'm also not totally clear on what makes these symbols dangerous? Since hy.mangle does a good job of mangling even numerals and whitespace into pythoncally valid identifiers. At worst if you assign (do-mac `(setv ~(Symbol "foo bar" "find me"))) you would have to access it via something like (do-mac (Symbol "foo bar")) or (get (locals) (hy.mangle "foo bar")), but that doesn't seem like a deal-breaker.

@Kodiologist
Copy link
Member

…but other places that explicitly or implicitly create Symbols would still be left on the slow path.

If you want them to skip the check, too, you need to be sure that syntactically illegal symbols can't get there in the first place. If they can, then we still need the check.

I'm also not totally clear on what makes these symbols dangerous?

You can see #2064 for the original PR. As an example, what should (hy.repr (hy.models.Symbol "foo bar")) return? Remember that mangling is part of the compilation process that changes some symbols to ast.Names and not inherent to a symbol: (!= 'foo-bar 'foo_bar) although (= (hy.mangle 'foo-bar) (hy.mangle 'foo_bar)).

@scauligi
Copy link
Member Author

Well... the reasoning for #2064 banning invalid symbols is that they're unlikely to be used, and so banning them is easier -- but now the validation is imposing a runtime penalty on the very-likely path.

what should (hy.repr (hy.models.Symbol "foo bar")) return?

Going back to #1117, we can add a syntax for string-literal-Symbols (and string-literal-Keywords), and it should be pretty easy with the new reader. Then we could have something like the following:

=> (print (hy.repr (hy.models.Symbol "foo bar")))
#"foo bar"
=> (print (hy.repr (hy.models.Keyword "foo bar")))
:"foo bar"

@Kodiologist
Copy link
Member

now the validation is imposing a runtime penalty on the very-likely path

Sure, I see the problem, and I think what I've proposed, of enabling speedups when we know the input can't be syntactically illegal (because e.g. we really did read it as a symbol originally) is a viable way forward. #1117 remains possible, but unmotivated.

@scauligi
Copy link
Member Author

remains possible, but unmotivated

I think a reasonable motivation for this is "I created a weirdly-named Symbol and now I'm trying to access/debug/repr it", as well as allowing (almost) all valid strings to also be Symbols and Keywords.

@Kodiologist
Copy link
Member

Kodiologist commented Aug 15, 2023

But you can't make a symbol of this sort in the first place, under the status quo. By preventing the problem to start with, we avoid the need to add more features to the language to cope with it.

This is feeling kinda circular, and a bit outside the scope of speeding up quote. I'm not going to say that adding syntax for more kinds of symbols and keywords is totally out of the question, but I think you're making the issue of quote being slow vastly more complex than it needs to be.

@scauligi scauligi force-pushed the speedsym branch 2 times, most recently from daf2ab5 to 42ef7e0 Compare August 15, 2023 00:16
@scauligi
Copy link
Member Author

$ time hy -c "(for [_ (range 100000)] 'a)"
real    0m0.123s
user    0m0.110s
sys     0m0.013s

Well, keeping the PR to just the quoted forms still gets us most of the speedup, so I'm still happy with this.

@scauligi scauligi changed the title Move validity checks out of Symbol and Keyword and into separate functions Bypass validity checks when quoting Symbols and Keywords Aug 15, 2023
@Kodiologist
Copy link
Member

So it was that easy, after all. I guess I guessed correctly.

@scauligi
Copy link
Member Author

I mean, this was one of the first things I did in my initial experimentation, it's just that I got (maybe too) focused on chasing the last bits of perf and also I always end up jumping at the chance to muck around with new syntax features

@Kodiologist
Copy link
Member

Well, the beauty of reader macros is that you can muck around with new syntax features entirely in userland. Or in Hyrule.

@Kodiologist Kodiologist merged commit c0a1897 into hylang:master Aug 18, 2023
9 checks passed
@scauligi scauligi deleted the speedsym branch August 22, 2023 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants