Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Hy builtins as actual builtins #1979

Merged
merged 10 commits into from
Mar 16, 2021

Conversation

scauligi
Copy link
Member

@scauligi scauligi commented Feb 23, 2021

Fixes #791, fixes #1045, fixes #1978.

I might have gotten a little carried away but I think I have a working implementation for getting rid of autoimports.

WIth this change, the compiler inserts a call to hy.importer._hy_inject_builtins at the top of each module. This function adds all of hy's core/stdlib to the builtins module so that they can be called as if they had been imported.

Upsides:

  • No more autoimports and associated shadowing bugs
  • Does not add to exported symbols in modules

Possible downsides:

  • Had to rename eval to hy-eval so it doesn't clash with python's builtin (alternatively, we could drop eval from the core as mentioned in Don't include the namespace in the import's name for hy models. #1644 and have it used as hy.eval instead)
  • All Hy models need qualified names now (e.g. can't reference HySymbol directly, but we can still use hy.HySymbol)
  • Modifying builtins affects non-Hy code as well. E.g.:
>>> import hy
>>> import empty  # a completely empty .hy file
>>> is_keyword
<function is_keyword at 0x7f14686ead30>

These will get shadowed by any local assignments though, so properly written python code will be unaffected. However, it's still a potentially confusing effect.

@peaceamongworlds
Copy link
Contributor

This is really cool!

I think it would make a lot of sense to follow up with #1644 after this.

@Kodiologist
Copy link
Member

I'd say this isn't ideal since we're still effectively polluting namespaces; we're just doing in a different, sneakier way. But it's an improvement over the status quo given things like #791.

Why not include the Hy model names in the builtins injection? Then you could still say HySymbol instead of hy.HySymbol, right?

How does this change #1978? You're only touching Python-level imports, not Hy's macro namespaces, right?

@scauligi
Copy link
Member Author

Why not include the Hy model names in the builtins injection?

That's also an option, I can update to do that if we're not going the #1644 route.

How does this change #1978?

Investigating further, it doesn't — I got too excited. Looks like I was running into the compiled/not-compiled effect again. E.g. in the example in #1978, if hy.contrib.walk is not in the pycache then its __macros__ re-exports core, but on the next run it does not pollute __macros__ (and I was originally testing by switching back and forth between branches but not clearing the pycache).

@scauligi
Copy link
Member Author

I believe if I add a __macros__ to builtins then I can pull a similar trick during macroexpand to have it check the builtins as the final module, so that we don't need to pollute each individual module's __macros__. I'll investigate this, which would let us fix #1978 for real.

@Kodiologist
Copy link
Member

I think #1644 is going to happen as part of the bigger core cleanup, whereas this PR is a stopgap method to fix some bugs before we get that. Might as well not add some breaking changes before then.

@scauligi scauligi force-pushed the builtins-prelude branch 2 times, most recently from 2d63356 to afb452f Compare February 26, 2021 01:53
@scauligi
Copy link
Member Author

scauligi commented Feb 26, 2021

Ok, I've cleaned it up a bunch more. Hy models are now part of the core (and so don't need to be qualified), and the builtin macros now go into the __macros__ dict of the builtins module, which macroexpand checks after iterating through any other modules in scope. This now fixes #1978 for real.

Furthermore, the builtins injection is now something that happens when you import hy instead of explicitly at the top of each module, so now the only "extra" thing added to a hy module during compilation is an import hy at the top and everything else Just Works™.

In other news, I spent several days working through various other methods to try and get the core accessible without bodging it into builtins, but most of them didn't work out for one reason or another.
The closest viable option was to wrap the contents of a hy file inside of a dummy function then re-export the locals() of the function at the end (and replace the entry in sys.modules with a fake dynamically created module), but this causes several other things to break (have to rewrite from module import *, have to prevent "top-level" returns, globals() now has different behavior from the perspective of a hy module, ...). Compared to that, stuffing things into builtins is actually much cleaner.

Other failed experiments, documented for posterity:

  • Wrapping the hy module in a class definition
    • Definitions inside of a class aren't available until the entire class has been defined. This breaks calling one function from another function.
  • Wrapping the hy module in a with statement to restore the builtins afterwards
    • Calling a function afterwards doesn't trigger __enter__ again, so exported functions become broken.
  • Creating to a (local) dictionary called __builtins__
    • During the initial import/execution of the module, python always uses the "real" builtins module and skips any globals or locals with the name __builtins__.

@Kodiologist
Copy link
Member

Don't forget new tests and NEWS items for the new fixes.

Instead of providing hy-eval as a user-visible name, how about telling users to call this function as hy.eval?

-                new_ast = ast.Module(
-                    exec_ast.body + [ast.Expr(eval_ast.body)],
-                    type_ignores=[])
+                body = exec_ast.body
+                if eval_ast is not None:
+                    body.append(ast.Expr(eval_ast.body, lineno=0, col_offset=0))
+                new_ast = ast.Module(body, type_ignores=[])

I don't understand this change. The new version looks to be equivalent, longer, and unrelated to the other changes in this commit.

-            func=asty.Name(obj, id="HyKeyword", ctx=ast.Load()),
+            func=asty.Attribute(
+                obj,
+                value=asty.Name(
+                    obj,
+                    id="hy",
+                    ctx=ast.Load()),
+                attr="HyKeyword",
+                ctx=ast.Load()),

This looks like a leftover from earlier.

for module in hy.core.STDLIB:
    mod = importlib.import_module(module)
    if hasattr(mod, '__all__'):
        names = mod.__all__
    else:
        names = filter(lambda name: name[0] != '_', mod.__dict__)
    for name in names:
        setattr(builtins, name, mod.__dict__[name])

I think you can simplify this by making sure __all__ is set in each of the STDLIB modules. Then the loop can be for name in mod.__all__.

-(import [hy.models [HyList HySymbol]])
-
 (eval-and-compile
-  (import [hy.core.language [*]]))
+  (import [hy.core.shadow [*]]
+          [hy.core.language [*]]))

These changes look like they were intended for an earlier commit.

+* Hy builtins and macros are now added to the global python builtins instead of
+  being imported into each module.

This is an undocumented implementation detail and doesn't need to be in NEWS.

+* Hy models are now included in `hy.core.language`.

This is a bugfix, not a new feature, and it should be described in a way so that users unfamiliar with Hy's internals can tell what it means, like "Fixed a bug in which Hy model objects weren't always in scope".

@scauligi
Copy link
Member Author

Update/add to NEWS, add tests, remove vestigial changes

Can do!

I don't understand this change. The new version looks to be equivalent, longer, and unrelated to the other changes in this commit.

I want hy --spy to display the prelude that hy inserts, but without that change:

$ hy --spy

Exception in AST callback:
Traceback (most recent call last):
  File "/.../hy/hy/cmdline.py", line 312, in ast_callback
    exec_ast.body + [ast.Expr(eval_ast.body)],
AttributeError: 'NoneType' object has no attribute 'body'

hy 0.20.0+52.ge7125d57.dirty using CPython(default) 3.9.1 on Linux
=>

I think you can simplify this by making sure __all__ is set in each of the STDLIB modules. Then the loop can be for name in mod.__all__.

Can do. In fact, I don't think we need STDLIB itself anymore; I can just have hy/core/__init__.py import everything into its own namespace and then anything else can just from hy.core import *.

@Kodiologist
Copy link
Member

I want hy --spy to display the prelude that hy inserts

Huh, fair enough. I'm not sure I entirely understand how your change fixes that, but I think it might be enough to change exec_ast.body + [ast.Expr(eval_ast.body)] to something like exec_ast.body + ([] if eval_ast is None else [ast.Expr(eval_ast.body)]).

In fact, I don't think we need STDLIB itself anymore; I can just have hy/core/__init__.py import everything into its own namespace and then anything else can just from hy.core import *.

Cool, sounds good.

@Kodiologist
Copy link
Member

Heads up, I noticed this in the Python 3.10.0a5 changelog, which might have consequences for this PR:

Functions have a new __builtins__ attribute which is used to look for builtin symbols when a function is executed, instead of looking into __globals__['__builtins__']. The attribute is initialized from __globals__["__builtins__"] if it exists, else from the current builtins.

@scauligi
Copy link
Member Author

Oh that's interesting. I'll have to spin up a copy of 3.10 and test this then, make sure it still works.

@scauligi
Copy link
Member Author

scauligi commented Mar 3, 2021

Finally back around to this with the requested changes.

Playing around with 3.10 a bit it looks like Python's new __builtins__ functionality won't affect this implementation.

@Kodiologist
Copy link
Member

Great work.

# Create an empty __macros__ so that macroexpand doesn't complain
builtins.__macros__ = {}

Instead of this, you can change fn in mod.__macros__ in macroexpand to fn in getattr(mod, '__macros__', []).

exec('from hy.core import *', builtins.__dict__)

It's weird that you would need to exec a string in order to do this, particularly since the string is fixed. Is there a way to do this with a function in importlib instead?

builtins.__hy__ = hy.__version__

Why hy.__version__? Can't it just be True? You could then name the variable __hy_injected__ or something instead.

It looks like we neglected to update test_futures_imports for #1977. Could you change it to use a different built-in function, in a separate commit?

is-keyword is better written keyword?.

@scauligi
Copy link
Member Author

scauligi commented Mar 4, 2021

Great work.

Thanks! I feel like I've learned... quite a lot about how Python imports and namespacing works.

exec with fixed string

The only other way I've found of doing this was the previous code that iterated through hy.core.__all__ and set builtins directly. I couldn't find any other way of programmatically importing * from a module, but using exec felt a bit clearer to me than iterating over __all__ since it states exactly what we're doing (importing everything from the core into builtins). It is a bit unconventional though, so I can revert it back to the previous method if we don't like it.

Using is-keyword instead of keyword?

Now that's just embarrassing, I clearly need to spend more time getting Hy.

@Kodiologist
Copy link
Member

I couldn't find any other way of programmatically importing * from a module

Heckin' weird. I'm going to give it a shot if only to convince myself that you're right.

@scauligi
Copy link
Member Author

scauligi commented Mar 5, 2021

The changes to macro handling broke doc from finding builtin macros, so I fixed that... which also fixed the original issue in #1970, so I added those (previously intermittently failing) tests back in.

@allison-casey
Copy link
Contributor

allison-casey commented Mar 5, 2021

This is really great work scauligi. I'm going to try and do a more comprehensive review tonight. Just wanted to pop in and mention that importlib doesn't have a programatic way of doing import *. The best I could find is what I used in HYSTARTUP

;; hy.cmdline.HyREPL
loader = HyLoader("__hystartup__", os.environ.get("HYSTARTUP"))
mod = loader.load_module()
imports = mod.__dict__.get(
    '__all__',
    [name for name in mod.__dict__ if not name.startswith("_")]
)
imports = {name: mod.__dict__[name] for name in imports}

# Load imports and defs
self.locals.update(imports)

# load module macros
require(mod, self.module, assignments='ALL')

This may be slightly more idiomatic than exec('from hy.core import *', builtins.__dict__) but they do essentially the same thing.

@Kodiologist
Copy link
Member

Looking through our options, I think it's probably best to loop over __all__ explicitly. It can still be concise with something like this (untested):

core = __import__('hy.core')
builtins.__dict__.update({k: core.getattr(k) for k in core.__all__})

@Kodiologist
Copy link
Member

(with [(pytest.raises NameError)]
    (doc does-not-exist))

I'm not sure this test is right. In the future, doc might do more of its work at compile-time, and this case could raise an error at compile-time instead of runtime. I'd say just drop the test.

@allison-casey
Copy link
Contributor

allison-casey commented Mar 5, 2021

Would probably want to check that the item exists in __dict__ just in case to cover our butts in case something gets mistyped in __all__
{k: getattr(core, k) for k in core.__all__ if k in core.__dict__}

Another thing i'm noticing is that if you start a regular python repl and import hy, it also imports everything into builtins which seems less than ideal.

python-venv-3.9.2 ❯ python
Python 3.9.2 (default, Feb 20 2021, 00:00:00)
[GCC 10.2.1 20201125 (Red Hat 10.2.1-9)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> rest
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'rest' is not defined
>>> import hy
>>> rest
<function rest at 0x7f050447bdc0>

@Kodiologist
Copy link
Member

That change would make the error silent, so it wouldn't be detected for a while, whereas not checking means a crash will occur as soon as you try to run Hy after messing up __all__, as desired.

Another thing i'm noticing is that if you start a regular python repl and import hy, it also imports everything into builtins which seems less than ideal.

Yeah, this is an inherent limitation. Remember that this is only a stopgap measure anyway.

@scauligi
Copy link
Member Author

scauligi commented Mar 5, 2021

doc raising NameError

I'm not sure I follow? We definitely want doc to error out if it can't find a name, previously it would (rather confusingly) just give the help page for NoneType instead of failing (which is why I added the test as a regression — I can add a comment).

If we change doc to do more compile-time work then we can update the test to check for a compile-time error, but either way I think raising an error at some point should be part of doc's spec.

import hy pollutes builtins

Until we can figure out some better way of managing the namespace, I'm willing to treat this as a feature rather than a bug. After all, Hy is in some sense a language extension to Python.

iterate and set from __all__ instead of using exec

Can do!

@Kodiologist
Copy link
Member

previously it would (rather confusingly) just give the help page for NoneType instead of failing (which is why I added the test as a regression — I can add a comment)

Okay, I didn't realize such a bug existed. That makes more sense. You could add a comment like "check that an error is raised instead of documenting None".

@allison-casey
Copy link
Contributor

Yeah, this is an inherent limitation. Remember that this is only a stopgap measure anyway.

could we add an explicit call to _inject_builtins at the top of the module along with import hy? So it looks like this in --spy:

import hy
hy._inject_builtins()

@scauligi
Copy link
Member Author

scauligi commented Mar 8, 2021

could we add an explicit call to _inject_builtins at the top of each module?

That's how I originally had it, but this just delays the injection to when you import a hy module:

Python 3.9.2 (default, Feb 20 2021, 18:40:11)
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hy
>>> is_keyword
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'is_keyword' is not defined
>>> import empty
>>> is_keyword
<function is_keyword at 0x7f537ee19670>

To me, it's less confusing to have the side effects happen from importing hy than from importing some other (potentially empty!) hy module.


I'm increasingly of the opinion that infecting builtins is a feature and not a bug; however, if we still want to pursue keeping builtins clean, another avenue I can try is to replace all use of hy-core symbols like so:

=> (a complicated expression involving keyword?)
a(complicated, expression, involving,
  locals().get("is_keyword", globals().get("is_keyword", hy.core.is_keyword)))

On the plus side, I believe (without having tested it) that it should be able to cover any corner cases; on the minus side it's going to make spy/hy2py output (more) messy.

@Kodiologist
Copy link
Member

another avenue I can try is to replace all use of hy-core symbols like so

Yeah, I'd recommend against that. I think it would be a bit of a mess.

@allison-casey
Copy link
Contributor

allison-casey commented Mar 13, 2021

I may have found an option that makes it so we don't have to inject directly into python's builtins and polluting the python namespace with our core. From the python docs here

most modules have the name __builtins__ made available as part of their globals. The value of __builtins__ is normally either this module or the value of this module’s __dict__ attribute.

So modules pull from their own __builtins__ reference. if we replace our hy modules with our own __builtins__ like we patch in __macros__ via the importer then we could separate the two. as a demonstration:

hy 0.20.0+78.ga8ccb7b3 using CPython(default) 3.9.2 on Linux
=> (import builtins)
=> (setv new-builtins {#** __builtins__})
=> (.pop new-builtins "zip")
<class 'zip'>
=> (setv __builtins__ new-builtins)
=> zip
Traceback (most recent call last):
  File "stdin-f13e27693c85aed522df8c3fcb0bb0110ca54e14", line 1, in <module>
    zip
NameError: name 'zip' is not defined
=> builtins.zip
<class 'zip'>

@Kodiologist
Copy link
Member

I'm going to write a proposal for the core cleanup soon, which, once actually accomplished, will make most of this work obsolete, so maybe it's not worth a lot of effort.

@Kodiologist Kodiologist mentioned this pull request Mar 14, 2021
31 tasks
@scauligi scauligi force-pushed the builtins-prelude branch 2 times, most recently from 024db29 to 3ed4b50 Compare March 15, 2021 02:11
@scauligi
Copy link
Member Author

scauligi commented Mar 15, 2021

Added the comment to the test for doc and updated the injection to not use exec, only took me over a week (sigh, finals).


Overwriting __builtins__

This is one of the implementations I tried, unfortunately python (for whatever reason) doesn't respect the "local" __builtins__ on the initial exec/import of the module:

# overwrite.py
__builtins__ = {"test": "it worked!"}
print(zip)
print(test)
$ python3 overwrite.py
<class 'zip'>
Traceback (most recent call last):
  File "/home/scauligi/workspace/hy/sandbox/overwrite.py", line 4, in <module>
    print(test)
NameError: name 'test' is not defined

The great core shakeup

Should we still merge in builtins injection for the time being? Or just drop it in favor of #1992's design?

@Kodiologist
Copy link
Member

Kodiologist commented Mar 15, 2021

It's your choice. #1992 is going to take a while to happen and I probably would want to make a new release before a change that big (or at least, before some of those changes). If you want to go ahead with this, rebase it and I'll take a last look.

@scauligi
Copy link
Member Author

Might as well then, since it's already done the dirty work of ripping out autoimport insertion.

@Kodiologist Kodiologist merged commit 1429a56 into hylang:master Mar 16, 2021
@scauligi scauligi deleted the builtins-prelude branch March 16, 2021 20:00
@allison-casey allison-casey mentioned this pull request Mar 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants