Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement compiled mode for Perlang #406

Open
8 of 12 tasks
perlun opened this issue Aug 12, 2023 · 4 comments
Open
8 of 12 tasks

Implement compiled mode for Perlang #406

perlun opened this issue Aug 12, 2023 · 4 comments
Labels
compiled mode Issues which are relevant in compiled mode enhancement New feature or request
Milestone

Comments

@perlun
Copy link
Collaborator

perlun commented Aug 12, 2023

In #396, I described the recent events leading up to me trying out what LLVM can do for us, in terms of making it possible to run Perlang programs completely independent of the .NET platform.

After that comment was written, and some discussions I had with an old friend of mine (@diwic - thanks a lot to you! 🙏), I started hacking on this and doing a little experiment: How hard would it be to write a compiler for Perlang, which emits C++ code, compiles this code, and then runs the end result? This is obviously not the "final solution" in any way and it is admittedly a bit clumsy. Still, if it was good enough for Bjarne Strousrup, it ought to be good enough for me as well. (Naturally, Strousrup's preprocessor and later Cfront compiler didn't emit C++, but you get the picture.)

I'm setting the milestone for this to 0.4.0, but naturally, given the sheer size of this task, the compiler will in no way be complete in 0.4.0. But it'll probably work to the point where I feel comfortable about pushing it out to the public.

Rough steps

  • Implement a compiler which translates the syntax tree for all/most valid Perlang programs into valid C++ code, and compiles and runs the result: (compiler) Add first steps towards experimental compiler #409
  • Implement a C/C++-based stdlib to support the above: (stdlib) Add C++-based stdlib project #407.
    • Make it possible to write unit and/or integration tests for this. We probably have to write these in C or C++ for now. cmocka is a useful unit test library for C that I have used elsewhere.
    • Add support for BigInt: Add support for BigInt in compiled mode #415
    • Distribute the (compiled) stdlib along with snapshot builds. This involves a bit of complexity, since native C++ code has historically only been able to compile on the same platform as the CI job is running on. We'll need to investigate if clang makes this easier for us.
      • In line with the next point, I think it's fine if we are Linux and amd64-only at this point (in compiled mode). In other words, we'll provide a Linux amd64 binary of the stdlib for now and emit an error message on other platforms stating that experimental compilation is not yet supported.
    • We will keep things simple in the 0.4.0 milestone and only support compiled mode on Linux. This makes the above easier. Going forward, we'll need to start building releases separately on each platform (i.e. build macOS on a macOS CI runner, build Linux binaries on Linux and so forth). I'll create a separate issue for this at some point and add a link to it here.
    • Implemented as of (ci) Include stdlib in Linux-based .tar.gz snapshots #445, with the above limitation (Linux-only).
  • Make sure PerlangCompiler uses the stdlib artifacts (.so/.a files and .h/.hpp header files), when being executed from a snapshot build.
    • The only thing that will prevent this from happening is if $PERLANG_ROOT is set. $PERLANG_ROOT is still used when running Perlang from source, so let's leave this as-is for now.
  • Once this is stable enough, consider dropping interpreted mode (to avoid having to always make "two implementations" for all new functionality going into the library). Challenge: this will make it hard/impossible to support the REPL though, so ideally we would keep this until we can reimplement the REPL on top of LLVM instead.
    • I am currently (2023-11-03) leaning towards dropping (parts of) the REPL soon, perhaps in the 0.5.0 or 0.6.0 release. This will make things simpler and free us from having to keeping it working all the time, since it won't be working in compiled anyway (for quite a long time, realistically speaking). Once the Perlang compiler is mature enough to be able to interface with LLVM to generate machine code for an arbitrary Perlang expression tree, we can reimplement the REPL on top of this.

      Suggested approach: make some "glue tooling" for interfacing between Perlang and C++ (and perhaps between Perlang and C# in the intermediate stage), so that we can expose the Perlang AST types to a little C++ helper library. The helper library will then consume the LLVM headers and emit machine code for the Perlang AST.

  • Figure out how to answer hard questions, like how to cast an ASCIIString to String (https://github.com/perlang-org/perlang/pull/451/files#r1548516040)
    • Fixed (or worked around) by (stdlib) Wrap ASCIIString in std::shared_ptr<T> #453, which should be "good enough" for now. As the compiler matures (and we can eventually move away from relying too much on C++), we can rework this to use more stack-based ASCIIString instances where possible, to reduce the number of heap allocations.
  • Implement some of the obvious missing string-related operations
  • Implement some mechanism for multi-file projects (like a "build system" of some form, like MSBuild or cargo)
    • TODO: Definitely deserves an issue of its own. A quick-and-dirty approach could be to support a perlang . or perlang <some-directory> approach, i.e. compile all files in a given directory; this seems to be similar to how https://vlang.io/ does it. The easy way here would be to just emit a single C++ file; if we do it like this, I think we can postpone the "build system" question for (perhaps much) later.
  • Implement a way to call Perlang code from C#, by compiling the Perlang code to one or more .so (subsequently .dll on Windows) files.
    • Has been started, TODO: add reference to PR when there is one.
  • Implement a way to do "reverse P/Invoke", i.e. expose Perlang code as native functions for calling them from managed C# code.
  • Once the compiler is in place and we have the required mechanics for creating native libraries with Perlang, start planning on gradually rewriting the Perlang compiler in Perlang. The "easiest" way is probably to start rewriting some isolated part of it, and call into the Perlang (native) code from C#.
    • The bootstrapping can be done using a "stable" version of the "compile-via-C++" compiler.
    • Once we have that bootstrapped, we can then subsequently move to depend on the first "stable" version which can compile to native code without any dependency on C++; our only dependency will be on the LLVM libraries at this point. (Challenge: consuming LLVM from non-C++ languages can be impractical. We might need to write some C++-based glue code in the Perlang compiler to make this happen, as described in one of the previous points.)
    • Should also have an issue of its own: Rewrite the Perlang compiler in Perlang #454.
@perlun perlun added enhancement New feature or request compiled mode Issues which are relevant in compiled mode labels Aug 12, 2023
@perlun perlun added this to the 0.4.0 milestone Aug 12, 2023
@perlun perlun pinned this issue Aug 13, 2023
@perlun
Copy link
Collaborator Author

perlun commented Oct 1, 2023

This involves a bit of complexity, since native C++ code has historically only been able to compile on the same platform as the CI job is running on. We'll need to investigate if clang makes this easier for us.

It does, since Clang is natively a cross-compiler. But that unfortunately doesn't magically solve all related problems:

But, as is true to any cross-compiler, and given the complexity of different architectures, OS’s and options, it’s not always easy finding the headers, libraries or binutils to generate target specific code. So you’ll need special options to help Clang understand what target you’re compiling to, where your tools are, etc.

@perlun
Copy link
Collaborator Author

perlun commented Feb 24, 2024

It does, since Clang is natively a cross-compiler. But that unfortunately doesn't magically solve all related problems:

But, as is true to any cross-compiler, and given the complexity of different architectures, OS’s and options, it’s not always easy finding the headers, libraries or binutils to generate target specific code. So you’ll need special options to help Clang understand what target you’re compiling to, where your tools are, etc.

An interesting approach to this is the way Golang is handling this. A single CI job can generate artifacts for a number of platforms and architectures, with very little extra work for the project itself. I saw this myself in action recently: https://gitlab.com/fleeting-plugin-hetzner/fleeting-plugin-hetzner/-/blob/main/.gitlab/ci/build.gitlab-ci.yml?ref_type=heads. The end result can be seen in this pipeline: https://gitlab.com/fleeting-plugin-hetzner/fleeting-plugin-hetzner/-/pipelines/1188295744

Now, this doesn't help us immediately since we don't intend to use Go for this. 😂 But it's still interesting to see, and we should aim for something similar in Perlang: cross compilation should be easy. It's fine if it requires an automatic in-the-background network download of standard libraries though; I presume (without having looked at the details) that this is how the Go toolchain does it.

perlun added a commit that referenced this issue Mar 26, 2024
This provides some of the groundwork for this, mentioned in
#406:

> Distribute the (compiled) stdlib along with snapshot builds

The changes to the `Makefile` means that running `make install` will now
install the `stdlib` into the expected location. The next step is to get
the `stdlib` bundled with releases and release snapshots as well.
perlun added a commit that referenced this issue Mar 26, 2024
This provides some of the groundwork for this, mentioned in
#406:

> Distribute the (compiled) stdlib along with snapshot builds

The changes to the `Makefile` means that running `make install` will now
install the `stdlib` into the expected location. The next step is to get
the `stdlib` bundled with releases and release snapshots as well.
perlun added a commit that referenced this issue Mar 26, 2024
This provides some of the groundwork for this, mentioned in
#406:

> Distribute the (compiled) stdlib along with snapshot builds

The changes to the `Makefile` means that running `make install` will now
install the `stdlib` into the expected location. The next step is to get
the `stdlib` bundled with releases and release snapshots as well.
perlun added a commit that referenced this issue Mar 26, 2024
This provides some of the groundwork for this, mentioned in
#406:

> Distribute the (compiled) stdlib along with snapshot builds

The changes to the `Makefile` means that running `make install` will now
install the `stdlib` into the expected location. The next step is to get
the `stdlib` bundled with releases and release snapshots as well.
@perlun
Copy link
Collaborator Author

perlun commented Mar 30, 2024

The required groundwork for including experimental compilation in release/snapshot binaries has now been done. 🎉 Moving this to the 0.5.0 milestone now, and intending to publish a 0.4.0 release very soon.

@perlun perlun modified the milestones: 0.4.0, 0.5.0 Mar 30, 2024
perlun added a commit that referenced this issue Mar 31, 2024
As discussed in #406, the
REPL will go away for some time, until we can (at some point)
reimplement it on top of LLVM. At that point, the REPL will be
_dynamically emitting native code_, i.e. still not require a JIT
interpreter of any form. Based on some experiments I've done with LLVM,
this should be doable.

The `-e "<code-to-be-executed>"` will also be removed for now, but will
take that as a separate commit.
perlun added a commit that referenced this issue Mar 31, 2024
As discussed in #406, the
REPL will go away for some time, until we can (at some point)
reimplement it on top of LLVM. At that point, the REPL will be
_dynamically emitting native code_, i.e. still not require a JIT
interpreter of any form. Based on some experiments I've done with LLVM,
this should be doable.

The `-e "<code-to-be-executed>"` will also be removed for now, but will
take that as a separate commit.
perlun added a commit that referenced this issue Apr 23, 2024
This has been an oversight while working on the experimental Perlang
compiler (#406). The bug
was discovered when implementing the changes in
#463; when we started running
one of those tests, no error was emitted even though the code was
redefining a top-level function. It turned out that the compiler would
silently overwrite a function if you defined it twice.
perlun added a commit that referenced this issue Apr 23, 2024
This has been an oversight while working on the experimental Perlang
compiler (#406). The bug
was discovered when implementing the changes in
#463; when we started running
one of those tests, no error was emitted even though the code was
redefining a top-level function. It turned out that the compiler would
silently overwrite a function if you defined it twice.
perlun added a commit that referenced this issue Apr 23, 2024
This has been an oversight while working on the experimental Perlang
compiler (#406). The bug
was discovered when implementing the changes in
#463; when we started running
one of those tests, no error was emitted even though the code was
redefining a top-level function. It turned out that the compiler would
silently overwrite a function if you defined it twice.
perlun added a commit that referenced this issue Apr 24, 2024
This is the biggest change for a while. Because
#406 is moving along
nicely, we are now ready to:

* Flip the switch, i.e. _make compiled mode the default_ for Perlang.
* Remove the PerlangInterpreter class in its entirety. This may be
  reimplemented in one form or another, once we have the LLVM-emitting
  backend in place, but not as a tree-walking interpreter.

This probably means we'll drop Windows (and perhaps macOS) support for a
while. Please don't despair; this is not intended to be permanent. While
we depend on a specific Clang version for compiling Perlang code, it
simply gets easier to not have to support too many platforms. Once we
have started emitting C++ code from Perlang, in an idempotent way (being
able to disable all timestamping etc in the file header), we could see
how hard it would be to get this Perlang-to-C++-transpiled code
compiling on macOS and Windows too.
perlun added a commit that referenced this issue Apr 24, 2024
This is the biggest change for a while. Because
#406 is moving along
nicely, we are now ready to:

* Flip the switch, i.e. _make compiled mode the default_ for Perlang.
* Remove the PerlangInterpreter class in its entirety. This may be
  reimplemented in one form or another, once we have the LLVM-emitting
  backend in place, but not as a tree-walking interpreter.

This probably means we'll drop Windows (and perhaps macOS) support for a
while. Please don't despair; this is not intended to be permanent. While
we depend on a specific Clang version for compiling Perlang code, it
simply gets easier to not have to support too many platforms. Once we
have started emitting C++ code from Perlang, in an idempotent way (being
able to disable all timestamping etc in the file header), we could see
how hard it would be to get this Perlang-to-C++-transpiled code
compiling on macOS and Windows too.
perlun added a commit that referenced this issue Apr 25, 2024
This is the biggest change for a while. Because
#406 is moving along
nicely, we are now ready to:

* Flip the switch, i.e. _make compiled mode the default_ for Perlang.
* Remove the PerlangInterpreter class in its entirety. This may be
  reimplemented in one form or another, once we have the LLVM-emitting
  backend in place, but not as a tree-walking interpreter.

This probably means we'll drop Windows (and perhaps macOS) support for a
while. Please don't despair; this is not intended to be permanent. While
we depend on a specific Clang version for compiling Perlang code, it
simply gets easier to not have to support too many platforms. Once we
have started emitting C++ code from Perlang, in an idempotent way (being
able to disable all timestamping etc in the file header), we could see
how hard it would be to get this Perlang-to-C++-transpiled code
compiling on macOS and Windows too.
perlun added a commit that referenced this issue Apr 25, 2024
This is the biggest change for a while. Because
#406 is moving along
nicely, we are now ready to:

* Flip the switch, i.e. _make compiled mode the default_ for Perlang.
* Remove the PerlangInterpreter class in its entirety. This may be
  reimplemented in one form or another, once we have the LLVM-emitting
  backend in place, but not as a tree-walking interpreter.

This probably means we'll drop Windows (and perhaps macOS) support for a
while. Please don't despair; this is not intended to be permanent. While
we depend on a specific Clang version for compiling Perlang code, it
simply gets easier to not have to support too many platforms. Once we
have started emitting C++ code from Perlang, in an idempotent way (being
able to disable all timestamping etc in the file header), we could see
how hard it would be to get this Perlang-to-C++-transpiled code
compiling on macOS and Windows too.
perlun added a commit that referenced this issue Apr 26, 2024
This is the biggest change for a while. Because
#406 is moving along
nicely, we are now ready to:

* Flip the switch, i.e. _make compiled mode the default_ for Perlang.
* Remove the PerlangInterpreter class in its entirety. This may be
  reimplemented in one form or another, once we have the LLVM-emitting
  backend in place, but not as a tree-walking interpreter.

This probably means we'll drop Windows (and perhaps macOS) support for a
while. Please don't despair; this is not intended to be permanent. While
we depend on a specific Clang version for compiling Perlang code, it
simply gets easier to not have to support too many platforms. Once we
have started emitting C++ code from Perlang, in an idempotent way (being
able to disable all timestamping etc in the file header), we could see
how hard it would be to get this Perlang-to-C++-transpiled code
compiling on macOS and Windows too.
perlun added a commit that referenced this issue Apr 26, 2024
#465)

This is the biggest change for a while. Because
#406 is moving along
nicely, we are now ready to:

* Flip the switch, i.e. _make compiled mode the default_ for Perlang.
* Remove the PerlangInterpreter class in its entirety. This may be
  reimplemented in one form or another, once we have the LLVM-emitting
  backend in place, but not as a tree-walking interpreter.

This probably means we'll drop Windows (and perhaps macOS) support for a
while. Please don't despair; this is not intended to be permanent. While
we depend on a specific Clang version for compiling Perlang code, it
simply gets easier to not have to support too many platforms. Once we
have started emitting C++ code from Perlang, in an idempotent way (being
able to disable all timestamping etc in the file header), we could see
how hard it would be to get this Perlang-to-C++-transpiled code
compiling on macOS and Windows too.
@perlun
Copy link
Collaborator Author

perlun commented May 8, 2024

This is the main feature being worked on in the current 0.5.0 milestone, but it won't be finished when we carve out the 0.5.0 release. Moving to 0.6.0.

@perlun perlun modified the milestones: 0.5.0, 0.6.0 May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiled mode Issues which are relevant in compiled mode enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant