Skip to content

Compiler Engineer Job

andychu edited this page Jun 2, 2022 · 84 revisions

Summary

The Oil project needs a compiler engineer with experience in C++ and garbage collection to help "finish" the project! As of April 2022, we have a 50K euro grant from NLnet to pay someone, and I'm also collecting donations with Github Sponsors.

Overview

  • What is it?
    • Oil is a new Unix shell. It's our upgrade path from bash to a better shell and runtime! It's also for Python and JavaScript users who avoid shell.
  • What do we need done?
    • Work on a 4K-8K line translator in Python, and a 3K-10K line garbage-collected runtime in C++, with an eye toward making it run the Oil interpreter. (For each component, we have code that passes significant tests, including many end-to-end tests. It's a proof of concept and some of it may need to be rewritten. Let's talk about it!)
    • This is a job very much in need of solid engineering! (i.e. it's not a research project)
  • Why are you doing it this way? What progress have you made?
  • How do I apply?
    • Please send me a mail at andy@oilshell.org. Let me know how you heard about the project, and what your interest and experience is. We can also chat on https://oilshell.zulipchat.com/ or have a video call.
    • It may make sense to have some kind of "paid intro period", which could involve making a few failing test pass in Oil. The idea is that after a short period of work, you should know if it's something you want to do more of.
  • How long does it last?
    • At least 3 months, but I can easily imagine 12 - 24 months of work. See "Flexibility" below -- it is OK to take on part of the project.
  • How much does it pay?
    • If you're interested in the work, I will do everything I can to meet your compensation requirements! It's funded by a grant, and I hope that if we show that concrete results arise, we can get another grant.

Code Overview

I made an HTML page that lists the code you'll be working with and working near: https://www.oilshell.org/release/latest/pub/metrics.wwz/line-counts/for-translation.html.

Note the line counts are quite small! This is not a 100K line project; it's more like 10K lines. (The big components are inputs and outputs to the compiler, not code we need to write.)

Skills Sought

In order of importance:

  1. Hard-won C++ experience and knowledge
    • Generating correct C++ code with a translator (i.e. C++ that works with all compilers)
    • Debugging it, analyzing its performance, and optimizing it
    • Comfortable using standard tools like gdb / CLion, ASAN, etc.
  2. Understanding of Garbage Collection
    • We have a working garbage collector, but I found this to be one of the most difficult parts of the project!
  3. Test-driven and terminal-based workflow (on some kind of Unix)
    • A large part of the job is very metrics-driven; the idea is to "make more tests pass". I've found that this strategy enables a lot of creativity and productivity!
  4. Type systems, and the relationship between types and garbage collection.
    • We may want to write our own type checker rather than relying on MyPy, but we can discuss it.
    • If you understand this Mozilla blog post, that's a good sign: Clawing Our Way Back to Precision (2013)
  5. Python
    • Most of the code is written in Python. However I think this can be learned on the job, whereas the C++ parts can't.

General attributes desired:

  1. You should consider yourself a "finisher". You should be able to prioritize work and not get lost in micro-optimization (although I hope to get to the point where we can do some fun micro-optimizations!). Again, this is not a research project; the goal is to make a production quality shell.
  2. You should have good communication skills, and be able to explain your work. (I encourage applicants in any country, though English is used for all docs and communication.)
    • Bonus: if like writing blog posts. I frequently do this, e.g. with posts tagged #project-updates, and I find it helps me organize work and attract new contributors.
  3. Generally speaking, you should be excited about the high level goals of the Oil project. The blog should not be boring to you :-)

Good Signs ...

  • If you think our C++ is ugly! That means you have ideas on how to make it better. What exists is a proof of concept, designed to show the strategy will work and can perform well. There are many improvements that can be made, and certain parts that must be rewritten. If you're convinced a complete rewrite is necessary, then maybe you can make an argument it's feasible, justified by a survey of the code?
  • If you enjoy debugging C++ code! And then writing tests to make sure the bug never comes back.
  • If you like using ASAN, profilers, and other such tools (uftrace). Maybe you have a nice debugger configuration.
    • (note: Oil has a GDB pretty printer for ASDL data structures)
  • If you can read the existing code in oil-native! If not, the job isn't probably a good fit.
  • If you understand how Rust is influenced by C++ (positively and negatively) and ML, that's a good sign. In a similar way, Oil is written with algebraic data types at the core, but we also want it to be efficient.
  • Understanding the Mozilla blog post above -- or better, pointing to even more relevant references!
    • This post is relevant since we also have a precise collector. I didn't find that many documents describing such issues on real world, deployed language projects. Our GC is also meant to be 100% portable C++.

Flexibility

I don't want to "overspecify" the job. I can see it going a few ways.

  1. One person could "own" the whole task of translating Oil to C++. That means you can totally rewrite it if you don't like the solutions I've come up with! I would like to learn some things from you!
  2. We could split the project in half, and have someone work on the Python front end, and someone work on the runtime. I would act as the coordinator.
    • This might lose some opportunity for creative, holistic solutions. I like to solve problems end-to-end.
  3. Or any other division of labor that makes sense, and moves the project forward.

So when I talk to people about this job, I'd like to know what kind of role you're interested in.

Subprojects

There are a few different starting points / approaches I can imagine.

  • Continue making the spec tests go up without garbage collection. This is useful for dipping your toes in, and I think easier than working with the garbage collector.
  • Enhance the existing mycpp codebase to translate oil-native with garbage collection. This work should be guided by spec tests. (As mentioned on the blog, it currently works only on mycpp/examples. It still needs to be hooked up to the oil-native translation.)
  • Rewrite the Python front end with Python 3.10 pattern matching.
    • At first, this should be guided by the tests in mycpp/examples.

Trial Period

I understand that most people have never "worked for" an open source project! I've tried to reduce and explain the risk by documenting everything on the blog (see below).

But if you want to jump in, but aren't sure if you'll like it, I think a one- or two- week trial period for both sides makes sense. You would still get paid for your work, but you wouldn't be under the obligation to finish "everything".

I also understand that many people qualified for this job are likely employed!

Background Knowledge / Links


Compiler Engineer Notes

Clone this wiki locally