Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design nucleus. #1

Closed
creationix opened this issue Jun 2, 2016 · 53 comments
Closed

Design nucleus. #1

creationix opened this issue Jun 2, 2016 · 53 comments

Comments

@creationix
Copy link
Member

The basic goal of this project is to implement a tiny core runtime that contains libuv, javascript and some essential C libraries (like openssl) needed to re-implement node.js in userland as modules.

If possible this will be backend agnostic and allow multiple JS engines.

See also nodejs/node#7098

@creationix
Copy link
Member Author

I think a first step would be to define an interface which all backend implementations should adhere to. One of the goals here is to avoid C/C++ addons so we don't need to worry about a public facing C or C++ API for addons initially.

Since it's a royal pain to try and get all JS engines to conform to a least-common-denominator interface, let's instead have independent implementations for each engine that all match some JS interface spec. Then modules written for one runtime will work for all so long as they don't use language features unique to a particular runtime (V8 and Chakra for example have most/all of ES6 while duktape is mostly ES5, but has lua style coroutines).

Once he have a common interface, we can start implementing the C parts for the various runtimes. I humbly suggest the I/O parts be directly designed to match libuv.

@Fishrock123
Copy link
Member

Fishrock123 commented Jun 2, 2016

I think this will also be greatly simplified if we use things such as @mscdex's work to make the dns resolver be pure-js and not use c-ares nodejs/node#1843, and the http-parser: nodejs/node#1457

@Fishrock123
Copy link
Member

I think a first step would be to define an interface which all backend implementations should adhere to.

Sounds like @trevnorris's original API WG goals.

@trevnorris
Copy link

:-) I've had a long-standing goal to create an API for node that serves as a strict entry point into C++, which basically all code in lib/ would use. Alas, time constraints.

@mscdex
Copy link

mscdex commented Jun 2, 2016

As I mentioned in the linked DNS PR, it is difficult to get even close to matching the performance of c-ares/libc, even when using node's C++ UDP bindings directly. That pretty much rules out any performance issues in js land, so the C++ layer would have to be improved (if possible) to be able to compete with c-ares and/or the system resolver.

Regarding http, I haven't compared benchmarks since @indutny incorporated the JS stream stuff that bypasses js land when doing http parsing, so I'm not sure how the pure js http parser fares anymore.

@creationix
Copy link
Member Author

As far as DNS resolving, in luvit we have two paths. One is pure lua on top of libuv's UDP primitives and is used for advanced queries. For basic resolving domain names to ip addresses, we use libuv's getaddrname and getaddrinfo which uses the system library on a thread-pool I believe. We have had no performance issues with this. Both are pure script on top of what libuv provides natively which will be provided in the the C core.

@creationix
Copy link
Member Author

Let's not get too tripped up on edge performance issues. The goal here isn't to win synthetic benchmarks with the vanilla flavor of the minimal core. We will have options where people can build different flavors of the core with various libraries included (like openssl, cares, http_parser, etc). If you're deploying a large enough system where these performance issues are actually a problem, then you don't mind compiling a little C code. But for most projects and development workflows, this is not critical.

In luvi, there are two main flavors known as "tiny" and "regular" with the biggest difference being that regular includes openssl and a couple lesser used C addons in it's core. For many cases, http servers don't need openssl since they are running behind a reverse proxy anyway that handles the TLS termination. Things like MD5, SHA1, etc can usually be handled just fine (and sometimes even faster) in pure script.

@indutny
Copy link

indutny commented Jun 2, 2016

Wow, I really like this proposal. I was working on something similar recently:

It is a modular C stream implementation. Not sure how useful it is, but it could be a good enough interface for interactions between C addons.

@creationix
Copy link
Member Author

creationix commented Jun 2, 2016

@indutny I saw those. I've always said libuv should have an extension community where things are written in C and can be used by all runtimes that consume libuv. Those could be included as well in the core if they are tiny (which I expect) and in optional addons if not.

If they are only to be consumed by other C code and it doesn't make sense to expose them to JS, that's fine. It will still be useful for addons to core that can use them.

@creationix
Copy link
Member Author

I started a new issue for designing the libuv -> js mapping interface that all implementations must adhere to. #2

@indutny
Copy link

indutny commented Jun 2, 2016

uv_link_t by itself is very small. uv_ssl_t is a bit bigger.

@creationix
Copy link
Member Author

creationix commented Jun 2, 2016

So I think we could say a nucleus implementation contains:

  1. A JS runtime engine
  2. libuv
  3. Bindings to libuv for said engine (exposing the standard interface)
  4. Glue to make applications.
  5. Other optional C modules.

I think for part 4, we should follow the pattern in luvi. This means including minimal code for reading zip files. I have a modified version of miniz that I've bugfixed and added missing features that works great for this and is super tiny.

This will expose a bundle API that allows scripts to read in the virtual filesystem that can either be a zip file (standalone or appended to nicleus) or a folder on disk.

We will also have some minimal hooks that makes bootstrapping a require system in userland less painful. For example, it can look for a file bundle:deps/require.js or something and auto-run it if it exists before running bundle:main.js.

@mmicko
Copy link

mmicko commented Jun 2, 2016

Do not wish to disappoint you but there is already something similar in works https://github.com/saghul/sjs

Have to say I am interested in this kind of projects, since I think those can replace LUA with JavaScript (that more people are common with) to be used for scripting their software. Also there is large library of nodejs compatible javscript modules, making it possible to be used from application itself would be a great plus.

My suggestion would be to to go for C++11/14 support, and not just plain C. Exposing API and enabling user to expose their classes into JavaScript is very useful. There is LuaBridge project done for LUA that enables you to expose your classes and objects to a LUA engine. Doing similar out-of-box solution would make integration with user code even easier.

Note that Lua and duktape are quite similar in design so similar patterns can be used.

If you go this or similar way "you have my axe" :)

@dlmanning
Copy link

Am also super into this.

Plain C makes for easier interop with whatever other language one might be interested in calling from, (e.g. rust).

@creationix
Copy link
Member Author

@mmicko I'm not disappointed, I know about sjs and even linked to it in the parent conversation in the nodejs issue. From my initial browsing however, sjs is much higher-level and opinionated than this project is aiming to accomplish.

@Fishrock123
Copy link
Member

Glue to make applications.

@creationix We'll probably a good amount of process, and some sort of module... bootstrapping at least. (Or maybe we just use ES modules?)

Is that what you meant by "glue"?

@mmicko
Copy link

mmicko commented Jun 2, 2016

@creationix good to hear that

@dlmanning understand that C API is easiest to combine with other languages, just pointing that C++11/14 support would be quite welcome

@creationix
Copy link
Member Author

@mmicko Also since we're fixing the interop level at the JS interface exposed by the C/C++ backend we don't need to standardize on a language/version. The duktape backend might be all C89 while the V8 backend will obviously have some C++ involved. The common glue layer can even have multiple implementations if needed as long as the JS interface matches the spec. This is why it's important to define the interface clearly.

@Fishrock123
Copy link
Member

Fishrock123 commented Jun 2, 2016

Note: using just ES modules are quite incompatible to the current node ecosystem so we'd still have to have some module bootstrapping available for the module module I think.
(& It would probably still have to be passed to scripts implicitly, like require. ...So it would probably have to be apart of the nucleus, I think.)

@creationix
Copy link
Member Author

creationix commented Jun 2, 2016

I don't want the module system to be part of the core glue. All we need is some conventions for bootstrapping a module system on choice. I really don't want things like node's global process in this layer.

For the curious, you can see how luvit accomplished this. Both process and require are userspace in modules.

@creationix
Copy link
Member Author

creationix commented Jun 2, 2016

@Fishrock123 I envision two parts.

  1. The core API will provide things like loading files by path, scanning directories, getting cwd, getting environment variables, getting path to main binary.

    It would also expose the JS runtime with API functions for compiling strings into code (with filename and ES goal type)

  2. The hook will simply auto-run a file with a certain filename so that it can self-register before the main file is run.

Would this not be enough? What APIs exactly would need to be provided for a module system to be implemented?

For luvit's require which is modeled after node's I basically needed:

  • scandir(path) -> stream/list of filenames with type
  • readfile(path) -> file contents (in lua strings are 8-bit binary safe, no text encoding)
  • pathjoin(...parts) -> path

@creationix
Copy link
Member Author

creationix commented Jun 2, 2016

@Fishrock123 I think the simplest way to expose the builtin C modules without depending on a module system is to have some global object (like global.NUCLEUS) that exposes the various builtin modules. Userspace module systems could then expose a uniform interface where require('uv') simple returns global.NUCLEUS.uv, but require('some-other') is handled by the custom loader.

@domenic
Copy link

domenic commented Jun 2, 2016

You could even call it process.binding :trollface:

@creationix
Copy link
Member Author

@domenic As I told @Fishrock123 in IRC, I'd like to avoid any name clashes with anything existing in node so I don't have to worry about matching semantics. This layer needs to have as little opinion as possible.

@creationix
Copy link
Member Author

Also, process.binding will go away if this ever lands in core. And it will assuredly have a different shape.

@creationix
Copy link
Member Author

creationix commented Jun 2, 2016

@Fishrock123 I wrote up the beginnings of a README with the parts that are currently designed. This should help solidify the design goals a little.

@creationix
Copy link
Member Author

@dlmanning see #3

@dlmanning
Copy link

@creationix : I am not as funny as I think I am...

@drom
Copy link

drom commented Jun 3, 2016

@creationix It woulde nice if nucleus would be available as an library for C++ embedding. I have used jxcore for this purpose: https://github.com/jxcore/jxcore/blob/master/doc/native/Embedding_Basics.md and quite liked it. But it is not supported anymore ;(

@creationix
Copy link
Member Author

@drom I'm not sure there would be much in here apart from what's provided in the JS engines and the bindings. I'll try to make the various bindings independent enough that they could be used embedded in other projects.

@chrisdickinson
Copy link

Hi! I'm poking at something along the same lines over here. It builds and runs on linux (ubuntu trusty) and OSX thus far, and glues v8 to libuv & uv_link_t using gn.

It currently leans on a hacked-up version of chromium's build/ dir, which I'm tearing apart to get to the salient bits. The idea is to get it running on windows, osx, and linux first, then rewrite the build dir's gn stuff in a cleaner way to get to that end.

The experiment is thus:

  1. Get a minimal project that includes v8, libuv, and the various uv bits @indutny has been putting together building everywhere.
  2. At that point build in & expose fs, tcp, and tls bindings and a module system (via require) to js.
    • I might do this in a separate project using gclient & gn to pull in the minimal binding layer.
  3. Whenever a node global (process) is accessed, or a node builtin module is required require('fs'), short circuit the lookup to require('@nojs/node-<target>').
  4. Long term goal is to get npm install working and bundle npm with the project.

My (handwave-y) plans are — and you'll each probably find something you like and something you dislike here:

  • Steer closer to TC39:
    • The minimal API will use Promise. async is coming.
    • No streams at first. Possibly include streams from WHATWG's ReadableStream spec later.
  • Steer closer to (newer) Google tools:
    • Build with gn and gclient, keep deps up to date with gclient sync.
  • Focus on FFI. (Insert so much 👋 handwaving 👋 here)
    • With an eye towards @indutny's heap.js & mmap.js, explore exposing mmap in order to create callable executable code from JS (possibly only for core functionality, but maybe not.)
    • Binary compat with Node later.
      • @dominictarr had the excellent idea that the build tools should be dockerized.
  • Stick with Node's decision on ES modules. If Node zigs, Nojs zigs. No zagging, never zagging.
    • Interoperability/backcompat is key.

In other words: I think this project and nojs are probably going to be walking along the same path for a bit, though it seems like eventually we'll have different goals. I'm happy to share the build code I've hacked together. Maybe making it easier to grab a compilable, working copy of libuv+v8 & friends will let a thousand Nodes bloom.

@indutny
Copy link

indutny commented Jun 3, 2016

@chrisdickinson looks very cool! Though, you probably would like to use jit.js instead of heap.js, since the latter one is a JS VM Heap implementation...

@creationix
Copy link
Member Author

@chrisdickinson thanks for the feedback. Indeed our goals are slightly different. Also I'll be starting with duktape and jerryscript as sample imeplementations of this interface as I abhor C++ and that steers me away from V8. Once I have things stable it would be awesome to use your code to make a V8 implementation.

Also the scope of this project seems to be a bit slimmer. I won't have any opinions at all regarding streams, promises, etc. I just want to provide a common base for tools to be built.

@chrisdickinson
Copy link

@indutny Ah indeed! I was thinking about repurposing this code to do the hop from JS to compiled code.

@creationix Cool — I wish you the best of luck! I'd definitely encourage checking out gn as a metabuild tool, it's slightly opaque but is pretty slick after a bit of use. I'm collecting a list of possibly handy links on the process of gluing stuff together.

@indutny
Copy link

indutny commented Jun 3, 2016

@chrisdickinson https://github.com/js-js/jit.js/blob/master/src/jit.cc#L56-L96 ;)

@dominictarr
Copy link

I am certainly of the opinion that @creationix's opinionlite approach is the way to go. Streams should definitely not be in the "core", way to many opinions in streams. even we have @creationix's min-streams and my pull-streams because we couldn't agree on one thing and they are incredibly simple!

I think a project like this is really a C project, it looks like it's about javascript but it's not. It's about finding a way for C libraries to easily plug into a thing, it seems to involve javascript, but would that even be necessary?

There are totally ligitimate reasons not to include certain C libraries (personally, I'd like be able to exclude openssl, and build in libsodium instead - This would be ideal for secure decentralization projects) clearly there is also different JS engines that target different use cases (jerryscript is low resource use vs v8 is performance)

I think that means that the particular C libraries used need to be lightly coupled, I just need to pull them in by editing a config (or package.json)

@drom's point about embedding as a library would be super valuabe too - that would make this easy to deploy as an android app - just write a java binding to it and then embed directly into the same process.

@dominictarr
Copy link

but @chrisdickinson I think you are right about FFI. It's too hard to write a node binding, if you could just call a C function from "javascript" then we are done. Is that what you are thinking here?

@dominictarr
Copy link

even if I have to put the args I am calling into a buffer, that is still easier than the current way to write node bindings.

@dominictarr
Copy link

I should also point out that you don't actually need a module system. If you can run one javascript file, then you can statically link the javascript. i.e. with browserify, or noderify (which is assembled from browserify parts to make node.js scripts start really fast)

@creationix
Copy link
Member Author

creationix commented Jun 3, 2016

Initial core API is documented in the README and I just prototyped a duktape version (minus libuv and zip reading) that you can see in action.

  • main.js This is the entry point of a sample app. It doesn't provide a require system and instead uses dofile directly to manually load it's minimal libraries.

See it in action https://asciinema.org/a/b0yk23l05yhrw9mlp0uqik6pp

@creationix
Copy link
Member Author

creationix commented Jun 3, 2016

@dominictarr while it's true you don't need a module system, I do love a workflow that doesn't have build steps. As I demonstrated in the asciicast, you can run apps directly out of the source tree while developing without needing to rebuild the final binary. If the JS needs to go through a build step it breaks this simple workflow.

@dlmanning
Copy link

Given that JS now has a module system in its specification, it would seem strange to not build it in, no?

@dominictarr
Copy link

@dlmanning sure, if you are using a javascript engine that implements modules, then you could have that. The engines that @creationix is talking about starting with jerry-script and duktape both implement ES5.1

@dlmanning
Copy link

@dominictarr sorry, I missed the bit about starting with JerryScript

@chrisdickinson
Copy link

@dominictarr:

but @chrisdickinson I think you are right about FFI. It's too hard to write a node binding, if you could just call a C function from "javascript" then we are done. Is that what you are thinking here?

Yep!

@dlmanning: Notably, the module system is only ~sorta implemented in stable V8's as well (flagged and, IIRC, incomplete.)

@dlmanning
Copy link

dlmanning commented Jun 3, 2016

@chrisdickinson : sure, it's a work in progress, but it's in progress.

(Don't worry, I have no desire to turn this thread into another ES Modules debate)

@trevnorris
Copy link

One the side about import. It's not possible to resolve a path at runtime. Which makes development of native modules a little more painful when you simply want to run:

$ NODE_DEBUG=1 ./node_g /path/to/my/module

and have it automatically pick up the Debug build of the binary. Setting up the application in this way, I'd assume there would be more than a few native modules written to extend the basic functionality.

@dlmanning
Copy link

@trevnorris : Seems like it would be good to provided a separate means of deliberately loading dynamically?

@matthewp
Copy link

matthewp commented Jun 3, 2016

Good choice on splitting the module system into user-land. I agree with both @creationix here that having one is good for development and with @dominictarr that they aren't needed for production. Is main.js as an entry-point going to be configurable? I'd like to have a separate dev.js and prod.js so I can do both.

This is going to be amazing for transpile-to-js languages, you essentially get statically linked small(ish) binaries for free if you just choose JS as your target.

@creationix
Copy link
Member Author

@matthewp luvi has an option to override the entry point, but it's tricky designing the CLI without resorting to environment variables that can cause security vulnerabilities.

That said, you can have a main.js that loads a real main of you choice based on some env or argument.

@matthewp
Copy link

matthewp commented Jun 3, 2016

I assume you mean dynamically load the real main? That would defeat the purpose of it being "statically linked". I mean, this is not a real issue, just a nicety. Can always have your build script / makefile do:

mv main.js _main.js
browserify _main.js > main.js
nucleus ...
mv _main.js main.js

@creationix
Copy link
Member Author

@matthewp of course. If you want something done at build time, do it with your build tool. If you want something done at runtime, do it with your runtime. :)

@creationix
Copy link
Member Author

I think the core design is now stable-ish and mostly documented in the README. I'm going to close this for now. Create new issues as problems come up.

Thanks everyone for the feedback and encouragement. See you at nodeconf if you're going!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests