Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: a strategy to modularize core #7098

Closed
dominictarr opened this issue Jun 1, 2016 · 38 comments
Closed

proposal: a strategy to modularize core #7098

dominictarr opened this issue Jun 1, 2016 · 38 comments
Labels
discuss Issues opened for discussions and feedbacks. meta Issues and PRs related to the general management of the project. module Issues and PRs related to the module subsystem.

Comments

@dominictarr
Copy link
Contributor

  • Version: 7.0.0
  • Platform: any
  • Subsystem: modules

Developing modules within node has proven difficult, since user code does not specify which version of the core modules it depends on. This means that any breaking changes to those core modules may break an unknown number of node modules on npm and in people's applications.

The idea of removing most or all of the modules in npm was first floated (as far as I am aware) by @isaacs https://vimeo.com/56402326 and has since been kicked around. but how would we actually pull this off?

The first step would be just publishing the core modules on npm - many of the core modules are already published, often as browser shims, such as https://www.npmjs.com/package/buffer
but other times as basically unmaintained modules that like https://www.npmjs.com/package/net

I'm sure some thing could be figured out with the current owners of those modules.

Now, the tricky part is that many of the core modules depend on a part written in C.
C addons are a massive pain in the butt as it is. If you did npm install http and it compiled the http_parser that would be a worse hell.

so, npm install http shouldn't compile anything, and we don't want to break current node. how do we do this?

What about, move the core modules into separate repos under the node org, before a release, npm install into node, and then node builds builds node/node_modules into the node binding statically.

This would also have the significant advantage that to distribute a node app that requires binary addons, you could build it into a custom statically linked node, and use that.

The modules should also be split into a -down part which contains the C part, and then an -up part which contains the javascript. The javascript usually changes a lot more than the C, we found this pattern in level and it works well.

When you require a core module, the same resolution process would apply as it does now, it will look in the node_modules folder, and then fall back to the list of core modules.
so, you could completely change the api in http but if the C part http_parser was backwards compatible, you wouldn't have to recompile anything. npm would just need to check what version of core modules the current instance of node has, and it would then not install those unless necessary.

Or, those modules could just skip their build step if they detect they are already built into the running instance of node.

The part that I don't know about, would be setting up node's build script to remove change the hard coded deps to look in the node/node_modules instead. C build system is not currently my strong suit, but surely there is a way this can be done?

@Fishrock123 Fishrock123 added discuss Issues opened for discussions and feedbacks. meta Issues and PRs related to the general management of the project. labels Jun 1, 2016
@jasnell
Copy link
Member

jasnell commented Jun 1, 2016

while I'm certainly in favor of the idea if a way can be found for it to work, I have deep concerns over the feasibility and usability. Another possible approach would be to have core modules published to npm in addition to being bundled, similarly to what we do with readable-streams. That, however, brings along it's own host of challenges and leads down a path leading straight to Java ClassLoader Hell. It's definitely worth exploring and discussing but I'm definitely skeptical.

@mscdex
Copy link
Contributor

mscdex commented Jun 1, 2016

@jasnell One big problem that we've seen with readable-stream is that it'd be hard for node core to make changes to core modules that are then published to npm because many people will peg their dependency versions in ways (e.g. explicit versions or 1.1.x) that would cause compatibility issues with node.

@mscdex
Copy link
Contributor

mscdex commented Jun 1, 2016

One problem with splitting modules into separate repos would be keeping up with all of the PRs and issues and for users to be able to easily search across all of them. Imagine someone has a problem and so they go to search the issues/PRs of a particular module's repo where they think the problem lies and find nothing, despite there being an issue about the particular problem in a repo for a different core module.

@mscdex mscdex added the module Issues and PRs related to the module subsystem. label Jun 1, 2016
@vkurchatkin
Copy link
Contributor

This basically requires to design whole new API layer, rewrite node from scratch and rebuild all modules on top of that. Doesn't seem realistic.

@eljefedelrodeodeljefe
Copy link
Contributor

To mitigate the native addons pain introducing a lightweight toolchain would be an option (there is a lot of precedence in languages). Relying on gyp and python and especially not having an an interface to them is the worst, currently. In my experience doing stuff with cl on NT is similarly pleasent as w/ gcc. However this has only gotten easier the in the earlier last.
A strong point would be the coding style being forceably more self-contained. Currently I see a lot of intertwined (IMO) stuff that makes it hard for me, as a newcomer.

My point is building shouldnt be the biggest obstacle.

This whole endeavour is as ambitious as it would be truly innovative.

@dominictarr
Copy link
Contributor Author

@jasnell that is basically the approach I'm advocating. node would essentially become a curation of native add ons, when you install node in the usual way it would include a curated set of addons that are basically backwards compatible with what we have currently.

@mscdex you raise quite a good point. That is definitely a problem with github on very modularized projects. About the best they have is to search within an org.

@Fishrock123
Copy link
Member

Maybe we keep it in one repo on github? Would that have a significantly negative impact?

It might be a little tricky, but I think it's workable to have a workflow like now, where the latest is on master, and there are branches for the older versions of the parts.

Has anyone tried that on this scale before?

@dominictarr
Copy link
Contributor Author

@Fishrock123 you could check in node_modules, then you know exactly what you are getting.

@dominictarr
Copy link
Contributor Author

@eljefedelrodeodeljefe it has recently come to my attention that chrome (which gyp came from) has since moved on to ninja which looks like it might be that lightweight build too you are looking for.

@creationix
Copy link
Contributor

creationix commented Jun 2, 2016

Has anyone tried that on this scale before?

This is essentially what I did with the luvit 2.0 rewrite. I spent about a full year of my time working on the migration and rewrite and couldn't be happier with the result.

For those that don't know the history, luvit 1.0 was a re-implementation of node using Luajit and Lua instead of V8 and JS. It has the same APIs, a similar build system, and very similar architecture with a with luajit, libuv, openssl, etc compiled intto a single binary with the core modules embedded in the binary as compiled bytecode and available globally. It had a very node like command line interface and repl as well.

Working on the core modules was a nightmare and upgrading to newer versions of libuv was painful because of the deep coupling throughout the entire system. I had faithfully duplicated many of node's mistakes.

One problem with splitting modules into separate repos would be keeping up with all of the PRs and issues and for users to be able to easily search across all of them

In luvit, the luvit/luvit repo contains all the core modules that were part of luvit 1.0. They are in one repo, but published to lit (our equivalent to npm) as individual packages with a meta-package that depends on them all. They are versioned independently with proper interdependencies declared.

Having them all in one repo keeps things localized and this pattern has worked for other frameworks like the weblit framework (something like express or koa).

C addons are a massive pain in the butt as it is.

Yes they are. With the luvit 2.0 rewrite, I made two new projects at different levels of abstraction.

One is the luv project which is purely libuv bindings for lua. It can be used by other lua projects. It doesn't add any sugar (no emitters, no streams, no generators/coroutines, etc), but simply exposes libuv in as straightforward a manner as possible where it makes sense in the target language. (using lua multiple return values instead of C outargs for example).

I wish there was a V8 equivalent. That alone would be an awesome project and could live independent of node just like luv lives independent of luvit. For someone who knows both the V8 and libuv APIs, this is a smallish project. It took me a few weeks to write luv from scratch.

Then the next layer is known as luvi. It basically acts as the global C glue. In this single binary is contained a build of luajit, openssl, miniz (our zlib alternative), and luv. There is just enough glue code to bootstrap a module system that's implemented in userland in pure lua. Everything else is pure lua and not part of the C compile step at all.

Luvi has a sort of bundle virtual filesystem where it can work on a zip file or a directory and treat that tree as the core application. This abstraction means that you can work on core modules without any kind of build system whatsover. Also to make building the final binary as painless as possible, luvi looks for a zip file appended to itself at startup and uses that zip as the application.

So running the luvit application (what implements the old luvit 1.0 interface with core libs embedded) you can do it three ways.

# Run directly from disk passing in args
luvi path/to/luvit -- args for luvit
# Run directly from zip file
luvi luvit.zip -- args for luvit
# Build a binary
cat `which luvi` luvit.zip > luvit
chmod +x luvit
./luvit args for luvit

As you can see, once you have a luvi binary for your platform you don't need to ever touch a C compiler or linker again. Even if you're working on core modules, everything can be done without one.

I later added features in to the lit package manager / toolkit to do higher level things like automatically pulling in dependencies and building apps directly from urls (similar to go get).


There are still unsolved edges. The best way to include C modules it to build a custom luvi binary with your addons included, since the lua is modified more often than the C, you don't need a compiler for most work. The rackspace monitoring agent that uses luvit in production does this for some C code that does system diagnostics beyond what libuv can do.

I did add support for also bundling .so files in the zip or folder with clever tricks to dlopen them even in the case of zip file, but that still have the same portability problems that node currently faces with binary modules.

We did get one break since luajit has an excellent ffi and ctypes interface native to the VM. I don't know if we could bundle something similar with the v8/node equivalent to luvi, but it's saved me in many cases where I could do things like create ptys in pure lua or have efficient memory structures using ctypes.

Regarding http_parser, luvit 2.0 dumped it in favor of writing the parser in pure lua and depending on the JIT to make it fast. So far performance has not been a problem and we have one less C dependency to worry about.

V8 is pretty amazing at this kind of stuff as well and the cost of calling C++ from JS is much higher than the cost of calling C from Luajit so there is even more pressure to write more core in JS. To prove my point I once wrote a MD5 implementation in pure JS that was faster than node's openssl bindings to native MD5. The reason I won was two fold. I hit a sweet spot in the JIT and the overhead of calling C++ made openssl's version much slower than C++ alone.

@creationix
Copy link
Contributor

So the end result of all this work is one you install lit on your system (which is basically luvi + lit.zip concatenated), building the legacy luvit interface as a CLI app is simply:

lit make lit://luvit/luvit

It will sync down all the objects for the latest release of luvit (any objects seen before will be cached and already local), build a zip file, download (or load from cache) the appropriate luvi binary (possible extracting it's own if it matches the requirements) and building the luvit binary. No C compiler or linker needed!

This is exactly the same steps for building any other app. If you want a custom build of luvit with different builtin libraries, write your own, declare your dependencies and publish it. It will be installed the exact same way. If you want to make some other CLI app, use only the core deps you want or none at all. You can even swap out the require system for something custom if you want. The hooks in luvi are generic. I even found a clever way using shebang line to not include a copy of luvi into each binary, but a single line containing #!/usr/local/bin/luvi -- concatenated with the zip of bundled files.

I wish node had this ability, how can we make it happen?

@trevnorris
Copy link
Contributor

I'd like to nip this and ask you file an EP. It'll be much easier to follow if there's a single document that's kept up-to-date with the ongoing conversation.

@eljefedelrodeodeljefe
Copy link
Contributor

@domenic I wanted to work on a js(-y) build system for some time now. Maybe I give it a shot now. Point is that any build system eventually just spawns the compiler binaries and essentially is just tooling around configuration and dev workflow, like checking timestamps. That can be swiftly done by js also. It's just a huge amount of effort to reach maturity. afaik, ninja heavily integrates with gyp and cmake (which is even harder to get to users machines than python) + ninja (imo) will face a similar fate as every other Google infra tool - unmaintain until dead or useless :)

Yeah, maybe we need a document to capture ideas and progress on that - maybe as an eps.

@rvagg
Copy link
Member

rvagg commented Jun 2, 2016

While my read is that most of us here like this ideal and would love to make this happen (I've been one of the ranters joining in on the no.js bandwagon for a long time), I think we're a bit too far down the stability path to make this happen from the inside. My suggestion for how to pull this off is to explore it with a completely new implementation and work it up till it has "node compatibility", or at least a "node compatibility mode" that we could potentially use and replace core with if we were able to determine the risks were low and performance was acceptable

A starting point might look something like https://github.com/defunctzombie/libuv.js or https://github.com/chrisdickinson/nojs and an ideal goal of a system like that (IMO) would be to be super minimal in the core but have the ability to bundle with more stuff, allowing for a stripped down minimal thing as well as something that can be made in to what people expect when they download "Node.js".

@creationix
Copy link
Contributor

@trevnorris Do you think there is actually any chance node will make such a huge change itself? I'm not sure an EP is the way forward. Perhaps I misunderstand their purpose.

@rvagg I agree. If there is honest interest in this, I'll happily work on starting building the foundation. I already have libuv bindings for duktape that's fairly complete and is actively used upstream by the duktape project itself for various utilities.

If we are going to go through the effort to essentially rewrite and re-architect the internals of node from the ground up, we should probably make the JS engine pluggable. With luvit this wasn't too hard since all known implementations of lua have very similar C APIs. With JS, we have a lua-like C interface in duktape, the smart-pointer and object based C++ interface in V8 as well as JSC, Chakra and possibly others. I wonder if nan's design could be used to make the engine pluggable. I especially like how dukluv doesn't need C++ at all since duktape and libuv and openssl are all plain C.

Would a node EP be the proper place to discuss this if we are going to do as @rvagg suggests and build it outside of node with an eventual goal of full compatibility? Personally I would use the minimal core directly and create my own abstractions as there are many parts of the node core APIs that I very much dislike. The minimal core can live apart from node proper the same as luv, luvi and lit live apart from luvit.

@creationix
Copy link
Contributor

Another possible starting point is @saghul's SJS project: https://github.com/saghul/sjs.

@Fishrock123
Copy link
Member

Fishrock123 commented Jun 2, 2016

@creationix I don't think it is out of the question, so long as the default binaries ship with with the current node feature set so as to not break people's workflows and programs.

If we are going to go through the effort to essentially rewrite and re-architect the internals of node from the ground up, we should probably make the JS engine pluggable. With luvit this wasn't too hard since all known implementations of lua have very similar C APIs. With JS, we have a lua-like C interface in duktape, the smart-pointer and object based C++ interface in V8 as well as JSC, Chakra and possibly others. I wonder if nan's design could be used to make the engine pluggable. I especially like how dukluv doesn't need C++ at all since duktape and libuv and openssl are all plain C.

That feels like very large scope creep, I'm not sure that's the best way forward here. (As much as I may like it.)

@trevnorris
Copy link
Contributor

@creationix

Do you think there is actually any chance node will make such a huge change itself? I'm not sure an EP is the way forward. Perhaps I misunderstand their purpose.

Then why would we bother discussing it in node's issue tracker? Point here of the EP is to maintain a document that's kept up to date with the ongoing discussion so users don't need to read through every post to understand where things stand. Though I say this assuming it would aggregate more than the average number of comments.

@trevnorris
Copy link
Contributor

Would a node EP be the proper place to discuss this if we are going to do as @rvagg suggests and build it outside of node with an eventual goal of full compatibility?

Possibly. Though it also may be simpler to just open a new repo, since most of this will need to be written from scratch, and discuss it there.

@dominictarr
Copy link
Contributor Author

sorry @trevnorris I think @rvagg is certainly right. my intention posting it here was more to engage the node community - it's probably gonna be a lot of work until it can be fully compatible with node. and for the core node team to announce "this is the direction for node" would be a pretty big risk, and theirfor a silly idea. That said, this is clearly an idea that intersects with the interests of people who may at least some of the time, developers of node core.

@rvagg
Copy link
Member

rvagg commented Jun 3, 2016

I'm pretty sure it won't be hard to get a critical mass working on something in this direction, there's enough pent up frustration with how locked in we are atm. We (the NF technical team) could even officially encourage experimentation although I don't think it'd be wise to attach to any particular effort this early on. At some point we could bring one of the experiments in to the nodejs org if it was a good fit, or use one as a reference to start again on a node-ng. I'd be interested in hearing people's thoughts on how we can best facilitate this kind of effort without giving the wrong signals to users who expect stability and a long future for what they know as Node today.

@alexjeffburke
Copy link
Contributor

Hi, I know this won't be a popular point of view but I think it's important to have all angles heard - hope it will prove useful and add to the discussion.

I have been convinced and gradually come around to the opinion that a small solid core is a very good thing, but I think as it stands node is already quite sparse. I'm certainly not an immediate fan of splitting even more out. At the very least, I'd like us to discuss what it would mean to split up core, the benefits and the trade-offs involved before jumping to how it would concretely be done.

For example, there seems a presumption about removing things like http - imo, and this applies particularly to in the case of http/https, I think having them in core has allowed node to be immediately useful in a way that must have helped it's adoption. It's such a fundamental protocol that having a stable, debugged, baseline implementation that works and works pretty well is a really good thing. (Without muddying the waters it'd actually be my preference for us to include a decent http2 implementation). In addition, having require('http') be a canonical thing means a library like mitm is possible, which is important to me as I am a co-author of an HTTP mocking library that heavily leverages this.

Perhaps to round off I can make a few suggestions about other ways we could tackle some of these issues:

  • bundle models with core, but expose their versions
    Just like the node version is exposed, expose an dictionary of core module names -> versions. We could then promote. In that model node looks more like a distribution. Could that allow developers to only conditionally include things like readable-stream otherwise use the one in base?
  • provide an API for swapping in alternate compatible implementations
    I saw some discussion previously about doing something like that for the HTTP parser, but what if in my node program I could set a module as what is returned by for example require('http')? There would be useful default implementation, people could use a specific one for their use case but the incredibly useful property of 'standard' module names would be kept.

Thanks, Alex J Burke.

@dominictarr
Copy link
Contributor Author

@alexjeffburke there have always been a spectrum of positions regarding the organization of core modules. Some, like me, have advocated moving modules out of core, and others have advocated adding modules (such as request, or mkdirp) into core. For the record, I'm not advocating breaking core - I'm advocating making core more flexible. The difference is instead of being those modules being integrated as a monolith, they would be integrated as a curation.

The soul of this idea is about removing opinions, that means it needs to also work better for people who's opinions differ from mine.

@piranna
Copy link
Contributor

piranna commented Jun 7, 2016

What about start by publishing all pure-Javascript modules on npm on independent repos and show a deprecation warning when using the bundle ones for some major releases so people can start to include them on their package.json files? This way we could left a pure C++ addons environment as bundled modules on Node.js binary, and after that we could start to think about replacing them for pure-Javascript alternatives (published on npm, following the same deprecation path) or on the hard ones reimplement them using Nan and publish them on npm too. Ideally, in the long term I think Node.js binary would not need to have anything beyond v8, libuv and the require() function...

My two cents :-)

@saghul
Copy link
Member

saghul commented Jun 7, 2016

Exposing just C++ modules sounds like a nice goal, but I'm afraid it would be more difficult than it sounds. Since those C++ modules are meant for use with their JS counterparts, error checking is usually split: some things are easier to check in JS, and then in C++ you just assert, since they are not meant to be consumed directly.

Moreover, right now the C++ side could change in every possible way as long as it "looks" the same on JS land. Publishing current JS modules on npm implies making a promise on how the interaction with their C++ counterpart works, which leads to a broader API surface to maintain.

@piranna
Copy link
Contributor

piranna commented Jun 7, 2016

Couldn't be made the C++ part barenaked to the minimal functionality (mostly bindings to the library APIs) and move all the functionality to the Javascript modules? This probably would reduce the API surface...

@saghul
Copy link
Member

saghul commented Jun 7, 2016

@piranna Reduce?! It basically doubles it. Now you have two API surfaces, the JS side and the C++ side.

@piranna
Copy link
Contributor

piranna commented Jun 7, 2016

Touchée, I was thinking only about the C++ one.

@creationix
Copy link
Contributor

@piranna I think your idea is a great way to eventually migrate people to a smaller core, but for the reasons @saghul mentioned, we first need to find a way to make the C bindings a supportable interface so that said modules published to npm can consume them.

I'm actively working on an experiment to build this from the ground up with the eventual goal to have enough userspace modules to re-implement node's core libraries. I'll expose things like libuv using their native API styles and make the core more friendly to bundling arbitrary JS modules so that nothing needs to be included by default.

@piranna
Copy link
Contributor

piranna commented Jun 7, 2016

I'm actively working on an experiment to build this from the ground up with the eventual goal to have enough userspace modules to re-implement node's core libraries. I'll expose things like libuv using their native API styles and make the core more friendly to bundling arbitrary JS modules so that nothing needs to be included by default.

Yeah, that's just what lead me here, I love the idea on your project! :-D

@Trott
Copy link
Member

Trott commented Feb 15, 2017

This issue has been inactive for sufficiently long that it seems like perhaps it should be closed. Feel free to re-open (or leave a comment requesting that it be re-opened) if you disagree. Just tidying up, not acting on a super-strong opinion or anything.

@Trott Trott closed this as completed Feb 15, 2017
@piranna
Copy link
Contributor

piranna commented Feb 16, 2017

This issue has been inactive for sufficiently long that it seems like perhaps it should be closed. Feel free to re-open (or leave a comment requesting that it be re-opened) if you disagree. Just tidying up, not acting on a super-strong opinion or anything.

I'm still interested on this thing, but it's true that the discussion got stalled... Maybe the point is that nobody showed code here. Hope the issue gets open and the debate about how to do it start again.

@eljefedelrodeodeljefe
Copy link
Contributor

@mcollina might have done some efforts on the side.

@mcollina
Copy link
Member

@piranna @eljefedelrodeodeljefe nodejs/node-eps#49

@eljefedelrodeodeljefe
Copy link
Contributor

In my eyes @mcollina's efforts are or can be mature enough in the short term to actually have a practical implementation of it. So reopening would see appropriate to me, if Matteo wants to continue with it, since factually it seems to be the right thing to do/

@mcollina
Copy link
Member

I don't think discussing it here is the right place. Let's aim to land the streams EPS, and pull off that refactoring. I think it will work out well, and then we can move it forward by extracting other things.

@dominictarr
Copy link
Contributor Author

I opened this more to start a discussion, which it did, node has gotta use it's issue tracker to actually manage things they are working on so okay to close this, will take this discussion to other venues

@Fishrock123
Copy link
Member

I don't actually think this is inactive in practice, however. Consider what we are doing for the node debug successor, node inspect: vendoring in the node-inspect module.

Truth is, anything like this is a pretty big step and not exactly a comfortable one to make. I think node inspect will probably help us learn a lot about vendoring a module in such a way, possibly the basis for doing more like it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues opened for discussions and feedbacks. meta Issues and PRs related to the general management of the project. module Issues and PRs related to the module subsystem.
Projects
None yet
Development

No branches or pull requests