Skip to content
This repository has been archived by the owner on Sep 2, 2023. It is now read-only.

Usage of Loader hook as APM/RASP vendors #508

Closed
vdeturckheim opened this issue May 1, 2020 · 12 comments
Closed

Usage of Loader hook as APM/RASP vendors #508

vdeturckheim opened this issue May 1, 2020 · 12 comments

Comments

@vdeturckheim
Copy link
Member

vdeturckheim commented May 1, 2020

I did not find a similar issue but let me know if this is a duplicate.

I build and maintain a RASP tool (basically a security APM) that relies on dynamic instrumentation:
When a module is loaded through require, I keep a reference on it so I can later monkeypatch some methods from this module.
(In some cases, I have to fallback on a more regular APM case where I monkeypatch methods when the module is loaded because it's not possible to do otherwise, but for performances reasons, delayed monkeypatching is the best solution for my use case).

I believe such thing should still be possible with the current implementation of loader hooks (let me know if I am missing something).
However, there are two points regarding which I could not wrap my head around to see how my module would fit with loader hooks:

Multiple APMs

It is pretty usual for my customers to use my module along with a regular APM. Currently, the APM
and the RASP intercept somehow the same things in the require internals and most of the time it works well (when it does not we fix our own code to be reliable over the impact of the other module and we quickly message the maintainer to let them know).

With loader hooks, having multiple instrumentation tools seems complicated. Is there a guideline on how to do that?
How would you recommend we ensure that a vendor does not becomes exclusive and prevents anyone else from instrumenting the same process?

UX

Currently, APMs and RASPs can be instanciated by doing require('<APM or RASP>') as the first line of executed code (or the second one if the users has another instrumentation tool). An alternative is to use the -r flag when starting the node process.

This is a pretty great UX for users right now as they have the choice to either commit some code that will enable the module or update the starting script. Depending of the internal organisation of the teams, installing the module falls in the hands of the dev or the ops people. This flexibility is very useful as sometimes we have to deal with interal tensions within the users' orgs and offering an alternative shows itself useful.

Would there be any chance to imagine a dev-based way of instanciating a loader hook?

Bottomline

I believe the first of this two concerns is pretty critical before ES modules can be considered as stable: this could lose customers to some instrumentation companies!

I am sure I can find some time to work in the next few months on this if needed (that would postpone Audit Hooks (1) (2) but that's probably acceptable).

@mcollina
Copy link
Member

mcollina commented May 1, 2020

From my perspective, this is needed to consider our ESM implementation production-ready for companies deploying Node.js servers.

@bmeck
Copy link
Member

bmeck commented May 1, 2020

@vdeturckheim

With loader hooks, having multiple instrumentation tools seems complicated. Is there a guideline on how to do that?

There is none, and there is contention in the Modules WG with some wanting there to not be multiple loaders.

There is an ancient PR about allowing multiple loaders that stalled : nodejs/node#18914 . For now the WG seems to be stalled on moving anything like such a PR forward and it is left to user land to solve the issue unless we find some new consensus.

How would you recommend we ensure that a vendor does not becomes exclusive and prevents anyone else from instrumenting the same process?

I believe this is out of scope by the nature of modifications, a loader (in CJS or ESM) can prevent access and/or cause incompatibility by the mere nature of modifying the same modules/internals. I'm not sure I understand the question given that.

Would there be any chance to imagine a dev-based way of instanciating a loader hook?

By dev-based, you mean not by providing a CLI argument/ENV when starting up node? Due to ESM timing, it couldn't realistically be done in the same workflow as currently is done.

import 'apm';
import 'y';

Would link y prior to apm executing and thus prevent apm from instruming y. So this kind of "import it to make it work" won't be viable.

Per package loaders are another stalled PR for similar reasons on the multiple loaders concern at : nodejs/node#18233 , such a method of out of band configuration has been discussed and would allow a method to have loaders that are configured without controlling the process runner/environment at startup. An alternative would be to have something like node policies' application level configuration file rather than per package, which could then ensure only 1 loader is applied.

@GeoffreyBooth
Copy link
Member

There is none, and there is contention in the Modules WG with some wanting there to not be multiple loaders.

I don't think it's quite as bad as this. I think there's consensus that we want this to be possible, we're just not sure how. For example, should Node provide an API for defining the sequence of multiple loaders, or leave it to userland to invent some kind of orchestrating loader the way that a task runner like Gulp defines how its plugins run in sequence or in parallel. The latter is possible today, should someone want to build such a tool; it's an open question whether such functionality needs to be part of Node core, especially since whatever we build would likely not be as full-featured as what userland could come up with.

Would there be any chance to imagine a dev-based way of instanciating a loader hook?

Per the JavaScript spec, by the time user code runs all the resolving and linking and parsing of all code has already occurred. (@bmeck can correct me if I'm oversimplifying here.) The ESM loaders provides a way to inject customization into those earlier phases, which isn't really part of the spec; and that kind of thing isn't possible at all in browsers, for example. (The equivalent there would be perhaps some kind of proxy that intercepts the browser's call to the server for a JavaScript file, and the server returns a modified one.)

So it sounds like what you're asking is, can an app or a package define a loader in a way that's not a CLI flag to node? Like a package.json "loader" key or the like. And the answer is . . . maybe? That would raise a lot of design questions, though, and would only really help the use case where a user can provide code for Node to run but can't set the Node CLI flags, and I'm not sure what a realistic example of such a use case would be. Some kind of cloud function?

@bmeck
Copy link
Member

bmeck commented May 1, 2020

@GeoffreyBooth that is correct.

I think the ask is to configure it on an application level, not per package so package.json would be potentially problematic.

@vdeturckheim
Copy link
Member Author

@bmeck @GeoffreyBooth thanks a lot both of you for your answers.
Bottomline the 2 questions are really:

  • can it be an alternative to the --loader flag?
  • how to do when 2 APMs are installed?

can it be an alternative to the --loader flag

I did not think about package.json. I am not sure if that's what you suggest but introducing a loaders entry which takes an array of modules in the application's package.json could do the trick for most of the time:

{
  "loaders": ["apm1", "apm2"],
  "dependencies": {
    "apm1": "x.y.z",
    "apm2": "a.b.c"
}

Wdyt?

how to do when 2 APMs are installed?

I see why you would want to let it to userland. TBH, there have been way too many situations where this put every APM/RASP vendors in a bad situation that I would strongly advocate against.
Leaving the ecosystem find a solution in userland can lead to a situation where a single vendor can decide to be exclusive to other (and therefore prevent newcomers on the market and highly impact existing companies) or more simply bring chaos. I spent a lot of time debugging issues in the way some tools I don't maintain nor work for just because they change some behavior improperly. Not having a clear way of doing APMs with ES modules will bring divergent hacky solutions. This will impact the instrumentation market at large and can also have a big trust impact from Node.js users who expect to use APMs in their applications as they do in other runtimes.

Do you think it would be possible to un-stall the topic? I can definitly spend time on that and it would probably be doable to bring some other APM vendors at the table.

Unrelated

Recently, I had a discussion with @bengl and we started thinking about providing an API in Node.js that woul look a bit like the javaagent one to provide a collaborative and clean way of instrumenting applications. This is pretty much linked to the work on Audit Hooks that resonates in the Securyt and the Tooling WGs.

@MylesBorins
Copy link
Contributor

MylesBorins commented May 1, 2020 via email

@vdeturckheim
Copy link
Member Author

Loaders are a hammer that make every esm problem look like a nail 😅

That's so true! Is there a pointer regarding the code coverage work? I probably missed it :/

Also, I had a great discussion with @bmeck and another one with @targos over the weekend and they helped me get on track (well, they help me a lot!) to explore other solutions for all of this. Hopefully there will be some proposals soon!

On a side note, I am amazed by the great work that has been/is done on ES modules!

@DerekNonGeneric
Copy link
Contributor

DerekNonGeneric commented May 9, 2020

/to @vdeturckheim, sorry for the late reply (wrapping up the semester).

I built out a prototype of a custom loader demonstrating the APM use case a few months ago. I’m not sure if you saw it, but it rudimentarily demonstrated what you had requested:

  • Multiple APMs
  • UX

Since there hadn’t been any activity in the space of so-called “composable loaders” at the time (nor am I aware of any currently), the solution for these two actually turned out to be quite similar.

Unfortunately, that prototype was a throwaway and currently exists (unmaintained) as a private repo of mine. It stopped being useful (and stopped evolving) due to lack of feedback (participation) from folks looking to implement APM loaders. Believe it or not, your requirements aren’t excessively rare and seem to be quite similar to what other APMs have been requesting.

We could potentially collaborate on a few prototype development cycles if that sounds like something that would pique your interest. From my perspective, the task of stabilizing custom loaders appears to be most approachable by actually fleshing these things out and getting some usage out of them.

@vdeturckheim
Copy link
Member Author

vdeturckheim commented May 14, 2020

@DerekNonGeneric , I missed your message too (sorry about that)😅 so no problem here.

I would actually love to iterate on this with you! What is a good way to kick it off? Could we jump on a call in the next week(s)?

@DerekNonGeneric
Copy link
Contributor

Could we jump on a call in the next week(s)?

Yeah, that sounds good! Can you add me on Google Hangouts?

@vdeturckheim
Copy link
Member Author

After a few iterations, I am pretty confident there is a way to compose loaders (thanks a lot to @bmeck , @DerekNonGeneric and @targos here!).
There are multiple ecosystem implementations and iterating around it to bake something will work out.
There are still a few UX questions but they should resolve along the discussion of building something for multiple loaders.

Probably better to close this issue for now :)

@bmacnaughton
Copy link

bmacnaughton commented Jul 3, 2020

@MylesBorins https://github.com/bcoe could one of you expand on/point me at the code:

One thing worth exploring for an apm solution could be direct hooks into V8
similar to what @bcoe did with code coverage.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants