Meta data for llnode - Node.js version 8 #50

mhdawson · 2017-11-21T22:27:56Z

In discussions at NodeConf EU I learned that we don't have the meta data needed for llnode in Node.js version 8.

It sounds like the change to the turbofan pipeline in V8 meant that there were so many changes that those who usually maintained the meta data were not able to keep up.

I think we need to

Identify who (sadly I don't know off the top of my head who was doing this valuable work recently) had been doing the work in order to get some context on how to do it efficiently.
Put together a plan and volunteers to help get the data up to date for Version 8
Identify volunteers to keep the data up to date
Put testing in place/notifications so that we have CI failures when the data is not up to date. We may want some sort of step that validates this as part of the release process.

@yunong who mentioned this to me
@MylesBorins who asked if we had a in issue covering this.

joyeecheung · 2017-11-22T05:12:26Z

I am interested in helping this one out. I am working on nodejs/llnode#143 to bring Errors back, and worked on thin strings & stub frames in nodejs/llnode#121 before. Also working on nodejs/llnode#144 to get the CI output the actual errors from lldb instead of silently timing out.

BTW also working on the JS API nodejs/llnode#147 , although need to think about how to compile both/one of the addon and the plugin when installed by npm.

cjihrig · 2017-11-22T06:24:30Z

Do you have specifics on what isn't working? For example, the errors implementation seems to have changed in V8 like Joyee mentioned, but as of nodejs/llnode@3ab289b, all of the postmortem metadata appeared to be loading properly, and the test suite was passing with TurboFan (on my machine anyway - we do need a better CI story there).

mhdawson · 2017-11-22T14:24:21Z

@yunong can you list what you was not working for you ?

mhdawson · 2017-11-29T20:49:33Z

ping @yunong

mhdawson · 2017-11-29T20:52:37Z

@cjihrig do you know who has been updating the metadata (since it sounds like it has been done). Would like to understand the process and see if we have it documented somewhere.

cjihrig · 2017-11-29T21:36:12Z

I did the bulk of the metadata update. As far as I know, it's not documented anywhere. It mostly involves seeing what metadata no longer exists (which llnode will tell you) and then jumping through V8 commits to find out what happened to it.

A big problem is that the metadata can disappear from V8 and Node, and doesn't present any issues until someone is trying to do postmortem debugging with llnode. I'll work on putting together a test in core to try to catch this scenario. Another option is to make compilation fail when certain constants don't exist. This currently happens on SmartOS - constants disappear, and src/v8abbr.h or the ustack helper simply don't compile.

mhdawson · 2017-11-29T22:05:24Z

@cjihrig thanks, it would be a good step to have tests that will tell when we have problem with the metadata.

How often is has an update needed ? If it is for most V8 versions then we probably want to try to have the update process integrated into the steps used to update V8 in core.

bnoordhuis · 2017-11-29T23:33:24Z

Practically every major V8 upgrade and infrequently after minor or patch updates.

Paradoxically, before the V8 team started maintaining the postmortem script we knew about most breaking changes because the build would break after an upgrade. :-)

to have the update process integrated into the steps used to update V8 in core

llnode is complex enough that I wouldn't want to inflict that on someone upgrading V8.

jclulow · 2017-11-29T23:48:09Z

to have the update process integrated into the steps used to update V8 in core

llnode is complex enough that I wouldn't want to inflict that on someone upgrading V8.

One might argue that by updating V8 without doing the work to ensure the post mortem metadata is correct, you're actually inflicting broken software changes on users.

Ensuring the post mortem bits still function is part of the real cost of updating V8, and the process for updating V8 should reflect that.

bnoordhuis · 2017-11-29T23:56:36Z

Easy to say when you're not doing the work. If Joyent has an interest in postmortem work, they should at least try a little harder to keep the upstream smartos build in working shape.

jclulow · 2017-11-30T00:06:53Z

This isn't really a Joyent-specific issue at all. We tend to use mdb_v8; this issue seems to be centred on llnode, a similar but unrelated body of software that also consumes the post mortem metadata.

bnoordhuis · 2017-11-30T00:14:26Z

I was specifically referring to nodejs/node-v8#21, which is smartos-specific and was ultimately fixed by people with zero interest in that platform.

I'll make you a deal: your company makes upgrades frictionless on the smartos front and we'll look into keeping the postmortem metadata in sync across upgrades. That's in your best interest too, right?

jclulow · 2017-11-30T00:21:17Z

I'm not really sure what motivates your apparent hostility, but nonetheless would like to note that @cjihrig works for Joyent. You'll see him earlier in this thread, where he talks about working on the metadata update. He also seems to have engaged on, and fixed, the other issue you're referencing.

MylesBorins · 2017-11-30T00:28:13Z

/cc @nodejs/v8 who can maybe glean some insight from the V8 side

@cjihrig is updating the metadata something that could be automated?

Currently for the majority of V8 updates we are using https://github.com/targos/update-v8 right now.

One might argue that by updating V8 without doing the work to ensure the post mortem metadata is correct, you're actually inflicting broken software changes on users.

While I can see your point here I'm not 100% that I agree. I would like to challenge the post-mortem working group to come up with a scalable solution to this problem. We have put quite a bit of effort into keeping V8 up to date. As one of the people doing a bunch of the work I wasn't even aware that this was a problem. While I recognize that most-mortem analysis is an important feature, I don't think that we can possibly block upgrading the VM when the team working on this doesn't even know how to fix the problem.

For example nodejs/node#16271 has been open for over a month and potentially broken meta data has not come up once.

Without individuals from postmortem actively involved in the V8 upgrade process I don't think it is entirely fair to put to onus on those who are not actively invested in it.

All that being said, I'm totally willing to chip in on the work if we can find a reasonable way to get it done

bnoordhuis · 2017-11-30T00:31:45Z

I didn't know Colin worked at Joyent. That makes my irritation at smartos being broken with every upgrade a little less severe although I'd still like it a lot better if you proactively kept it in working shape, like IBM does for its platforms.

I was going to circle back to what @jclulow wrote earlier:

Ensuring the post mortem bits still function is part of the real cost of updating V8, and the process for updating V8 should reflect that.

But at this point I'd just be repeating what @MylesBorins said: it should be up to the stakeholders to make it work. That includes me, by the way, and could be you, @jclulow.

cjihrig · 2017-11-30T00:38:44Z

@cjihrig is updating the metadata something that could be automated?

No, I don't think so. It would be a lot better if V8 had some sort of test that alerted when one or more of the symbols disappeared. We don't find out that it has happened until @targos updates V8 in node-v8 and the compile on SmartOS fails. Even then, that will only catch things that are used by mdb_v8. llnode uses a mostly (but not completely) overlapping set of metadata, but gives no type of error notification until someone is trying to use it to debug a core file.

davepacheco · 2017-11-30T01:04:08Z

I would like to challenge the post-mortem working group to come up with a scalable solution to this problem.

The best solution would be for V8 to grow first-class postmortem analysis tools (i.e., maintained with V8). Obviously, that's not a small ask. But until that happens, debugging tools like mdb_v8 and llnode have to re-implement a bunch of the VM internals. They encode VM implementation details that change from version to version. (If you want to know more about this, I wrote about what's involved in this.) The metadata in V8 was created to parametrize the parts of that that are likely to change frequently and easy to parametrize. But there's no way to know that the metadata is complete and correct except to actually exercise postmortem tooling against a Node version built with that metadata. When the metadata is incorrect or incomplete, there's no way to automatically fix it -- you have to understand the semantics of what changed in V8, add metadata to describe that, and potentially update the debugging tool to use it. You don't really know you've done it correctly until you've done all of that. Trying to automate this process is like trying to automate changes to a program that depends on a library whose interface is constantly changing in massive, complex ways -- because that's essentially what's going on.

One fundamental question is: is postmortem debugging (and the metadata that's used to support it) a first-class, supported part of the Node platform? If it is, then it should be tested regularly like any other supported feature. mdb_v8 has an automated test suite that could be run with each new Node version. (It doesn't have to be mdb_v8's test suite -- I just know that one well. It can be run for an arbitrary Node version, and if that passes, you have decent coverage on mdb_v8's postmortem facilities for the new Node version. Maybe the better option would be to incorporate all the major postmortem tools' test suites, since they have different sets of features that require different metadata.)

In practice, postmortem support has historically been more of a second-class feature: it's best-effort, and tools like mdb_v8 have to deal with metadata that's missing. This is a pretty major pain, but on the other hand, most of us who work on mdb_v8 can't spend a ton of time dealing with every V8 update. As a result, mdb_v8 tends to be somewhat behind in support for latest Node versions, but that's okay for most of its users.

MylesBorins · 2017-11-30T01:11:05Z

Perhaps @nodejs/build would be interested in making a CI job with the mdb_node teat suite (or llnode)

At the very least we can use that to find problems. The amount of work to keep things up to date sounds non trivial, and in unsure how to reconcile that with our current work to improve the speed of V8 updates

That being said I would like to help improve the process... It seems like at the very least we can remove the element of suprise

joyeecheung · 2017-11-30T04:34:28Z

@MylesBorins We have a llnode CI on Jenkins https://ci.nodejs.org/view/post-mortem/job/llnode-pipeline/ , although it only works with Node.js releases at the moment. We can make it work with github refs as well so we could know in advance if an V8 update is going to break llnode. (It doesn't have to block the V8 update, but we would get the notice in advance and fix sooner)

The llnode Travis CI is broken at the moment but should be good after nodejs/llnode#144 lands. Once Travis is green it should not be hard to get Jenkins working again.

mhdawson · 2017-11-30T21:54:14Z

My interest is that the more a runtime is used in production the more important post-mortem tooling is and I think it is important that we move it from "second-class" to "first-class".

We have at least one end user (@yunong) with a large deployment indicating that this not working will delay adoption of 8.x in their environment.

As with everything the challenge is how to get any additional work done on a regular basis, but as suggested a good first step is being alerted so that we know when its broken, spreading the knowledge of what work is required and what it takes to do it and then looking to see if we have an creative ways to get it done.

We had the same challenge with updates with IDDE. There were 2 parts. The first was updating the meta data. We managed to automate the generation of that. At the time it used some proprietary tools so it could not be open sourced but I believe the part that generates the meta data now has removed that part of the dependency (and possible open source in OMR) so it might be worth another look (although I've not had enough time to look at the Node.js metadata generation to know if that makes sense). The second part was the code that uses the metadata to implement the commands and those still needed to be maintained by hand.

mmarchini · 2017-11-30T22:57:34Z

On nodejs/node#14901 I proposed documenting what the postmortem metadata provided by V8 and Node are, and what it can be used for. This document could be improved with information regarding what needs to be done to keep the metadata updated.

Regarding how to test for missing or altered metadata after V8 is updated without relying on llnode or mdb_v8 is creating a script to compare the metadata available before and after v8 is updated. This approach would help us identify what constants were added, changed and (especially) deleted. I can write a prototype for that to see if it works.

hashseed · 2017-12-01T08:57:27Z

While I don't think V8 will come up with a postmortem tool on-par with llnode anytime soon, we are willing to at least help find issues early. However, it's not entirely clear to me what needs to happen.

V8 does have a build target called postmortem_metadata. Building that target works, because it simply runs src/tools/gen-postmortem-metadata.py to generate a debug-support.cc file. That generated source file is not compiled or linked anywhere, which might have detected changes or breakages.

I would welcome upstream contributions where a new build target is added. That would force the V8 team to make changes to src/tools/gen-postmortem-metadata.py when symbols are removed or renamed, and a simple git log src/tools/gen-postmortem-metadata.py would give you clues on what needs to change. Of course this might not work in many more complicated cases, for example if we add a new stack frame type, or change the stack frame layout.

But it would be a step into the right direction?

bnoordhuis · 2017-12-01T09:36:05Z

Yang, the node.js integration buildbot already takes care of that. I haven't seen a release that outright failed to build since it went online (apart from smartos - but that's joyent's job.)

The difficulty is in updating downstream consumers (e.g., llnode) but that's ultimately manual labor; @davepacheco summarized it well. That's also why...

creating a script to compare the metadata available before and after v8 is updated

...doesn't really help. The best step forward is arguably @joyeecheung's suggestion of extending the llnode CI.

joyeecheung · 2017-12-01T10:38:50Z

BTW I opened nodejs/llnode#151 which can print missing metadata required by the llnode test suites. See that OP about what is currently missing in Node v9.2.0 on MacOS (and a bit of Node v8.9.0 Linux because I use that coredump to speed up local test runs). The output there is not exhaustive because there is too much noise from invalid casts and pointer loading and I turned those off in the debug output for now.

davepacheco · 2017-12-01T15:46:22Z

V8 does have a build target called postmortem_metadata. Building that target works, because it simply runs src/tools/gen-postmortem-metadata.py to generate a debug-support.cc file. That generated source file is not compiled or linked anywhere, which might have detected changes or breakages.

It should be compiled and linked. Otherwise, the metadata wouldn't wind up in the final binaries at all. Or am I misremembering how this works?

mmarchini · 2017-12-01T15:55:19Z

It should be compiled and linked. Otherwise, the metadata wouldn't wind up in the final binaries at all.

There's a flag to compile and link the metadata. V8 has this flag set to false by default, and Node has it set to true with the exception of the Windows build. When the flag is on it will perform both actions, and not just the compilation (from deps/v8/BUILD.gn):

if (v8_postmortem_support) {
  sources += [ "$target_gen_dir/debug-support.cc" ]
  deps += [ ":postmortem-metadata" ]
}

davepacheco · 2017-12-01T15:56:43Z

While I don't think V8 will come up with a postmortem tool on-par with llnode anytime soon, we are willing to at least help find issues early.

V8 would not need to come up with a complete debugger (although that would be ideal). Another option would be to provide a library or tool with a stable interface that was capable interpreting basic structures and spitting out a description, maybe using the proposed format:
https://github.com/davepacheco/post-mortem/blob/163a65db6cde79e8deb409e55ff61c2023f15a63/docs/working/common-heapdump-format/README.md

A library could even be phrased in terms of operations like "lookup the address of this native symbol" and "read this chunk of memory". Consumers would then implement these operations. That would allow us to plug it into various existing debuggers for core files, live processes, and even existing heap dumps; and the V8 team wouldn't need to maintain that code. That's still a bunch of work, but it's much less than what's required to build a full debugger.

I'm really glad that the postmortem metadata has taken off and we've been able to build multiple useful debugging tools with it -- and all of these suggestions are great! But it's always been something of a hack. It's somewhat amazing that it's worked as well as it has. Without real support from the VM, this is always going to be brittle.

cjihrig · 2017-12-01T15:57:49Z

It looks like it is used if v8_postmortem_support is true. One problem is that gen-postmortem-metadata.py silently fails us in some cases. For example, if I change the line:

'Code, instruction_size, int, kInstructionSizeOffset',

to

'Code, instruction_size_foobar, int, kInstructionSizeOffset',

there is no indication that anything is wrong. That means that if the name changes in the V8 codebase, but not in that script, we end up with broken metadata.

hashseed · 2017-12-01T18:38:23Z

@davepacheco FWIW we actually have a tool that we use to debug crash dumps that come as part of Chrome's crash report. See v8/tools/grokdump.py. It can be run with -w for a web UI.

The crash dumps that we receive are usually not much more than 1MB in size and usually contains the raw stack and fractions of the heap. The tool shows that, annotates parts of the stack, and there is a way to tell it to assume that a certain address represents a V8 heap object. We can then deduce other known objects from that.

The way the deduction works is that we keep an updated v8/tools/v8heapconst.py around that provides the fixed addresses of initial heap objects that are immortal immovable. The v8heapconst.py commited to V8's repo assumes x64 release build, but it can be generated via mkgrokdump for other platforms too.

Every heap object's first field points to its map object. Every map object's first field points to the map map, or meta map. Since we know the fixed offset of the meta map, we can deduce the addresses of all other immortal immovable objects, including other maps. That way we can easily tell whether something on the heap represents a string, a shared function info, or a context, etc.

However, this tool is not really polished, and deals with crash dumps that are very incomplete.

davepacheco · 2017-12-01T18:50:21Z

@hashseed That's fantastic information! We could potentially use some of those same fixed addresses to bootstrap other debugging tools. Thanks for that information.

mhdawson · 2017-12-13T21:57:27Z

@yunong still hoping we can get your input on a reproduce for problems.

yunong · 2017-12-14T17:26:33Z

@mhdawson We still use mdb_v8 quite a bit for post-mortem debugging. After speaking to @cjihrig offline, it does appear that mdb_v8 isn't yet compatible with node 8.

mhdawson · 2017-12-14T21:13:09Z

@yunong thanks, @cjihrig once you are ready can you comment here on whether the issue is the metadata in Node.js or something specific to mdb_v8 so that we can figure out if we need more doc, ci etc. to alert for this issue in the future.

cjihrig · 2017-12-14T21:19:31Z

This is specific to mdb_v8. After I updated llnode, I started work on updating mdb_v8. There are a few implementation details (not related to the metadata) in mdb_v8 that are different from llnode, and haven't had the time to revisit. I have a WIP branch here. Most of the tests were passing with TurboFan the last time I checked.

PR-URL: nodejs#151 Refs: nodejs/post-mortem#50 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

PR-URL: #151 Refs: nodejs/post-mortem#50 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

mhdawson · 2018-02-07T21:03:52Z

@cjihrig do you think this needs to stay open or should we just close.

cjihrig · 2018-02-07T21:23:58Z

I think we can close this. I think your original list of issues has been addressed. Of course, conversation can continue.

mhdawson added the diag-agenda label Nov 21, 2017

mike-kaufman mentioned this issue Nov 29, 2017

Diag WG Meeting - 2017-11-29 nodejs/diagnostics#114

Closed

joyeecheung mentioned this issue Nov 30, 2017

CI for llnode? nodejs/build#777

Closed

joyeecheung mentioned this issue Dec 1, 2017

src: improve debug info printed when LLNODE_DEBUG=true nodejs/llnode#151

Merged

davepacheco mentioned this issue Dec 1, 2017

incorporate information about initial, immovable heap objects TritonDataCenter/mdb_v8#98

Open

mike-kaufman mentioned this issue Dec 13, 2017

Diag WG Meeting - 2017-12-13 nodejs/diagnostics#115

Closed

joyeecheung added a commit to joyeecheung/llnode that referenced this issue Dec 19, 2017

src: print more debug info when LLNODE_DEBUG=true

72699d8

PR-URL: nodejs#151 Refs: nodejs/post-mortem#50 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

joyeecheung added a commit to nodejs/llnode that referenced this issue Dec 19, 2017

src: print more debug info when LLNODE_DEBUG=true

4846299

PR-URL: #151 Refs: nodejs/post-mortem#50 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

mike-kaufman mentioned this issue Jan 9, 2018

Diag WG Meeting - 2018-01-10 nodejs/diagnostics#116

Closed

mike-kaufman mentioned this issue Jan 24, 2018

Diag WG Meeting - 2018-01-24 nodejs/diagnostics#117

Closed

mike-kaufman mentioned this issue Feb 5, 2018

Diag WG Meeting - 2018-02-07 nodejs/diagnostics#145

Closed

cjihrig closed this as completed Feb 7, 2018

Meta data for llnode - Node.js version 8 #50

Meta data for llnode - Node.js version 8 #50

Comments

mhdawson commented Nov 21, 2017

joyeecheung commented Nov 22, 2017 • edited

cjihrig commented Nov 22, 2017 • edited

mhdawson commented Nov 22, 2017

mhdawson commented Nov 29, 2017

mhdawson commented Nov 29, 2017

cjihrig commented Nov 29, 2017

mhdawson commented Nov 29, 2017

bnoordhuis commented Nov 29, 2017

jclulow commented Nov 29, 2017

bnoordhuis commented Nov 29, 2017

jclulow commented Nov 30, 2017

bnoordhuis commented Nov 30, 2017

jclulow commented Nov 30, 2017

MylesBorins commented Nov 30, 2017

bnoordhuis commented Nov 30, 2017

cjihrig commented Nov 30, 2017

davepacheco commented Nov 30, 2017

MylesBorins commented Nov 30, 2017

joyeecheung commented Nov 30, 2017 • edited

mhdawson commented Nov 30, 2017

mmarchini commented Nov 30, 2017 • edited

hashseed commented Dec 1, 2017 • edited

bnoordhuis commented Dec 1, 2017

joyeecheung commented Dec 1, 2017 • edited

davepacheco commented Dec 1, 2017

mmarchini commented Dec 1, 2017

davepacheco commented Dec 1, 2017

cjihrig commented Dec 1, 2017

hashseed commented Dec 1, 2017

davepacheco commented Dec 1, 2017

mhdawson commented Dec 13, 2017

yunong commented Dec 14, 2017

mhdawson commented Dec 14, 2017

cjihrig commented Dec 14, 2017

mhdawson commented Feb 7, 2018

cjihrig commented Feb 7, 2018

joyeecheung commented Nov 22, 2017 •

edited

cjihrig commented Nov 22, 2017 •

edited

joyeecheung commented Nov 30, 2017 •

edited

mmarchini commented Nov 30, 2017 •

edited

hashseed commented Dec 1, 2017 •

edited

joyeecheung commented Dec 1, 2017 •

edited