New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for mapping field names #6

Open
JamesXNelson opened this Issue Feb 12, 2014 · 12 comments

Comments

Projects
None yet
4 participants
@JamesXNelson
Contributor

JamesXNelson commented Feb 12, 2014

The sourcemap spec currently does not have any support for deobfuscating field / variable names. This is a rather complex problem, but Ray Cromwell of the Gwt team has offered us a viable solution (copy pasted below):

I've proposed it internally, but it is more than trying to map fields. Javascript's concepts of scopes don't match Java, if you want to do heap or stack frame inspection, what you really need is a scope identifier. That's because if you're looking at the heap object { x: 2 }, there could be 200 different classes who's fields have been obfuscated to "x". There is no unique mapping for field "x".

Currently, source maps have a 5th field used for symbol mapping, but it only applies to the global scope (which is unique). Thus, when inspecting something in the global scope, you can deobfuscate it, but when inspecting an arbitrary heap object, or a local variable, you can't.

We have these other scopes to deal with:

  1. object scopes (field of a class)
  2. function scope (local variables of a class)
  3. intermediate scope (for function expressions, not function declarations, it's wedged between global and function)

One proposal is to just give every unique scope in the program a unique numerical scope id. So class A is scope 1, and function X on A is scope 2, class B is scope 3, etc. Then when emitting the symbol to the sourcemap, you prefix it with the scope-id, e.g. the deobfuscated symbol "foo" is stored as "2:foo" which means "foo is the deobfuscated symbol for the obfuscated symbol represented in scope 2"

That's the first part of the problem. The second part of the problem is defining a language independent mechanism or API for retrieving the current scope in a debugger. For hand coded Javascript for example, obj.proto.constructor.name might give it to you, if you consider constructors as representing class scopes. But for GWT, we have a different notion of runtime types, so ours might be obj._clazz_.seedId or some such. Dart or ClojureScript might do things differently.

Anyway, let's posit a function exists called ___objectToScope(obj) => scopeid. Now we can support field and frame mapping as follows:

Emitting Sourcemap from GWT:

  1. If class, and writing symbol for field, write symbol as ":field" or ":method"
  2. if local variable, and function has a name, then write scopeid:varname where scopeid is
    a) if non-anonymous function, "!funcName"
    b) if anonymous, "!lineNo" where lineno is the declaring line number
  3. else emit sourcemap as normal

When processing the source map, when handling the symbol column to rebuild a bidirectional map, use a multilevel map instead. You want scopeId => (obfuscatedSymbol, deobfuscatedSymbol)

During debugging, you need a magic function like this:

function objToScopeId(obj) {
if (typeof(obj) == "object" && obj.tM) { // has typeMarker field, it's a GWT object
return obj.__proto
.___clazz.seedId; // need to make it non-obfuscatable abbreviation
}
if (typeof(obj) == "function") {
if (obj.name) { return "!" + obj.name; }
// else anonymous function
// get line number of current function
}
return 0; // global scope
}

When the debugger is stopped, if someone inspects a heap object, you can first retrieve it's scope id, and then use that to recursively deobfuscate all the field names. If someone inspects a local variable, you can construct the scopeid of the current stack frame, and then use that to deobfuscate the locals.

Makes sense?

Anyway, this is kind of tunneling extra information the symbol name without extending the format. It'll break chrome deobfuscation (GWT already does). Best way to get a proposal accepted is to just extend the format, show off a demo, and they might accept the changes.

If you don't plan to support debugging obfuscated production code, you can avoid all of this by giving every field name a globally unique Javascript name and then add a global property map to the sourcemap. I find this a little less useful than a solution that will work in any compiler mode on any language.

@ivmarkov

This comment has been minimized.

Contributor

ivmarkov commented Apr 4, 2014

Interesting read: http://fitzgeraldnick.com/weblog/55/
Another proposal how to extend the SourceMap spec with field/variable names' deobfuscation.
The proposal goes as far as suggesting how conditional breakpoints and evals() based on the source language can be embedded in the sourcemap spec as well. Not that it makes these any easier to implement in e.g. GWT.

@skybrian

This comment has been minimized.

skybrian commented Jan 17, 2015

For Super Dev Mode, fields have a known suffix that's easily removed. I think this is the way forward. But probably GWT itself should provide a way to demangle JavaScript field names so that the compiler can change this without affecting the debugger. This will probably be provided by GWT app, not in the sourcemap.

We are doing something similar (but a bit higher level) for Chrome Dev Tools [1]. Probably other debuggers can use the same API once it's finalized.

https://docs.google.com/document/d/1FTascZXT9cxfetuPRT2eXPQKXui4nWFivUnS_335T3U/edit

@ivmarkov

This comment has been minimized.

Contributor

ivmarkov commented Jan 18, 2015

The API looks interesting, thanks for letting me know.

One thing I do not understand is the mixture of formatting and data? Sure, the suggested HTML subset is simple enough so that I can probably translate it into native drawing calls/widgets, but still - why the need of this in the first place? It should be up to the debugger to decide what colors, fonts and overall UI paradigms to use to render a collection of objects. Why not just returning a regular JSON, perhaps, also extended with the "object reference notion"?

@ivmarkov

This comment has been minimized.

Contributor

ivmarkov commented Jan 18, 2015

BTW: These functions are available in GWT master, I would assume, and so also in the nightly maven GWT binaries as well?

@ivmarkov ivmarkov referenced this issue Jan 18, 2015

Closed

#105 #106

@skybrian

This comment has been minimized.

skybrian commented Jan 19, 2015

The Chrome Dev Tools UI isn't plain text; there is basic syntax highlighting (such as making quoted strings a different color). I think the idea is to make the API as language-independent as possible while still providing the capability to do the same formatting that they do by default for regular JavaScript objects.

If necessary, GWT could have a more abstract API that just provides the data.

My demo code is here:
https://github.com/gwtproject/superdebug

This is entirely outside the GWT SDK and assumes GWT 2.7 for now, but it should be pretty easy to use (just add a dependency on the SuperDebug module).

After the Dev Tools feature is finished (not hidden and not just in Chrome Canary), I'll work on getting it into trunk. My plan is to enable it only for Super Dev Mode. In a production compile, there are too many compiler optimizations so I don't think it would be maintainable.

@ivmarkov

This comment has been minimized.

Contributor

ivmarkov commented Jan 19, 2015

The beauty of your solution is that it lives outside of GWT and is completely runtime-based, i.e. as far as I understand, it does not require any extra meta-data than what is already available in the GWT code generated by the GWT 2.7 SDM compiler. This means that I can e.g. just take your superdebug module and rewrite to be part of SDBG, rather than requiring users to include another GWT module. At least superficially, it looks as if Any.java - which seems to be the crux of it all - can be rewritten as plain Java code + calls into Runtime.evaluate() with Javascript snippets inspired by Any.java's JSNI code. In that way I can also have my data-only "implementation" without relying on the DevTools API at all.

(Of course in the long term my code has to be maintained as well, which is obviously a disadvantage, so I'm still liking the idea that you are providing a data-only API in addition to the html+data DevTools API).

BTW: Originally, @cromwellian was also mentioning that there is a scope mismatch between Java and JavaScript - as written in the initial (long) description of this issue. I don't think your formatter idea deals with the scope mismatch, or does it?

@skybrian

This comment has been minimized.

skybrian commented Jan 19, 2015

I "solved" it by focusing on object scope and ignoring the others for now. That is, for custom formatting, we assume you already have a JavaScript object and ignore where it came from.

It's also important to handle local variable names and static variables but this is out of scope (so to speak) for this project.

Also, for custom formatting, I'm not trying to map field names back to the source code. For example, in the inheritance tree Object->Foo->Bar, there may be private fields on Foo and Bar that have the same name. There is no way to figure out which field comes from which class using the current field names alone.

I think this might be solved with a different naming convention for fields. We could number the fields based on how far they are from the root of the inheritance tree, so any fields on Object would be numbered 0, fields on Foo numbered 1, and on Bar numbered 2. This would make field names unique without using the additional numbering from JsIncrementalNamer, and if you know the inheritance tree then you can disambiguate them.

So, I think if you want to use similar code in SDBG then that might be okay, except that it should disable itself when not using 2.7 and fall back on just showing the JavaScript objects. For 2.8 we might come up with a better naming scheme.

@skybrian

This comment has been minimized.

skybrian commented Jan 19, 2015

I created a bug to discuss field naming in 2.8:
https://code.google.com/p/google-web-toolkit/issues/detail?id=9106

@branflake2267

This comment has been minimized.

Contributor

branflake2267 commented Jan 19, 2015

Nice work!

@skybrian

This comment has been minimized.

skybrian commented Jan 19, 2015

On second thought, you wouldn't easily be able to rewrite most of superdebug to be part of sdbg. Most of it is written in Java and it's not that easy to call the same methods from outside GWT. For example, it uses Java iterators to loop over collections and maps. The methods might not be compiled into the GWT app at all if nobody else is using them.

I think it would still work for generic Java objects, since that part is mostly in JavaScript. But compiling it into the GWT app and providing a data API seems like a better idea since it will support more features.

ivmarkov added a commit that referenced this issue Jan 21, 2015

@ivmarkov

This comment has been minimized.

Contributor

ivmarkov commented Jan 21, 2015

In the meantime I have re-implemented most (but not all) formatting ideas of your superdebug module in the new BETA of the plugin (available here: http://sdbg.github.io/p2beta/ )

NOTE: To see what it does, you have to push the "Show Logical Structue" button in the Variables/Expressions view. For me, it was crucial to implement it in a way where the user would be able to switch between the raw view and the "Javaised" view with a click of a button, because the Javaised view of the world is not evaulatable. in other words, you cannot go to the Expressions view and write an expression based on the Javaised fields and variable names. It is just a formatting gimmick, just like what "Show Logical Structure" is supposed to be anyway.

I've walked an extra step by also trying to fix the names of the local variables in the functional scope, as well as the names of the static variables as it brings in additional readability.

Missing for now is the special formatting of the Collection, Map and Java Array types. It is possible with my approach but not as simple as in superdebug as I cannot just use an "Iterator" to create the readable structure. Rather, I have to take use an apriori/internal knowledge how GWT implements HasMaps, Java Arrays, etc. and traverse the JavaScript backing maps etc. In the end, it can end up being considerably more complex than necessary.

The best part of basing my implementation on the "Show Logical Structure" Eclipse API is that it is pluggable. In other words, tomorrow I or someone elsse can implement an ILogicalStructureType which is based on your superdebug module, and just calls into your superdebug JavaScript functions, as specified here https://docs.google.com/document/d/1FTascZXT9cxfetuPRT2eXPQKXui4nWFivUnS_335T3U/edit
For this however, a data-only API would be beneficial.

@skybrian

This comment has been minimized.

skybrian commented Jan 22, 2015

Very nice. I have included a screenshot in my presentation tomorrow.

SergeyZh pushed a commit to JetBrains/intellij-community that referenced this issue Jun 10, 2015

Don't try to map code fragment to source name. Often, sourcemaps is n…
…ot detailed - token is not mapped to token, so, it is not easy to extract identifier from code fragment

 So, if source map doesn't provide name mappings, we assume that names are not mangled.

GWT mangles name (https://code.google.com/p/google-web-toolkit/issues/detail?id=9106 sdbg/sdbg#6 https://youtrack.jetbrains.com/issue/IDEA-135356), but doesn't add name mappings. So, in this case we implement custom normalize member name.

Fix isInLibraryContent (CallFrameView uses it to select appropriate font color) — we must use source file to check, but not script file.

IDEA-135094 gwt: support name mappings
IDEA-135356 Local variable does not appear in variables list during debug
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment