Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERL-1350: Cannot produce compile:forms asm automatically, nor convert beam_disasm:file output to .S #4425

Closed
OTP-Maintainer opened this issue Sep 15, 2020 · 4 comments
Labels
bug Issue is reported as a bug help wanted Issue not worked on by OTP; help wanted from the community priority:low team:VM Assigned to OTP team VM

Comments

@OTP-Maintainer
Copy link

Original reporter: JIRAUSER16603
Affected version: Not Specified
Component: compiler
Migrated from: https://bugs.erlang.org/browse/ERL-1350


The real problem here is that a few functions are not exported.

The first problem is that compile:preprocess_asm_forms/1 is not exported publicly.  This prevents one from taking a .erl or .S file and automatically generating the asm tree variant.  Of course the code for compile:preprocess_asm_forms/1, compile:collect_asm/2, compile:collect_asm_function/2 can just be copied along with the asm_module record and then one can cleanly do this transformation.

Use case: generating on the fly compiled functions via compile:forms without having any usage of external files.  Both the ASM source and BEAM can be provided and generated in memory.  And the ASM source is now in an erlang preprocessed format and requires no strings or parsing.  Sure you could unstick, modify a copy of the compile source and recompile it but thats even worse than a solution like:
{code:java}
-record(asm_module, {module,-record(asm_module, {module,      exports,      labels,      functions=[],      attributes=[]}).

preprocess_asm_forms(Forms) ->
    R = #asm_module{},
    R1 = collect_asm(Forms, R),
    {R1#asm_module.module,
     {R1#asm_module.module,
      R1#asm_module.exports,
      R1#asm_module.attributes,
      lists:reverse(R1#asm_module.functions),
      R1#asm_module.labels}}.collect_asm([{module,M} | Rest], R) ->
    collect_asm(Rest, R#asm_module{module=M});
collect_asm([{exports,M} | Rest], R) ->
    collect_asm(Rest, R#asm_module{exports=M});
collect_asm([{labels,M} | Rest], R) ->
    collect_asm(Rest, R#asm_module{labels=M});
collect_asm([{function,A,B,C} | Rest0], R0) ->
    {Code,Rest} = collect_asm_function(Rest0, []),
    Func = {function,A,B,C,Code},
    R = R0#asm_module{functions=[Func | R0#asm_module.functions]},
    collect_asm(Rest, R);
collect_asm([{attributes, Attr} | Rest], R) ->
    collect_asm(Rest, R#asm_module{attributes=Attr});
collect_asm([], R) -> R.collect_asm_function([{function,_,_,_}|_]=Is, Acc) ->
    {lists:reverse(Acc),Is};
collect_asm_function([I|Is], Acc) ->
    collect_asm_function(Is, [I|Acc]);
collect_asm_function([], Acc) ->
    {lists:reverse(Acc),[]}.
s_to_beamasm(Fname) ->
  {ok,Forms0} = file:consult(Fname),
  preprocess_asm_forms(Forms0).
compile_beamasm(Fname) ->
  {ok, _, CplBin} = compile:forms(element(2, s_to_beamasm(Fname)), [binary, from_asm, [outdir, "ebin"]]).

{code}
The second issue is that beam_disasm:file produces output that is effectively useless without beam_disasm:pp which are inconveniently only exported if the DEBUG_DISASM flag is set on compilation of beam_disasm.  So you again have to mess with library files to get the job done:
{code:java}
-define(LibPath, code:root_dir() ++ "/lib/").
disassembleToDis(Outfile, Fname) ->
  Filename = case is_atom(Fname) of true -> code:which(Fname); _ -> Fname end,
  code:unstick_mod(beam_disasm), c:c(?LibPath ++ "/compiler-7.6/src/beam_disasm.erl", [debug_info,{d,'DEBUG_DISASM'},{outdir, "ebin"}]),
  DisasmCode = element(6, beam_disasm:file(Filename)), beam_disasm:pp(Outfile, [{file, Filename}, {code, DisasmCode}]), ok.
{code}
There are some reasons you might want to disassemble a BEAM file to a recompilable .S.  Namely to make patches to it and recompile, for example, when you do not have the source code.  Yes this is a debugging type of operation but why not allow it by default.

I see no good reason why important functions like this are not exported which are the subjects of questions on the web now and again, and cause custom built, hacks to be provided as answers since most do not want to study the details of a large source base to find this stuff.

See the attached guidance of the flow of transformations which are needed for going between all the variations of erl source, BEAM, BEAM assembly, and AST.  It is interesting because the epp module is not mentioned.  Of course some transformations are missing such as erl -> AST and the picture should be further flushed out.  In fact, something like this should be part of the erlang documentation...
@OTP-Maintainer
Copy link
Author

bjorn said:

We will accept a pull request that add back export of the functions dfs/1, df/1, files/1, pp/1, pp/2. We will also accept a pull request that would make the output of beam_disasm have the same format as in .S files (so that it would be possible to compile them again).

@OTP-Maintainer
Copy link
Author

john said:

{quote}Namely to make patches to it and recompile, for example, when you do not have the source code.
{quote}
Keep in mind that {{.beam}} files lack certain information the validator needs, such as parameter and return type annotations, and these need to be added by hand for recompilation to work. We require them because it simplifies validation a great deal; they're hard to infer but easy to check, and the compiler always hands them to us anyway.

You can disable validation by modifying the source, but I highly recommend that you don't (and we will not add an option for it). The validator is not a mere linter and files that fail to validate are bound to corrupt things in runtime sooner or later, and usually in very subtle ways.

@OTP-Maintainer
Copy link
Author

JIRAUSER16603 said:

Okay so it seems default exporting is the best solution since you mentioned that HiPe needs beam_disasm to have the alternative format.  And this gives the most flexibility as its just an extra function call to do the translation.

Yea I see the validation hints like
{code:java}
{'%',{var_info,{x,0},[{type,{t_list,any,nil}}]}}.
{code}
Did not realize those are strictly necessary!  Anyway I get the point with validation, again my research is with obfuscation so all tricks are on the table.  But I can see why though I can get away with it in some narrow highly thought out special scenarios, its hardly worth mainstreaming in any way.

What about exporting compile:preprocess_asm_forms/1 so that one could take .S/.erl file and get a usable input to compile:forms?

@OTP-Maintainer OTP-Maintainer added bug Issue is reported as a bug help wanted Issue not worked on by OTP; help wanted from the community team:VM Assigned to OTP team VM priority:low labels Feb 10, 2021
@bjorng
Copy link
Contributor

bjorng commented Feb 24, 2021

Closing this issue because of inactivity.

@bjorng bjorng closed this as completed Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug help wanted Issue not worked on by OTP; help wanted from the community priority:low team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

2 participants