Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable CSE for the initialization function #1455

Merged
merged 8 commits into from Mar 15, 2018

Conversation

Projects
None yet
7 participants
@chambart
Copy link
Contributor

chambart commented Oct 27, 2017

This is #1453 with only the CSE change. This is sufficient for compiling the reported problematic files of https://caml.inria.fr/mantis/view.php?id=7630 with flambda with a few hundreds MB of ram while it took ~10GB without this patch.

@@ -99,6 +99,12 @@ let rec regalloc ppf round fd =
Reg.reinit(); Liveness.fundecl ppf newfd; regalloc ppf (round + 1) newfd
end else newfd

let cse fd =
if fd.Mach.fun_fast_compile then

This comment has been minimized.

@gasche

gasche Oct 27, 2017

Member

You should link at a relevant MPR to justify the choice of disabling specifically CSE.

@@ -3084,6 +3085,7 @@ let compunit (ulam, preallocated_blocks, constants) =
let c1 = [Cfunction {fun_name = Compilenv.make_symbol (Some "entry");
fun_args = [];
fun_body = init_code; fun_fast = false;
fun_fast_compile = true;

This comment has been minimized.

@gasche

gasche Oct 27, 2017

Member

You should link to the relevant MPR to justify the specific choice of enabling fast_compile for compilation-unit functions.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Oct 30, 2017

I'm unsure whether we need the new field fun_fast_compile: wouldn't the negation of the existing field fun_fast work for this purpose?

@chambart

This comment has been minimized.

Copy link
Contributor Author

chambart commented Oct 30, 2017

It's not exactly the same thing, for instance, I know that the frama-c guys set it to false to reduce code size.
Disabling CSE in that case would be counterproductive.

@damiendoligez damiendoligez added this to the 4.07-or-later milestone Nov 2, 2017

@alainfrisch

This comment has been minimized.

Copy link
Contributor

alainfrisch commented Nov 6, 2017

  • I'm unsure about the name of the new field. At some point, we might want to add precise control over compilation options (disabling other passes, choice of register allocator, etc) at the function level (for instance with annotation, or other heuristics), and fun_fast_compile would be too coarse grained. I suggest fun_cse_disabled (or fun_cse_enabled).

  • Is the problem really specific to flambda? I.e. does the "transl_store" compilation strategy for non-flambda avoid the problem? If so, what about disabling cse for the entry function only in flambda mode?

  • From https://caml.inria.fr/mantis/view.php?id=7630, I understand that the problem comes from sharing loads of addresses of global symbols. Cannot this happen in arbitrary code?

@chambart

This comment has been minimized.

Copy link
Contributor Author

chambart commented Nov 6, 2017

  • I have no objection to changing the name. Note that a curated set of options that favor compilation time
    against performance per function might be useful. For instance, if linscan is deemed stable enought at
    some point in the future, it would fit such a criteria.

  • The real problem triggers only with flambda, but the compilation time benefit is noticeable with clambda
    too while not degrading performances (unless you wrote your computation heavy loop at toplevel).
    The main difference between flambda an non-flambda in that case is the number of symbols. The symbol
    address loading is shared too, but as there is only one, the interaction is not too problematic.

  • I never encountered a function remotely as large as what can appear at toplevel. Something similar
    could appear if you turned your large compilation unit into a functor for instance, but in that situation CSE
    wouldn't be the main problem.

Such an option could be available through an annotation per function. I wouldn't like something too low level that explicitly disable passes. Such an option would imply [@inline never] as inlining would erase the annotation.

@chambart

This comment has been minimized.

Copy link
Contributor Author

chambart commented Nov 17, 2017

This PR is suggested for 4.06.1. For the bugfix release, this would only apply to flambda to minimize the impact.

This allows uunf to build (see http://obi.ocamllabs.io/logs/eedf08d88b07040656ce206266ba0c82.txt) with flambda.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Nov 18, 2017

I'm OK with the basic approach but like @alainfrisch I have doubts about the name of the new field, and moreover I have doubts that we should add a new Boolean field for every variation in code generation we need.

What about a single field fun_codegen_options of type codegen_option list? The type codegen_option would contain constructors such as CG_no_CSE. The existing fun_fast field could also be removed and turned into a codegen_option, perhaps by reversing the logic (no flag = optimise for speed; CG_reduce_code_size = optimise for size). The general idea is that the absence of flags would correspond to the most common code generation choices, e.g. "optimise for speed", "perform CSE", etc.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Nov 18, 2017

Also the issue may not be hot enough to go in a possible 4.06.1 release. At any rate let's shoot for 4.07.

@xavierleroy xavierleroy modified the milestones: 4.07-or-later, 4.07 Nov 18, 2017

@damiendoligez damiendoligez modified the milestones: 4.07, 4.06.1 Nov 20, 2017

@ejgallego

This comment has been minimized.

Copy link

ejgallego commented Nov 23, 2017

Indeed with this patch the Coq problems seem solved, thanks to all!

Informal timings:

A = 4.06.1+flambda+pr1455 "-O3 -unbox-closures"
B = 4.06.1+flambda+pr1455 "-O3 -unbox-closures -native-compiler no"
C = 4.06.0
D = 4.06.0 "-native-compiler no"
coq-build-ocaml A [fl-n=yes] B [fl-n=no] C [n=yes] D [n=no]
real 2m49.756s 2m45.236s 0m33.169s 0m32.667s
user 10m1.800s 10m10.356 2m59.352s 2m59.456s
sys 0m53.052s 0m53.788s 0m37.452s 0m35.896s
coq-build-vo
real 2m45.629s 1m47.738s 2m30.453s 2m0.606s
user 10m29.820s 7m29.188s 9m37.728s 8m19.564s
sys 2m57.500s 1m56.512s 2m46.096s 1m57.520s

native compiler is called with -Oclassic.

@mshinwell

This comment has been minimized.

Copy link
Contributor

mshinwell commented Dec 6, 2017

@chambart Can you try to finish this so we can at least consider it for 4.06.1? (I like the idea of @xavierleroy 's suggestion as regards a flags field.)

@chambart

This comment has been minimized.

Copy link
Contributor Author

chambart commented Dec 6, 2017

I followed @xavierleroy's suggestion. It is a bit cleaner. The flag type is defined in Cmm, it might not be the most obvious place, but it is the most convenient dependency wise. I didn't push the change to the very end of the backend as the CSE flag is not relevant there anymore and that would increase the patch size. But I would be happy to if you think it is better that way.

@xavierleroy
Copy link
Contributor

xavierleroy left a comment

Looks good to me modulo the two remarks below.

fd
else
CSE.fundecl fd

let (++) x f = f x

This comment has been minimized.

@xavierleroy

xavierleroy Dec 15, 2017

Contributor

I would rather have this test done in module CSEgen, method fundecl.

@@ -143,6 +143,7 @@ fundecl:
LPAREN FUNCTION fun_name LPAREN params RPAREN sequence RPAREN
{ List.iter (fun (id, ty) -> unbind_ident id) $5;
{fun_name = $3; fun_args = $5; fun_body = $7; fun_fast = true;
fun_fast_compile = true;
fun_dbg = debuginfo ()} }

This comment has been minimized.

@xavierleroy

xavierleroy Dec 15, 2017

Contributor

This fun_fast_compile field looks like a leftover from the earlier version of this GPR.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Dec 28, 2017

I'll be happy to merge once my two remarks above are addressed.

@chambart chambart force-pushed the chambart:no_cse_for_toplevel_function branch 2 times, most recently from 86a0606 to ca7a139 Jan 9, 2018

@chambart

This comment has been minimized.

Copy link
Contributor Author

chambart commented Jan 9, 2018

Back from holiday with some fixes. I rebased it on trunk.
If it is still considered for 4.06.1 I will make a branch for it. (only the last commit updating Changes has to be modified)

@xavierleroy
Copy link
Contributor

xavierleroy left a comment

On reviewing again, I think your use of Reduce_code_size makes no sense: you're setting Reduce_code_size when you want to optimize for speed, and not setting it when you want to optimize for size. I'm confused.

Changes Outdated
- MPR#7630, GPR#1455: Disable CSE for the initialization function
(Pierre Chambart, report by Emilio Jesús Gallego Arias,
review by Gabriel Scherer)

This comment has been minimized.

@xavierleroy

xavierleroy Jan 9, 2018

Contributor

Can I please be listed as reviewer?

This comment has been minimized.

@chambart

chambart Jan 19, 2018

Author Contributor

Sorry for forgetting that too.

[ Reduce_code_size ]
else
[]
in

This comment has been minimized.

@xavierleroy

xavierleroy Jan 9, 2018

Contributor

It feels like you're reversing the logic. "optimize for speed" means DO NOT reduce the code size.

This comment has been minimized.

@chambart

chambart Jan 19, 2018

Author Contributor

Sorry about that. Of course I reversed it. Thank you so much for noticing.

@@ -311,7 +311,7 @@ let rec linear i n =
let fundecl f =
{ fun_name = f.Mach.fun_name;
fun_body = linear f.Mach.fun_body end_instr;
fun_fast = f.Mach.fun_fast;
fun_fast = List.mem Cmm.Reduce_code_size f.Mach.fun_codegen_options;
fun_dbg = f.Mach.fun_dbg;

This comment has been minimized.

@xavierleroy

xavierleroy Jan 9, 2018

Contributor

Here you're reversing the logic again. "Reduce code size" means "fun_fast = false", not "fun_fast = true".

@chambart

This comment has been minimized.

Copy link
Contributor Author

chambart commented Jan 19, 2018

By the way @xavierleroy, would you be want for coherence fun_fast to be renamed to something related to code size too ?

@xavierleroy
Copy link
Contributor

xavierleroy left a comment

Looks good to me!

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Mar 2, 2018

@chambart there is a conflict on Changes related to another of your GPRs, namely #1401. I would fix it myself except that I'm not sure where #1401 was committed: 4.06? 4.06.1? trunk? Could you please have a look? Thanks!

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Mar 2, 2018

Concerning the fun_fast name: it's historical baggage, and I don't think it's worth renaming it if is remains a boolean flag, like today. What could make sense for future developments is to carry a list of compilation options all the way to the asm emitters, like this PR does for C-- functions.

@xavierleroy xavierleroy removed this from the 4.06.1 milestone Mar 2, 2018

@xavierleroy xavierleroy added this to the 4.07 milestone Mar 2, 2018

@chambart chambart force-pushed the chambart:no_cse_for_toplevel_function branch from be0daf6 to feb7538 Mar 5, 2018

@chambart

This comment has been minimized.

Copy link
Contributor Author

chambart commented Mar 5, 2018

#1401 was merged in trunk too. It is an improvement independently of that one.

I rebased for trunk. I'm waiting for CI to approve before merging.

I'm pushing the flag propagation to assembly on my todo list.

@mshinwell

This comment has been minimized.

Copy link
Contributor

mshinwell commented Mar 15, 2018

CI has now passed and @xavierleroy has approved, so I'm merging this.

@mshinwell mshinwell merged commit 69fb40b into ocaml:trunk Mar 15, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.