Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce cmx sizes by sharing variable names #1627

Merged
merged 8 commits into from Apr 9, 2018

Conversation

Projects
None yet
5 participants
@lpw25
Copy link
Contributor

lpw25 commented Feb 22, 2018

This PR implements the first of the three optimisations in:

#1343

In particular, it makes sure that we get good sharing for the variable names generated by flambda. It takes a slightly different approach than the previous patch. The earlier one used a variant type for variable names, this one keeps it as a string but declares all the internal name in a single location and removes unnecessary uses of ^. It produces a slightly better reduction in size than this part of the original patch. It is still only ~5% improvement, but that's not to be sniffed at and there isn't much additional complexity.

@lpw25 lpw25 changed the title Reduce cmx names Reduce cmx sizes by sharing variable names Feb 22, 2018

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Feb 22, 2018

It is not clear to me that using well-shared strings is nicer than using constructors; I would have thought the converse, given that constructors enforce the sharing (instead of respecting it by convention), can be pattern-matched over, and more clearly expose the structure of the data (for example, Block_symbol_get of int clearly exposes a family of names, which is nicer than using "block_symbol_get_" ^ pos_str). What are your reasons to prefer shared strings?

@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Feb 22, 2018

My main reason is that I intend to replace all uses of Ident.t after the typedtree with Variable.t, and so I don't want Variable.t to be flambda-specific. However, it occurs to me that when I do that I could make an flambda-specific variable that is = private Variable.t and get the same benefits, whilst allowing me to enforce that the name strings are shared by requiring them to come from Internal_variable_names. I could even add a functor to Variable for creating these private types and use it to create specific instances for each IR in the compiler, which would be nice.

That would address the problem of enforcing sharing via typing rather than convention. As for the other points: I disagree that pattern matching would be useful, or that the structure is better exposed because we want Block_symbol_get foo to behave identically to String ("block_symbol_get_" ^ (int_of_string foo)) otherwise the name is being used as something other than a name.

@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Feb 22, 2018

Also, using a string representation improves the size of Variable.ts who's name comes from the source, which is the common case.

@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Feb 22, 2018

Ok, I've changed it so that Internal_variable_names has a type t = private string, and Variable.create requires an Internal_variable_name.t instead of a string. That should enforce sharing.

lpw25 and others added some commits Nov 27, 2017

Share internal variable names
Add the [Internal_variable_names] which includes shared definitions of
all the strings used for variable names that do not come from existing
identifiers. The motivation for this change is to reduce the size of cmx
files by removing redundant repetitive strings from variable names.
Reduce size of symbols
Represent symbols created from variables directly. This postpones
creating the symbol label until the back-end, which avoids storing
needlessly large strings in the cmx files.
Give Internal_variable_names their own type
This should make sure we don't use an unshared string
as a variable name.

@lpw25 lpw25 force-pushed the lpw25:reduce-cmx-names branch from 8685bd5 to d8eafcd Apr 6, 2018

@@ -16,9 +16,6 @@

[@@@ocaml.warning "+a-4-9-30-40-41-42"]

let rename_var var =
Mutable_variable.create
(Variable.unique_name var)
(* Variable.rename var *)
(* ~current_compilation_unit:(Compilation_unit.get_current_exn ()) *)

This comment has been minimized.

@xclerc

xclerc Apr 6, 2018

Contributor

These floating comments are probably a left-over after the deletion of rename_var.

@xclerc

This comment has been minimized.

Copy link
Contributor

xclerc commented Apr 6, 2018

Looks good to me. Unless I am mistaken, the only observable
change is that some identifiers will be slightly different (e.g. will
lose their _{copied,original} suffix), but it should not be a problem.

mshinwell added some commits Apr 9, 2018

@mshinwell

This comment has been minimized.

Copy link
Contributor

mshinwell commented Apr 9, 2018

I will vouch for @xclerc 's review on this one (@lpw25 has looked at this too, of course, after it was originally written).

I have fixed the conflicts and run tests (both with and without flambda) on my own machine, so merging.

@mshinwell mshinwell merged commit bc9f032 into ocaml:trunk Apr 9, 2018

0 of 2 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.