Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow annotating a function as pure through comments #5614

Closed
jfmengels opened this issue Aug 10, 2022 · 5 comments · Fixed by #5616
Closed

Allow annotating a function as pure through comments #5614

jfmengels opened this issue Aug 10, 2022 · 5 comments · Fixed by #5616

Comments

@jfmengels
Copy link

jfmengels commented Aug 10, 2022

Hello 👋

First of all, thanks for this amazing project ❤️

I would quite like to re-suggest a proposal made in #5234 by @skyrpex, which has been closed but that I think would be quite valuable.

I'll quote the original proposal:

Currently, the /* @PURE */ annotation must be used individually on every returned value for a given call.

Could it be possible to allow using the annotation on a function, so all its executions and returned values are considered pure? Like in the following.

/**
 * The function is pure, so if the result is not used, it can be removed.
 * 
 * @__PURE__
 */
function addTwo(number) {
    return number + 2;
}

// Will be removed
addTwo(4);

// Will be removed
const x = addTwo(3);

// This will be kept
export const y = addTwo(2);

Background and motivation

The original discussion didn't go very much into motivations for the issue, so let me add my thoughts (and others can then add their own) for allowing annotating functions as pure through comment annotations, and not only through the CLI flags.

My current use-case is for compiled languages. There are a few languages that compile down to JavaScript, including Elm. Elm is a pure functional language, meaning that ALL the functions defined in user code and in packages are pure.

At the moment, Elm recommends running UglifyJS with the following command:

uglifyjs elm.js --compress 'pure_funcs=[F2,F3,F4,F5,F6,F7,F8,F9,A2,A3,A4,A5,A6,A7,A8,A9],pure_getters,keep_fargs=false,unsafe_comps,unsafe' | uglifyjs --mangle --output elm.min.js

As you see in the command above, there are a few special functions that it indicates as being pure, which helps uglify remove a number of functions, but it doesn't annotate all of the other functions which could end up remaining when they could have been removed.

If the compiler knows that a function is pure, it could quite well annotate the function as such through a comment as suggested above, to get all the benefits of pure_funcs.

Current alternatives

1. Specify all pure functions through the configuration

The current approach that is being recommended and let to the closing of the previous issue was to add the list of pure functions to the configuration.

In the case of compiling to JS, this is not feasible in practice. It would be a very large list and it would depend on the contents of the compiled code. Meaning you'd have to analyze the compiled JS code to figure out which functions are pure, and then run UglifyJS with the constructed configuration. It's possible, but it would require additional tooling.

2. Annotate all assignments

What a compiler could do is annotate every assignment like var a = /*@__PURE__*/ foobar();. It's a lot more comments, but it could work reasonably well. At least for a compiled language with control over the entire source code.

Benefits

Being able to annotate functions as pure would make it really easy to prune unnecessary computations. It would be really useful for languages that compile down to JavaScript.

Potentially, this could be used for JavaScript frameworks and/or library authors as well. Because those can provide some pure functions, which they can choose to annotate, but they don't have the capability to annotate the function calls that users of their library make. That makes alternative 2 not possible.

Alternative 1 is also out of scope for that use-case. You don't see libraries like React say "please add this list of functions to pure_funcs when you minify your project" in their documentation. But with comment annotations, they could get this for free, improving dead-code elimination by quite a bit, reducing bundle size and increasing speed.

This would also remove the need for Elm to tell people to run UglifyJS using an obscure list of pure_funcs, and could leave that as a behind-the-scenes detail.

Also, I don't know whether annotating a function as pure in comments would solve the issue that Elm warns about in its instructions, but that would be a nice win if it was the case.

Drawbacks

I don't know much about the implementation of UglifyJS and similar projects. I can imagine that it's not built to expect thousands of functions being considered as pure (as I don't imagine that people passed very large arrays of pure funcs through the configuration), and that could lead to performance issues. But maybe it's fast, I don't know.

I can also imagine that it's going to be annoying for library authors to have to think about "should I add a pure annotation here?", because that's an extra task that may be requested from them by users.

Afterword

I hope you'll consider this feature again. As pointed at by @aleclarson in #5234 (comment), other tooling also seem interested in this feature but are basically waiting for this project to enable it, to avoid fragmentation in this tooling space.

@alexlamsl
Copy link
Collaborator

The suggested feature will make diagnostic more difficult as it is not standard JavaScript (I've already been getting so many self-proclaimed bug as is).

Moreover, to take your example:

$ cat 5614.js
function addTwo(number) {
    return number + 2;
}
addTwo(4);
const x = addTwo(3);
export const y = addTwo(2);
$ uglifyjs 5614.js --module --compress --mangle
var r=4;export{r as y};

As you can see, no pure_funcs is necessary. Here's another example with TypeScript-transpiled enum:

$ cat enum.js
/*
enum Direction {
    North = 1,
    East,
    South,
    West,
}
*/
var Direction;
(function(Direction) {
    Direction[Direction["North"] = 1] = "North";
    Direction[Direction["East"] = 2] = "East";
    Direction[Direction["South"] = 3] = "South";
    Direction[Direction["West"] = 4] = "West";
})(Direction || (Direction = {}));

console.log(Direction[2], Direction.West);
$ uglifyjs enum.js --module -mc passes=10
console.log("East",4);

In short, it is more constructive for uglify-js to learn about these complex patterns and to optimise them away automatically than to workaround them. If you can illustrate the kind of code Elm generates that currently relies on pure_funcs we can tackle those as well.

@jfmengels
Copy link
Author

The suggested feature will make diagnostic more difficult as it is not standard JavaScript (I've already been getting so many self-proclaimed bug as is).

I'm not sure I follow. How is this proposal different from the already existing pure_funcs? Also, I am not aware of the bugs you're referring to that may be relevant to the discussion.

In short, it is more constructive for uglify-js to learn about these complex patterns and to optimise them away automatically than to workaround them. If you can illustrate the kind of code Elm generates that currently relies on pure_funcs we can tackle those as well.

I would still argue that this could be valuable as I don't see this as a workaround, but as hints for the tool, so that it doesn't have to infer it.

Also, as mentioned before, all Elm functions are pure without side-effects through the design of the language, meaning that we can give that information very easily, whereas the minifier is unlikely to be able to catch everything.

More importantly, while the Elm functions are pure, when they are compiled to JavaScript, they could result in code that may trigger side-effects based on the inputs.

For instance, let's take this function:

function returnZ(a) {
  return a.z;
}

While this looks like a pure function, if you pass it an argument like { get z() { /* ... */ } }, the call of returnZ may trigger a side-effect. Meaning you can't remove this function call unless you have pure_getters=true.

But in Elm code, there are no getters or any similar feature (proxies, ...) that makes code behavior in "hidden/non-obvious" ways based on a dynamic environment.

As you'll see in the examples, the compiled Elm output is wrapped in an IIFE so that no function defined outside of Elm code can use the Elm functions in unexpected ways. Meaning these functions will never be able to be called unexpectedly with arguments that will trigger side-effects.

I think it's going to be extremely hard for this tool and similar ones to be able to re-figure out that all compiled Elm functions are pure, because things are more uncertain in JavaScript.

Here are a few functions that Elm generates. In every example, if the minifier was knowledgeable enough, the output should be empty because nothing happens except the creation of unused data.

Used configuration
{
  parse: {
    bare_returns     : false,
    ecma             : 8,
    expression       : false,
    filename         : null,
    html5_comments   : true,
    shebang          : true,
    strict           : false,
    toplevel         : null
  },
  compress: {
    arrows           : true,
    booleans         : true,
    collapse_vars    : true,
    comparisons      : true,
    computed_props   : true,
    conditionals     : true,
    dead_code        : true,
    drop_console     : false,
    drop_debugger    : true,
    ecma             : 5,
    evaluate         : true,
    expression       : false,
    global_defs      : {},
    hoist_funs       : false,
    hoist_props      : true,
    hoist_vars       : false,
    ie8              : false,
    if_return        : true,
    inline           : true,
    join_vars        : true,
    keep_classnames  : false,
    keep_fargs       : true,
    keep_fnames      : false,
    keep_infinity    : false,
    loops            : true,
    negate_iife      : true,
    passes           : 1,
    properties       : true,
    pure_getters     : "strict",
    pure_funcs       : ["F2","F3","F4","F5","F6","F7","F8","F9","A2","A3","A4","A5","A6","A7","A8","A9"],
    reduce_funcs     : true,
    reduce_vars      : true,
    sequences        : true,
    side_effects     : true,
    switches         : true,
    top_retain       : null,
    toplevel         : false,
    typeofs          : true,
    unsafe           : false,
    unsafe_arrows    : false,
    unsafe_comps     : false,
    unsafe_Function  : false,
    unsafe_math      : false,
    unsafe_methods   : false,
    unsafe_proto     : false,
    unsafe_regexp    : false,
    unsafe_undefined : false,
    unused           : true,
    warnings         : false
  },
  mangle: false,
  output: {
    ascii_only       : false,
    beautify         : true,
    bracketize       : false,
    comments         : /@license|@preserve|^!/,
    ecma             : 5,
    ie8              : false,
    indent_level     : 4,
    indent_start     : 0,
    inline_script    : true,
    keep_quoted_props: false,
    max_line_len     : false,
    preamble         : null,
    preserve_line    : false,
    quote_keys       : false,
    quote_style      : 0,
    safari10         : false,
    semicolons       : true,
    shebang          : true,
    source_map       : null,
    webkit           : false,
    width            : 80,
    wrap_iife        : false
  },
  wrap: false
}

Record update

Elm has record updates, similar to using the spread operators on an object in JavaScript. Elm currently compiles down to ES5, so there is no spread operator to use.

value = { a = "Hello", b = "World" }
-- Similar to { ...value, b: 'alex' }
unused = { value | b = "Alex" }

⬇️ Elm compilation

(function() {
var value = {a: 'Hello', b: 'World'};
var unused = _Utils_update(value, {b: 'Alex'});

function _Utils_update(oldRecord, updatedFields)
{
	var newRecord = {};

	for (var key in oldRecord)
	{
		newRecord[key] = oldRecord[key];
	}

	for (var key in updatedFields)
	{
		newRecord[key] = updatedFields[key];
	}

	return newRecord;
}
})()

⬇️ Minification

!function() {
    !function(oldRecord, updatedFields) {
        var newRecord = {};
        for (var key in oldRecord) newRecord[key] = oldRecord[key];
        for (var key in updatedFields) newRecord[key] = updatedFields[key];
    }({
        a: "Hello",
        b: "World"
    }, {
        b: "Alex"
    });
}();

Currying: AX and FX functions

The F2 through F9 and A2 through A9 functions that Elm generates (and that Elm asks users to annotate as pure_funcs when calling UglifyJS) are for currying purposes. The FX functions make a function "curried" while the AX functions are there to speed up calling a curried function with the ideal number of arguments. THe details don't matter, but in practice, they're pure functions.

(function() {
function F(arity, fun, wrapper) {
  wrapper.a = arity;
  wrapper.f = fun;
  return wrapper;
}

function F2(fun) {
  return F(2, fun, function(a) { return function(b) { return fun(a,b); }; })
}

function _Utils_eq(x, y)
{
	for (
		var pair, stack = [], isEqual = _Utils_eqHelp(x, y, 0, stack);
		isEqual && (pair = stack.pop());
		isEqual = _Utils_eqHelp(pair.a, pair.b, 0, stack)
		)
	{}

	return isEqual;
}

var _Utils_equal = F2(_Utils_eq);

})()

The code above only defines a _Utils_equal function that can be called like _Utils_equal(a, b) or like var equalsA = _Utils_equal(a); equalsA(b);.

Without tagging F2 as pure, the code remains in some form. With tagging, we have an empty output.

So, because a lot of Elm function calls happen using these AX functions (A2 if you call a function with 2 arguments, A3 if you call it with 3 arguments, ...), and these have been tagged as pure through CLI flags, an unused var unused = F2(fn, 1, 2) will be removed. I'd say that's quite dangerous without knowing what fn is, but it works well for Elm. I can imagine this being a source for bugs as you mentioned.

I would have expected the call to be removed of both F2 and fn were tagged as pure, and I understand some reluctance to mark more functions as pure because of this.

So in Elm, the function calls that would remain are the function calls for 1 argument, which are compiled to f(a), not to A1(f, a).
Because f has not been tagged as pure, the function call will stay even if its value is unused. My proposal would help here.

Note that it will be likely not be possible (at least through this proposal) to indicate the purity spreads.
Because of currying, we can also do var equalsA = _Utils_equal(a); var unused1 = equalsA(b);. If we tagged _Utils_equal as pure, then var unused2 = _Utils_equal(a) would be removed. But this purity would not be shared to unused1. Unless the inference is crazy good.

Tail-call optimization

Elm converts some recursive functions into while loops, to avoid stack overflows and improve performance.

find : List Int -> Maybe Int
find values =
    case values of
        [] ->
            Nothing

        value :: rest ->
            if value > 5 then
                Just value

            else
                find rest

a = find [1, 2, 3]

⬇️ Elm compilation

var $elm$core$Maybe$Nothing = {$: 1};
var $elm$core$Maybe$Just = function (a) {
	return {$: 0, a: a};
};

// Lists in Elm are implemented as linked lists
// The `a` field is the value, `b` points to the next list element.
// Every list ends with `_List_Nil`
function _List_Cons(hd, tl) { return { $: 1, a: hd, b: tl }; }
var _List_Nil = { $: 0 };
function _List_fromArray(arr)
{
	var out = _List_Nil;
	for (var i = arr.length; i--; )
	{
		out = _List_Cons(arr[i], out);
	}
	return out;
}

var find = function (values) {
	find:
	while (true) {
		if (!values.b) {
			// List is empty
			return $elm$core$Maybe$Nothing;
		} else {
			var value = values.a;
			var rest = values.b;
			if (value > 5) {
				return $elm$core$Maybe$Just(value);
			} else {
				var $temp$values = rest;
				values = $temp$values;
				continue find;
			}
		}
	}
};
var unused = find(_List_fromArray([1, 2, 3]));

After minification, these operations remain.

Afterword

I hope this was useful to you. These were mostly functions that Elm generates and that are used everywhere (not including the tail-call one). User-made functions are usually generated differently and are more simple. I'm happy to provide more examples if need be.

I'm repeating myself but I doubt that this tool will be able to infer the purity of the Elm code as well as the Elm compiler, hence the value of inline pure_funcs annotation. If the compiler was able to annotate on its own that the AX and FX functions are pure, then that would also help usability of UglifyJS, as it wouldn't need to tell users to run with an obscure set of pure_funcs.

@alexlamsl
Copy link
Collaborator

Firstly, thanks for the examples. They all look possible improvements to existing compress options − I will start working through them after the next release.

Marking as enhancement to tackle the examples above without involving pure_funcs.

How is this proposal different from the already existing pure_funcs?

It differs in how obvious the deviation from standard JavaScript is − especially with how most users just blindly pull in third party libraries, a comment annotation would make diagnosing "surprising" behaviour a lot more difficult.

With the current form of /*#__PURE__*/ the non-standard behaviour is at least localised right next to the comment.

Also, I am not aware of the bugs you're referring to that may be relevant to the discussion.

I appreciate you not being a maintainer of this project means not having to process all these off-topic, often cryptic issue reports − just go through the closed tickets that aren't "ufuzz failures", e.g. invalid input syntax, downstream hard-coded string replacement etc.

It is thus undesirable to introduce any features that would generate further usage confusions − comment annotations amongst top of the list.

@alexlamsl
Copy link
Collaborator

As demonstrated in #5616, the first two examples can be taken care of by default compress settings without #__PURE__ annotations or pure_funcs.

As for "Tail-call optimization", that find functions is actually not that "pure" at all:

var find = function (values) {
	find: while (true) {
		if (!values.b)
			// List is empty
			return $elm$core$Maybe$Nothing;
		else {
			var value = values.a;
			var rest = values.b;
			if (value > 5) {
				return $elm$core$Maybe$Just(value);
			} else {
				var $temp$values = rest;
				values = $temp$values;
				continue find;
			}
		}
	}
};
var values = { a: 0 };
values.b = values;
// stuck in infinite loop
find(values);

So this example illustrates why annotating #__PURE__ at each call site is much safer than the proposed behaviour, as it will result in a change in runtime behaviour that − especially when find is defined in some nested third party libraries − much harder to spot.

@jfmengels
Copy link
Author

I don't know if it's useful to you, but I'd just like to mention that you can't have data that reference each other (values referencing values, or a referencing b which also references b, etc.) because you can only achieve that using mutation, which is a feature that Elm doesn't have. Therefore you can't enter an infinite loop because of the data that you passed to the function like this (you can still manually create infinite loops/recursion in Elm, but not because of self-references).

alexlamsl added a commit to alexlamsl/UglifyJS that referenced this issue Aug 16, 2022
alexlamsl added a commit to alexlamsl/UglifyJS that referenced this issue Aug 16, 2022
alexlamsl added a commit that referenced this issue Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants