Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for new language features (please provide feedback) #43

Closed
shazqadeer opened this issue Mar 19, 2016 · 4 comments
Closed

Proposal for new language features (please provide feedback) #43

shazqadeer opened this issue Mar 19, 2016 · 4 comments

Comments

@shazqadeer
Copy link
Collaborator

Paul’s work on NodeService exposed a performance problem with excessive cloning in the P runtime. I am noting here a language-level proposal to combat the problem. There are two features I am proposing to add. I am looking for feedback for each feature along three axes: (1) is it useful? (2) is it too restrictive? (3) are there any gotchas?

First, I want to make arguments to functions immutable. If the programmer wants to mutate a parameter, she must make a copy of the parameter into a local variable. With this design, any procedure-local l-value expression passed as an argument to a function call does not have to be cloned; rather the parameter can create an alias. The type checker should disallow global l-value expressions to be passed as a parameter to a function since global variables could be modified anywhere and consequently aliasing would be problematic.

Second, I want to allow a procedure-local l-value expression of a complex type to be passed by reference. If a parameter x is passed by reference, then the contents of x can be mutated (x still cannot be mutated). Of course, this mutation is visible as a side-effect in the caller in the l-value that was passed by reference. The type checker must make sure that only a local variable or a ref parameter l-value is passed as an argument to a function in place of a ref parameter.

Examples:

fun A1(x: int) {
x = 42; // disallowed
}

fun A2(ref x: int) { ... } // disallowed since the type of x is a primitive type

fun A3(ref x: seq[int]) {
x[0] = 42; // allowed
}

fun B(x: seq[int]) {
A3(x); // disallowed
}

fun C() {
var x: seq[int];
x += (0,0);
A3(x); // allowed
assert x[0] == 42; // passes
}

fun D(ref x: seq[int]) {
A3(x); // allowed
}

@ankushdesai
Copy link
Member

What about passing global l-values to static functions ?
One of the general use case of global static functions is to perform common
computations, and you might want to pass global l-values to the static
functions.

On Sat, Mar 19, 2016 at 3:41 PM Shaz Qadeer notifications@github.com
wrote:

Paul’s work on NodeService exposed a performance problem with excessive
cloning in the P runtime. I am noting here a language-level proposal to
combat the problem. There are two features I am proposing to add. I am
looking for feedback for each feature along three axes: (1) is it useful?
(2) is it too restrictive? (3) are there any gotchas?

First, I want to make arguments to functions immutable. If the programmer
wants to mutate a parameter, she must make a copy of the parameter into a
local variable. With this design, any procedure-local l-value expression
passed as an argument to a function call does not have to be cloned; rather
the parameter can create an alias. The type checker should disallow global
l-value expressions to be passed as a parameter to a function since global
variables could be modified anywhere and consequently aliasing would be
problematic.

Second, I want to allow a procedure-local l-value expression of a complex
type to be passed by reference. If a parameter x is passed by reference,
then the contents of x can be mutated (x still cannot be mutated). Of
course, this mutation is visible as a side-effect in the caller in the
l-value that was passed by reference. The type checker must make sure that
only a local variable or a ref parameter l-value is passed as an argument
to a function in place of a ref parameter.

Examples:

fun A1(x: int) {
x = 42; // disallowed
}

fun A2(ref x: int) { ... } // disallowed since the type of x is a
primitive type

fun A3(ref x: seq[int]) {
x[0] = 42; // allowed
}

fun B(x: seq[int]) {
A3(x); // disallowed
}

fun C() {
var x: seq[int];
x += (0,0);
A3(x); // allowed
assert x[0] == 42; // passes
}

fun D(ref x: seq[int]) {
A3(x); // allowed
}


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#43

@paulthomson
Copy link

I like this train of thought, as it does seem like there will be performance issues with the current approach. However, I am not sure about these proposals.

Making function arguments immutable seems inconsistent with other languages. Further, disallowing global variables to be passed as function parameters seems like a very odd restriction from the programmer's point of view. The "everything is a value" semantics are simple to grasp (even for Java/C# programmers) and likely familiar to C++ programmers. In contrast, these restrictions seem unintuitive and potentially frustrating.

I like the pass-by-reference proposal. However, it also feels unintuitive because we have primitive and complex types as values, plus complex types as references (but only as parameters). In C#, complex types are references by default and the ref keyword allow functions to change the value of the parameter (that points to the complex type). I don't feel that pass-by-reference in C++ is similar either. Thus, I would hesitate to add it without considering alternative approaches. What if ref allowed the function to modify the (caller's) parameter value itself (like in C# and C++), so that primitive and complex types could be passed by reference?

As an alternative (or in addition), I think it would be very interesting to explore storing values as immutable heap objects (they are already on the heap anyway, right?) and using copy-on-write and some straightforward optimizations. However, I understand that this would be a big change. I think it would require reference counting. However, I assume there could be a nice optimization where, if an object only has one reference, then it could be mutated; once an object has more than one reference, then it must be copied when being "mutated". This would somewhat alleviate the performance concern of creating a default tuple/sequence and then setting/adding each field/element. Also, say you wanted to sort a list:

{
  var seq1 : seq[int];
  ...
  seq1 = sorted(seq1);
  ...
}

Perhaps the compiler/runtime could keep the reference count of seq1 as 1 inside the body of sorted (by clearing the value of seq1 before entering the function due to the fact that it is about to be overwritten). Thus, sorted would mutate the sequence directly, with no copying.

@shazqadeer
Copy link
Collaborator Author

Thanks for the feedback. Based on the feedback received, I would like to revise the proposal on two counts:

  1. Allow l-value expression derived from a global variable g to be passed as a parameter to function f if g is not modified by f.
  2. Match the ref semantics provided by a language such as C#. So, even parameters of primitive types can be passed by reference. Any parameter passed by reference can be modified, either contents or entirely. The modification would be visible to the caller. If an l-value expression derived from a global variable g is passed as a ref parameter to function f, then g must neither be read nor modified by f (to avoid confusion by having two different names refer to the same piece of memory).

It may be possible to unify the restriction regarding global variables in 1 and 2 above by simply saying that a global variable g is inaccessible (neither read nor written) in f if an l-value derived from g is passed as a parameter to f.

I also find interesting the other proposal, copy-on-write and reference counting, in Paul's response. The advantage of this proposal is that the language does not have to change at all. However, there are a few disadvantages as well:

  1. The performance characteristic is not as transparent. The programmer would likely not be able to predict when cloning would happen.
  2. If sharing is done across machines, then thread-safety of reference counting has to be considered. Perhaps sharing across machine should not be done. But that is limiting because the idea of ownership transfer with parameters could also be applied to payloads that ship across machines (an idea mentioned by Ankush in a separate email thread).

@shazqadeer
Copy link
Collaborator Author

Closing this one. I am currently implementing support for these features in P but the design is sufficiently different that we would need a separate issue for the new design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants