Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functions: full type inference #29

Closed
Tracked by #57
sharkdp opened this issue Sep 25, 2022 · 4 comments · Fixed by #443
Closed
Tracked by #57

Functions: full type inference #29

sharkdp opened this issue Sep 25, 2022 · 4 comments · Fixed by #443

Comments

@sharkdp
Copy link
Owner

sharkdp commented Sep 25, 2022

It would be very cool if we could infer function parameter types and return types. For example, given hypothetical code like

fn g(t: Time) -> Speed = …

fn f(d, t) = (d + 1 meter) / g(t)

we could easily infer from d + 1 meter that d: Length, and from g(t) that t: Time. Furthermore, we could compute the return type to be Length / Speed =Time, leading to an inferred type of

fn f(d: Length, t: Time) -> Time

This also relates to generics (#25), as some (most?) functions would probably have inferred generic types. For example, in the absence of any annotations,

kineticEnergy(m, v) = 1/2 * m * v^2

would have an inferred type of

kineticEnergy<T0, T1>(m: T0, v: T1) -> T0 * T1^2 = 1/2 * m * v^2

Possible constraints

  • Addition, Subtraction, Conversion: these binary operators act as constraints on their operands. For example, an addition like
    expr1 + expr2
    
    imposes a simple [expr1] == [expr2] constraint. Similarly, expr1 -> expr2 also forces expr1 to have the same dimension as expr2.
  • Function application: A function call like
    f(expr1, expr1, …, exprN)
    
    imposes N constraints, one for each argument: [exprK] == [parameter k of f]. A simple example would be something like
    fn normalize_len(l: Length) -> Length = …
    fn f(d, alpha) = normalize_length(d) * sin(alpha)
    
    where we could use the function calls to easily get the types of d and alpha.
  • Exponentiation: Since the exponent in an expression like
    expr1 ^ expr2
    
    always needs to be a scalar, we can infer [expr2] == 1. In general, we can not assume that expr1 is also a scalar (meter^2 is perfectly fine, for example). So there is also no constraint for expr1.

Formalization

Given a (partially annotated) function definition like one of these (Cx is a concrete type):

f(x0, x1, x2, …) = …
f(x0, x1: C1, x2, …) = …
f(x0, x1, x2, …) -> C_ret = …
f(x0, x1: C1, x2, …) -> C_ret = …
…

We start by filling in blanks with free type variables T0, T1, … and T_ret:

f(x0: T0, x1: T1, x2: T2, …) -> T_ret = …
f(x0: T0, x1: C1, x2: T2, …) -> T_ret = …
f(x0: T0, x1: T1, x2: T2, …) -> C_ret = …
f(x0: T0, x1: C1, x2: T2, …) -> C_ret = …
…

Next, we walk the AST of the right hand side expression and record all constraints that we discover (see above). For example, given a declaration like f(x0, x1, x2) = sqrt((x0 · x2² + second) · second / x1²) -> meter, we would identify:

  • T0 · T2^2 == Time (from x0 · x2² + second)
  • ((T0 · T2^2) · Time / T1²)^(1/2) == T0^(1/2) · T1 · T2 · Time^(1/2) == Length (from the -> conversion operator)

These constraints would then have to be converted to a system of linear equations on the exponents.

  1 * exponents0 + 0 * exponents1 + 2 * exponents2 == (1, 0)
1/2 * exponents0 + 1 * exponents1 + 1 * exponents2 == (1, -1/2)

where exponents_i and the right hand sides would be vectors with one rational number entry for each of the D physical dimensions. In total, this is a set of N_constraints · D equations, where the left-hand side will be the same regardless of the physical dimension.

Examples

Unique solution

Example 1

g(t: Length) -> Length = …
f(x, y, z) = g(sqrt(x)/y) * (y + 2 * second) * 2^z
  • From 2^z, we infer [z]: 1.
  • From y + 2 * second, we infer [y]: Time
  • From g(sqrt(x)/y), we infer sqrt(x)/y: Length, i.e. sqrt(x): Length · Time or x: Length² · Time².
  • Finally, there return type can be computed as Length · Time
f(x: Length² · Time², y: Time, z: 1) -> Length · Time = g(sqrt(x)/y) * (y + 2 * second) * 2^z

Example 2

f(x) = x + x^2

should be inferred as

f(x: Scalar) -> Scalar

Example 3

(from https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-391.pdf)

f(x,y,z) = x*x + y*y*y*y*y + z*z*z*z*z*z

should ideally be inferred as

f<T>(x: T^15, y: T^6, z: T^5) -> T^30

Underdetermined

For a function definition like

f(x, y) = x + y^3

we could infer one of those (equivalent) types:

f(x: T, y: T^(1/3)) -> T = x + y^3
f(x: T^3, y: T) -> T^3 = x + y^3

Overdetermined

A function like

f(x) = (x + meter)/(y + second)

should lead to a type check error.

@sharkdp
Copy link
Owner Author

sharkdp commented Sep 25, 2022

As so often, after understanding and thinking about the problem, I'm finally able to properly search for existing work. I found this:

Wand, Mitchell, and Patrick O'Keefe. "Automatic Dimensional Inference." Computational Logic-Essays in Honor of Alan Robinson. 1991 (https://www.ccis.northeastern.edu/home/wand/papers/dimensions.ps)

and this:

Programming Languages and Dimensions, Andrew John Kennedy, University of Cambridge, Computer Laboratory, 1996 (https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-391.pdf)

@sharkdp
Copy link
Owner Author

sharkdp commented Sep 26, 2022

Recursion can lead to more interesting constraints for the return type, e.g.

f(x)=meter^2/f(x-1)

@sharkdp sharkdp mentioned this issue Feb 15, 2023
19 tasks
@sharkdp sharkdp added this to the Feature parity milestone Feb 15, 2023
@sharkdp sharkdp modified the milestones: Feature parity, public release Jul 10, 2023
@sharkdp
Copy link
Owner Author

sharkdp commented Jul 21, 2023

If we don't fully implement this at the beginning, we could think about doing the following — just to make it possibly for users to write simple function definitions without having to add type annotations:

  • Infer completely generic types for those functions, i.e. f<T1, T2, …>(x: T1, y: T2, …)
  • Leave the return type unspecified
  • Defer type checking of those functions to the usage site (similar to how C++ templates work)

This would allow a user to define e.g. fn f(x) = x + 1 meter. We would give it a generic x: T type and defer type checking (because it would fail with T != Length). Then, if it's used with f(1 inch), everything is fine. And if it's used with f(1 second), we would show a type error.

Edit: This is implemented now

@sharkdp sharkdp changed the title Functions: type inference Functions: full type inference May 18, 2024
@sharkdp sharkdp self-assigned this May 21, 2024
This was referenced May 24, 2024
@sharkdp
Copy link
Owner Author

sharkdp commented Jun 4, 2024

This is now fully supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant