Functions: full type inference #29

sharkdp · 2022-09-25T14:59:10Z

It would be very cool if we could infer function parameter types and return types. For example, given hypothetical code like

fn g(t: Time) -> Speed = …

fn f(d, t) = (d + 1 meter) / g(t)

we could easily infer from d + 1 meter that d: Length, and from g(t) that t: Time. Furthermore, we could compute the return type to be Length / Speed =Time, leading to an inferred type of

fn f(d: Length, t: Time) -> Time

This also relates to generics (#25), as some (most?) functions would probably have inferred generic types. For example, in the absence of any annotations,

kineticEnergy(m, v) = 1/2 * m * v^2

would have an inferred type of

kineticEnergy<T0, T1>(m: T0, v: T1) -> T0 * T1^2 = 1/2 * m * v^2

Possible constraints

Addition, Subtraction, Conversion: these binary operators act as constraints on their operands. For example, an addition like
```
expr1 + expr2
```
imposes a simple [expr1] == [expr2] constraint. Similarly, expr1 -> expr2 also forces expr1 to have the same dimension as expr2.
Function application: A function call like
```
f(expr1, expr1, …, exprN)
```
imposes N constraints, one for each argument: [exprK] == [parameter k of f]. A simple example would be something like
```
fn normalize_len(l: Length) -> Length = …
fn f(d, alpha) = normalize_length(d) * sin(alpha)
```
where we could use the function calls to easily get the types of d and alpha.
Exponentiation: Since the exponent in an expression like
```
expr1 ^ expr2
```
always needs to be a scalar, we can infer [expr2] == 1. In general, we can not assume that expr1 is also a scalar (meter^2 is perfectly fine, for example). So there is also no constraint for expr1.

Formalization

Given a (partially annotated) function definition like one of these (Cx is a concrete type):

f(x0, x1, x2, …) = …
f(x0, x1: C1, x2, …) = …
f(x0, x1, x2, …) -> C_ret = …
f(x0, x1: C1, x2, …) -> C_ret = …
…

We start by filling in blanks with free type variables T0, T1, … and T_ret:

f(x0: T0, x1: T1, x2: T2, …) -> T_ret = …
f(x0: T0, x1: C1, x2: T2, …) -> T_ret = …
f(x0: T0, x1: T1, x2: T2, …) -> C_ret = …
f(x0: T0, x1: C1, x2: T2, …) -> C_ret = …
…

Next, we walk the AST of the right hand side expression and record all constraints that we discover (see above). For example, given a declaration like f(x0, x1, x2) = sqrt((x0 · x2² + second) · second / x1²) -> meter, we would identify:

T0 · T2^2 == Time (from x0 · x2² + second)
((T0 · T2^2) · Time / T1²)^(1/2) == T0^(1/2) · T1 · T2 · Time^(1/2) == Length (from the -> conversion operator)

These constraints would then have to be converted to a system of linear equations on the exponents.

  1 * exponents0 + 0 * exponents1 + 2 * exponents2 == (1, 0)
1/2 * exponents0 + 1 * exponents1 + 1 * exponents2 == (1, -1/2)

where exponents_i and the right hand sides would be vectors with one rational number entry for each of the D physical dimensions. In total, this is a set of N_constraints · D equations, where the left-hand side will be the same regardless of the physical dimension.

Examples

Unique solution

Example 1

g(t: Length) -> Length = …
f(x, y, z) = g(sqrt(x)/y) * (y + 2 * second) * 2^z

From 2^z, we infer [z]: 1.
From y + 2 * second, we infer [y]: Time
From g(sqrt(x)/y), we infer sqrt(x)/y: Length, i.e. sqrt(x): Length · Time or x: Length² · Time².
Finally, there return type can be computed as Length · Time

f(x: Length² · Time², y: Time, z: 1) -> Length · Time = g(sqrt(x)/y) * (y + 2 * second) * 2^z

Example 2

f(x) = x + x^2

should be inferred as

f(x: Scalar) -> Scalar

Example 3

(from https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-391.pdf)

f(x,y,z) = x*x + y*y*y*y*y + z*z*z*z*z*z

should ideally be inferred as

f<T>(x: T^15, y: T^6, z: T^5) -> T^30

Underdetermined

For a function definition like

f(x, y) = x + y^3

we could infer one of those (equivalent) types:

f(x: T, y: T^(1/3)) -> T = x + y^3
f(x: T^3, y: T) -> T^3 = x + y^3

Overdetermined

A function like

f(x) = (x + meter)/(y + second)

should lead to a type check error.

The text was updated successfully, but these errors were encountered:

sharkdp · 2022-09-25T20:43:54Z

As so often, after understanding and thinking about the problem, I'm finally able to properly search for existing work. I found this:

Wand, Mitchell, and Patrick O'Keefe. "Automatic Dimensional Inference." Computational Logic-Essays in Honor of Alan Robinson. 1991 (https://www.ccis.northeastern.edu/home/wand/papers/dimensions.ps)

and this:

Programming Languages and Dimensions, Andrew John Kennedy, University of Cambridge, Computer Laboratory, 1996 (https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-391.pdf)

sharkdp · 2022-09-26T13:48:33Z

Recursion can lead to more interesting constraints for the return type, e.g.

f(x)=meter^2/f(x-1)

sharkdp · 2023-07-21T06:20:31Z

If we don't fully implement this at the beginning, we could think about doing the following — just to make it possibly for users to write simple function definitions without having to add type annotations:

Infer completely generic types for those functions, i.e. f<T1, T2, …>(x: T1, y: T2, …)
Leave the return type unspecified
~~Defer type checking of those functions to the usage site (similar to how C++ templates work)~~

This would allow a user to define e.g. fn f(x) = x + 1 meter. We would give it a generic x: T type and defer type checking (because it would fail with T != Length). Then, if it's used with f(1 inch), everything is fine. And if it's used with f(1 second), we would show a type error.

Edit: This is implemented now

sharkdp · 2024-06-04T17:32:26Z

This is now fully supported.

sharkdp added the type-checker label Sep 25, 2022

sharkdp mentioned this issue Sep 25, 2022

Functions: generics / parametric polymorphism #25

Closed

sharkdp mentioned this issue Dec 27, 2022

Interesting challenges #38

Open

sharkdp added the feature-parity label Feb 15, 2023

sharkdp mentioned this issue Feb 15, 2023

Feature-parity #57

Closed

19 tasks

sharkdp added this to the Feature parity milestone Feb 15, 2023

sharkdp modified the milestones: Feature parity, public release Jul 10, 2023

sharkdp added a commit that referenced this issue Jul 21, 2023

Allow non-annotated parameter types, see #29

53aa753

sharkdp mentioned this issue Sep 8, 2023

Interesting resources #20

Open

sharkdp changed the title ~~Functions: type inference~~ Functions: full type inference May 18, 2024

sharkdp mentioned this issue May 18, 2024

Type system extensions #433

Closed

sharkdp self-assigned this May 21, 2024

This was referenced May 24, 2024

Typed holes #437

Closed

Full type inference, list datatype #443

Merged

sharkdp closed this as completed in #443 Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Functions: full type inference #29

Functions: full type inference #29

sharkdp commented Sep 25, 2022 •

edited

Loading

sharkdp commented Sep 25, 2022 •

edited

Loading

sharkdp commented Sep 26, 2022 •

edited

Loading

sharkdp commented Jul 21, 2023 •

edited

Loading

sharkdp commented Jun 4, 2024

Functions: full type inference #29

Functions: full type inference #29

Comments

sharkdp commented Sep 25, 2022 • edited Loading

Possible constraints

Formalization

Examples

Unique solution

Example 1

Example 2

Example 3

Underdetermined

Overdetermined

sharkdp commented Sep 25, 2022 • edited Loading

sharkdp commented Sep 26, 2022 • edited Loading

sharkdp commented Jul 21, 2023 • edited Loading

sharkdp commented Jun 4, 2024

sharkdp commented Sep 25, 2022 •

edited

Loading

sharkdp commented Sep 25, 2022 •

edited

Loading

sharkdp commented Sep 26, 2022 •

edited

Loading

sharkdp commented Jul 21, 2023 •

edited

Loading