Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: type annotations docs #96

Closed
wants to merge 17 commits into from
Closed

WIP: type annotations docs #96

wants to merge 17 commits into from

Conversation

Fuco1
Copy link
Member

@Fuco1 Fuco1 commented Sep 9, 2018

See also #74.

docs/type-annotations.org Outdated Show resolved Hide resolved
#+END_SRC

The =declare= form starts with =elsa= followed by one or two values. If
two are provided, first must be a list which is the list of input
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to drop the parens around arguments and just have a sequence of types where the last one is automatically return type.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that would be awkward in case you want a void function or rather, ignore the return value. Requiring parentheses around arguments would model this naturally (nothing would follow them) whereas without them you'd always need to specify the last thing. Also, what if you have an empty argument list, but a meaningful return value (like, emacs-version)? How would you tell both apart without parentheses around the arguments? If you need a better separator of arguments and return value, I like : in the Wirth family, an arrow would be another option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can always inspect the argument list, so incase of (point) you know the int is the return value since there is no argument.

I have never considered void functions, such a thing does not really exist in Elisp does it? At the very least you always return nil or some such.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course there are void functions, everything you call for its side effect, like save-buffer (returns nil) or revert-buffer (returns t). What I'm pointing out is that we could avoid ambiguity with sexp-syntax.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A void function returns nothing. Every function in elisp returns something.

Doing (string-to-number (save-buffer)) would error. We don't like errors, therefore we must always consider the return type... the above code is silly but might be a result of a failed refactoring or a mistake and we want to emit a warning there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing (string-to-number (save-buffer)) would error.

I agree and my conclusion is that we should consider the void return type. The result of a function returning void should not be passed as argument to another function nor stored into a variable. When calling a void function is the last statement of a function, then that function itself returns void.

I agree with @Fuco1 that in ELisp, every function returns something. But not all functions are advertised as returning something: those functions should have the void return type.

Consider save-buffer for example and consider that no void return type exists. A static analysis could detect that this function returns a bool for example. Then, Elsa should accept code like this (if (save-buffer) ...). My opinion is that this is wrong because the implementation of save-buffer can change tomorrow to return an int instead. This will be an implementation detail until save-buffer docstring describes what the return value is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Void is nasty because it doesn't play well with the type algebra.

What would this function return?

(defun what-is-return-type ()
  (if (we-should-save-buffer) ;; bool
      (save-buffer) ;; void
    (return-time-since-last-save-in-seconds) ;; int
    ))

A type like void | int makes no sense because void basically poisons everything. If you would want to use such a function you would have to test the return type anyway (did I get an int back?). So then the type of save-buffer doesn't really matter.

It's a bit silly example, but the problem is that void is not a type in a sense every other type is a type (i.e. a set). Void is not even an empty set type, it's something outside the hierarchy.

This function

(defun forever-and-ever ()
  (while t t))

also returns void? Or is that something else.

I have this issue #68 to check for unused return values so I would rather go with a "this function should be only used for side-effect" warning but not on the type level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are much more knowledgeable than me in this area. Do what you think is best!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A type like void | int makes no sense because void basically poisons everything.

Some languages (Rust, OCaml) have unit types for this purpose (). In this example, (or void int) tells you that you may get a useful value (int) or you may get a useless one (void). You have to test the return type in order to use the int, because you cannot do anything with void. But that's the point of the type analysis: to make sure your code is correct in all situations.

If you would want to use such a function you would have to test the return type anyway (did I get an int back?). So then the type of save-buffer doesn't really matter.

But it does matter. Consider the alternative: (or bool int), in which case you may freely use the bool return value even though it's undocumented, and may change in the future as argued by @DamienCassou. If it changes to int, then if you previously made use of the bool value your code won't typecheck anymore, but there is value in an annotation to indicate "this value is not relevant and subject to change".

This function
(defun forever-and-ever ()
(while t t))
also returns void? Or is that something else.

Something else: it's the empty type (or bottom type). It indicates that this function never returns, which is different than void as argued above, which means returns something you cannot use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To complete: the empty type may be more tricky to deal with, but in practice may not be that useful. The void unit type however, though by not mean indispensable. is useful.

@alphapapa
Copy link
Contributor

This looks beautiful, Matus! Using the declare form feels so elegant and fits right into elisp! I only have a few suggestions, which I'll make inline...

#+BEGIN_SRC elisp
(defun drop-items (list &optional n)
"Drop first item of LIST or N items if N is provided."
(declare (elsa ((list a*) &optional int) (list a*)))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The &optional marker is not actually necessary since we can get that info from the function signature itself. I feel it might be a good idea to have it redundant for cross-checking, but it might prove more annoying than useful.

Same for &rest

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With &rest there's also the issue that we might want to wrap the type with (list) or assume that implicitly.

  • (elsa (&rest string) string implicit list
  • (elsa (&rest (list string)) string
  • (elsa (string) string) if we decide to get rid of &rest + implicit list
  • (elsa ((list string)) string) if we decide to get rid of &rest

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also &optional implies (or nil X)... I think it will be best to leave the markers in as it makes the whole thing read a lot easier.

Ideas?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just* make &optional and &rest optional? The extent to which they help with readability rather depends on the argument list at hand. For the simplest functions, they don't add much: something like

(defun make-csv-row (&rest strings) 
  (declare (elsa (string) string))
  (string-join strings ","))

reads perfectly naturally without a second &rest, but the redundant annotations become useful very quickly as the argument list's complexity grows. I would think providing them should be considered good style, in general.

​* "just" in the sense that "both!" is a simple answer to either/or questions; I don't have a clear sense of what the implementation difficulty would actually be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ambirdsall I kind of like this. I'm a bit afraid people will try to derive some extra significance out of it though. i.e. lack of &rest meaning something as opposed to being optional. Documentation helps of course but it's still unobvious at first.

But I think we can make it optional and see the reactions. We can always deprecate that usage and ask users to fill in missing types (e.g. Elsa itself can check this and provide an error)

@alphapapa
Copy link
Contributor

Ideas?

Sorry, could you clarify the question?

Also, I can't seem to reply in that conversation...

@Fuco1
Copy link
Member Author

Fuco1 commented Sep 14, 2018

@alphapapa Not really a question just observation on how it looks. &optional in the argument list does two things: you can leave the argument out, and/or you can also provide nil as a value which functionally acts the same (but the call is different---missing an argument). So one question is if we should require the users to write this down or make it implicit.

I think it could be implicitly understood by the presence of &optional in front of the type(s)

@Wilfred
Copy link

Wilfred commented Sep 23, 2018

Elegant lispy API which fits nicely with the rest of Emacs. It seems super readable to me :)

I like having the explicit &optional in the declaration FWIW.

I've never heard of intersection types before, do other languages have them?

@Fuco1
Copy link
Member Author

Fuco1 commented Sep 24, 2018

@Wilfred The intersection type is something used internally in the analysis. You would likely never write an annotation like that.

In dynamic languages expressions can be "anything" and so we need to have a way (i.e. "set theory") to compute the possible run-time domains.

You can read some more about it in the Flow (javascript static analysis) docs: https://flow.org/en/docs/types/intersections/

@Fuco1
Copy link
Member Author

Fuco1 commented Sep 25, 2018

I still have no good idea on how to represent things like records (cl-defstruct) and objects defclass. My idea so far is to have a "constructor" (struct <record-name) and (class <class-name>).

That might actually work fine. I don't think we can do with just using the record/class names.

It should also support generic classes, so something like (class a*) should also be supported.

The problem is that class is not a type constructor... the class name is the type constructor. We say (list int) and this means "list of integers", but saying (class elsa-type-int) does not mean "a class of elsa-type-ints"... further classes probably can have generic arguments as well themselves.

So we would mix a constructor semantic with some sort of "annotation" semantic.

By default types do not accept =nil= values. This helps preventing
errors where you would pass =nil= to a function expecting a value. To
mark a type nullable you can combine it with =nil= type using the [[id:5a21a68a-4df1-4d44-a854-1d9700858a1a][or]]
combinator: =(or string nil)= is a type which takes any string but also
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the rationale for dropping the ? suffix? I liked the fact that it was both expressive and short.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I like it and I would not oppose having it remain. It is a lot nicer to read int? instead of (or int nil).

I don't want to overdo it with the suffixes though. There will also be the generic a* syntax (or some other variant) and then it might matter on how we parse the suffixes (i.e. is a*? the same as a?* or does it even make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you can combine * and ? in the same type because what is before * is an abstract name that conveys no meaning by itself. My opinion is that a*? and a?* should trigger an error.

Copy link
Member Author

@Fuco1 Fuco1 Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you know Haskell I can think of this as an analogy of Maybe a, that would be in our syntax a*? (since a* is the generic type constructor). So basically ? is short for (or nil <arg>), similar as one can write (list a*)

a?* probably makes no sense other than a? would be the placeholder and that shouldn't be allowed indeed.

We can also put some things to the front, (e)lisp has a tradition with the & prefix.

Or we can go entirely different way and for example use strings or quoted symbols for generic names, so that it would be "foo" or (list "foo") instead of foo*.


** Most general type

A type for everything is called =mixed=. It accepts anything and is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that "accepts anything" is a bit wrong here: apparently, mixed does not accept [mixed] (see #112).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should so that's a bug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue in #112 is the other way around though. We want an array [Mixed] and we are getting Mixed instead which might not be an array. So this is a mismatch.

- =(cons a b)= where =a= is the type of =car= and =b= is the type of =cdr=. If
the =car= and =cdr= can be anything write =(cons mixed mixed)= or simply
=cons= for short.
- =(list a)= where =a= is the type of items in the list. If the list can
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this syntax, you can't distinguish between:

  1. a list with unknown length and whose elements are all of the same type, and
  2. a list with length 1

I suggest (list a) to represent case 2 and (list a...) to represent case 1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I never actually had in mind to have lists of finite/known length with the type constructor list. I was thinking of using tuple as short for (cons a (cons b (cons c nil))) which is a list of three things of types a, b, and c.

Soy idea was (tuple a b c) and list would be specifically for a list (of unknown length) with a repeated value.

I think I did not write it down anywhere, so I should. Does this make sense to you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using tuple probably makes more sense than to reuse list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had another thought at tuple and I'm not sure about it anymore. The tuple solution won't let you express that the first items are of some known type where the rest is unknown. Some use cases:

  • the parameters to format ((list string mixed...))
  • tagged lists where the first element is a symbol indicating how to interpret what follows ((list symbol mixed...))
  • poor-man structures (e.g., a person could be described as a name, a birthdate and children with (string date person...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's also a bit related to structures like plists or alists, where alists are lists of lists where the car is something. These are good points.

In the future I also want to support "constant" types, so you can explicitly require specific values, for example, using your notation

(list (or 'regex 'syntax) string...)

could mean "this is a list of strings which should be interpreted as regular expressions if the car is a constant 'regex, or as syntax descriptors if 'syntax.

The constants will probably have a constructor const similar to how defcustom does it. Maybe we can use that syntax? The defcustom type (repeat string) is a list of strings ("a" "b" "c" "d" ...) where (list string integer symbol) is a list of three things ("foo" 1 'bar).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wanted to reuse the defcustom stuff so I dived into it and I realized that I really like the constructor-as-type principle.

There's also this cool pattern-matching library which uses a similar concept: https://github.com/VincentToups/shadchen-el

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So actually maybe we can lift the list-rest idea from there. Or allow a &rest keyword inside list such that (list &rest int) is a list of integers of any length and (list string bool &rest int) requires a string, a bool and then a bunch of ints.

The same can work in parallel with vector such that (vector string bool &rest int) is... you get the idea.

This might be it! :D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this


Lists are made by chaining cons cells together and leaving =nil= in the
last =cdr= of the last =cons=. Therefore =(cons a (cons b (cons c nil)))=
can be understood as a list of exactly three items of type =a=, =b=, and
=c=.

Elisp provides a data constructor =list= to make creating such lists
easier. In analogy to =cons= we provide a =list= type which corresponds
to this, such that =(list "foo" 1 :bar)= has type =(list string int
keyword)=.

In general =(list a b c ...)= is a list type of known length where =a=, =b=,
=c=... are types of the items on the 0th, 1st, 2nd... places. It's a
proper list so the last =cons='s =cdr= is a =nil=.

Sometimes we need to express lists of unknown length which contain
items of the same thing repeatedly. For example =(3 2 5 1 5 6 7 5)= is
a list of numbers. Because we don't know its length beforehand we
provide a special syntax to express that a type repeats forever inside
the list.

The type =(list a b c . rest-type)= is a type similar to =list= except the
last =rest-type= repeats forever. Therefore, we expect a known number
of items of types =a=, =b=, =c= and then any number of items of =rest-type=.
If we only speficy one type it is automatically the rest type, so that
=(list . rest-type)= holds any number of items of =rest-type=.

If the list can hold any number of anything, write =(list . mixed)= or
simply =list= for short.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it very much, good job. I'm not so sure about this phrasing though:

If we only speficy one type it is automatically the rest type, so that (list . rest-type) holds any number of items of rest-type.

If the reader doesn't pay attention, she might understand that you are talking about (list rest-type) and not (list . rest-type). May I suggest this phrasing instead?

If we specify no type before the dot . as in (list . rest-type), this means the list holds any number of items of rest-type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, I wasn't so sure myself but couldn't phrase it better.

I'm also thinking about maybe introducing some other shorter notation without the dot as that is confusing and we maybe might want to use it for dotted lists (albeit those are really rare so I wouldn't care about those very much).

Something like (list* a) where the star denotes a repetition like in formal grammars and stuff. I'd probably move the generic * marker before the type then.

docs/type-annotations.org Show resolved Hide resolved
docs/type-annotations.org Show resolved Hide resolved
docs/type-annotations.org Show resolved Hide resolved
A sum type is useful if the function internally checks the passed
value and decides what processing to do:

*TODO:* I think the double parens in the "argument" portion will lead to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is the same problem as the double-parenthesis in the bindings of the let form. I don't find it confusing but I don't know about others.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than confusing it's also a bit noisy I think.

(elsa ((function (a*) b*) (repeat a*)) (repeat b*))

for me it's very hard to get a good overview of where the arguments are and where the return type is.

(elsa (function (a*) b*) (repeat a*) -> (repeat b*))

is to me a lot more visual.

In my original thoughts I actually had two annotations (elsa-args) and (elsa-return). Not sure if that's better though.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me the idiomatic lisp thing would be to break the s-expression onto multiple lines and use indentation to visually indicate structure. The -> notation is nice and efficient for one-line type annotations, but in the real world it seems likely that people would get that visual reinforcement of the type structure by writing

(elsa ((function (a*) b*)
       (repeat a*))
    (repeat b*))

instead of

(elsa ((function (a*) b*) (repeat a*)) (repeat b*))

at least within elisp code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice idea and should be easily implemented using the indent engine. When you lay out out like this the structure actually makes sense too!

I will update the examples to use this layout and hopefully it catches on.

arguments.

Composite types usually correspond to data constructors such as =cons=,
=list=, =vector=...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you specify a plist where odd items are symbols and even items are strings? E.g.,:

`(:background "#583333" :foreground "#29457")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For plists I'm planning to have a special type (plist key value), same for alist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it will be a bit more involved since you might also want to express what keys exactly are allowed. For that I'm pondering a set type, such that

(set :foo :bar :baz)

Allows any of :foo, :bar, :baz zero or once and nothing else.

Then

(plist (set :foo :bar :baz) string)

Is a type of a plist with string values and keys :foo, :bar, :baz.

Now this still does not allow us to specify what value type we want for what key type exactly. In javascript (flow/typescript) they do it simply by enumerating:

{
  name: string,
  age: int
}

so maybe we can also have such a syntax:

(plist :foo string :bar int)

I'm not decided yet here what is the best way. Ideally somehow combine it all for maximal flexibility.

@Fuco1
Copy link
Member Author

Fuco1 commented Nov 3, 2018

Some more points of things currently impossible to express: #117 (comment)

@Fuco1
Copy link
Member Author

Fuco1 commented Apr 26, 2020

I've implemented the "lispification" of the type annotation language. I want to merge this to avoid the code diverging as I work on more features, but I don't want to lose the unresolved discussion here.

Anyone has a good idea what to do? Create a new pull request and merge that leaving this open?

@Fuco1 Fuco1 closed this Nov 1, 2021
@Fuco1 Fuco1 deleted the docs/type-annotations branch November 1, 2021 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants