Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language Discussion #1

Open
keean opened this issue Sep 17, 2016 · 116 comments
Open

Language Discussion #1

keean opened this issue Sep 17, 2016 · 116 comments

Comments

@keean
Copy link
Owner

keean commented Sep 17, 2016

The idea is to create a simple, compact, symmetrical, typed language for generic programming that compiles to JavaScript. The initial reference is "Elements of Programming" by Alexander Stepanov and Paul McJones.

The core concept is to have parametric types, type-classes and type-families, along with maybe higher-ranked-types, higher-kinded-types, to enable type-directed generic programming.

Inspiration is from C, C++ Concepts (type safe templates), Haskell, Rust, Eff, and others.

This is different from TypeScript, because it is based on type-classes, not classical inheritance, different from PureScript because it allows imperitivity and state.

This issue is for the initial design of the language, with other issues being forked as necessary for specific topics like syntax, semantics, type system etc.

@SimonMeskens
Copy link

I'd like to chime in that Elements of Programming is one of my favorite tech books, it's got a very prominent space on my book shelf, next to Luca Cardelli's A Theory of Objects and Bird and Wadler's Introduction to Functional Programming (1st Edition). If a language could express the ideas in those three books combined, it might very well be my favorite language.

What will your objects look like? Have you studied Self? I highly recommend giving this short paper a read:
http://sauerbraten.org/lee/ecoop.pdf

What will the solution for this problem look like in your language? (I think it's a great litmus test for languages) It shows why just type classes has some problems solved by delegation with differential inheritance (a more powerful superset of subclassing).

@keean
Copy link
Owner Author

keean commented Sep 17, 2016

@SimonMeskens I'm not familiar with "A Theory of Objects" but a while back I contributed to this paper https://arxiv.org/pdf/cs/0509027.pdf which shows how object types fit into a formal type system.

With objects there is a problem with dispatch in that "a.f(b, c, d)" gives special treatment to the receiving object 'a', and only does dispatch on this. With type-classes we start with a language that has no overloading. If we want to allow overloading we use a type class, and a type class can dispatch on multiple type parameters.

I will take a look at the examples in that paper, thanks.

@keean
Copy link
Owner Author

keean commented Sep 17, 2016

@SimonMeskens I have implemented an example from the Shark paper in Rust, as it is a similar language (imperative and has traits):

trait Animal {
    fn swim_away(&self) {
        println!("swim away");
    }
}

trait Encounter<A> : Animal where A : Animal {
    fn encounter(&mut self, other : &A);
}

struct Fish {}
struct Shark {
    healthy : bool
}

impl Shark {
    fn swallow<A>(&self, _ : &A) where A : Animal {
        println!("swallow");
    }
    fn fight<A>(&mut self, _ : &A) where A : Animal {
        self.healthy = false;
    }
}

impl Animal for Fish {}
impl Animal for Shark {}

impl Encounter<Fish> for Fish {
    fn encounter(&mut self, _ : &Fish) {}
}

impl Encounter<Shark> for Fish {
    fn encounter(&mut self, other : &Shark) where Self : Animal {
        if other.healthy {
             self.swim_away()
        }
    }
}

impl Encounter<Shark> for Shark {
    fn encounter(&mut self, other : &Shark) where Self : Animal {
        if self.healthy {
            self.fight(other);
        } else {
            self.swim_away();
        }
    }
}

impl Encounter<Fish> for Shark {
    fn encounter(&mut self, other : &Fish) where Self : Animal {
        if self.healthy {
            self.swallow(other)
        } else {
            self.swim_away();
        }
    }
}

I don't intend to use the same syntax as Rust, but it gives an idea of the structure in a language with type-classes. One key difference with JS as the target is that we have garbage collection, so we don't need to have any lifetimes, which simplifies things (although Rust correctly infers all the lifetimes in this example without any annotation), and also because of this we don't need affine types, and JS always passes by value (objects are values which are references to the object).

@keean
Copy link
Owner Author

keean commented Sep 17, 2016

First design questions are how much type inference should there be, and what are the consequences for syntax. Considering the simplest polymorphic function, in a Rust-like style:

fn id<A>(x : A) -> A { x }

Or Haskell style:

id :: A -> A
id x = x

The Haskell style has the advantage of working with type inference, and the Rust style has the advantage of automatically introducing scoped type variables.

Obviously there will need to be type signatures in type-class definitions, and records / module definitions.

@SimonMeskens
Copy link

I think that if the goal of the language is to provide a very simple, but very expressive language, you should consider going for a very simple, but expressive syntax. In that case, I think I like the Haskell syntax more than the Rust syntax. I've never been a fan of how dense the Haskell syntax gets though, so I'd aim to sit somewhere in between. I think Haskell syntax, with words instead of symbols, would work best. If you want, I could give some samples of syntax that might be fun to play around with. I think I'm going to get out Elements of Programming and look up a few good examples and try to create a syntax that is made explicitly to express those problems.

Thanks for providing the example in Rust. I think there's a possibility to introduce some concepts to make it more powerful and concise. I assume you'd prefer me to open an issue with a proposal of any specific features? Btw, in the paper I linked, multiple dispatch is more or less equal to type classes, so, as the paper shows, introducing a concept of delegation improves upon the code semantics. I've been playing around with the idea of delegation in Haskell and I think I know where to go with it. I'll move that talk to the other issue then.

@shelby3
Copy link

shelby3 commented Sep 18, 2016

@keean the type signatures of all exported interfaces must be explicitly declared and not inferred, else modularity (separate compilation) is impossible which is critically needed if targeting JavaScript and its dynamic runtime module importing. This also makes the interaction of certain higher-order first-class type features decidable when moving away from the origin on the Lambda Cube, as well enables to implement typeclasses in more than one way (which Haskell can't do due to its global inference).

It also enables decidability with first-class anonymous (structural) unions (disjunctions), which is a mandatory feature of the language else I won't contribute. TypeScript, N4JS, and Ceylon (which emits JavaScript) have this feature. Scala 3 (Dotty) also has it (and there is a Scala.JS compiler for Scala 2). Ceylon also doesn't support typeclasses, but Scala 2 (and presumably Scala 3) does. Inferred structural unions are useful for many paradigms and is necessary to support the JavaScript || logical operator which doesn't always return a boolean.

Declaration of type signatures within functions (also methods) should be optional and inferred if not specified. Declaration of non-exported function type signatures could perhaps also be optional and inferred.

@shelby3
Copy link

shelby3 commented Sep 18, 2016

@keean to add to your opening comment of this issue thread, TypeScript is intentionally unsound typing and PureScript lacks first-class anonymous structural unions. N4JS appears to be sound typing, but also lacks typeclasses.

@keean
Copy link
Owner Author

keean commented Sep 18, 2016

@SimonMeskens I am interested to see what the code looks like with delegates. I provided the sample in Rust because I can syntax check and run the example, there is always the problem of unchecked errors when posting hypothetical code samples.

@keean
Copy link
Owner Author

keean commented Sep 18, 2016

@shelby3 I agree all exported interfaces must have type signatures. For higher-order features I quite like first-class-polymorphism. I think a reasonable assumption is that everything inferred is monomorphic. If non-exported function signatures can be inferred, then a syntax with optional type annotations would seem better.

@keean
Copy link
Owner Author

keean commented Sep 18, 2016

@shelby3 regarding first class unions, I think one of the key features is supporting the kind of programming you want to do. I do want to suggest an idea though, and see what the consequences are. The idea is to have no concrete types at all, so all types are expressed by type variables constrained by type classes. Consider the identity function using a cross between Haskell and TypeScript notation:

id(x : a) : a = x

'A' can be any type, so we could constrain it to be a type-class using a Prolog like notation:

id(x : a) : a :- Integer a = x

This is already the union of all types that implement Integer, but we can go further and have a type-level or operator:

id(x : a) : a :- Integer a | String a = x

This has a nice synergy with JavaScript and duck-typing as we are saying that we don't care what type is passed, as long as it provides the correct interface. We can type check this nominally in TraitScript, but it reflects the underlying JavaScript that allows any object to be passed as long as it provides the right methods and properties.

The question is, does that give all the functionality that you get with union-types?

@keean
Copy link
Owner Author

keean commented Sep 18, 2016

@shelby3 I find myself re-thinking in the morning and finding I don't agree with what I posted yesterday. An or operator for traits does not make any sense, as it is saying the object may have either method A or method B. What we want to know inside a function is if we can safely call method A for example.

I don't think there is any need for an 'or' operator, or first class unions outside of dynamic types (aka trait-objects). I think I need to revisit the original expression problem with type-classes.

Starting with a collection of trait-objects (existential types):

c : [a] :- Integer(a) = [new A(), new B()]

Requires all objects inserted into 'c' to implement the Integer trait. Now lets say we want to print this as well, we somehow need to inject the 'Show(a)' trait into 'c'. This is actually an 'and' operation, but it occurs in an 'open' way, somewhere else in the source code, something like:

show(c : Coll) {
    foreach(c, a => {
        print a;
    })
}

negate(c : Coll) {
    foreach(c, a => {
        a = -a
    })
}

This requires anything calling 'show' has to be able to prove all elements in collection 'c' provide show. This to me suggests that what is needed is some kind of constraint inference. We can easily infer the set of constraints on each element of 'c' as being (Show a, Negate a) or something like that. The question is whether 'c' is heterogeneous or homogeneous, there are two possible type signatures:

data Coll = Coll [a]
data Coll = forall a . (Show a, Negate a) => Coll [a]

In the former case 'show' and 'negate' would need to each have the constraints on '[a]' in their type signatures directly (although they can be inferred). One option would be to allow:

data Coll = forall a => Coll [a]

To automatically accumulate the inferred type classes. But what to do on module boundaries?

@shelby3
Copy link

shelby3 commented Sep 18, 2016

@keean I explained to N4JS's @yoshiba why Joose and Object Algebras do not solve Wadler's Expression Problem and are not as orthogonally extensible as typeclasses.

@shelby3
Copy link

shelby3 commented Sep 18, 2016

@keean wrote:

@SimonMeskens wrote:

What will your objects look like? Have you studied Self? I highly recommend giving this short paper a read:
http://sauerbraten.org/lee/ecoop.pdf

With objects there is a problem with dispatch in that "a.f(b, c, d)" gives special treatment to the receiving object 'a', and only does dispatch on this. With type-classes we start with a language that has no overloading. If we want to allow overloading we use a type class, and a type class can dispatch on multiple type parameters.

Also remember we discussed in May, that typeclasses (not virtualized typeclasses, i.e. Rust's trait objects) can be monomorphic at the function application use-site because the data type is not lost (as in subclassing where it is assigned to a supertype), thus avoiding the virtual method jump table lookup which otherwise stalls the CPU pipeline reducing performance by as much as to 5 - 6 times slower.

@keean
Copy link
Owner Author

keean commented Sep 18, 2016

@shelby3 I have created a second issue here #2 to discuss union types.

I am fully aware of monomorphisation, which is when the type is statically determined. There is also runtime polymorphism (which it what haskell uses existential types for, and Rust uses Trait objects. I prefer the existential approach for the type system, as it is actually sound as a logic).

@keean
Copy link
Owner Author

keean commented Sep 18, 2016

Also note the mult-method dispatch version of the Sharks code makes some other changes, introducing new types for HealthyShark and DyingShark, I will update the Rust code to reflect this way of implementing it.

@shelby3
Copy link

shelby3 commented Sep 18, 2016

@keean wrote:

JS always passes by value (objects are values which are references to the object).

ASM.js passes by value unboxed. For functions operating only on numbers, afaics we could support this seamlessly. Emulating other types (e.g. strings) in ArrayBuffers would require more complexity to compile and interopt, and might require more C-like syntax. We could keep in mind these sort of optimizations and whether we want and could reasonably provide this without forcing the programmer to resort to a FFI.

No need to discuss this ASM.js tangent right now.

@SimonMeskens
Copy link

I think you can write delegation as a monadic structure, which might be the real ticket. I also look forward to your new implementation. I think it's a cool example as it covers such a broad range of language capabilities.

@shelby3
Copy link

shelby3 commented Sep 18, 2016

@SimonMeskens wrote:

Thanks for providing the example in Rust. I think there's a possibility to introduce some concepts to make it more powerful and concise. I assume you'd prefer me to open an issue with a proposal of any specific features? Btw, in the paper I linked, multiple dispatch is more or less equal to type classes, so, as the paper shows,

The huge salient difference is that multi-methods is dynamically dispatching to the implementation of the subclasses of Animal at run-time, thus stalling the CPU pipeline and adversely impacting performance.

Wheres, the Rust type-class example shows that type-classes can be resolved monomorphically at compile-time, so there is no jump table lookup.

Edit: another huge difference is that type-classes don't bind the multiple interfaces in the class inheritance and instead keep interfaces and classes orthogonal.

introducing a concept of delegation improves upon the code semantics. I've been playing around with the idea of delegation in Haskell and I think I know where to go with it. I'll move that talk to the other issue then.

I think you can write delegation as a monadic structure, which might be the real ticket. I also look forward to your new implementation. I think it's a cool example as it covers such a broad range of language capabilities.

@keean wrote:

Also note the mult-method dispatch version of the Sharks code makes some other changes, introducing new types for HealthyShark and DyingShark, I will update the Rust code to reflect this way of implementing it.

Changing the interface based on the state of data type, conflates the instance of the data type with the interface. This makes orthogonal extension with for example collections of instances inefficient and boilerplate prone.

I don't think this is a paradigm that should be used.

P.S. if we do our job exceedingly well (as in demonstrate that type-classes and sound typing kick ass on subclassing), there is a remote possibility we could be designing the future replacement for EMCAScript. So performance should be high on our list of priorities, because performance is very difficult to achieve in a dynamically typed language and virtual inheritance of subclassing messes up performance. Google is looking for inspiration for their SoundScript concept of a soundly typed JS-like language in the browser. The guy leading that effort (Andreas Rossberg) is familiar with Scala, ML, etc.. Andreas commented that N4JS is "very interesting".

But I don't know if fast and incremental compilation will be achieved in the language we are contemplating. We can think about this in our design decisions if we want to.

@keean
Copy link
Owner Author

keean commented Sep 18, 2016

SoundScript is described here: http://www.2ality.com/2015/02/soundscript.html which sounds like JavaScript plus enforced classical objects and inheritance, and simple function typing. There's nothing about generics or code generation.

@SimonMeskens
Copy link

SoundScript's goal is to actually be somewhat compatible with TypeScript, I assume Google intends to make it so we don't have to compile TypeScript for Chrome anymore.

@shelby3
Copy link

shelby3 commented Sep 18, 2016

@keean wrote:

which sounds like JavaScript plus enforced classical objects and inheritance

Until they figure out that virtual inheritance stalls the CPU pipeline and type-classes don't. I think we have a very tiny (almost nil) chance to influence the future. Any way, we should design for performance assuming one day our language will have its own VM and not just compiled to JS, if that doesn't incur any huge trade-offs in our design constraints. Also we can think about optimizing some parts with ASM.js. I am not sure how this thought process will play out. I am trying to keep my mind open and see where the design constraints lead us. I agree the priority is on having a simple, clean language without proliferation of duplicated paradigms.

Edit: battery life (i.e. performance) on mobile is important. I could see our language potentially becoming very popular for mobile apps.

There's nothing about generics or code generation.

In the link I provided, they mentioned TypeScript as a possible starting point, but aiming to make it sound. I think they will (or maybe already have) discovered that retaining interoperability with TypeScript code while also making TypeScript sound is probably not practical. Of course I could be wrong about this.

@shelby3
Copy link

shelby3 commented Sep 18, 2016

From now on, in all discussion where I mean to use type-classes, I will use the word trait to indicate the interface of the type-class, since that is the name Rust chose. We might choose a different keyword or not, yet until then I will use the term trait. And not the same thing as trait objects.

@keean wrote:

I agree all exported interfaces must have type signatures. For higher-order features I quite like first-class-polymorphism.

So I assume by "first-class-polymorphism", you mean we can declare a (function, method, or class) type parameter constrain to a trait (or union of traits) bound, and then I can call functions and methods without knowing the actual type of the data which implements the trait(s)?

I presume by "first-class-polymorphism", you don't mean higher-ranked types.

I think a reasonable assumption is that everything inferred is monomorphic.

Unpacking this, I presume you mean that inference will never subsume to a subclassing subtype (with virtual inheritance) nor to a virtualized trait object (which also incurs dynamic dispatch and stalls the CPU pipeline)? And this is why we need anonymous structural union types, e.g.:

if (x) true
else 3

The inferred type is Boolean | Number.

I am presuming we will choose an "everything is an expression" grammar. Note that the following expression (not the return type of the containing function) has a type of undefined (or the bottom type):

if (true) return 1

If non-exported function signatures can be inferred, then a syntax with optional type annotations would seem better.

Agreed. Thus we need the suffixed : type and not the C/Java prefixed type declaration.

@shelby3
Copy link

shelby3 commented Sep 18, 2016

@SimonMeskens wrote:

I think that if the goal of the language is to provide a very simple, but very expressive language, you should consider going for a very simple, but expressive syntax. In that case, I think I like the Haskell syntax more than the Rust syntax. I've never been a fan of how dense the Haskell syntax gets though, so I'd aim to sit somewhere in between. I think Haskell syntax, with words instead of symbols, would work best.

I think I'd prefer to stay as close to JavaScript as possible in expression syntax while everything becomes an expression (optionally a statement as well). The point is to produce a language that is very easy for everyone to learn.

However, I hate noisy code (unless necessary to provide reasonable cognitive load), so I am open to adopting some function level grammar ideas from Haskell, such as:

  • mandatory block indenting instead of curly braces (Python is very popular)
  • function name overloading with default arguments (multiple versions of a function with same name)
  • everything is an expression (but some can also be used as statements with side-effects)
  • pattern matching with switch or match within function, but not on function arguments
  • guards within function, but not on function arguments
  • semicolons not allowed (don't enable to cram multiple statements on one LOC as it is not readable)

I prefer consistency of coding style because we are in the open source reality now, where code should be readable the same to everyone. So I would prefer to enforce a chosen style.

I realize we might not agree on all of these. Let's discuss. But syntax may not be the most important discussion to have now and first.

@SimonMeskens
Copy link

I agree with all of those list items 👍

@shelby3
Copy link

shelby3 commented Sep 18, 2016

@SimonMeskens cool we finally agree on something 😍

@SimonMeskens
Copy link

Yeah, sorry if I acted like an ass on the TypeScript place, not sure why I get defensive there, I'm a pretty nice guy usually. I also don't have personal beef with you, life is far too short for that 😄

@keean
Copy link
Owner Author

keean commented Sep 18, 2016

I agree with most of that, with some restrictions.

  • function name overloading, I would want all overloads to have the same type, so the overloads are on data constructors.
  • as an imperative language with mutation, i think statements make more sense, but in general everything that makes sense should be an expression too.
  • function overloading on constructors implies pattern matching in function arguments, so I think these two are mutually exclusive. You cannot have both. I would rather have pattern matching and overloading in arguments as it leads to less indentation.
  • I quite like function guards on arguments.

Couple of neat examples.

Pattern matching in arguments:

append x Nil = x
append x (Cons h t) = Cons h (append x t)

Guards on arguments:

find_zero Nil = false
find_zero (Cons h t) | h == 0 = true
find_zero (Cons h t) | h /= 0 = find_zero t

@shelby3
Copy link

shelby3 commented Mar 7, 2017

@keean I was re-reading your idea to unify the syntax for data, trait (interface or typeclass), and module.

Realizing that typeclasses are just an interface that is either monomorphized or an implicitly passed object instance of the interface, caused me to see that there is nothing that special about typeclasses nor an instance of a class object which implements an interface.

Sum types are a bit weird for an interface or typeclass though. It means the type has to be tagged and the code that accesses the interface has to pattern match on each of the sum cases. But at least it unifies and perhaps there is even some utility to sum type interfaces.

I don't see how they unify with module, unless access restrictions become part of interfaces. Need to think more about that.

Thus data is not the correct unification keyword since typeclass interfaces can be monomorphized (i.e. no data object is ever created).

I don't like struct either because it has a meaning in C which I would like reserve for a binary packed data structure. I don't think most mainstream programmers (at least not autodidact programmers who didn't study computer science at a university) are aware of use of struct for example in ML.

I like interface or trait. My complaint against trait in past has been that Scala conflates them with mixins containing variant implementation. ECMAScript has mixins now.

And for the typeclass instances of interface or trait, I like variant or reified better than instance because the latter implies to me an object.

@keean
Copy link
Owner Author

keean commented Mar 7, 2017

It's not so much a sum type, as a record type. A record is like a JavaScript object, it's a collection of labelled properties. You can also think of a record as a "labelled product type".

You are correct, a type-class is really just a mechanism for resolving interfaces implicitly. Optional implicit parameters (which Scala has) provide a way of unifying runtime interfaces (passed by value) and monomorphisable interfaces (passed implicitly by type).

If we allow polymorphic-recursion then type-classes can be non-monomorphisable, and we have to pass the correct instance at runtime, depending on the type at runtime - this is something I would rather not do, and would suggest parsing as an argument instead.

Regarding modules, they are no different than records/objects, except the can in some languages extend them with additional features. Our records would already be parametrically polymorphic, which allows them to be instantiated with type parameters. If our language supports no other module features, then modules and records are already unified. You can see this in JavaScript where function closures returning an object give you simple modules. In this case as module implementations can be passed as first class values and we avoid the need for a separate module type system like ML has (and we don't want subclassing of modules).

The only other module features we probably want are associated types (or type families), but we want type-classes to have these too, so you could say a record is a module with no associated types.

There is one thing to be careful of here, and that is passing modules by value can lead to typed depending on values (dependent types). I think it is a bad idea to have a little bit of type dependency, other languages explore dependent types, but here they add unexpected complexity. However I think we can use union types here, to avoid dependent types and also avoid weird restrictions on passing modules that will confuse people.

If we consider a module with an associated type, say some container that contains only Ints or a different container that contains only Strings, when we pass a container like this and access the values they will be of type (Int /\ String). This is different from the case with parametric polymorphism because the type in the parametric case is in the signature and we can monomorphise the called function into two versions, one which handles strings and one which handles ints, and know which one to call at compile time. So why not use parametric polymorphism everywhere? Well type parameters are 'input' types and therefore need to be specified or inferred. Associated types are 'output' type parameters, and simply having a specific value of a module determines those types. Further if the output types of a module are functionally dependent on the input types, we can monomorphise the output types wherever we can monomorphise the input types.

@shelby3
Copy link

shelby3 commented Mar 7, 2017

It's not so much a sum type, as a record type. A record is like a JavaScript object, it's a collection of labelled properties. You can also think of a record as a "labelled product type".

I know what a record type is, you don't need to waste the verbiage (perhaps you remember me writing numerous times in our discussions about record types). You didn't understand my point. I meant what I wrote. I meant a sum type. I presume you remember what a sum type is. If we unify data and interface/trait, then the sum types of data are unified and exist in the latter also. Please re-read my prior post in that light.

@shelby3
Copy link

shelby3 commented Mar 7, 2017

Optional implicit parameters (which Scala has) provide a way of unifying runtime interfaces (passed by value) and monomorphisable interfaces (passed implicitly by type).

I like that summary (in the context of what I had written). We should make sure we frame that and use it in our documentation.

@shelby3
Copy link

shelby3 commented Mar 7, 2017

If we allow polymorphic-recursion then type-classes can be non-monomorphisable, and we have to pass the correct instance at runtime, depending on the type at runtime - this is something I would rather not do, and would suggest parsing as an argument instead.

I am not grokking this (I don't know the meaning of the term polymorphic-recursion, although I can guess), especially don't know what you mean in this context by "parsing as an argument" so I have insufficient foundation from which I can easily deduce the meaning of the rest.

I do remember we had discussed higher-ranked typing and the issues of functions which require typeclasses being the argument of another function. Is that what you are referring to? I remember you wanted to force us to create a new record instance for each case and have to do boilerplate to integrate two instances which are otherwise equivalent other than their differing record name. I remember that was a complexity aspect that started to make me want to not attempt to add typeclasses at the first iteration of the language. Too much complexity needs to be carefully considered first.

@shelby3
Copy link

shelby3 commented Mar 8, 2017

You can see this in JavaScript where function closures returning an object give you simple modules.

You are saying the local data and/or function(s) within a function, can be returned wrapped in the publicly inaccessibly closure of the returned public function, which serves as the means for creating private access restriction on the said local data and/or function(s). The said returned function is a constructor of an instance of the module.

But afaics by not mentioning a key distinction, you are implicitly conflating passing around type signatures with passing around values, i.e. instances of the data and/or function(s) in those type signatures. I think what you mean is that the public interface of the module is declared outside said constructor function. And the returned object of the constructor function conforms to the public interface type signature of the module. But then what enforces that only that constructor function is allowed to construct instances?

In this case as module implementations can be passed as first class values and we avoid the need for a separate module type system like ML has (and we don't want subclassing of modules).

You are saying we can pass around constructor functions as first-class values, so the access restrictions of the module system is a unified feature of first-class functions. But that seems to be incomplete as stated above.

@shelby3
Copy link

shelby3 commented Mar 8, 2017

The only other module features we probably want are associated types (or type families), but we want type-classes to have these too, so you could say a record is a module with no associated types.

There is one thing to be careful of here, and that is passing modules by value can lead to typed depending on values (dependent types). I think it is a bad idea to have a little bit of type dependency, other languages explore dependent types, but here they add unexpected complexity. However I think we can use union types here, to avoid dependent types and also avoid weird restrictions on passing modules that will confuse people.

If we consider a module with an associated type, say some container that contains only Ints or a different container that contains only Strings, when we pass a container like this and access the values they will be of type (Int /\ String). This is different from the case with parametric polymorphism because the type in the parametric case is in the signature and we can monomorphise the called function into two versions, one which handles strings and one which handles ints, and know which one to call at compile time. So why not use parametric polymorphism everywhere? Well type parameters are 'input' types and therefore need to be specified or inferred. Associated types are 'output' type parameters, and simply having a specific value of a module determines those types. Further if the output types of a module are functionally dependent on the input types, we can monomorphise the output types wherever we can monomorphise the input types.

I'm not grasping what the huge distinction or benefit of associated types are?

I can conceptually (not actually in Rust) write this example as follows employing type parameters of the type constructor instead of associated type members:

// Without using associated types
fn difference<C>(container: &C) -> i32 where
    C: Contains<_, _> { ... }

And this one:

fn distance<N, G: Graph<N, _>>(graph: &G, start: &N, end: &N) -> u32 { ... }

And remember my proposal is except where required for preventing ambiguity, those will be written as follows because all upper-case would be reserved for type parameters (a proposal which it seemed you were not entirely enthused or perhaps begrudgingly accepted):

fn difference(container: &C) -> i32 where C: Contains<_, _> { ... }
fn distance(graph: &G, start: &N, end: &N) -> u32 where G: Graph<N, _> { ... }

One claimed benefit is not having to repeat yourself and that the use-site provides local reasoning (instead of positional w.r.t. to the definition site) of the name of the type variable:

interface List { type T }
class Cons implements List
let x = new Cons{ T = Int }(head, tail)

instead of:

interface List<T>
class Cons<T> implements List<T>
let x = new Cons<Int>(head, tail)

Whether we express existentials (i.e. the type exists but “I don’t care” what the type is) with _ for type parameters or as unspecified associated type members, either way we have thrown away static knowledge of what the type is and therefor we can’t specialize on the unknown type. If we want to specialize without restricting to a specific type, then we have to restrict the (least upper and greatest lower) bounds of the type. That is expressed with subtyping. Conjunctions and disjunctions are one way of expressing a subtyping (i.e. subset) relationship.

Note there are some issues of decidability of type safety that I don't completely grok yet:

https://wiki.haskell.org/GHC/Type_families#Decidability
http://lampwww.epfl.ch/~amin/dot/fool.pdf#page=8 (c.f. §4. Counterexamples to Preservation)

Note regarding the EPFL paper, afaik the type-safety was somehow proved for a later iteration of the DOT calculus.

@shelby3
Copy link

shelby3 commented Mar 8, 2017

Scala Tour says:

Furthermore, there are cases where it is not possible to replace abstract types with type parameters.

Do you know of an example? The only justification I've found is to avoid an exponential blowup of _ explicitly unknown parameters.

Apparently the converse examples are variance annotations (on type parameters for subclassing) and path-dependent type-checking soundness/decidability, but our language won't have those features any way.

Also this as converse example:

there is also the problem of accidental name conflicts between abstract type names that emulate type parameters

@shelby3
Copy link

shelby3 commented Mar 14, 2017

Paul Graham wrote:

Most data structures exist because of speed. For example, many languages today have both strings and lists. Semantically, strings are more or less a subset of lists in which the elements are characters. So why do you need a separate data type? You don't, really. Strings only exist for efficiency. But it's lame to clutter up the semantics of the language with hacks to make programs run faster. Having strings in a language seems to be a case of premature optimization.

Strings are unboxed, store-by-value Array<Char> not List<Char>, because they are a random access interface.

Might be useful to unify the interface of String with Array<Char>?

The following is the point I've been trying to make to you:

If we think of the core of a language as a set of axioms, surely it's gross to have additional axioms that add no expressive power, simply for the sake of efficiency. Efficiency is important, but I don't think that's the right way to get it.

The right way to solve that problem, I think, is to separate the meaning of a program from the implementation details.

Since speed doesn't matter in most of a program, you won't ordinarily need to bother with this sort of micromanagement. This will be more and more true as computers get faster.

Saying less about implementation should also make programs more flexible.

What programmers in a hundred years will be looking for, most of all, is a language where you can throw together an unbelievably inefficient version 1 of a program with the least possible effort. At least, that's how we'd describe it in present-day terms. What they'll say is that they want a language that's easy to program in.

Inefficient software isn't gross. What's gross is a language that makes programmers do needless work. Wasting programmer time is the true inefficiency, not wasting machine time. This will become ever more clear as computers get faster.

@shelby3 shelby3 mentioned this issue Mar 14, 2017
@shelby3
Copy link

shelby3 commented Mar 15, 2017

Thus data is not the correct unification keyword since typeclass interfaces can be monomorphized (i.e. no data object is ever created).

I don't like struct either because it has a meaning in C which I would like reserve for a binary packed data structure. I don't think most mainstream programmers (at least not autodidact programmers who didn't study computer science at a university) are aware of use of struct for example in ML.

I like interface or trait. My complaint against trait in past has been that Scala conflates them with mixins containing variant implementation. ECMAScript has mixins now.

And for the typeclass instances of interface or trait, I like variant or reified better than instance because the latter implies to me an object.

I had some conceptual lapses above.

We are defining the signature of an optionally parametric polymorphic type which defines a sum and/or product type constructor. The type parameters the operands of the sum and/or product type may or may not be bound to the parent type.

We can instantiate that constructor supplying some or all of the operands and/or type parameters. Our instance can also define another said constructor. Repeat recursively.

Instantiations contain data but they are not an object that can be accessed until all operands and bound type parameters have been provided.

A typeclass is any such parent type which has bound type parameters. A typeclass instance implements a typeclass where all operands and bound type parameters have been provided.

Thus a typeclass instance is really an object that has an interface with bound type parameters.

Thus we are describing a mixin type and Scala's linearization isn't necessary because there is no subclassing relationship, only composition.

Our only subtyping relationship is the subset relationship created by conjunctions and disjunctions.

Thus I propose trait is the best keyword to describe our unified type. An object for accessible instantiations. This copies Scala's keywords but without the subclassing semantics.

Note I don't think we should have a block indented shorthand syntax for sum types. For those, I think we should declare a base type, then implement that base type for each of the types in the sum. This makes the syntax much easier to express with the indenting instead of brace-delimited.

trait Option<T> = None | Some(T)
trait BloatedSum<T>
   ...
trait BloatNone extends BloatedSum<Nothing> // or trait BloatNone extends BloatedSum
trait BloatSome<T>(T) extends BloatedSum<T> // instead of trait BloatSome<T> extends BloatedSum<T> = BloatSome(T)

@keean
Copy link
Owner Author

keean commented Mar 15, 2017

Another thing to consider is that interfaces are types, and their implementations are values of that type.

Remember an object can have many interfaces, so we need a keyword to indicate that an object implements a trait:

trait T[A]
    id : [A] -> A

object O 

implement T[O]
   id _ = O()

@shelby3
Copy link

shelby3 commented Mar 15, 2017

Another thing to consider is that interfaces are types, and their implementations are values of that type.

That is what I wrote:

We can instantiate that constructor supplying some or all of the operands and/or type parameters. Our instance can also define another said constructor. Repeat recursively.


Remember an object can have many interfaces, so we need a keyword to indicate that an object implements a trait

Not only objects, even interfaces can implement other interfaces. I propose the keyword is implements for both.

trait Y implements X ...
object O implements Y ... 

But when the typeclass instances are implicitly provided, we don't need the object O prefix, just the implement Y ... part. We could assign the implement Y ... to a reference if we need to access it explicitly.

So we could consider instead without an object keyword:

trait Y extends X ...
implement Y ... 

I remember you had I think proposed requires or where clause to put the base interfaces and possibly other invariants, so maybe that is more generalized than extends:

trait Y where X ...

@shelby3
Copy link

shelby3 commented Mar 15, 2017

implement Y ... 

That syntax is only necessary when we need block indenting. Otherwise we write that as a constructor call, e.g. Y( ... ).

Thus I think I prefer for the block indenting to use object instead of implement.

@keean
Copy link
Owner Author

keean commented Mar 15, 2017

trait Y implements X ...
object O implements Y ... 

This is not right, a trait is a type, it never implements anything. It's more like it extends the type. Also objects are independent if implementations, so you would declare the object and then declare the traits it implements. These are not interfaces bound to the object like Java interfaces. So i would write the above:

trait X extends Y

object O

implement X(O)

Note: because X is a trait, not a type we may want to use different brackets for the parameters to make it clear it is not a type.

@shelby3
Copy link

shelby3 commented Mar 16, 2017

I think it is important to remember that we want to keep implementation of interface (i.e. typeclass) orthogonal to instantiation of data, otherwise we end up with the early binding of subclassing. For example, in subclassing both a rectangle and a square would be subclasses of an interface which provides width and height. Yet a square only has one dimension, not two. Any interfaces which need to operate on the dimensions of rectange and square need to be customized for each, which is what typeclasses do. We will still have subtyping such as when an interface extends another interface then we can subsume to the supertype (which is why we need read-only references and for supersuming then write-only references). Yet references will not have the type of an interface if we don't provide typeclass objects, so then the only case of subtyping will be the subset relationships created by conjunctions and disjunctions of data types.

However, data types could have partial implementation, e.g. if one of a plurality of type parameters is specified and perhaps some instance data hard-coded for that case.

Also typeclasses could be partially implemented (aka mixins).

@keean
Copy link
Owner Author

keean commented Mar 16, 2017

I'm not sure about this. Type-classes are not types, so no subtyping is involved (they are constraints on types). If we allow typeclasses to be passed as a first class value, then we do have a form of subtyping, but is can be expressed using row polymorphism.

In effect we use some functions from an interface inside a function the argument passed to the function would be expressed as:

f({function1 : Type1 | E}, ...) =>
   ...

You would not often type the above signature because it can be inferred, but what is says is that an interface is a record which at least includes function1 and some other stuff we represent with a type variable E.

Implementations of an interface always have to be complete (things must comply with the specification if they say that they do), but users do not have to use all of an interface, so you can pass extended interfaces in place of unextended ones.

@shelby3
Copy link

shelby3 commented Mar 16, 2017

but is can be expressed using row polymorphism

I prefer not to get a discussion about the merits and drawbacks of row polymorphism at this time (not a priority issue for me right now). We did mention/discuss row polymorphism in the past here, here, here, here, here, here, here, here, and here (this last link contains a post from @keean that links to a research paper claiming improvements for Ocaml's polymorphic variants).

The answers on the following are probably relevant:

http://stackoverflow.com/questions/15237598/why-doesnt-ocaml-support-record-subtyping

More:

http://cs.stackexchange.com/questions/53998/what-are-the-major-differences-between-row-polymorphism-and-subtyping

http://cs.stackexchange.com/questions/71282/difference-between-row-polymorpism-and-bounded-polymorpishm

Quildreen Motta wrote:

Notice that in the second case the values in the list don't have the same shape, but they still work with the predicate correctly. In this case, this is a constraint from Haskell's type system, which wouldn't be a problem if Haskell supported row polymorphic records (like PureScript, Elm, MLPolyR, etc).

@shelby3
Copy link

shelby3 commented Aug 30, 2017

@shelby3
Copy link

shelby3 commented Aug 4, 2018

I wrote on Quora:

Some of the suggestions are good and reasonable. But these two are overzealous.

Functions should not have more than say 3 arguments. 3 maybe is already too much.

This doesn’t weigh the tradeoffs in every situation. For example, a generic sort with all the necessary dependency injection can’t adhere to that myopic rule.

Even if you claim that sort is a procedure and not a function, I doubt Clean Code was making that distinction. In math, a function only ever has one domain. However, it may be a tuple of multiple values.

If you put comments in the code (except Javadoc, fixme's, todo's) you are doing something wrong.

It’s not always possible to keep all the overarching documentation consistent with changing code. If a comment is local to a specific facet which is too detailed or current code structure specific to be in an overarching documentation, then it should be placed in the local context.

Greg Kemnitz wrote:

Documentation outside the code is often worse than useless. High-level design documentation is good - and API specs, DB schema specs, network protocol specs, and file format specs must exist - but anything deeper than that should be code comments, not un-maintained documentation that lives as “write once, forget about” docs in an in-house wiki system. As often as not, newbies find these, read them, and think the existing code actually does what the wiki doc says, and discover it’s crap only after introducing bugs.

@shelby3
Copy link

shelby3 commented Aug 4, 2018

I think it was in this thread where we might have originally talking about the importance of static typing.

There are numerous comments on Quora from programmers who think unit tests are counter-productive (thus implicitly agreeing that static typing is necessary for large scale development). Some even argue that Test Driven Development (TDD) is counter-productive…

Ryan Cook wrote:

Most code projects could be improved by taking the entire unit test suite... and throwing it in the trash.

When should you delete your tests? It depends on what kind of code change causes your tests to break:

  1. Try to refactor some code. Don't change the behavior, just the underlying implementation. Did you break any tests? If so, delete your tests.

  2. Try to break a test by actually changing the behavior. Swap < with >, or == with !=, or comment out a line. Did a test break? If not, delete your tests.

If your tests break when they aren't supposed to, or fail to break when they should, then answer this question: what are your tests really telling you? They sure as hell aren't alerting you to regressions.

Tomas Novella wrote:

TDD sucks, so do unit tests in many many cases. Dick measuring contests for unit test coverage are just BS. Integration tests are what you need instead.

In my experience TDD is mostly useless. When your project is starting out, TDD won’t help you. You’ll rewrite code 100x times and changing tests for everything is just painful. Unit tests for documentation purposes are almost laughable.

Tamer Aly wrote:

I have two that I can think of:

Integration tests are the only tests that matter.

Unit tests, for the most part, are not useful to ensure that your program is working like you want it to. Unit tests are only useful for situations like this…

Greg Kemnitz wrote:

I dislike internal unit tests, especially if they are used as a substitute for proper integration testing. Too often, unit tests have a “let’s look for the car keys under the streetlight because it’s brighter there” quality to them. This is especially true if the software makes extensive use of databases or networks (or both), which are hard to simulate in unit-test worlds.

Hovik Melikyan wrote:

Unit tests are for those who are unsure what they are doing. In almost 30 years in this industry I have never written unit tests even once, however my bug rates have been pretty low compared to the average. Though to be fair I do write integration tests and find them necessary.

Imtiaz Mohammad wrote:

I would not waste my time writing unit tests. I would rather invest that time in getting my design and code reviewed. It is OK if something breaks rarely. I can fix it in a tiny fraction of time people spend on unit testing.

Vivek Nagarajan wrote:

I also believe unit tests are an idiocy. Any non-trivial function needs a test with a complexity as much as the function itself. Trivial tests catch trivial bugs. Non trivial tests cause bugs themselves. You cant replace careful coding with automation.

Nikhil Nair wrote:

Unit tests don’t answer my questions:

I’m not convinced that unit tests actually solve anything. They test such a small part of the application that to me it’s a question of “whole > sum of parts”.

Much prefer functional tests. At the end of my change, I want to know that the whole application works as the user expected, not just the functions.

@shelby3
Copy link

shelby3 commented Aug 22, 2018

This is an important neat idea to make diff algorithms aware of higher-level syntax:

https://begriffs.com/posts/2014-04-08-pilgrimage-report-structural-merging.html

I think I had also suggested this idea in the past.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants