Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow significant indentation syntax #7083

Closed
wants to merge 1 commit into from

Conversation

odersky
Copy link
Contributor

@odersky odersky commented Aug 22, 2019

As an experimental feature, allow indentation to be treated as significant.

What is supported?

To get a feel for the code, see the example below. For precise rules, see the doc page.

The old syntax using braces is also supported. Indentation is not significant inside pairs of braces (or other forms of parentheses), so that gives a natural mode without requiring an explicit switch: Once you use a { after in a toplevel definition, indentation is off in that code.

Why explore this option?

When Scala was first invented, braces ruled. Almost all widely used notations were brace-based. That's why we decided to follow: there were so many other things to innovate, so we explicitly chose to have very conventional syntax.

By now, things are quite different:

  • The most widely taught language is now (or will be soon, in any case) Python, which is indentation based.
  • Other popular functional languages are also indentation based (e..g Haskell, F#, Elm, Agda, Idris).
  • Documentation and configuration files have shifted from HTML and XML to markdown and yaml, which are both indentation based.

So by now indentation is very natural, even obvious, to developers. There's a chance that anything else will increasingly be considered "crufty".

This PR demonstrates that indentation based syntax is quite a nice fit for Scala. So I believe that before finalizing Scala 3 we should seriously consider this syntax as an exploration. We might conclude that indentation has hidden problems, or that its benefits are not enough to balance the cost of change. But to be able to conclude this with confidence we need a serious exploration of this option first.

This PR is intended to enable the exploration. Besides supporting optional significant indentation,
it also provides automatic rewrites from old to new syntax and back. This enables one to migrate
a source quickly and cleanly to indentation based syntax, work with it, and migrate it back
to braces when needed (e.g. before merging it with the master branch). Rewrites maintain formatting and (depending on coding style) it is possible to get back exactly the same source file after going to indentation based and back.

This PR is based on #7024. In fact indentation only works well with new-style control syntax. And rewrites have to be done in two steps. To go from current Scala code to new style indented code one has to invoke the compiler twice, with options

 dotc -rewrite -new-syntax
 dotc -rewrite -indent

To go the other way, it's also to steps:

 dotc -rewrite -noindent
 dotc -rewrite -old-syntax

While indentation-based syntax requires #7024, the reverse is not true. #7024 is in my mind a definite improvement even if adopted alone. So the two should be considered independently.

Example:

enum IndentWidth:
    case Run(ch: Char, n: Int)
    case Conc(l: IndentWidth, r: Run)

    def <= (that: IndentWidth): Boolean =
        this match
        case Run(ch1, n1) =>
            that match
            case Run(ch2, n2) => n1 <= n2 && (ch1 == ch2 || n1 == 0)
            case Conc(l, r)   => this <= l
        case Conc(l1, r1) =>
            that match
            case Conc(l2, r2) => l1 == l2 && r1 <= r2
            case _            => false

    def < (that: IndentWidth): Boolean = this <= that && !(that <= this)

    override def toString: String =
        this match
        case Run(ch, n) =>
            val kind = ch match
                case ' '  => "space"
                case '\t' => "tab"
                case _    => s"'$ch'-character"
            val suffix = if n == 1 then "" else "s"
            s"$n $kind$suffix"
        case Conc(l, r) =>
            s"$l, $r"

object IndentWidth:
    private inline val MaxCached = 40

    private val spaces = IArray.tabulate(MaxCached + 1):
        new Run(' ', _)
    private val tabs = IArray.tabulate(MaxCached + 1):
        new Run('\t', _)

    def Run(ch: Char, n: Int): Run =
        if n <= MaxCached && ch == ' ' then
            spaces(n)
        else if n <= MaxCached && ch == '\t' then
            tabs(n)
        else
            new Run(ch, n)

    val Zero = Run(' ', 0)

For comparison, here's the current layout of this code, using braces:

  enum IndentWidth {
    case Run(ch: Char, n: Int)
    case Conc(l: IndentWidth, r: Run)

    def <= (that: IndentWidth): Boolean =
      this match {
        case Run(ch1, n1) =>
          that match {
            case Run(ch2, n2) => n1 <= n2 && (ch1 == ch2 || n1 == 0)
            case Conc(l, r)   => this <= l
          }
        case Conc(l1, r1) =>
          that match {
            case Conc(l2, r2) => l1 == l2 && r1 <= r2
            case _            => false
          }
      }

    def < (that: IndentWidth): Boolean = this <= that && !(that <= this)

    override def toString: String = {
      this match {
        case Run(ch, n) =>
          val kind = ch match {
           case ' '  => "space"
           case '\t' => "tab"
           case _    => s"'$ch'-character"
          }
          val suffix = if (n == 1) "" else "s"
          s"$n $kind$suffix"
        case Conc(l, r) =>
          s"$l, $r"
      }
    }
  }
  object IndentWidth {
    private inline val MaxCached = 40
    private val spaces = Array.tabulate(MaxCached + 1) {
      new Run(' ', _)
    }
    private val tabs = Array.tabulate(MaxCached + 1) {
      new Run('\t', _)
    }

    def Run(ch: Char, n: Int): Run =
      if (n <= MaxCached && ch == ' ') 
        spaces(n)
      else if (n <= MaxCached && ch == '\t') 
        tabs(n)
      else 
        new Run(ch, n)

    val Zero = Run(' ', 0)
  }

A difference between the two styles as shown here is that indentation uses 4 spaces per tab whereas brace-based uses only 2 spaces. This is an arbitrary choice in each case. One could also use 2 spaces for indentation based (Edit: a comment from me below shows the example reworked in this way).

Copy link
Member

@dottybot dottybot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, and thank you for opening this PR! 🎉

All contributors have signed the CLA, thank you! ❤️

Commit Messages

We want to keep history, but for that to actually be useful we have
some rules on how to format our commit messages (relevant xkcd).

Please stick to these guidelines for commit messages:

  1. Separate subject from body with a blank line
  2. When fixing an issue, start your commit message with Fix #<ISSUE-NBR>:
  3. Limit the subject line to 72 characters
  4. Capitalize the subject line
  5. Do not end the subject line with a period
  6. Use the imperative mood in the subject line ("Add" instead of "Added")
  7. Wrap the body at 80 characters
  8. Use the body to explain what and why vs. how

adapted from https://chris.beams.io/posts/git-commit

Have an awesome day! ☀️

@odersky
Copy link
Contributor Author

odersky commented Aug 22, 2019

Part of this PR is a reformat of the dotc compiler code base which makes the rewrite steps outlined above invariant on this code. I.e. if one starts with this codebase, rewriting to indentation based and then rewriting back to old syntax will yield exactly the code one started with.

@bblfish
Copy link

bblfish commented Aug 22, 2019

A long time ago as a kid in the early 1980ies, I came across someone using a Lisp machine. (These had colour that could be turned by 90degrees). They used 3 chars to indent their code. I asked why? They told me 2 is too little and 4 is too much.

It actually makes sense. With 2 you can't see the identation, and 4 is a little too much.

@jducoeur
Copy link
Contributor

Same comment as with the previous PR: I personally find this harder to read and less pleasant. Indentation-based languages make me crazy, particularly in how tweaks to code can often lead to unintended semantic changes if you miss changing some indentation. It's fine for scripting, but terrible for serious long-lived code. I'd forbid it in codebases I control.

And honestly, it feels like chasing a fad to me. Yes, Python is indentation-based -- but JavaScript and its cousins, which are the other languages I mainly see as feeders these days (and which seems to be an easier entree into Scala, since JS teaches a measure of FP) are largely brace-based. And we still get a lot of people coming in from Java -- I believe that all of the people I've been training up in Scala recently come from a Java background. Python gets all the hype, but braces are still utterly common in many peoples' habits.

I'll also note: I've spent much of the past year reassuring folks all over the place that Scala 3 isn't going to be that different -- that routine application code will look and feel pretty close to existing code -- to calm nerves. When I talk to people in the community, I still see a lot of terror that we're going to screw this up and split the community a la Python: the majority of people I talk to (especially engineering managers) care far more that Scala 3 remains decently similar to Scala 2, and care much less about the new and cool stuff.

IMO, this invalidates that argument: the resulting code looks dramatically different across the board. I think that's going to generally decrease folks' confidence in upgrading to Scala 3, so I think it's a bad idea strategically...

@aappddeevv
Copy link

aappddeevv commented Aug 22, 2019

I'd like to try this. I use python (and js, and a bunch of other things) alot and recently converted a python project to dotty. Can this get into 0.18 to be released in Aug (I hope)?

Can it be applied file by file? I don't have time to convert existing files to this syntax in my codebase.

@DavidGregory084
Copy link

@odersky I have some sympathy for this plan but I strongly suggest this be opt-in via a different file extension or something similarly extreme.

It is after all effectively a totally different syntax for Scala in the same way that ReasonML is a different syntax for OCaml.

@non
Copy link

non commented Aug 22, 2019

Some initial thoughts:

  1. I'm surprised at how much I like the idea of this. 😅
  2. The new syntax around colons seems a bit odd. For example, : is used in some new places (after enum and object) but not after match. The use for a second parameter group (as in private val spaces) is very hard to visually parse.
  3. With this change I found myself noticing where existing syntax conflicts with the new. For example the => after case stands out against : and then used elsewhere.
  4. Keeping the indent at 2 spaces reads better, even with significant indentation.
  5. The lack of an end token corresponding to a starting if, while, etc. unnerved me, but that may just be due to familiarity with existing syntax (e.g. sh, Ruby, etc.).

All in all, I'm not sure if this is a good idea or not, but it's intriuging. One worry I have is that a compromise syntax that tries to mix elements of old and new is likely worse than either.

@aappddeevv
Copy link

I like the idea of a different file extension.

@soronpo
Copy link
Contributor

soronpo commented Aug 22, 2019

I'm looking at this PR via mobile. The braces-free code is horrendous.

@TheElectronWill
Copy link
Member

  • The most widely taught language is now (or will be soon, in any case) Python, which is indentation based.
  • Other popular [...]

I feel like the big picture requires more statistics. Here are the results of StackOverflow's 2019 survey.
I've colored indentation-based/highly whitespace-sensitive languages in red.

Most Popular Technologies

The majority clearly tends to perfer braces. But we may want to look at more recent trends and prefer more loved languages to legacy ones. In that case, look at this graph:

Most Loved Languages

Notice how Rust is the most loved one! Surely, it's because of its amazing brace-based syntax, isn't it? Just like Python's indentation makes it the "most widely taught language". 😃

In the end, the popularity of Python doesn't seem like a compelling argument in favor of indentation-based syntax.

@eed3si9n
Copy link
Member

I personally find the indent-based more difficult to read:

    def Run(ch: Char, n: Int): Run =
        if n <= MaxCached && ch == ' ' then
            spaces(n)
        else if n <= MaxCached && ch == '\t' then
            tabs(n)
        else
            new Run(ch, n)

    val Zero = Run(' ', 0)

In the above, val Zero = Run(' ', 0) is seemingly levigating mid-air, and without scanning the code up and down multiple times, I can't really be sure if that's part of some if-expression, match-expression, object, or (new) top-level.

Maybe keep then to be like other ML-family of languages, but I don't see a huge win with the whitespace stuff.

@aappddeevv
Copy link

aappddeevv commented Aug 22, 2019

I thought python was funky with its space significant syntax but I'm Ok with it now. I know for me, I have to try it on something for a bit to see if it works. I get tired of ))) or })} or whatever spread-out across rows or crammed together. And with editors, I can't always see everything in one window (some windows are big and some are small) so different parts of the function are not always visible and brace highlighting is unhelpful. At that point, braces are pretty useless for delimiting things visually. I'm not a big code-folder.

Perhaps we can get rid of "then" per the above comment.

I have alot of folds, maps, flatMaps, for and other things on lists that generate a bunch of braces, so I'm curious if it helps with these constructs.

My only concern on trying this out is that if editors don't support it, how can I really try it?

@Daenyth
Copy link

Daenyth commented Aug 22, 2019

I'm on a team of mixed-experience-level programmers (both generally and with scala). Earlier this year I did an informal survey on development pain points in our codebase, and one very common response was "It's hard to figure out what things do at a glance". I'm concerned that moving to whitespace-sensitivity, even optionally, would be a detrimental impact for scala.

I think indentation makes sense for languages with much simpler syntax than scala. Python and markdown a great example: they both are incredibly regular, and the syntax has very few variations. For scala, we get a lot of flexibility in how code looks because of things like infix syntax and implicits. I think that even now, keeping the "novelty budget" down will benefit the community more.

@Philippus
Copy link
Member

Philippus commented Aug 22, 2019

I was wondering what the impact (negative or positive) would be for visually impaired programmers? Maybe we can ask someone who worked on SCP-016.

@CucumisSativus
Copy link
Contributor

One of the key points in making dotty adoption possible is the possibility to migrate existing code to scala3 syntax. Will it be possible to migrate current code to this new syntax using tools like scalafix?
Will it be possible to have 2 versions of syntax in one project?

@nafg
Copy link

nafg commented Aug 22, 2019

What problem is this solving?

@odersky
Copy link
Contributor Author

odersky commented Aug 22, 2019

squashed to a single commit on top of #7024.

@gabro
Copy link
Contributor

gabro commented Aug 22, 2019

Chiming in just to mention that this will effectively make the parser in Scalameta useless, and with it we’ll also lose Scalafmt and Scalafix.

I dislike the syntax, but I can get over it, but I’m mostly worried about:

  • rewriting a bunch of tools (scalafix, Scalafmt, and friends) -> this is the best case scenario
  • not having those tools at all because there are no resources to rewrite them -> worse case scenario

@odersky
Copy link
Contributor Author

odersky commented Aug 22, 2019

@non

The new syntax around colons seems a bit odd. For example, : is used in some new places (after enum and object) but not after match. The use for a second parameter group (as in private val spaces) is very hard to visually parse.

Yes, maybe we can still find a way to simplify this. The current syntax is stated in a way that is purely lexical: <indent> tokens are inserted after certain keywords if the next following line is indented. It's similar (but simpler) to what Haskell does in that respect. This means we need a : after a class or object header like

class C:

since without it the header ends in an identifier, which is not one of the tokens after which <indent>s are inserted. We could avoid this by opening a channel of communication from the parser to the scanner. The parser could tell the scanner "I expect the body of a class or object here", which would then prompt the Scanner to treat indentation as significant. That would avoid the need to write colons, at the price of complicating the scanner/parser interface.

That leaves : in function applications such as the tabulate examples above. These examples were intended just to demonstrate that this syntax is possible. You'd normally pass the argument in parentheses instead. The : has nevertheless its place for operators that are essentially user-defined syntax. As an example, say you have an optimize macro that can be passed a large code block. If your program was indentation-based, you would not want to write

optimize {
  ...
}

since that would mix indents and braces, and indents would be turned off inside the outer braces. You'd want to write

optimize:
  ...

instead. But maybe we can restrict suffix : to that use case only.

Keeping the indent at 2 spaces reads better, even with significant indentation.

We definitely need more experimentation to come up with a recommendation here.

The lack of an end token corresponding to a starting if, while, etc. unnerved me, but that may just be due to familiarity with existing syntax (e.g. sh, Ruby, etc.).

In fact, the proposal allows for an optional end marker.

@odersky
Copy link
Contributor Author

odersky commented Aug 22, 2019

I think indentation makes sense for languages with much simpler syntax than scala. Python and markdown a great example: they both are incredibly regular, and the syntax has very few variations. For scala, we get a lot of flexibility in how code looks because of things like infix syntax and implicits. I think that even now, keeping the "novelty budget" down will benefit the community more.

I think you are going to find that with the new control syntax and significant indentation, Scala syntax is just as simple as Python syntax, if not simpler. In fact a significant part of the complexity in current Scala syntax comes from the over-use of parentheses and braces (e.g. how parens and braces are used in for expressions).

@odersky
Copy link
Contributor Author

odersky commented Aug 22, 2019

One of the key points in making dotty adoption possible is the possibility to migrate existing code to scala3 syntax. Will it be possible to migrate current code to this new syntax using tools like scalafix?
Will it be possible to have 2 versions of syntax in one project?

This is already possible, using just the dotty compiler with the -rewrite option. Not only that, but you can also rewrite back to the old code, and get the same code you started with. See #7089.

@odersky
Copy link
Contributor Author

odersky commented Aug 22, 2019

One argument in favor of indentation that I forgot to mention is that I expect it to be easier to teach. Here's a strong argument along these lines from Chris Okasaki: http://okasaki.blogspot.com/2008/02/in-praise-of-mandatory-indentation-for.html

In fact, the plan is that, starting next month we will teach a full course of functional programming to EPFL students using Scala 3 with the new syntax. That's basically the first two MOOCs in the Scala specialization. After that we'll have hopefully more data that helps us decide what to do with the idea.

@chaotic3quilibrium
Copy link

There is so much emotionally laden bike shedding about the most arbitrary of things like keywords and/or code layout. Because code is data, shouldn't how to name particular language keywords (consider the recent debate around "given") and then how to format/layout the code itself (maybe even one way for reading/scanning, and another for editing) be a user-specific configured decoder/encoder applied to said data? Won't TASTY help ease some of this by allowing an intelligent editor display to the user whatever hell keywords and/or format the particular user wants via their own customized configuration, and then TASTY (as a form of a DOM) is what is actually stored? If not, then what would be needed to get Scala source code into a standardized serialized format which could work with a user-specific configured decoder/encoder?

We spend so freakin much time separating interface from implementation in two entirely different paradigms (OOP and FP), why don't we do it here? Attempting to force agreement (like on "given") versus enabling intelligent personalized customization (use whatever you like given you take responsibility for retaining parsing coherence) seems like it would eliminate so many of these issues...which are like from like 5 decades ago.

Do you know what would be fun? To bring back the requirement for semicolons. And don't even get me started about code required to be file system based versus using far more modern database schema tools...

/rant

Bike-shed away lots of time that could be productively spent MANY other far more valuable ways.

@odersky
Copy link
Contributor Author

odersky commented Aug 22, 2019

For better comparability, here's the IndentWidth example again, this time with the same indentation scheme as the brace based version. I have thrown in an end marker to show how that one works.

enum IndentWidth:
  case Run(ch: Char, n: Int)
  case Conc(l: IndentWidth, r: Run)

  def <= (that: IndentWidth): Boolean =
    this match
      case Run(ch1, n1) =>
        that match
          case Run(ch2, n2) => n1 <= n2 && (ch1 == ch2 || n1 == 0)
          case Conc(l, r)   => this <= l
      case Conc(l1, r1) =>
        that match
          case Conc(l2, r2) => l1 == l2 && r1 <= r2
          case _            => false

  def < (that: IndentWidth): Boolean = this <= that && !(that <= this)

  override def toString: String =
    this match
      case Run(ch, n) =>
        val kind = ch match
          case ' '  => "space"
          case '\t' => "tab"
          case _    => s"'$ch'-character"
        val suffix = if n == 1 then "" else "s"
        s"$n $kind$suffix"
      case Conc(l, r) =>
        s"$l, $r"

object IndentWidth:
  private inline val MaxCached = 40

  private val spaces = IArray.tabulate(MaxCached + 1):
    new Run(' ', _)
  private val tabs = IArray.tabulate(MaxCached + 1):
    new Run('\t', _)

  def Run(ch: Char, n: Int): Run =
    if n <= MaxCached && ch == ' ' then
      spaces(n)
    else if n <= MaxCached && ch == '\t' then
      tabs(n)
    else
      new Run(ch, n)
  end Run

  val Zero = Run(' ', 0)

Of course that's just one first example, and as such does not tell us much. We'll do a lot more experimentation before coming up wth a recommendation.

@TheElectronWill
Copy link
Member

TheElectronWill commented Aug 23, 2019

Thank you Martin for the interesting article about the benefits of mandatory indentation. I see the value of enforcing cleaner code and removing the clutter of braces (and begin/end statements).

Some remarks to improve the learning experience:
As a student helping other students to learn new languages, I believe that simplicity and consistency are keys. Beginners need to have a few rules that are easy to remember. The fewer arbitrary rules to learn by heart, the better.

In my opinion, Scala would be easier to learn if parameters were all written between parentheses, like this:

private val spaces = Array.tabulate(MaxCached + 1)(
  new Run(' ', _)
)
optimize (
  ...
)
  • In addition to being more consistent with "usual" parameters, it allows to switch between one or multiple lines just by updating the code, without taking care of an extra : or {.
  • It would solve the readability problem caused by :. Anyone reading the code would know that the block of code in parentheses is a parameter, even a non-scala programmer.
  • Students wouldn't forget the colon like they forget the semicolon in other languages.
  • Proper indentation would be enforced by the compiler inside multiline parameters.

Following this idea, I think it would be better for colons not to introduce blocks. That is, keep them for typing only. If a character is needed e.g. for declaring a class, = would be more logical (thus easier to learn) since it is already used for methods and variables. So we'd have something like:

declaration ::= [scope] {modifier} declarationKind name {parameterList} "=" definition
declarationKind ::= "def" | "class" | ...
definition ::= expression | "\n" indent {expression} (outdent | eof)

@TheElectronWill
Copy link
Member

As an exploration, using = could allow small enums to be written in one line:

enum Color = case Red, Green, Blue

Compare that to:

enum Color {
  case Red, Green, Blue
}

@sharno
Copy link

sharno commented Aug 23, 2019

A strong argument against significant indentation is that it's worse for the blind/visually impaired or people who use screen readers to program. White spaces aren't easy to distinguish with screen readers and most of the screen readers read each single space individually which is frustrating or totally ignore it.

Relevant experiences:

edit: I think tooling could solve this problem, with editors/IDEs that integrate somehow with screen readers. Also, I'm not sure how screen readers deal with nested braces but as far as I read online, it seems to be an easier thing to deal with for them. If someone can reach someone with hands on experience it might give a better insight

@mmynsted
Copy link

This is good innovative thinking. Significant indentation results in cleaner code. However, even if it were made the only way to define the grammar, it would not be worth the adoption friction.

@nogurenn
Copy link
Contributor

This is understandably very controversial, but if I get what sir Odersky is aiming for, it must be the fact that Python is recognizable almost everywhere now. Perhaps his current end goal is to write a language that is significantly more accessible to anyone for the sake of its future and its community. More accessible in terms of people who have less time to devote to studying it, and in terms of the growing mindset in various domains that there's always 1-2 languages you could go to. When data science started becoming a thing, the jerk reaction of people who want to get into it is to check if Python could do it since it was widely known even by non-developers. When educators realized that Java isn't necessary to be taught for introductory computer science courses, they turned to Python--it was a lot easier, simpler, and more understandable to aspiring engineers. Hell, even business, humanities, and social sciences people know about Python, even if they just heard it ("that's like Excel too, right? where you can write formulas?"). Frankly speaking, Scala would just usually come up when talking about big data and tools like Apache Spark. Even my peers are surprised when I tell them I use Scala for web development ("I didn't know you can use it there???").

A language that could be taught (1) to the most clueless students and (2) by teachers who don't need very proficient Scala-fu to do so (even scala's type system is crazy even for someone who uses scala for work) is very, very useful and impactful.

If we will say, for example, that JavaScript is pretty much at the top in terms of language popularity in the annual Stack Overflow survey, and it's using the braces syntax, we would be forgetting two essential elements to its popularity: (1) web development is basically everywhere, and (2) web development is THE most accessible domain to specialize in as an IT professional. I mean, why should I enter the realm of embedded software development or system administration, when I could already freelance as a web developer to dozens of potential clients who need their websites made, even just via Wordpress or Shopify?

While all the arguments for and against this proposal are all highly technical in nature (and safe to say, biased as people who already learned and currently code in Scala), I think we should remember that the learning curve of Scala as a language is much, much steeper compared to most high-level languages. Personally, it's hard to see Scala growing as a community when outsiders and potential newcomers only chalk it up as an unnecessarily complex (not flexible) language. People think Python is very flexible (not complex) because the domains it can cover vary widely, and people can pick up the learning quickly. If this was a simple popularity contest, Scala wouldn't come close (we got a huge boost in, like, big data because Spark. Do we count the Play framework, or was it because of piggybacking on Java Play?).

I would very much rather read this proposal as one's way to try to reach a wider audience for the language. It just so happens that Python, given the domains it has reached (data science, web development, physics, business intelligence, protein-folding, machine learning, etc), is the most popular one among developers and non-developers alike. At the end of the day, if only a niche community of developers talk about Scala, then how should we expect Scala to expand to domains that non-developers dominate, but could use a hell lot of tech help?

With all that said, I'm asking myself if this indentation-based syntax going to help the community in the long term, or if there are better ways. I'm always at a loss for words whenever I'm stuck in a Scala programming issue, because there never seems to be clearer documentation and noticeable community activity that could reassure me a bit that someone could help me out. This is me as someone who's only been using Scala for 1 year, self-taught. Using Silhouette for your Scala app's authentication? Good luck trying to implement that if you don't have a good grasp of Scala interfaces/classes. That took me a month to get it working, on my own. Did I ask for help online? Yes.

Personally, I don't want Scala to grow old as just another domain-specific language (most likely in big data). It's a well-designed language for general purpose computing, and I think we should make sure it remains to be so. To me, this proposal seems like a jab at that pain point both as a language and as a community moving forward. We should remember that people saying something is just plain ugly isn't an argument at all. Scala was written with the brace-style syntax ingrained in it, and if the indentation-style syntax would help us get closer to lowering the learning curve, and therefore increasing accessibility, then the question we should really be asking are does this help us achieve those and if the answer to that is yes to a considerable degree, how to redesign the Scala syntax in terms of this other style without compromising on the power and flexibility of the language. Those mind-numbing underscores and long chains of methods shouldn't leave. Neither does the complex type system with all those [A <: B]s and stuff.

From a guy who uses both Scala and Python professionally

@godenji
Copy link

godenji commented Aug 27, 2019

the even broader point ... is what problem is this solving?

@nafg it may open the door for eliding case in pattern matching and other syntactic improvements that can use significant whitespace as a deliminiter, something we can't do with brace-based syntax. This could potentially bring Scala closer syntactically to the MLs (Haskell, OCaml, F#, SML, ...), which would be wonderful; syntactically less is more.

Really depends on how far one goes with the experiment. If it's "just" the following, then I'd say the experiment doesn't go far enough, and isn't really worth pursuing.

Basically all significant indentation should do is give one more choice: have no braces around multi-statement blocks.

I don't particularly care one way or the other about braces vs. whitespace, but I do care a lot about repeating keywords all over the place that the compiler could infer for me, provided the syntax gave the necessary clues.

service.update(entity).map:
  Right(x)  => Ok(toJson(x))
  Left(err) => BadRequest(err)
service.update(entity).map {
  case Right(x)  => Ok(toJson(x))
  case Left(err) => BadRequest(err)
}

@nafg
Copy link

nafg commented Aug 27, 2019

it may open the door for eliding case in pattern matching and other syntactic improvements that can use significant whitespace as a deliminiter, something we can't do with brace-based syntax. This could potentially bring Scala closer syntactically to the MLs (Haskell, OCaml, F#, SML, ...), which would be wonderful; syntactically less is more.

@godenji what problem would be solved by making scala's syntax nicer (granting the premise for the moment)

There are lots of things that are nice but I think we need to prioritize solving problems unless there aren't any big ones (which is not the case).

I was thought someone might say, the problem of scala's limited popularity. But either way,

maybeProblem match {
  case None =>
    println("Wouldn't the energy be better spent on solving problems that exist or are imminent?")
  case Some(problem) =>
    getAllPotentialSolutionsFor(problem)
      .sortBy(s => (s.likelihoodToActuallySolve, -s.effortNeeded))
      .foreach(s => println(s"Are we trying $s?")
}

@Ichoran
Copy link

Ichoran commented Aug 28, 2019

I am skeptical that this would be an improvement. It might be better for early learning in a protected environment, but it will be worse for learning to mastery because you'll have to train yourself to recognize two visual patterns to recognize for a bunch of common stuff (and they're not that visually distinct, which will make the task harder).

I also don't think that Python's popularity is terribly much affected by significant whitespace. The most loved language by a large margin is Rust, and it has braces all over the place. Kotlin's high and it's full of braces. TypeScript too. Furthermore, Python's popularity and ranking on most-loved-language has been growing year after year, and the whitespace thing hasn't changed a bit. I don't think it's the whitespace.

If it's not the whitespace, what is it in Python's case? Since the language hasn't changed much, I can only conclude that it must be mostly ecosystem. You can get things done with a minimum of fuss, mostly using libraries that are designed with that goal in mind.

Technically, I think the proposal introduces some syntactic ambiguity. For instance, this is currently valid Scala but would be ambiguous:

def foo(i: Int)(j: Int) = i + j
val partialAp = foo(5):
  Int => Int

I haven't thought of anything that would compile silently with altered behavior, but it does suggest that using : to open a block isn't a great idea.

@eed3si9n
Copy link
Member

@nafg Take a look at the previous round of PR #2491.

As motivations it lists:

  • Cleaner typography
  • Regain control of vertical white space
  • Ease of learning
  • Less prone to errors
  • Easier to change
  • Better vertical alignment

and for impediments:

  • Cost of change
  • Provide visual clues where constructs end
  • Susceptibility to off-by-one indentation
  • Editor support

@godenji

This could potentially bring Scala closer syntactically to the MLs (Haskell, OCaml, F#, SML, ...), which would be wonderful; syntactically less is more.

#2491 does open with:

I was playing for a while now with ways to make Scala's syntax indentation-based. I always admired the neatness of Python syntax and also found that F# has benefited greatly from its optional indentation-based syntax, so much so that nobody seems to use the original syntax anymore.

Curiously, however, languages like OCaml, F#, and Haskell do not seem to use offside rule for if expression or passing blocks and lambdas. Instead what they do is use parenthesis or curly braces.

let results = List.choose (fun elem ->
    match elem with
    | elem when isCapitalized elem -> Some(elem + "'s")
    | _ -> None) listWords

use begin..end

if GtkBase.Object.is_a obj cls then
  fun _ -> f obj
else begin
  eprintf "Glade-warning: %s expects a %s argument.\n" name cls;
  raise Not_found
end

or use let / where binding to name things

calcBmis :: (RealFloat a) => [(a, a)] -> [a]  
calcBmis xs = [bmi w h | (w, h) <- xs]  
    where bmi weight height = weight / height ^ 2  

There is offside rule in Haskell, but it is started with the layout keywords let, where, of, do. It might be because an arbitrary expressions bunched up together as a block indicates side effect, and the syntax is trying to nudge you toward to be more expression-oriented style. So in this aspect, I don't know if we are getting closer to ML family.

Verbose Syntax lists a side-by-side comparison of lightweight vs verbose syntax for F#. In their case, it seems the lightweightness mostly comes from being able to omit the end markers like end and done, so the syntactical jump was relatively small.

@nafg
Copy link

nafg commented Aug 28, 2019

@eed3si9n those are motivations and benefits however my question is, is there a problem that it is solving?

Because if not I wish we would prioritize solving problems over enhancements. And if yes, then we should frame it that way, because it helps to see where it fits into the bigger picture (cf. the 5 whys technique).

I can reframe some of the items on your as problems, however I'm not sure if it makes sense to:

Cleaner typography: Does the more noisy syntax cause any problems? If so what are they?
Ease of learning: Is learnability a problem? If so let's identify "difficulty to learn" as one of the problems we need to solve, and let's think about how this fits with other ways of solving that.
Less prone to errors: So prone-ness to syntax errors? Is it in fact a problem with scala in practice?
Easier to change: Is difficulty of changing code (at the syntax level) a problem people face?
Better vertical alignment: Do people find scanning code difficult?
Regain control of vertical white space: [not sure what this means]

Of course there are a lot of premises in this that are debatable but that is not my objective. What I'm trying to do is to get the "pro" team to crystalize and communicate clearly their reasoning behind investing effort into this as it fits into the bigger picture.

Again, I don't want to make it sound like I'm demanding anything, no one owes me this of course. I just feel like I'm asking a simple question but I'm having a hard time getting it understood.

@odersky
Copy link
Contributor Author

odersky commented Aug 28, 2019

@nafg The main problem this is potentially solving is that Scala is not as accessible as it could be for beginners and casual users. For me, that is an important aspect. I am saying potentially since right now this is just a hypothesis. We need a lot more experience to back that up, and the hypothesis might turn out to be wrong in the end. But now is not the right time for me to discuss this. Everything I write would be pre-mature.

@Ichoran

I am skeptical that this would be an improvement. It might be better for early learning in a protected environment, but it will be worse for learning to mastery because you'll have to train yourself to recognize two visual patterns to recognize for a bunch of common stuff (and they're not that visually distinct, which will make the task harder).

I thought I had debunked that in my comment? People don't have a problem to train themselves to recognize

if (x < 0) 
  x = -x 

and

if (x < 0) { 
  x = -x 
}

as equivalent. The core of significant indentation is to also establish that equivalence for multi-statement expressions.

I also don't think that Python's popularity is terribly much affected by significant whitespace. The most loved language by a large margin is Rust, and it has braces all over the place. Kotlin's high and it's full of braces. TypeScript too.

These are different audiences. Rust is for systems programmers and needs very high expertise. Kotlin and Typescript are also specialized languages. Python is great for beginners and casual users. That's the audience I am after. Indentation is not a panacea of course, but maybe it removes one impediment to easy access. There are other impediments for sure (a big one is over-complicated libraries), and we will have to work on all of them.

@nafg
Copy link

nafg commented Aug 28, 2019

@odersky thanks.

Not sure where to ask since it's now perhaps a tangent but what do you think of having a high level google doc or something where we outline all the priorities and how they fit in with each other etc.?

@odersky
Copy link
Contributor Author

odersky commented Aug 28, 2019

@nafg

Not sure where to ask since it's now perhaps a tangent but what do you think of having a high level google doc or something where we outline all the priorities and how they fit in with each other etc.?

I think that would be very useful. Right now the closest is we have is

https://dotty.epfl.ch/docs/reference/overview.html

This needs to be periodically updated, of course. Do you think we should have another format as well?
If yes, maybe we should open a contributors thread to discuss this.

@diesalbla
Copy link

diesalbla commented Aug 28, 2019

The new syntax around colons seems a bit odd. For example, : is used in some new places (after enum and object) but not after match. The use for a second parameter group (as in private val spaces) is very hard to visually parse.

I would strongly advise against using single colons as the delimiter between the declaration and the body of the class/object/enum implementation. In Scala, the single colon is mostly used as the symbol for type annotations, both in class fields, method parameters, method return types, etc. To use the same symbol for a different purpose (begin the body of the implementation) would complicate the language.

IMHO, the syntactic complexity of Scala does not stem from it using many diverse symbols, but in using same symbols for different purposes. That is a cognitive burden to the reader, who has to "parse" a symbol by its surrounding. It is similar to the underscore _ meaning three different things in the following code:

def fili[Kili[_]](x: Int): String = x match {
  case b: Ori[_] => "yes"
  case _  => "no"
}

Perhaps, it could make more sense to use the equals = to separate class declaration and subclasses from the body of the class. That would be coherent with the use of the = symbol when defining the value of a variable or the implementation of a function. This would be similar to the use of = when declaring data types in Haskell, or with the syntax for modules in OCaml.

@diesalbla
Copy link

diesalbla commented Aug 28, 2019

A side-note regarding the use of colons:

In Kotlin, the colon is also used within a class declaration as the symbol to represent inheritance relations between classes and interfaces, with commas to separate multiple interfaces. This may be a reasonable overload of the colon, to denote both typing and subtyping, and may be easier than the extends and with keywords. However, this may be a matter for another conversation.

@odersky
Copy link
Contributor Author

odersky commented Aug 28, 2019

@bishabosha Do you have code where the exception happens? I could not reproduce (I tried it with #7114).

@joan38
Copy link
Contributor

joan38 commented Aug 28, 2019

@diesalbla Your point has been answered in #7083 (comment)

May I ask what are your thoughts on having also the tokens ( and , to the start an indentation region? In relation to #7083 (comment)

@Ichoran
Copy link

Ichoran commented Aug 28, 2019

@odersky - It took me close to a year to be able to read

if (foo)
  x = -x

and

if (foo) {
  x = -x
}

equally quickly, and with the same ability to catch errors. Maybe other people are different, but I think the premise is incorrect that because someone can immediately understand that the two are equivalent, said individual has mastery at that point.

Rather, I think conceptual mastery is immediate, but the individual still has the skills of a beginner. Reading code then takes more effort and attention, and we only have so much effort and attention to give. Sometimes it's worth it, e.g. it takes time to learn to read xs.take(4) and xs take 4 as equivalent, and then it's a nice addition to visual expressiveness. On the other hand, people complain about even this, and there's a proposal to drop infix notation, so we seem to be pulling in two directions here.

@diesalbla
Copy link

diesalbla commented Aug 28, 2019

= is wrong for classes and objects. The left hand side is not the same as the right hand side. Instead, a class or object gives a name to a set of definitions. You can think of it as tagging or labelling. IMO : is perfect for that.

@odersky Could you ellaborate as to what those differences are? To give a simple example, I can see the following as being similar:

trait A { def x: Int }
// val-def
val b = new A { val x = 13 }
def c(y: Int): A = new A { val x = y + 1 }
// object-class
object B extends A { val x = 13 }
class C(y: Int) extends A { val x = y + 1 }

In other words, an object is like a label for an anonymous instance of the super-trait, and a class is like a label for a function to create objects (or like a template of objects). Apart of the difference that class C declares a new type, whereas def c does not

One benefit of this syntax, using the = to :, to separate declaration and implementation, is that together with the use of colon for inheritance, it would bring the two snippets of code above very similar:

trait A { def x: Int }

// val-def
val b = new A { val x = 13 }
def c(y: Int): A = new A { val x = y + 1 )

// object-class
object B: A = { val x = 13 } 
class C(y: Int): A = { val x = y + 1} 

@Ichoran
Copy link

Ichoran commented Aug 28, 2019

If we're going to do this, I propose that the ellipsis be a indention-based-block opener, and that every case where you don't use it is a case of ellipsis elision.

This gives a nearly trivial translation between (unelided) braced and unbraced styles.

For example, here's a bit of code I wrote recently:

class Ps(_roost: Path) extends FourFlock[P, Ps](_roost) { self =>
  lazy val allEs = all.map(_.collect{ case e: E => e })

  lazy val allMs = all.map(_.collect{ case m: M => m })

  protected def justWrap(path: Path) = P(path)

  def filtered(pred: P => Boolean): Ps = new Ps(roost) with ProxyFlock[P, P, Ps] {
    protected def flocker = self
    protected def proxyOf(p: P): Ok[String, Option[P]] =
      Yes(if (pred(p)) Some(p.repath(p.path)) else None)
  }
}
object Ps {
  def apply(roost: Path) = new Ps(roost)
}

which would translate automatically to something like

class Ps(_roost: Path) extends FourFlock[P, Ps](_roost) ...
  self =>

  lazy val allEs = all map ...
    _ collect ...
      case e: E => e

  lazy val allMs = all map ...
    _ collect ...
      case m: M => m

  protected def justWrap(path: Path) = P(path)

  def filtered(pred: P => boolean): Ps = new Ps(roost) with ProxyFlock[P, P, Ps] ...
    protected def flocker = self
    protected def proxyOf(p: P): Ok[String, Option[P]] = Yes ...
      if pred(p) then Some(p.repath(p.path))
      else None

object Ps ...
  def apply(roost: Path) = new Ps(roost)

In particular, ... would be completely general as a replacement, working anywhere and always. So, for example, if you wanted to

val x = {
  val myFoo = foo(fooArg)
  bar(myFoo, myFoo.bazzify)
}

you could

val x = ...
  val myFoo = foo(fooArg)
  bar(myFoo, myFoo.bazzify)

Maybe you just want a temporary val to not pollute your namespace. So

val permanent = foo()
{
  val temp = bar()
  baz(temp, permanent).runSideEffect
}

would become

val permanent = foo()
...
  val temp = bar()
  baz(temp, permanent).runSideEffect

For comprehensions, too:

for ...
  x <- opX
  y <- opY
yield x + y

There may be some places where ellipsis elision makes sense (e.g. after match or then). But if we make it a completely general language feature instead of special-cased to be triggered in particular contexts, I think it would be much easier to understand all the rules surrounding it (because there is only one rule: after ... at the end of a line, following lines indented more deeply are all part of the same block).

@bmeesters
Copy link

bmeesters commented Aug 29, 2019

In general, I am in favor of doing experiments. And given that Dotty is not Scala 3 yet, it seems like the best place to do these kind of experiments. I think the blog post to be describes it well.

That said, I am not sure the comparison to Python is entirely correct. Scala has a lot more powerful tools which will be daunting to beginners regardless the syntax. Python simply doesn't have these and thus it is much easier to learn to whole language.

I also think that we should not only look at the potential new developers, but also to the existing developers. In this thread most seem to be happy with the current state, and against white-space sensitive syntax in general. It would be a shame to loose those because they feel they are not being listened to.

Even if it turns out that whitespace-sensitive syntax is a clear win, than still I am not sure we should get this in Scala 3, but maybe Scala 4, to not make too many steps at once. Migration tools might help, but more developers will be hesitant since so many things change at the same time. This gives us more time for experimentation, smooth out the syntax, and allows developers to get used to the idea, without endangering the Scala 3 release.

Copy link
Member

@bishabosha bishabosha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will crash under -rewrite -noindent if there is no blank line at the end

object testindent:
  if 1 > 0 then
    println
    println
  else
    println
    println

@nicolasstucki
Copy link
Contributor

What should be the semantics of

  start match
    case _ => 5

  end match
    case _ => 5

According to the rules end finishes the match start match but we clearly want to match on the variable end.

@odersky
Copy link
Contributor Author

odersky commented Aug 29, 2019

@bishabosha In the follow-up PR #7114 this seems to be fixed.

@odersky
Copy link
Contributor Author

odersky commented Aug 29, 2019

This PR is now closed in favor of #7114, which implements updated indentation rules.

@odersky odersky closed this Aug 29, 2019
@Ichoran
Copy link

Ichoran commented Aug 29, 2019

I am not entirely sure that we have thought through whether this proposal simplifies things enough. All of these can be used right now to open blocks: : = => <- if then else while do try catch finally for yield match. That's quite a handful, and the : is now overloaded with type ascription. Also : can be used everywhere. So, for instance, these should be legal according to spec:

for
  x <- foo
: 
  println(x)
val x = 2
: 
  val temp = x*x
  println(temp)
val y = 3

This should no longer compile:

def myVeryWordyJavaInspiredMethodNameThatPracticallyWrapsOnItsOwn():
  MyIncrediblyLongType[ThatIsParameterized, PossiblyNeedlessly] =
    foo

But this should:

def myVeryWordyJavaInspiredMethodNameThatPracticallyWrapsOnItsOwn():
MyIncrediblyLongType[ThatIsParameterized, PossiblyNeedlessly] =
  foo

This is valid (by spec; I haven't tried it):

def foo(i: Int)(j: Int)(k: Int) = i + j + k

val answer =
  foo:
    3
  :  
    4
  : 
    5

This should work:

var x = 2
x =
  println("Hi")
  foo()

This shouldn't work:

var x = 2
x +=
  println("Hi")
  foo

Fine:

x.pipe:
  _ + 1

Also fine:

x.pipe : 
  _ + 1

Broken:

x |>:
  _ + 1

Fine again:

x |> : 
  _ + 1

And this should work:

xs + : 
  ys

as should this:

xs +:
  ys

but they do totally different things.

And with so many keywords, how do you keep track of what's valid and what's not? Does this work?

while (x < 5) do
  println("Looping")
  x += 1

Does this?

return
  println("All done")
  x + 5

How about this?

import
  collection.mutable.ArrayBuffer
  collection.immutable.SortedSet

Does this still compile?

import collection.mutable.{
  ArrayBuffer =>
    cmArrayBuffer,
  AnyRefMap =>
    cmAnyRefMap
}

Does this work?

import collection.mutable.:
  ArrayBuffer,
  TreeMap

What is the content of this list (hint: you'll get warnings)?:

val xs = List:
  1
  2
  3

Maybe teaching will reveal that these things aren't problems, or maybe that they are, or maybe it will reveal that they aren't problems when you teach it a particular way, but not reveal what happens when people aren't taught that way.

But, anyway, I object strenuously to the notion that this is simple. It can be pretty, but it's complicated and has a fair number of corner cases. People like pretty. And people can learn complicated stuff. But if we're doing it to be simple, it's not; and if we're doing it to simplify, that's not possible because we have to keep al the old stuff and the new syntax interacts with the old in nontrivial ways.

@anatoliykmetyuk anatoliykmetyuk removed this from the 0.18 Tech Preview milestone Aug 29, 2019
@odersky
Copy link
Contributor Author

odersky commented Aug 30, 2019

@Ichoran These are all good points. I opened issue #7136 to continue the discussion.

#7114 has now been merged to enable exploration, but that should be the start of a serious discussion, not the end of it. Besides #7136, I also plan to open a thread on contributors to discuss the topic once we have gained more experience with it.

@fanf
Copy link

fanf commented Sep 6, 2019

I won't arg if the no brace syntax is better or worse than the other one. But factually, it is extremelly different and a huge shift from scala 2. Depending if you want to force only one of the two styles for scala 3, I see only huge waste of resources and disapointment coming with that:

  • if it is allowed to choose between brace/indentation, you totally destroy the consistency of the language. Everything from books to tweet and lib will need to be written two times or only for a subpart of the developpers. And if there is something scala really don't need, it is new syntactic way to do write the same thing. And it seems to totaly miss the "more consistancy needed" goal. This is likelly to cost infinite resources (in learning, arguing, teaching, internal company code style, documentation resources, etc). Given Scala ecosystem size, do we really want to pay that cost on that?

  • on the contrary, if the idea is to go the python way, then there must be a quasi-dicatorship enforcement of what is correct syntaxt and what is not (the goal is homogeneity and consistancy) and so everything must be rewritten for scala 3 in the no-brace syntax. This seems to be totally at odds with the goal of Scala 3 not being a new language (if every source code must be totally rewritten, it's my definition of a new lang, even if the concept are similar). This is very likelly to create a long term shift between Scala 2 and Scala 3.

All of that is independant from the inherent qualities of each of the syntax.

EDIT: typo
EDIT: on the other hand, I believe that an excellent, supported linter/formatter with a default "standard" formating style (included in sbt OR EVEN scalac by default) can bring a lot of homogeneity in the long term (b/c in that case, documentation/teaching is style valid for everyone, but the one who don't want to follow the default format/syntax usage accept (and need) to pay the added cognitive cost of drifting from default/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet