Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Literate CoffeeScript #1786

Closed
revence27 opened this issue Oct 21, 2011 · 56 comments
Closed

Literate CoffeeScript #1786

revence27 opened this issue Oct 21, 2011 · 56 comments

Comments

@revence27
Copy link
Contributor

This file, as it is, has been taken from tests/literate.literatecoffee in my fork.
Without making any changes to it whatsoever, it is valid literate CoffeeScript, if you just copy and paste it into an editor, and make my fork of CoffeeScript run on it. Make the file end in .literatecoffee to make it go into literate mode.
I wonder if Mr. Jeremy Ashkenas thinks it is a good idea. It would be good to know, before I send a pull request. (Oops! Earlier pull request for binary literals was pending, and it subsumed this one. Oh, well.)
The change is only one line in only one file, and it is bound to be faster than fast in use. It is in my commit revence27/coffee-script@132d306 with an accompanying modification (different commit) of the Cakefile, that it may look for .literatecoffee files in the tests.

This, of course, is inspired by the Haskell programming language, but it is neater that the Haskell version of literate programming.

Beautiful Literate Programming

If this file parses at all, that is the test, and it has passed.

Originally, literate programming did not mean heavily-commented code. However, owing to the System, that is what it evolved to mean.
Literate programming is supposed to be where code invades the commentary, not lots of comments invading the code.

  test "If this parses, literate coffee works.", ->
    eq 'So beautiful, I want to cry.', 'So beautiful, I want to cry.'

Under this scheme, everything is a comment.
Except the bits that are indented. If a line does not start with whitespace, it is a comment.
Everything else is code.

  test "I parse, therefore I work.", ->
    eq 4, 100 - 96

This is how we shall proceed with writing a string reverse in CoffeeScript.

  reverse = (str) ->
    rez = ''
    for chr in str
      rez = chr + rez
    rez

And then we will use it.

  test "Using a function defined within literate CoffeeScript.", ->
    eq 'So beautiful, I want to cry.', reverse '.yrc ot tnaw I ,lufituaeb oS'

See? Wasn't ugly, was it?
Work on a syntax mode should not be difficult, in my opinion. (Although the only
syntax things I know how to do are Vim, and even those, not too perfectly.)
That will be all.

@jashkenas
Copy link
Owner

Yes -- I think this is very interesting. Please separate out the pull requests into two separate branches so we can look at them in isolation.

@jashkenas
Copy link
Owner

A couple things:

If this goes in to the core coffee command -- it would be great to not have to deal with a separate file extension. Ideally, it would be able to tell literate CoffeeScript files apart from ordinary ones.

Are you planning to tie this directly to Markdown? Or do you want to have the ability to use other markup languages?

How will multiple files be organized together? Are you planning to generate an index page, with some sort of navigation for browsing around?

Is the output format of the final document just HTML? Do you want to make it possible to generate a PDF of all your source code?

Would it be possible to add syntax highlighting for all the CoffeeScript snippets, in all output formats?

@revence27
Copy link
Contributor Author

Hello;
I am trying to fight with git reset to de-couple these two commits. In the meantime, though, to answer your questions …

This thing has been inspired almost entirely by Haskell, which happens to be my mother tongue in programming. So most things are going to be just like in Literate Haskell, enabling what it does, and really little more than that.

Regarding the first one (magic combination): it is doable, and I think the idea is great because it thinks outside the Literate Haskell box in which I am. We would just have to agree on how literate programs should start. I guess using

Literate CoffeeScript

at the top of the file should help us detect magically.

I am not planning to tie this directly to markdown. They just accidentally happen to be perfect fits.

Regarding file concatenation, the files joined together had better have the same syntax. Mixing literate with illiterate and parsing it as one file would not work. If they are treated each file on its own, even if sent into coffee at the same time, it would work, since the magic words or the file name would be preserved.

The output of the final format is what your Docco produces. The text transformations do not usurp the literate programming that has already been used to great (beautiful!) effect in your code; they just provide another way to write literate CoffeeScript. They just make it so that it reads as beautifully in source code as it does in your generated docs.

I like to think that literate CoffeeScript is what you have not. But when it is capitalised, it is what is in the commit: Literate CoffeeScript.
So Literate CoffeeScript transforms to literate CoffeeScript, which your tools can go to work on.

On syntax highlighting, since it doesn’t usurp your Docco system, it will do what Docco does.

@quackingduck
Copy link

Awesome work @revence27!

@jashkenas one nice thing about having a distinct extension (.lcoffee?) is that it's probably the way a lot of syntax highlighting programs decide how to lex a file. Something like pygments would need to use a different lexer for small L literate coffee files and capital L literate coffee files.

@revence27
Copy link
Contributor Author

@quackingduck We could have both; it is, after all, still one extra line of code. (Long line.) :-)

@steveklabnik
Copy link

This is pretty cool.

@geraldalewis
Copy link
Contributor

I love the idea.
Extensions don't strike me as the most elegant solution.
A "use literate" compiler directive could work (like ES5's "use strict").
"Crossing the streams" by concatenating lit and non-lit code is a valid concern (as it is with intermixing strict and non-strict; global and non-global). However, I think solving the broader issue of script concat'ing is not within the scope of this issue.

@michaelficarra
Copy link
Collaborator

I like @geraldalewis's idea of using a "use literate" directive. It may make it more difficult to integrate with syntax highlighters, but shouldn't be impossible.

@jashkenas
Copy link
Owner

I was actually hoping for auto-detection ... purely automatic > file extension > magic comments > string directives, in my book.

@michaelficarra
Copy link
Collaborator

string directives > file extension > magic comments > automagic

@quackingduck
Copy link

@jashkenas agree that purely automatic is nice, just concerned that this code might not show up highlighted on github pages. Is there a precedent where github starts using a lexer based on a file extension and then switches to a different one based on auto-detection?

@geraldalewis
Copy link
Contributor

hoping for auto-detection

Hmm... Generating an AST for a snippet, and seeing if it generates an error?
Though I don't know how genuinely buggy code could be distinguished... Some kind of scoring system like @josh uses for language detection on GitHub?

@revence27
Copy link
Contributor Author

@jashkenas
Purely-automatic cannot be done. Turing says so. (Actually, I think von Neuman is the one who says so.)
By purely-automatic, what do you mean? Does my suggestion of having “Literate CoffeeScript” at the top count? After all, since it is meant to be read, that is a very good preamble.

@revence27
Copy link
Contributor Author

@geralalewis you say “Some kind of scoring system like @josh uses for language detection on GitHub?”
No. This cannot be heuristic at all. It must be boolean and without any false negatives or false positives. And let us try for the simplest thing that is clean. Scoring systems are ugly.

@michaelficarra
Copy link
Collaborator

Totally agree with @revence27 here.

@jashkenas
Copy link
Owner

No worries -- let's use continue to use .literatecoffee for the time being -- there's a ton of more important stuff to get figured out. The index page, navigation, and HTML generation + highlighting are much more pressing.

I don't think it should use the Docco format, as you'll have a much higher ratio of prose : code.

@autotelicum
Copy link

What is the syntax of the proposed input format? Standard markdown?


I have been using sort-of-literate coffeescript in markdown with pandoc for some upcoming stuff. Pandoc can render to an epub or pdf ebook, to html, rtf, odt etc. Pandoc accepts both standard markdown code blocks indented with at least 4 spaces or code blocks delimited by bird tracks:

~~~~~~~~~~~~~~~~~~~~~~~{.coffeescript}
add = (a, b) -> a + b
show add 7, 8
# … 15
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I like the pandoc birdtracks because they make it possible to mix languages and to see that a document is literate without using a file extension; assuming that ~~~ is not valid as code. Pandoc has built-in support for coffeescript syntax highlighting for HTML. Extracting the code sections is simple in an editor with regex support (or possibly with sed/awk). I have been using it without any special support from the coffeescript compiler. In my editor (acme) it is Edit ,x/^~~+[ ]*{\.coffeescript.*}$/+,/^~~+$/-p to extract all code sections from a document, which can then be piped into coffee -s I would guess the regex parts are similar in other editors.

It would be nice (but not required for me) if this was built into the compiler. For example: if a file contain birdtracks then it is literate and everything outside of the birdtrack sections is discarded before lexing/parsing.


As an example here is a rendered version of Mark Hahn's A Beginner's Introduction to CoffeeKup. It isn't a literate document but shows what it can look like augmented by TeX template.

@revence27
Copy link
Contributor Author

@jashkenas Well, then, it will use a modified Docco format that takes into consideration the potential for a prose:code ratio that is steeped hard in the direction of prose.
But these I believe can be solved in due time, without requiring that we do guesswork. The doc problem can be a thing of the future, since it is not hard to solve at all, even within the current Docc (perhaps with a few more lines).

I am leaning towards adding a test for “Literate CoffeeScript” at the top of the code to start the mode; I think it is excellent, considering that it is literate programming we are talking about. It is a fine preamble.
Also, let us remember that it doesn’t have to be this or that. Within reason, we can have both and a boolean or in the code.

@revence27
Copy link
Contributor Author

@autotelicum It could be any syntax you are going to be using for your output. In the case of this issue, it was Markdown. But it could even be the syntax you speak of above.
The best bet is to keep it simple and not over-specify, so that there is room for innovation later on.

@revence27
Copy link
Contributor Author

Literate CoffeeScript

I have added an alternative way to express that a script is Literate CoffeeScript. If the script starts with the line: “Literate CoffeeScript”, as does this comment, it is considered to be Literate CoffeeScript, regardless of the file extension.
The commit is revence27/coffee-script@53078ff
The meaning of this is that, as @jashkenas wants, one can write Literate CoffeeScript and, with the same file extension .coffee, still get it to be parsed as Literate CoffeeScript. The .literatecoffee extension should probably remain, because it helps other tools that do not check the content, and yet it does not seem to add very much of a cognitive burden.
@jashkenas will decide here.

I did this because there is no chance that these two tokens will start a valid CoffeeScript program, yet they add a lot of beauty and rhetorical fluidity (as preamble) to the code. In fact, I think that even if that particular test is rejected from the code, the convention should be that Literate CoffeeScript start out as did this comment. (And we all know that code conventions are just missing compiler features.)
It also helps tools like file(1) to make positive identifications of Literate CoffeeScript files.

So, there it is.

  console.log 'Merge me!'

I just had to put that there because I can, and you cannot stop me, and this comment will still compile. :-p

@ninjacato
Copy link

Hmm, 3 months since there was any activity on this issue. Is this still up for debate or has this idea died with the issue? Would love to see this in the next version of CoffeeScript. :-)

@michaelficarra
Copy link
Collaborator

It's not dead, there's just a lot of things to get done, and this isn't at the top of the list (or at least not mine).

@jashkenas
Copy link
Owner

Hey folks. I've pushed an initial implementation of literate CoffeeScript to the literate branch, here:

master...literate

... I've initially tested it by formatting the src/scope.coffee source file as markdown, and it compiles beautifully.

Before merging it, there are a number of things that need to be done, the most important of which is ... It would be lovely if we can figure out a way to not have to add an additional file extension (.litcoffee, currently), in order to get proper compiles. In a perfect world, we would be able to compile both styles without an extension flag, or a special marker present somewhere in the file -- just by being able to detect either Markdown or CoffeeScript.

Any ideas?

@vendethiel
Copy link
Collaborator

Try first to parse it through coffee and fall back to litcoffee ? Add an annotation ? (#LITERATE?)
I don't think there's really a "good" way, except trying to detect markdown in unindented lines

@osuushi
Copy link

osuushi commented Sep 26, 2012

What about a compromise solution, with a required marker that is also functional? For example, the literate.litcoffee example has a header at the top using the ------- markdown syntax. Requiring such a header (maybe it should be a ====== H1 header) seems like a reasonable convention for literate files, and it makes detection easy. More importantly, it would make accidental misinterpretation nearly impossible.

Given that a large number of lines of CoffeeScript are also valid lines of Markdown, it seems unlikely that unaided autodetection is going to work without significant and confusing edge cases. It would be a bad situation if, for example, a typo like a missing -> caused the compiler to suddenly see think a non-literate file was supposed to be literate. In the worst case, such an error might even cause the file to compile without a complaint, leaving the programmer to dig through and figure out what caused the misinterpretation.

@jashkenas
Copy link
Owner

... you'd think it wouldn't be to hard too detect CoffeeScript, and fall back to markdown ... but unfortunately code like this:

This is valid coffeescript code. Does it compile?

Compiles: http://coffeescript.org/#try:%20%20%20%20This%20is%20valid%20coffeescript%20code.%20Does%20it%20compile%3F

@vendethiel
Copy link
Collaborator

I don't think so http://is.gd/NuBxgb
I agree with a marker tho

@jashkenas
Copy link
Owner

Whoa -- nice!

@osuushi
Copy link

osuushi commented Sep 26, 2012

@superjoe30 literate CoffeeScript in general, or reusing the same extension and autodetecting?

@andrewrk
Copy link

Literate CoffeeScript in general.

@osuushi
Copy link

osuushi commented Sep 26, 2012

@superjoe30 The Wikipedia article on literate programming is decent reading for understanding the basic purpose, but the main point is better code documentation.

@bergie
Copy link

bergie commented Sep 26, 2012

Really glad to see this happening. If it is of any help, here is how I implemented literate programming for PHP: http://bergie.iki.fi/blog/literate_programming_with_php/

@epidemian
Copy link
Contributor

Why would it be undesirable to add a new file extension for Literate CS files? Isn't that the standard mechanism that text editors and syntax highlighting tools (like github/linguist) use to detect the language? It's my understanding that Haskell also uses this strategy to difference from normal .hs and literate .lhs files.

If a separate extension is indeed undesirable, the proposal of adding a Literate CoffeeScript preamble for Literate CS seems fine to me (kinda like adding a hashbang to a script file :)

@bergie
Copy link

bergie commented Oct 3, 2012

From the doctest.js discussion on HN:

My main concern with it is that it forces you to write the document in the same order you want the code to be extracted, which may not be the best order for explaining things. That is why classic Literate Programming tools like noweb allow you to name the chunks of code and then arrange them into the generated files as you wish. You can see an example of this in action when I'm assembling noweb.php.

One way to work with named chunks of code in Literate CoffeeScript would be to use the fenced code blocks syntax from Github-flavored Markdown. This way we could do stuff like:

```coffeescript;somechunk
# Contents of the chunk here
```

We would only need to figure out an appropriate chunk inclusion syntax. Noweb uses <<chunkname>>, so you could call in that previous named chunk into an arbitrary location of your document with:

```coffeescript
# Some code, then the chunk:
<<somechunk>>
# Code continues
```

@jashkenas
Copy link
Owner

... and replied on HN: http://news.ycombinator.com/item?id=4608260

@bergie
Copy link

bergie commented Oct 3, 2012

@jashkenas well argumented. I guess the way proposed in this issue makes Literate Programming a lot more approachable.

The next question with this approach is editor support, as then you'd have to understand both the Markdown syntax and CoffeeScript, and the relations between the two. Emacs probably makes this easy with major and minor modes, but with other editors this can be trickier.

@jashkenas
Copy link
Owner

Yes -- I think that's actually not so hard. I'm going to attempt it for textmate / sublime. Those editors are already able to combine parse modes ... for example, JavaScript is correctly highlighted within an HTML document. Here's it's easier -- any block that's indented more than four whitespace characters should be passed to the CoffeeScript highlighter.

@bergie
Copy link

bergie commented Oct 3, 2012

For a different take on editors, I've been toying with the idea of combining collection support in Create.js, Hallo's Markdown mode and something like CodeMirror to make a web-based WYSIWYG editor for literate programming.

On longer term you could then utilize features like image insertion, or even connect it with a web-based graph editor so you could produce really nice documents while you code.

The less separation there is between your documentation and the code, the more likely they'll both remain up-to-date.

@niclashoyer
Copy link

is there any support in gedit / gtksourceview for this? That would be great!

@niclashoyer
Copy link

I quickly wrote my own Literate Coffeescript syntax definition for gedit, see my gist. It is far from perfect, but at least Markdown / CoffeeScript highlighting works.

You need gedit-markdown (included by default) and gedit-coffeescript.

Place the file in ~/.local/share/gtksourceview-3.0/language-specs

@supersym
Copy link

supersym commented Feb 2, 2013

Cool. I actually had the same idea earlier and wanted to actually start work on it but now I don't have to. Thanks!

@semperos
Copy link

This looks like a great addition. However, after trying a few syntaxes, it appears that there isn't support for "fenced code blocks." Is this the case?

In an already white-space sensitive language like CoffeeScript, adding an extra layer of indentation for Markdown code blocks is problematic. Fenced code blocks, for me, would be a must before using this.

@jashkenas
Copy link
Owner

You're correct. Fenced code blocks aren't a syntax of ordinary Markdown that I'm aware of.

http://daringfireball.net/projects/markdown/syntax#precode

Aren't they a special GitHub thing?

@semperos
Copy link

They are not part of standard Markdown, that's correct, but they are a common addition by many Markdown libraries, including Showdown.js, Github-flavored markdown via Sundown, Ruby's Kramdown, etc.

For the purposes of literate CoffeeScript, adding more indentation makes things tougher to read and edit, especially if one is using something like RequireJS wherein most of the file is already indented to live inside the callback to define.

@satyr
Copy link
Collaborator

satyr commented Feb 25, 2013

Right now the literate syntax is bare minimum--far from full-blown Markdown.

-   Lists
    like
    this
-   fail, for example.

@supersym
Copy link

First of all, to see a little experimentation of mine with Literate CoffeeScript and the
result rendered by GitHub using GitHub Flavored Markdown (GFM), go here. The raw source of this file can be found here, it is a .litcoffee extension file that I did a simple cp a.litcoffee b.md on. That last link, more particular the text inside the file can be copied and pasted from the operating system clipboard in some editor like this live preview editor here. You can easily see how it renders to HTML. There are browser extensions who do these things too, as there are some downloadable viewers/readers/editors (mostly Mac) that have some GUI interaction.

I too have been toying with some ideas I had on these, and related fields, around literate programming and UI design, as a response to bergie. I sent you an email actually to see if we can do something with it. Note that there are, although few, several tools that can take very different approaches. Someone interested in LP should also checkout [Codnar](https://github.com/orenbenkiki/codnar](https://github.com/orenbenkiki/codnar). There are

@supersym
Copy link

supersym commented Mar 6, 2013

I'm not sure if this was considered, but it is possible to append the fenced code block triple backticks ````coffeescript (filename)` where you can easily sequentally parse/concat those into seperate files.

I could imagine that explaining why you have the file/directory structure could partially be together in a 'story' and then, we moving back to the main idea, keep writing chunks of files as a coherent red-line throughout, and branch off as the flow of thoughts come along.

Donald Knuth and other usually do also have a outline that goes from 'require' to the 'execution' etc. which is still a bit too much towards computers and not humans, but does make a lot more sense when they get explained.

``coffeescript 
# file1
    # (1A) code goes here
``

Story line continues lorem ipsum...

``coffeescript 
# path/to/file2

     # code goes here

``

Story line continues

``coffeescript
# file1

    # (1B) execute some code goes concat under (A)

``

And indeed, with functions and requirements being scanned before execution, we have all we need to write a entire book in 1 file. Just don't forget the module.exports.

Updated: example that does get parsed properly, having the file name on the first carriage return after coffeescript. It could even be easily done with additional tab for aesthetics

@bergie
Copy link

bergie commented Mar 6, 2013

@supersym I suggested something similar in this Hacker News thread: http://news.ycombinator.com/item?id=4608165

Here is the response from @jashkenas back then:

this is probably the main complaint raised when talking about tools like Docco or Markdown-as-source-code as "literate programming". But, it's a concern that I believe is entirely outdated. Modern dynamic languages make it easy to sequence your code as you like. The methods in a class may be listed in any logical order, helper functions can be listed in an appendix after the functions that make use of them, and so on. I find it hard to imagine an example where changing the order of the codebase would make the prose version more readable -- and wouldn't also make the code version more readable as well.

@supersym
Copy link

supersym commented Mar 6, 2013

Right. And he's probably right :) But we're still talking 1 file? I didn't see that pass in the discussion of 1 litcoffee or 10 or 20 .litcoffee files. One file makes sense from a literary standpoint, and for easy handling / maintenance (debatable, enough author make a seperate file for each chapter because 200 pages aint fun) but e.g. what they did with literate clojure and such is packing it all in 1 file. And after that, the classes get extracted etc.
I should really look deeper what function he used for that and apply it to coffeescript so I can extract files if I want to, or not.

Using the above suggestion I made, and ^(`{3}coffeescript( \r|\r)()(.*)) regex expression I have enough to do this . Updated because file on the same line doesnt get parsed

http://www.gridlinked.info/oop-with-coffeescript-javascript/ somekind of namespaces is something I should investigate a bit better too

@bergie
Copy link

bergie commented Mar 6, 2013

@supersym good point. My noweb.php supports multiple files in very similar manner.

Defining a named chunk of code:

<<my cool chunk>>=
# some code
@

Defining a file and including code:

<<somefile.coffee>>=
  exports.foo = (somearg) ->
    <<my cool chunk>>
@

@supersym
Copy link

supersym commented Mar 6, 2013

Yeah exactly I noticed something like that and, although I find the notation ugly, its a simple and elegant way of working with this style. Personally I like that I can stay on the same line with coffeescript, since the fences are there to stay I guess and the (path/filename) looks clean and clear enough to me. And since I know it to be parsable by coffee-script can just omit the .extension.

Last thing I noticed, that the used inline #macro() more or less inside the paragraphs to do stuff.

Any thoughts if we could call our methods in a non-significant whitespace manner? Probably, the best candidates might be to use the markdown _ or * or backtick + another symbol to escape out of the texts like the pound, or perhaps 3x _. Not sure.

@supersym
Copy link

supersym commented Mar 6, 2013

Btw, if you have a customer who knows exactly what they want, have them write down the story and put code in between :) Or just comments there where you are unsure or want questions answered. My personal dialect will probably provide some conventions/enhanchements in that order of communicating messages as footnotes etc. Also I like to have some convention to autoindex and number illustrations etc like tex does.

Shame that Markdown/HTML doesn't handle 1., 1.1, 1.2 and that kind of enumerations. Or does it?

@supersym
Copy link

supersym commented Mar 6, 2013

Well can't clean up so it seems... In short, perhaps implicit ```coffeescript blocks would be nice. That's the only thing bugging me still but its relatively easy made anyway.

He is another example of a literate program on GitHub: a Clojure to C++ compiler

And Emacs org-mode example http://doc.norang.ca/org-mode.org.html and http://orgmode.org/worg/org-contrib/babel/intro.html

Another example of Lisp dialect LP (Scheme/Racket) here and a blog post here

Maybe some want to learn from other language implementations.

@deanh
Copy link

deanh commented Mar 13, 2013

Should this have closed with 1.5?

@michaelficarra
Copy link
Collaborator

Looks like it. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests