Minimal Runtime #4

Closed
planetenkiller opened this Issue Jun 20, 2013 · 39 comments
@planetenkiller

A minimal JS Runtime allows web-developers to use the (minimal-) runtime without waiting for a (full featured-) runtime that is not too large.

Advantages:

  • can use scala-js on existing web-pages (not everyone can use a 16mb/1-2mb JS library)
  • use of advantages of scala (static typing, languange features, etc.)

Disadvantages:

  • can not use scala.collection.* and other uncommon or big scala-library packages
@SethTisue

+1, this is what I really want as well in my use scenario — much more than I want something more “complete” that allows using something enormous like the Scala collections API. I'd like to write small classes in pure, no-dependencies Scala and have them compiled to a reasonably small amount of pure, no- or minimal-dependencies JavaScript.

@mushtaq

+1. Not sure if it is possible but an extremely lightweight runtime around 100 K will allow us to use it in our web apps.

@sjrd
Scala.js member

This is an excellent idea!

And it is pretty straightforward to obtain. All we need is a list of the top-level classes, traits and objects that we do want. Then it is only a matter of selecting only those files when packaging them together.

So ... what do we want? I'll start with the obvious:

  • The Java core classes: probably everything that is currently in the "javalib". (except maybe java.io?)
  • The Scala core classes: probably everything that is in the scala package (not in subpackages) - this includes function types, Option, Product and tuple types. (except maybe App, Application and Console?)
  • The Scala runtime support: scala.runtime and scala.reflect

Note that Predef.println relies on Console, and Console relies on java.io.

@marcesquerra
@SethTisue

@sjrd: your list seems right to me. I would hope java.io could be left out.

The Scala sources will need patching here and there, e.g. to remove Option's dependency on Iterator and List, Product's dependency on Iterator, etc.

A few possible additions: much of scala.math (not BigDecimal and BigInt though), scala.util.Either, scala.util.Try, scala.util.control.Exception, scala.util.control.Breaks.

@TheBizzle

+1, this would be glorious.

@sjrd
Scala.js member

The Scala sources will need patching here and there, e.g. to remove Option's dependency on Iterator and List, Product's dependency on Iterator, etc.

It is not necessary to patch the source, because dependencies are looked up on demand. So if you do not actually call it, you won't have an error. Of course you have to be careful not to call it. But even if you did patch the source, you would not get a compile-time error, because - at the moment - the Scala.js compiler compiles your code against the original (.class) Java and Scala libraries.

A few possible additions: much of scala.math (not BigDecimal and BigInt though), scala.util.Either, scala.util.Try, scala.util.control.Exception, scala.util.control.Breaks.

Good point.

@SethTisue

Cool, thanks for the clarification.

@planetenkiller

I've started to implement a minimal runtime (current state: https://github.com/planetenkiller/scala-js/tree/minimal-runtime ).
The JavaScript code of some scala library packages are very big:

  • scala.math: 541 kb (without .map/.jstype files, google closure compiler and gzip)
    • 97 kb of 541 kb belong to BigInt/BigDecimal
    • 190 kb to Ordering
    • 183 kb to Numeric
  • scala.runtime: 609 kb (without .map/.jstype files, google closure compiler and gzip)

The size of the scalajs-runtime-minimal.js:

  • 3.7 mb raw
  • 2.7 after google closure compiler
  • 221 kb after google closure compiler & gzip
@sjrd
Scala.js member

Great job!
It's disappointing that we reach only 2.7 MB this way, but it's definitely better than 16.

@mushtaq
@sjrd
Scala.js member

Wouldn't it be even more confusing to have a different package?

Given the selected set, it is fairly easy to remember: you are allowed to use the scala and scala.math package, the Either hierarchy and Breaks.

@mdedetrich

I actually disagree with not having scala.collection.* as part of the minimal library, collections would be a very looked out feature for client heavyweight web apps, which I believe is something that scala-js would want to target (and is arguably where the web is heading)

Wouldn't a better approach be to user an agressive compiler for production, since we are dealing with a static language (scala), we can easily eliminate a lot of dead code + have much better reasoning for further optimizations

Essentially what seems to be done currently (correct me if I am wrong) is we are trying to implement JVM + Java/Scala standard libraries inside javascript, which is right approach for development, but the wrong approach for production

@sjrd
Scala.js member

Hi,

For production, we definitely want aggressive compilation, eventually. We will use the Google Closure compiler for that purpose. The present discussion is mostly trying to find an ingermediate, transient solution.

@mdedetrich

In my opinion, removing a core library (scala collections) from the runtime is the complete incorrect direction, and shouldnt ever be done. The only libraries that should be removed should be things like java.io.* and java.land.thread (for obvious reasons)

Regarding using google closure compiler, idealistically speaking there isn't any reason to use it if you are doing your own language (scala) -> JS compiler.

You will receive much better optimizations and output code reduction by going through the Scala AST and doing optimizations that way, before outputting any JS. And by doing this properly, there won't be any need for the google closure compiler.

As an interem solution, the best way to reduce filesize (which is the main issue currently) is by dead code elimination and symbol renaming. Removing core libraries is definitely not the way to do it.

@sjrd
Scala.js member

I agree that removing the library is not a good idea. But apparently it did appeal to some people, so why not?

Dce and symbol renaming are the way to go. And Closure happens to do that already. Implementing the same thing again would be a poor decision, I think.

@mdedetrich

I think that removing collections from the runtime is a bad idea simply because if I was to use scala for javascript, then I would be using those collections. Its like removing Hash from Ruby or Typeclassses from Haskell

The main point I am getting across is its the completely incorrect way to approach the problem. As has been said earlier, the collections runtime is around 500k. Right now we have a 16 mbyte file to deal with, removing 500kb from that isn't going to achieve much

if you want to actually get the runtime to a filesize that is considered sane for production, it needs to be around 1megabyte absolute max, the only way to do this is by dead code elimination and symbol renaming. Removing collections by itself won't actually make people want to use scala-js in production, dead code elimination/symbol renaming however will

Regarding GCC (google closure compiler), yes it does do symbol renaming. The thing is, it will never do it as well as the scala-js compiler, because it can't infer as much from javascript code as the scala compiler can. For example, if you call stuff from javascript by string, i.e.

var a = {test: function{(alert('rawr'))}};
a["test"]();
a.test();

The second line will prevent GCC from doing any symbol renaming, where as the scala compiler can do symbol renaming in this situation because it has your scala code in a AST that is statically typed. You can get more details from here (https://developers.google.com/closure/compiler/docs/api-tutorial3)

It may not make so much of a difference with symbol renaming, since you can probably adjust the outputted JS code so it always resembles the a.test() example so that GCC will always rename it. The bigger difference is however actual code optimization (and dead code elimination), and I have already explained this before

As for the short term, there is no issue in using GCC for symbol renaming as altering what the current JS would be is minimal work. But the main point I want to put across, is that, by FAR, the best way to reduce the codesize is by dead code elimination, and to do that you need to implement a good algorith that the scala compiler will run on the AST before it outputs JS. That should probably be the biggest priority (apart from obviously bug fixing) to get the code usable in production. Obviously symbol renaming can also be done after the dead code elimination phase by either GCC or the compiler itself.

Essentially you need to do what proguard does

@sjrd
Scala.js member

Currently, Scala.js does not have the complete typed AST when it outputs .js code. Since it does separate compilation, it translates libraries, and even files, individually and separately to JS. Hence, at that time, it cannot know the whole program graph.
Now, we could consider to output a different file format for separate compilation, that would retain all JVM-like type information. Then we'd have our own "linker" that reads back these files, and exploits these type information for better dce and symbol renaming. I am not sure it is worth it, considering that [see next paragraph].

During ECOOP last week, I met Ben Lickly, who is a developer of GCC, and I discussed with him how it could be put to use for Scala.js. It seems like it should be able to do a very good job with Scala.js-emitted code (i.e., it might be almost as effective as proguard). Some of the encoding, and the way classes are loaded, must be changed, of course. I will investigate this now.

@sjrd
Scala.js member

It goes without saying, but it's always better to say it anyway: if the GCC approach proves to be insufficient, then obviously I will do as you suggest, with a custom dce.

@mdedetrich

Idealistically speaking, it would be better if the scala-js compiler did have the full AST available when compiling JS code, because from another perspective it would eliminate the whole "which compressor and which combination of compressors should we use" problem that we currently have in web programming (gcc, uglifyjs etc etc)

Even better, with a full AST you could do nice stuff, like automatic asynchronous module loading on the client, something that is currently a massive pain in the js client side world. Generally speaking, there is always an advantage for a compiler having the full AST. Currently scala-js isn't really a compiler so much, its more of a translator (similar to coffeescript). Nothing bad about that, but even Coffeescript (and other to-js languages) are ending up recoding their compilers to have proper AST's due to the benefits it provides

I still think for the long term, turning scala-js into a proper compiler should be done, but I definitely agree 100%, that for now, using GCC is preferable if possible, since it takes very little effort and will produce code that is actually usable for production.

My point is we should eventually move to an AST model, for the reasons stated earlier. My biggest issue imho, is doing stuff like removing scala collections in an attempt to reduce code size, which I consider a stupid idea

@sjrd
Scala.js member

We're starting to understand each other, I think :-)

I may have been imprecise when I said Scala.js did not have the full AST. It does have the full AST for one file, of course, and actually for all files being compiled by the present invocation of the compiler. But it does not have the AST for previously compiled files (say, libraries). Of these, it only has symbol and type information (just like scalac).

So I do not agree in saying that Scala.js is not a compiler. It is a compiler that supports separate compilation (just like scalac). But it does not perform global dce and symbol renaming. Note that javac/scalac do neither of these either, and yet you call them compilers, don't you?

But maybe we just don't have the same definition of "compiler" ;-)

@mdedetrich

Well I guess its more accurate to say to say that scala-js is a partial compiler then :)

In which case it may be worth exploring if a JVM bytecode -> javascript compiler exists, a google search however isn't providing anything...

@sjrd
Scala.js member

In which case it may be worth exploring if a JVM bytecode -> javascript compiler exists, a google search however isn't providing anything...

That is GWT, and the Scala/GWT experiment was tried before, and abandoned. So no, it's not worth it.

@mdedetrich

Not sure if this is a reputable source, but GWT apparently is a java -> js compiler, not a JVM bytecode -> js compiler

https://groups.google.com/forum/#!topic/google-web-toolkit/SIUZRZyvEPg

EDIT:

In fact, according to that link, it creates its own AST of the java source to do its own transformations for optimization reasons, which is what I am arguing for

@mushtaq
@SethTisue

I think that removing collections from the runtime is a bad idea simply because if I was to use scala for javascript, then I would be using those collections

Well sure, ideally I'd like to use them too, but if the cost is too great, I could also do without them. Just Scala-the-language, without the Scala collections API, still has enormous advantages over JavaScript-the-language.

If I can get the collections API without the resulting JS code being unusably huge, then great! Obviously that would be ideal for everyone. But if that ideal turns out to be difficult and/or time-consuming to achieve, I'm interested in alternatives.

@jackcviers
@mdedetrich

Let's back up a bit here. We should not be attempting to separate the runtime from user-space code when compiling to js. Instead, the runtime should be bundled with the user code, and then minification and optimization can be performed with the knowledge of the entire runtime scope. This would preclude the ability to do on demand loading and evaluation in the browser at runtime, but stays true to the notion of Scala being a compiled language. I think we should probably rely on a separate, standalone, and heavyweight runtime for an in-browser repl only. Not for runtime applications.

Precisely, we need to run optimizations on the entire codebase, and not just on the runtime or the produced JS code. There is nothing wrong with having the entire runtime embedded in a JS file for development (heck its ideal for obvious reasons), but for production, you just want to take your entire scala code (including runtime), and perform your optimizations on the whole thing.

If there is any runtime on production, it would be a minimal ABI to allow scala - > js code to run other scala - > js code (async modules?), if we want to get to that stage

Well sure, ideally I'd like to use them too, but if the cost is too great, I could also do without them. Just Scala-the-> language, without the Scala collections API, still has enormous advantages over JavaScript-the-language.

As I have stated before, this isnt a solution. The current runtime is 20+ megs, and the collections library is only 500k. The reason why the library is so large is because there is no dead code elimination, and guess what, dead code elimination will also remove scala collections if you don't use them!

If I can get the collections API without the resulting JS code being unusably huge, then great! Obviously that would be ideal for everyone. But if that ideal turns out to be difficult and/or time-consuming to achieve, I'm interested in alternatives.

We all are, but removing core libraries (unless they have nothing to do with client side JS, i.e. threading or file IO) is not an alternative

Lets get this straight, the only reason that runtime library is so large is because there is no real optimization run on it (particularly dead code elimination and symbol renaming). Its the same reason we get 20 meg jars if we stuff hundreds of classes in the jars that never end up being used or called. Doing such optimizations is 101 when it comes to compilers, its not being done at all

The solution to this wasn't removing Java's Lists or anything to the standard library, the solution to do this is doing optimizations. Thats what proguard does

@sjrd
Scala.js member

Some progress on my branch be-closure-friendly. I changed the encoding of method names and JS dynamic calls (and other small things) to be a little bit more Closure-friendly.

On this branch, one can apply the Advanced Opts of Closure to the Scala.js-emitted code of a whole application (including user-space), including the "startup" code, but excluding any other JS library being used. Closure will then rename all Scala.js-specific symbols, and leave all JS symbols alone.

dce is not performed yet, due to the way classes are encoded.
I also still have to automate the application of Closure in the sbt build.

Results: both the Hello World and Reversi examples work after Advanced Opts, and their .js code is 6.4 MB.

@sjrd
Scala.js member

Yeehaaaah! After a week of hardcore hacking, I have managed to output code that is friendly enough to Closure that it can perform its global dce, symbol renaming and namespace flattening.
I still need to introduce into the sbt plugin an automated way to apply it.

But big win! Hello world is now 2.63 KB, and the Reversi is just under the MB with 1,004 KB!

In the output code for the Reversi, there are still things that should be dce'ed, but are not. It might be a limitation of Closure, so maybe I'll have to contribute some improvements to GCC ... But that also means that there is still room for some improvement :)

@mushtaq
@joneshf

Very cool!

@jackcviers
@mdedetrich

Congrats, out of curiosity what areas of reversi the code are not being DCE'ed?

@sjrd
Scala.js member

It seems to be mostly the runtime type information about many types that are not used. In general these rtti are used for instance tests and ClassTag creation.

@sjrd
Scala.js member

I have now a complete, automated workflow in branch be-closure-friendly. It also goes along with the branch opt-with-closure of the example application :)

@sjrd
Scala.js member

Merged into master!
Time to close this issue, I believe. :)

@sjrd sjrd closed this Jul 19, 2013
@hrj

Nice work 👍

@SethTisue

happy happy, joy joy

@gzm0 gzm0 modified the milestone: v0.2, v0.1 Aug 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment