Skip to content

Conversation

@fonsp
Copy link
Owner

@fonsp fonsp commented Nov 26, 2020

We turn a simple cell like s1 = s2 = [sqrt(i) for i in x] into a function so that Julia can 1) compile a type-specialized version, which can also be 2) called a second time without compilation!

The generated function takes the referenced globals as arguments, and returns the defined globals.

s1 = s2 = [sqrt(i) for i in x]

would become (skipping some details):

function(sqrt, x)
	result = begin
		s1 = s2 = [sqrt(i) for i in x]
	end
	(result, (s1,s2,))
end

We can do this because Pluto already knows the set of referenced globals (function arguments) and defined globals (function outputs) -- we use it create reactivity!


Function generation is here:
https://github.com/fonsp/Pluto.jl/pull/720/files#diff-0cc97f3d6a0f647a05e5f913d416242ecb00f6e67e12004a7204b8846a3fa44cR96-R104

and it is called here:
https://github.com/fonsp/Pluto.jl/pull/720/files#diff-0cc97f3d6a0f647a05e5f913d416242ecb00f6e67e12004a7204b8846a3fa44cR112


Not every cell will be run as a function. Cells that import, define a function, or type, or call a macro still run with the old method:
https://github.com/fonsp/Pluto.jl/pull/720/files#diff-04c7252262a0e6de2db2cded7aba9b1130c1e5ba6adff25bf8ec3a34e36a4788R928-R946


And the best part is: all of this is completely invisible! You get superfast code without having to rewrite anything ❤

@fonsp
Copy link
Owner Author

fonsp commented Nov 26, 2020

Right now I did the rewrite but no cell is being function wrapped yet

@fonsp
Copy link
Owner Author

fonsp commented Nov 26, 2020

Getting there!

image

@fonsp
Copy link
Owner Author

fonsp commented Nov 26, 2020

Done! 🌈🍡💜💚🧡

TODO:

  • Remove the tip to wrap your code inside a function in the Interactivity notebook 😊
  • Measure performance boost 👍👍👍👍👍👍👍👍
  • Measure increase in RAM usage -- decrease 👍
  • Fix stack traces

@fonsp
Copy link
Owner Author

fonsp commented Nov 26, 2020

Somewhat extreme example to make it more visible on video:

Before:

(200 ms)

After:

(5 μs)

@fonsp
Copy link
Owner Author

fonsp commented Nov 26, 2020

@NHDaly this is what we talked about a long long time ago 😊


@gensym result
@gensym elapsed_ns
# we don't use `quote ... end` here to avoid the LineNumberNodes that it adds (these would taint the stack trace).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could use MacroTools.striplines(ex) for removing LineNumberNotes from expressions generated by quote (if this would be beneficial for you).

@lungben
Copy link
Contributor

lungben commented Nov 26, 2020

looks cool, I'll try it out tomorrow!

@fonsp fonsp marked this pull request as ready for review November 26, 2020 22:44
@fonsp
Copy link
Owner Author

fonsp commented Nov 26, 2020

@lungben Let me know if you find anything, but I'm pretty sure that it's done! Yayyy

@lungben
Copy link
Contributor

lungben commented Nov 27, 2020

looks good so far!

a)

function do_stuff(x)
	y = 0.0
	for i = 1:100000000
		y += x/i
	end
	y
end

do_stuff(x)

b)

begin
	y = 0.0
	for i = 1:100000000
		global y += x/i # global keyword may be obmitted in this PR because of implicit function wrapping
	end
	y
end

Timings:
without this PR:
a) ~112 ms
b) 10s (global y is required)

with this PR:
a) ~112 ms
b) 10s (with global y) / 112 ms (without global y)

Observations:

  • I tried a few of my own notebooks and nothing crashed :-)
  • if just a function is executed in a cell, the timings are the same (especially, no performance loss)
  • the scoping rules change for loops inside a cell because they are now implicitly inside a function - this may be unexpected, but is not bad imho, it is essentially the same as the new Julia 1.5 REPL scoping (which is currently not used in Pluto).
  • removing the global keyword in the loop gives with this PR the same performance (a factor of 1000 better!) as manually wrapping it into a function.
  • in general I think this PR gives (potential) benefits when there are more than 1 function call in a single cell -> then the compiler can do more optimizations.

Suggestions:

  • It would be good to have a way to define "local" variables, which are not returned to global scope. This could be helpful when dealing with large amount of data to allow GC to clean up intermediate results. We could use the Python convention of variables starting with underscore (e.g. _x) for private variables which are not exported. This would somehow complement let blocks, where only the return values are exported and everything else is local.
  • A blog post / discourse, etc. on this change would be helpful - understanding this change allows users to write even efficient notebooks in some cases (e.g. by using function scoping instead of globals).

@fonsp
Copy link
Owner Author

fonsp commented Nov 27, 2020

About local variables: this is why Julia has local: #379 (comment) and the let block. Or did you mean something else? And yes, we should write a sample notebook about all scoping rules and blocks!

About the new scoping rules: oops! Thank you for finding this, but like you mentioned, it's an accidental feature. 👍👍

About a blog post: maybe... I think that in the case of Pluto, documentation is a sign that I am doing something wrong. 🙃 For example, to me, the best of this PR is that I can remove this one line: https://github.com/fonsp/Pluto.jl/pull/720/files#diff-68739a2259e7362fd1fe549bef016303e9d476e26d9536797fce6cec7623484dL168-L172


Note that global is one of the blacklisted keywords that disable function wrapping: https://github.com/fonsp/Pluto.jl/pull/720/files#diff-04c7252262a0e6de2db2cded7aba9b1130c1e5ba6adff25bf8ec3a34e36a4788R928-R946 I had a reason why, I forgot it, but I'm sure that there was something

@lungben
Copy link
Contributor

lungben commented Nov 27, 2020

Thanks!
Using local for y the code runs in 8s without this PR and 112 ms with this PR 😄
Furthermore, the local keyword prevents the variable from being exported to global scope, therefore my suggestion above is already possible without me recognizing it.

@fonsp fonsp merged commit efdabfe into master Nov 27, 2020
@fonsp fonsp deleted the wrap-cell-in-function branch November 27, 2020 10:32
@fonsp
Copy link
Owner Author

fonsp commented Nov 27, 2020

That's right, 40x faster than the REPL!

image

FYI here is Python 3.7:

>>> import time
>>> def hello():
...     x = 0.0
...     s = time.time()
...     for i in range(1,10000000):
...             x += y / i
...     return time.time() - s
...
>>> hello()
0.816871166229248

80x faster!

@NHDaly
Copy link

NHDaly commented Dec 15, 2020

🎉!! @fonsp this is awesome, I'm super glad to see this improvement! 🎉 Thanks for pinging me! :)

🤩 Amazing results!! 🤩

@NHDaly
Copy link

NHDaly commented Dec 15, 2020

Also, wow, it's so cool how much faster this got.

Hehe technically this isn't including the compilation time in the time result, but that's standard for what julia does for @time (to time the actual time including compilation, you can time @eval, which can be important if your code actually takes potentially several minutes to compile and then time reports it ran almost instantly):

julia> @time 2+2
  0.000000 seconds
4

julia> @time @eval 2+2
  0.000970 seconds (53 allocations: 3.516 KiB)
4

But still, after this change, even if you include compilation time, things get much faster because you've changed a non-const global into a function argument. For the old case before this PR, julia had to check every iteration of the loop that y's value hadn't changed (because it's just a global, and it could have been affected by e.g. another thread somewhere or something), and in the new case, it can rely on the value being a constant, which matches the reactive notebook paradigm much better! :)

So even if you include the compilation time, the new system is still faster than evaling the block old way! 🎉
Screen Shot 2020-12-15 at 11 52 49 AM
Screen Shot 2020-12-15 at 11 53 49 AM


I guess Pluto could consider to separately report the compilation time as well? Julia 1.6 has started doing that:
JuliaLang/julia#37678
(Although they are missing some "top-level" compilation time, as reported here: JuliaLang/julia#37938. Pluto is better positioned to get this report correct, if that's desirable, but i'm not sure whether it is or isn't. 🙂)

@fonsp fonsp added backend Concerning the julia server and runtime expression explorer Figuring out assignments and references in a cell labels Mar 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend Concerning the julia server and runtime expression explorer Figuring out assignments and references in a cell

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants