add July knowledge engineering posts

input-output-hk · Aug 16, 2022 · b661319 · b661319
1 parent b725e3b
commit b661319
Show file tree

Hide file tree

Showing 2 changed files with 136 additions and 0 deletions.
diff --git a/blog/2022-07-18-ghcjs-threads.md b/blog/2022-07-18-ghcjs-threads.md
@@ -0,0 +1,49 @@
+---
+slug: 2022-07-18-lightweight-threads-on-JavaScript
+title: Lightweight Haskell Threads on JavaScript
+date: July 18, 2022
+authors: [ luite ]
+tags: [ghc, javascript, concurrency, ffi ]
+---
+
+## Introduction
+
+I recently gave a short presentation on the topic of threads in GHCJS to the GHC team at IOG. This blog post is a summary of the content.
+
+## JavaScript and Threads
+
+JavaScript is fundamentally single threaded. There are ways to share specific data between tasks but it's not possible to run multiple threads that have access to a shared memory space of JavaScript data.
+
+The single JavaScript thread is often responsible for multiple tasks. For example a node.js server handles multiple simultaneous connections and a web application may be dealing with user input while downloading new data in the background.
+
+This means that any single task should take care to never block execution of the other task. JavaScript's canonical answer is to use asynchronous programming. A function reading a file returns immediately without waiting for the file data to be loaded in memory. When the data is ready, a user-supplied callback is called to continue processing the data.
+
+## Haskell Threads
+
+Concurrent Haskell supports lightweight threads through `forkIO`. These threads are scheduled on top of one more more operating system thread. A blocking foreign call blocks an OS thread but other lightweight threads can still run on other OS threads if available.
+
+There is no built-in support for foreign calls with a callback in the style of JavaScript. Functions imported with `foreign import ccall interruptible` can be interrupted by sending an asynchronous exception to the corresponding lightweight thread.
+
+## Lightweight Threads in JavaScript
+
+GHCJS implements lightweight threads on top of the single JavaScript thread. The scheduler switches between threads and handles synchronization through `MVar` and `STM` as expected from other Haskell platforms.
+
+Foreign calls that don't block can be handled in the usual way. We extend the foreign function interface with a new type `foreign import javascript interruptible` that conveniently supports the callback mechanism used by JavaScript frameworks. The foreign call is supplied with an additional argument `$c` representing a callback to be called with the result when ready. From the Haskell side the corresponding lightweight thread is blocked until `$c` is called. This type of foreign call can be interrupted with an asynchronous exception to the lightweight Haskell thread.
+
+By default, Haskell threads in the JS environment run asynchronously. A call to `h$run` returns immediately and starts the thread in the background. This works for tasks that does not require immediate actions. For situations that require more immediate action, such as dealing with event handler propagation, there is `h$runSync`. This starts a synchronous thread that is not interleaved with other task. If possible, the thread runs to completion before the call to `h$runSync` returns. If the thread blocks for any reason, such as waiting for an `MVar` or a `foreign import javascript interruptible` call, synchronous execution cannot complete. The blocking task is then either interrupted with an exception or the thread is "demoted" to a regular asynchronous thread.
+
+## Black Holes
+
+When a Haskell value is evaluated, its heap object is overwritten by a black hole. This black hole marks the value as being evaluated and prevents other threads from doing the same. "black holing" can be done either immediately or "lazily", when the garbage collector is run. GHCJS implements immediate blackholing.
+
+Black holes give rise to an interesting problem in the presence of synchronous and asynchronous threads. Typically if we use `h$runSync`, we want to have some guarantee that at least part of the task will run succesfully without blocking. For the most past it's fairly clear which parts of our task depends on potentially blocking IO or thread synchronization. But black holes throw a spanner in the works: Suddenly any "pure" data structure can be a source of blocking if it is under evaluation by another thread.
+
+To regain some predictability and usability of synchronous threads, the `h$runSync` scheduler can run other Haskell threads in order to "clear" a black hole. The process ends all black holes have been cleared or when any of the black holes is impossible to clear because of a blocking situation.
+
+This all happens transparantly to the caller of `h$runSync`, if the black holes could be cleared it appears as if they were never there.
+
+## Conclusion
+
+We have lightweight Haskell threads in the single-threaded JavaScript environment and extend the foreign function interface to easily support foreign calls that depend on an asynchronous callback. This way, only the Haskell lightweight thread blocks.
+
+By default, Haskell threads are asynchronous and run in the background: The scheduler interleaves the tasks and synchronization between threads. For situations that require immediate results or actions there are synchronous threads. Synchronous threads cannot block and are not interleaved with other tasks except when a black hole is encountered.
diff --git a/blog/2022-07-26-ghcjs-linker.md b/blog/2022-07-26-ghcjs-linker.md
@@ -0,0 +1,87 @@
+---
+slug: 2022-07-26-the-ghcjs-linker
+title: The GHCJS Linker
+date: July 26, 2022
+authors: [ luite ]
+tags: [ ghc, javascript, linking ]
+---
+
+## Introduction
+
+I recently gave a short presentation on the workings of the GHCJS linker. This post is a summary of the content.
+
+## JavaScript "executables"
+
+The task of a linker is collecting and organizing object files and resources into a loadable library or executable program. JavaScript can be run in various environments, for example the browser or node.js, and not in all of these the concept of an executable makes sense.
+
+Therefore, when we link a Haskell program, we generate a `jsexe` directory filled with various files that allow us to run the JavaScript result:
+
+| File        | Description          |
+| :----:        | :---:             |
+| `out.js`      | compiled/linked Haskell code          |
+| `out.frefs.*` | list of foreign calls from `out.js` |
+| `out.stats`   | source code size origin statistics for `out.js` |
+| `lib.js`      | non-Haskell code, from `js-sources` in packages and RTS. possibly preprocessed |
+| `rts.js`      | generated part of RTS (apply functions and similarly repetitive things) |
+| `runmain.js`  | single line just starts `main` |
+| `all.js`      | complete runnable program, created by combining `out.js`, `lib.js`, `rts.js` and `runmain.js` |
+
+Most of the work done by the linker is producing `out.js`, and that's what we'll be focusing on in the next sections.
+
+## Building `out.js`
+
+The linker builds `out.js` by collecting all code reachable from `main` (and a few other symbols required by the RTS) and generating the required initialization code for all top-level data. The code is found in object files. These object files have the following structure:
+
+| Section        | Description          |
+| :----:        | :---:             |
+| Header       | version number and offsets of other sections       |
+| String table | shared string table, referred to by `Dependencies` and `Code`, to avoid duplication in file and memory |
+| Dependencies | Dependency data, internally between binding groups and externally to symbols in other object files |
+| Code         | Compiled Haskell code stored as serialized JavaScript AST and metadata. Code is organized in binding groups |
+
+The object files contain binding groups of mutually dependent bindings. These are the smallest units of code that can be linked. Each binding group has some associated metadata required for initialization of the heap objects in the group. The metadata contains for example constructor tags (e.g. 1 for `Nothing`, 2 for `Just`), the arity of functions and static reference tables.
+
+From a high level, the procedure that the linker follows is this:
+
+| Step |
+| :---: |
+| Read object files from dependencies into memory |
+| Decode dependency part of all object files in dependencies (includes reading the string tables) |
+| Using dependency data, find all code reachable from `main` |
+| Decode reachable binding groups |
+| Render AST to JavaScript |
+| Construct initializers from metadata | 
+
+We avoid decoding (deserializing) the binding groups that do end up in the linked result to keep the memory consumption lower. Still the linker requires a lot of memory for larger programs, so we may need to make more improvements in the future.
+
+## The Compactor
+
+The compactor is an optional link-time transformation step that reduces code size. It consists of a lightweight (i.e. no expensive operations like dataflow analysis) rewrite of the code contained in the object files. The compactor is disabled when linking with the `-debug` flag. There are a few steps involved.
+
+### Renaming private symbols
+
+Haskell names are quite long by default: they need to be globally unique, hence they contain their defining unit-id and module name. For example: `mtl-2.2.2-somehash-Control.Monad.State.Lazy.execState_go1` (special characters would be z-encoded but it isn't shown here).
+
+Private symbols are only referred to from within the same module. It doesn't matter which JavaScript name we pick for them, as long as there is no overlap between the names from different modules. The compactor renames all the private symbols using a global sequence to ensure short names that do not overlap.
+
+### Block Initializer
+
+Without the compactor, the linker generates an `h$initObj` initialization call (or `h$o`) call for each global Haskell heap value. The code for this can get quite big. The compactor collects all heap objects to be initialized in a single large array and encodes the metadata in a string. This makes the initialization code much more compact.
+
+### Deduplication
+
+An optional step in the compactor is deduplication of code. When deduplication is enabled with the `-dedupe` flag, the compactor looks for functionally equivalent pieces of JavaScript in the output and merges them. This can result in a significant reduction of code size.
+
+## Incremental Linking
+
+The linker supports building programs that are loaded incrementally. This is used for example for Template Haskell. The process that runs the Template Haskell stays alive during compilation of a whole module. When the first Template Haskell expression is compiled, it is linked against all its dependencies (including the RTS) and the resulting JavaScript code is sent over to be run in the evaluator process.
+
+As subsequent Template Haskell expressions are evaluated in the same process, there is no need to load already loaded dependencies (including the RTS) again and it is much more efficient to avoid doing so. Therefore the linker keeps track of which dependencies have already been linked and each subsequent TH expression is only linked against dependencies that are not already loaded in the evaluator process.
+
+It's also possible for users to use this functionality directly, with the `-generate-base` to create a "linker state" file along with the regular `jsexe` files. Another program can then be linked with `-use-base=state_file`, resulting in a program which leaves out everything already present in the first program.
+
+## Future Improvements
+
+Memory consumption is the biggest problem in the linker at the moment. Possible ways to achieve this are compression, more efficient representation of the data structures or more incremental loading of the parts from the object files that we need.
+
+In terms of functionality, we don't take advantage of JavaScript modules yet. It would be good if we could improve the linker to support linking a library as a JavaScript module. We should also consider making use of `foreign export javascript` for this purpose.