Permalink
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
251 lines (161 sloc) 30.7 KB

Module Thinking

As discussed in the preface, complexity seems to be all around us while we’re working on software projects. So are abstractions, that keep complexity hidden away from us under rocks we don’t dare touching. These rocks are our interfaces to the rest of the world, so that we can get away with hardly thinking about it. JavaScript is no exception here — on the contrary: as powerful as dynamic languages are, it is also that much easier, and even tempting, to write complex programs when we’re using them.

To get started, let’s discuss how we can better apply abstractions, interfaces, and their underlying concepts to the work we do, so that we can minimize the amount of complexity we need to stare at when working on a project, a feature, a piece of functionality, down to the branches of a single function.

1.1 Introduction to Module Thinking

Embracing module thinking is understanding that complexity is, ultimately, inescapable. At the same time, that complexity can be swept under an interface, hardly to ever be seen or thought of again. But, and here’s one of the tricky parts, the interface needs to be well-designed so that its users don’t become frustrated. That frustration could even lead us to peering under the hood and finding that the implementation is vastly more complex than the poor interface we’re frustrated by, and maybe if the interface didn’t exist in the first place, we’d be better off in terms of maintainability and readability.

Systems can be organized granularly: we can split them into projects, made of multiple applications, containing several application-level layers, where we can have hundreds of modules, made up of thousands of functions, and so on. A granular approach helps us write code that’s easy to understand and maintain, by attaining a reasonable degree of modularity, while preserving our sanity. In section 1.4, we’ll discuss how we can best leverage this granularity to create modular applications.

Whenever we delineate a component, there’s going to be a public interface which other parts of the system can leverage to access our component. The interface — or API — is comprised of the set of methods or attributes that our component exposes. These methods or attributes, can also be referred to as touchpoints, that is, the aspects of the interface that can be publicly interacted with. The fewer touchpoints an interface has, the smaller its surface area is, and the simpler the interface becomes. An interface with a large surface area is highly flexible, but it might also be a lot harder to understand and use, given the high amount of functionality exposed by the interface.

This interface serves a dual purpose. It allows us to develop new bits and pieces of the component, only exposing functionality that’s ready for public consumption while keeping private everything that’s not meant to be shared with other components. At the same time, it allows consumers — that is, components or systems that are leveraging our interface — to reap the benefits of the functionality we expose, without concerning themselves with the details of how we implemented that functionality.

Robust, documented interfaces are one of the best ways of isolating a complicated piece of code so that others can consume its functionality without knowing any implementation details. A systematic arrangement of robust interfaces can be accrued to form a layer, — such as service or data layers in enterprise applications — and in doing so we might be able to largely isolate and circumscribe logic to one of those layers, while keeping presentational concerns separate from strictly business or persistence related concerns. Such a forceful separation is effective because it keeps individual components tidy and layers uniform. Uniform layers, comprised of components similar in pattern or shape, offer a sense of familiarity that makes them more straightforward to consume on a sustained basis from the point of view of a single developer, who over time becomes ever more used to familiar API shapes.

Relying on consistent API shapes is a great way of increasing productivity, given the difficulty of coming up with adequate interface designs. When we consistently leverage similar API shapes, we don’t have to come up with new designs every time, and consumers can rest assured that you haven’t reinvented the wheel every time. We’ll discuss API design at length over the coming chapters.

1.2 A Brief History of Modularity

When it comes to JavaScript, modularity is a modern concept. In this section we’ll quickly revisit and summarize the milestones in how modularity evolved in the world of JavaScript. This section isn’t meant to be a comprehensive list, by any means, but instead it’s meant to illustrate the major paradigm changes along the history of JavaScript.

1.2.1 Script Tags and Closures

In the early days, JavaScript was inlined in HTML <script> tags. At best, it was offloaded to dedicated script files, all of which shared a global scope.

Any variables or bindings declared in one of these files or inline scripts would be imprinted on the global window object, creating leaks across entirely unrelated scripts that might’ve lead to conflicts or even broken experiences, where a variable in one script might inadvertently replace a global that another script was relying on.

<script>
  var initialized = false

  if (!initialized) {
    init()
  }

  function init() {
    initialized = true
    console.log('init')
  }
</script>

<script>
  if (initialized) {
    console.log('was initialized!')
  }

  // even `init` has been implicitly made a global variable
  console.log('init' in window)
</script>

Eventually, as web applications started growing in size and complexity, the concept of scoping and the dangers of a global scope became evident and more well-known. Immediately-invoking function expressions (IIFE) were invented and became an instant mainstay. An IIFE worked by wrapping an entire file or portions of a file in a function that executed immediately after evaluation. Each function in JavaScript creates a new level of scoping, meaning var variable bindings would be contained by the IIFE. Even though variable declarations are hoisted to the top of their containing scope, they’d never become implicit globals, thanks to the IIFE wrapper, thus suppressing the brittleness of implicit JavaScript globals.

Several flavors of IIFE can be found in the next example snippet. The code in each IIFE is isolated and can only escape onto the global context via explicit statements such as window.fromIIFE = true.

(function() {
  console.log('IIFE using parenthesis')
})()

~function() {
  console.log('IIFE using a bitwise operator')
}()

void function() {
  console.log('IIFE using the void operator')
}()

Using the IIFE pattern, libraries would typically create modules by exposing and then reusing a single binding on the window object, thus minimizing global namespace pollution. The next snippet shows how we might create a mathlib component with a sum method in one of these IIFE-based libraries. If we wanted to add more modules to mathlib, we could place each of them in a separate IIFE which adds its own methods to the mathlib public interface, while anything else could stay private to the component that defined the new portion of functionality.

void function() {
  window.mathlib = window.mathlib || {}
  window.mathlib.sum = sum

  function sum(...values) {
    return values.reduce((a, b) => a + b, 0)
  }
}()

mathlib.sum(1, 2, 3)
// <- 6

This pattern was, coincidentally, an open invitation for JavaScript tooling to burgeon, allowing developers to — for the first time — concatenate every IIFE module into a single file, reducing the strain on the network. Provided the primitive bundling solutions that existed at the time were able to figure out their way around automatic semicolon insertion and minified content without breaking your application logic.

The problem in the IIFE approach was that there wasn’t an explicit dependency tree. This means developers had to manufacture component file lists in a precise order, so that dependencies would load before any modules that depended on them did — recursively.

1.2.2 RequireJS, AngularJS, and Dependency Injection

This is a problem we’ve hardly had to think about ever since the advent of module systems like RequireJS or the dependency injection mechanism in AngularJS, both of which allowed us to explicitly name the dependencies of each module.

The following example shows we might define the mathlib/sum.js library using RequireJS’s define function, which was added to the global scope. The returned value from the define callback is then used as the public interface for our module.

define(function() {
  return sum

  function sum(...values) {
    return values.reduce((a, b) => a + b, 0)
  }
})

We could then have a mathlib.js module which aggregates all functionality we wanted to include in our library. In our case, it’s just mathlib/sum, but we could list as many dependencies as we wanted in the same way. We’d list each dependency using their paths in an array, and we’d get their public interfaces as parameters passed into our callback, in the same order.

define(['mathlib/sum'], function(sum) {
  return { sum }
})

Now that we’ve defined a library, we can consume it using require. Notice how the dependency chain is resolved for us in the snippet below.

require(['mathlib'], function(mathlib) {
  mathlib.sum(1, 2, 3)
  // <- 6
})

This is the upside in RequireJS and its inherent dependency tree. Regardless of whether our application contained a hundred or thousands of modules, RequireJS would resolve the dependency tree without the need for a carefully maintained list. Given we’ve listed dependencies exactly where they were needed, we’ve eliminated the necessity for a long list of every component and how they’re related to one another, as well as the error-prone process of maintaining such a list. Eliminating such a large source of complexity is merely a side-effect, but not the main benefit.

This explicitness in dependency declaration, at a module level, made it obvious how a component was related to other parts of the application. That explicitness in turn fostered a greater degree of modularity, something that was ineffective before because of how hard it was to follow dependency chains.

RequireJS wasn’t without problems. The entire pattern revolved around its ability to asynchronously load modules, which was ill-advised for production deployments due to how poorly it performed. Using the asynchronous loading mechanism, you issued hundreds of networks requests in a waterfall fashion before much of your code was executed. A different tool would have to be used to optimize builds for production. Then there was the verbosity factor, where you’d end up with long lists of dependencies, a RequireJS function call, and the callback for your module. On that note, there were quite a few different RequireJS functions and several ways of invoking those functions, complicating its use. The API wasn’t the most intuitive, because there were so many ways of doing the same thing: declaring a module with dependencies.

The dependency injection system in AngularJS suffered from many of the same problems. It was an elegant solution at the time, relying on clever string parsing to avoid the dependency array, using function parameter names to resolve dependencies instead. This mechanism was incompatible with minifiers, which would rename parameters to single characters and thus break the injector.

Later in the lifetime of AngularJS v1, a build task was introduced that would transform code like the following:

module.factory('calculator', function(mathlib) {
  //
})

Into the format in the following bit of code, which was minification-safe because it included the explicit dependency list.

module.factory('calculator', ['mathlib', function(mathlib) {
  //
}])

Needless to say, the delay in introducing this little-known build tool, combined with the over-engineered aspect of having an extra build step to un-break something that shouldn’t have been broken, discouraged the use of a pattern that carried such a negligible benefit anyway. Developers mostly chose to stick with the familiar RequireJS-like hard-coded dependency array format.

1.2.3 Node.js and the Advent of CommonJS

Among the many innovations hailed by Node.js, one was the CommonJS module system — or CJS for short. Taking advantage of the fact that Node.js programs had access to the file system, the CommonJS standard is more in line with traditional module loading mechanisms. In CommonJS, each file is a module with its own scope and context. Dependencies are loaded using a synchronous require function that can be dynamically invoked at any time in the lifecycle of a module, as illustrated in the next snippet.

const mathlib = require('./mathlib')

Much like RequireJS and AngularJS, CommonJS dependencies are also referred to by a pathname. The main difference is that the boilerplate function and dependency array are now both gone, and the interface from a module could be assigned to a variable binding, or used anywhere a JavaScript expression could be used.

Unlike RequireJS or AngularJS, CommonJS was rather strict. In RequireJS and AngularJS you could have many dynamically-defined modules per file, whereas CommonJS had a one-to-one mapping between files and modules. At the same time, RequireJS had several ways of declaring a module and AngularJS had several kinds of factories, services, providers and so on — besides the fact that its dependency injection mechanism was tightly coupled to the AngularJS framework itself. CommonJS, in contrast, had a single way of declaring modules. Any JavaScript file was a module, calling require would load dependencies, and anything assigned to module.exports was its interface. This enabled better tooling and code introspection — making it easier for tools to learn the hierarchy of a CommonJS component system.

Eventually, Browserify was invented as a way of bridging the gap between CommonJS modules for Node.js servers and the browser. Using the browserify command-line interface program and providing it with the path to an entry-point module, one could combine an unthinkable amount of modules into a single browser-ready bundle. The killer feature of CommonJS, the npm package registry, was decisive in aiding its takeover of the module loading ecosystem.

Granted, npm wasn’t limited to CommonJS modules or even JavaScript packages, but that was and still is by and large its primary use case. The prospect of having thousands of packages (now over half a million and steadily growing) available in your web application at the press of a few fingertips, combined with the ability to reuse large portions of a system on both the Node.js web server and each client’s web browser, was too much of a competitive advantage for the other systems to keep up.

1.2.4 ES6, import, Babel, and Webpack

As ES6 became standardized in June of 2015, and with Babel transpiling ES6 into ES5 long before then, a new revolution was quickly approaching. The ES6 specification included a module syntax native to JavaScript, often referred to as ECMAScript Modules (ESM).

ESM is largely influenced by CJS and its predecessors, offering a static declarative API as well as a promise-based dynamic programmable API, as illustrated next.

import mathlib from './mathlib'
import('./mathlib').then(mathlib => {
  //
})

In ESM, too, every file is a module with its own scope and context. One major advantage in ESM over CJS is how ESM has — and encourages — a way of statically importing dependencies. Static imports vastly improve the introspection capabilities of module systems, given they can be analyzed statically and lexically extracted from the abstract syntax tree (AST) of each module in the system. Static imports in ESM are constrained to the topmost level of a module, further simplifying parsing and introspection. Another advantage of ESM over CommonJS require() is that ESM specifies a way of doing asynchronous module loading, which implies that parts of an application’s dependency graph could be loaded in response to specific events, concurrently, or lazily as needed. Although this feature is not yet implemented in most environments at the time of this writing, there is strong indication[1] that Node.js would incorporate it in the future.

In Node.js v8.5.0, ESM support was introduced behind an --experimental-modules flag — provided that we use the .mjs file extension for our modules. Most evergreen browsers already support ESM without flags.

Webpack is a successor to Browserify that largely took over in the role of universal module bundler thanks to a broader set of features. Just like in the case of Babel and ES6, Webpack has long supported ESM with both its static import and export statements as well as the dynamic import() function-like expression. It has made a particularly fruitful adoption of ESM, in no little part thanks to the introduction of a "code-splitting" mechanism[2] whereby it’s able to partition an application into different bundles to improve performance on first load experiences.

Given how ESM is native to the language, — as opposed to CJS — it can be expected to completely overtake the module ecosystem in a few years time.

1.3 The Perks of Modular Design

We’ve already addressed the fact that modularity, as opposed to a single shared global scope, helps avoid unexpected clashes in variable names thanks to the diversification of scoping across modules. Beyond a fix for clashes, modularity spread across files limits the amount of complexity we have to pay attention to when working on any one particular feature. In doing so, our team is able to focus on the task at hand and be more productive as a result.

Maintainability, or the ability to effect change in the codebase, also improves significantly because of this. When code is simple and modular, it’s easier to build upon and extend. Maintainability is valuable regardless of team size: even in a team of one, if we leave a piece of code untouched for a few months and then come back to it, it might be hard to improve upon or even understand if we didn’t consider writing maintainable code the first time around.

Modular code is meant to be highly maintainable by default. By keeping pieces of code simple and following the Single Responsibility Principle (SRP), whereby it only aims to fulfill one goal, and combining these simple pieces of code into more sophisticated components, we’re able to compose our way to larger components, and eventually an entire application. When each piece of code in a program is modular, the codebase appears to be simple when we’re looking at individual components, yet on the whole it is able to exhibit complex behaviors, just like the book publishing process we’ve discussed in the beginning of this chapter.

Components in modular applications are defined by their interfaces. The implementation of those components is not their essence, but their interfaces are. When interfaces are well-designed, they can be grown in non-breaking ways, augmenting the amount of use cases they can satisfy, without compromising existing usage. When we have a mindfully designed interface, the implementation behind that interface becomes easy to tweak or swap entirely. Strong interfaces are effective at hiding away weak implementations, that can be later refactored into more robust implementations provided the interface holds. Strong interfaces are also excellent for unit testing, because we won’t have to worry about the implementation and we can test the interface — the inputs and outputs of a component or function. If the interface is well-tested and robust, we can surely consider its implementation in a secondary plane.

Given those implementations are secondary to the foremost requirement of having intuitive interfaces, that aren’t coupled to their implementations, we can concern ourselves with the trade-off between flexibility and simplicity. Flexibility inevitably comes at the cost of added complexity, which is a good reason not to offer flexible interfaces. At the same time, flexibility is often a necessity, and thus we need to strike the right balance by deciding how much rigidity we can get away with in our interfaces. This balance would mean an interface appeases its consumers thanks to its ease of use, but that it also enables advanced or more uncommon use cases when needed, without too much of a detrimental effect on the ease of use or at the cost of greatly enhanced implementation complexity.

We’ll discuss the trade-offs between flexibility, simplicity, composability, and the right amount of future-proofing in the following couple of chapters.

1.4 Modular Granularity

We can apply modular design concepts on every level of a given system. If a project’s demands outgrow its initial scope, maybe we should consider splitting that project into several, smaller projects with smaller teams that are more manageable. The same can be said of applications: when they become large or complex enough, we might want to split them into differentiated products.

When we want to make an application more maintainable, we should consider creating explicitly defined layers of code, so that we can grow each layer horizontally while preventing the complexity of those additions from spreading to other, unrelated, layers. The same thought process can be applied to individual components, splitting them into two or more smaller components that are then tied together by yet another small component, which could act as a composition layer whose sole responsibility is knitting together several underlying components.

At the module level, we should strive to keep functions simple and expressive, with descriptive names and not too many responsibilities. Maybe we’ll have a function dedicated exclusively to pulling together a group of tasks under a particular asynchronous flow, while having other functions for each task that we need to perform within that control flow. The topmost flow controlling function could be exposed as a public interface method for our module, but the only part of it that should be treated as public interface are the parameters that we receive as inputs for that function and the output produced by that same topmost function. Everything else becomes an implementation detail and is, as such, to be considered swappable.

The internal functions of a module won’t have as rigid of an interface either: as long as the public interface holds, we can change the implementation — including the interfaces of functions that make up that implementation — however we want. This is not to say, however, that we should treat those interfaces any less deliberately. The key to proper modular design is in having an utmost respect for all interfaces, and that includes the interfaces exposed by internal functions.

Within functions, we’ll also note a need to componentize aspects of the implementation, giving those aspects a name in the way of function calls, deferring complexity that doesn’t need to be immediately dealt with in the main body of the function until later in the read-through of a given piece of code. We’re writing programs that are meant to be readable and writable for other humans and even ourselves in the future. Virtually everyone who has done any amount of programming has experienced a feeling of frustration when glancing at a piece of code they themselves wrote a few months prior, only to later realize that, with a fresh pair of eyes the design they had then come up with wasn’t as solid as they originally intended.

Remember, computer program development is largely a human and collaborative endeavor. We’re not optimizing for computers to run programs as fast as possible. If we were, we’d be writing binary or hard-coding logic into circuit boards. Instead, our focus is to empower an organization so that its developers can remain productive and able to quickly understand and even modify pieces of code they haven’t ran across before. Working under the soft embrace of conventions and practices — that place developers on an even keel — closes that cycle by making sure future development is consistent with how the application has taken shape up until the present.

Going back to performance, we should be treating it as a feature, where for the most part we don’t place a higher premium on it than we would for other features. Unless performance needs to be a defining feature of our system for business reasons, we shouldn’t worry about ensuring the system runs at top speed on all code paths. Doing so is bound to result in highly complex applications that are hard to maintain, debug, extend, and justify.

We, as developers, often over-do architecture as well, and a lot of the reasoning about performance optimization applies here as well. Laying out an all-encompassing architecture that has the potential to save us trouble as we scale to billions of transactions per second might cost us considerable time spent upfront and possibly also lock us into a series of abstractions that will be hard to keep up with, for no foreseeable gains in the near term. It’s a lot better when we focus on problems we’re already running into, or might soon run into, instead of trying to plan for a hockey-stick growth of infrastructure and throughput without any data to back up the hockey-stick growth we’re anticipating.

When we don’t plan in such a long-term form, an interesting thing occurs: our systems grow more naturally, adapting to the needs of the near-term, gradually progressing towards support for a larger application and larger set of requirements. When that progression is gradual, we notice a corrective behavior in how abstractions are picked up or discarded as we grow. If we settle on abstractions too early, and they end up being the wrong abstractions, we pay dearly for that mistake. Bad abstractions force us to bend entire applications to their will, and once we’ve realized that the abstraction is bad and ought to be removed, we might be so heavily invested in it that pulling out might be costly. This, paired with the sunk cost fallacy, whereby we’re tempted to keep the abstraction just because we’ve spent a lot of time, sweat, and blood on it, can be very hazardous indeed.

We’ll devote an important part of this book to understanding how we can identify and leverage the right abstractions at the right time, so that the risk they incur is minimized.

1.5 Modular JavaScript: A Necessity

Due to its history, JavaScript is particularly interesting when it comes to modular design. In the early days of the web, and for a long time, there weren’t any established practices and few people knew the language beyond showing alert boxes. As a highly dynamic language that wasn’t yet mature enough, JavaScript was at an odd place between statically typed languages like Java or C#, and more heavily used dynamic languages like Python or PHP.

The lack of native modularity in the web — due to the way a program is loaded in chunks using HTML <script> tags — is in stark contrast with any other execution environments where programs can be made up of any number of files and modular architectures are natively supported by the language, its compiler, and its file system based environment. On the web, we’re only now barely beginning to scratch the surface of native modules, something other programming environments have had since their inception. As we discussed in section 1.2, the lack of a native module loading mechanism, paired with the lack of native modules beyond just files that shared a global scope, forced the web community to get creative in its approach to modularity.

The native JavaScript modules specification that eventually landed into the language was heavily influenced by this community-led effort. Even as of this writing we’re still probably some 2 or 3 years away from being able to use the native module system effectively on the web. This shortcoming of the web is evidenced by how patterns that were adopted universally elsewhere, like layered or component-based architectures, weren’t even contemplated on the web for most of its lifetime thus far.

Up until the launch of a Gmail beta client in April, 2004, which demonstrated the power of asynchronous JavaScript HTTP requests to provide a single-page application experience, and then the initial release of jQuery in 2006, which provided a hassle-free cross-browser web development experience, JavaScript was seldom regarded as a serious modern development platform.

With the advent of frameworks like Backbone, Angular, Ember, and React, new techniques and breakthroughs also made an uptick on the web. Writing code under ES6 and beyond, but then transpiling parts of that code down to ES5 to attain broader browser support; shared rendering, using the same code on both server and client to render a page quickly on initial page load and continue to load pages quickly upon navigation; automated code bundling, packing the modules that comprise an application into a single bundle for optimized delivery; bundle-splitting along routes, so that there are several bundles outputted, each optimized for the initially visited route; CSS bundling at the JavaScript module level, so that CSS — which doesn’t feature a native module syntax — can also be split across bundles; and a myriad ways of optimizing assets like images at compile time, improving productivity during development while keeping production deployments highly performant, are all part of the iterative nature of innovation in the web.

This explosion of innovation doesn’t stem from sheer creativity alone but also out of necessity: web applications are getting increasingly complex, as is their scope, purpose, and requirements. It follows logically, then, that the ecosystem around them would grow to accommodate those expanded requirements, in terms of better tooling, better libraries, better coding practices, architectures, standards, patterns, and more choice in general.

In the next chapter we’ll break down the meaning of complexity, and start building fortifications against complexity in the programs we write. By following a few rules around how we encapsulate logic across layers upon layers of components, we’ll commence our journey to simpler program design.


1. You can dive into the specifics through this article from a member of the Node.js team, Myles Borins: https://mjavascript.com/out/esm-node.
2. Code-splitting lets you to split your application into several bundles based on different entry points, and also lets you extract dependencies shared across bundles into a single reusable bundle. Learn more at: https://mjavascript.com/out/code-splitting.