diff --git a/locale/en/docs/guides/blocking-vs-non-blocking.md b/locale/en/docs/guides/blocking-vs-non-blocking.md new file mode 100644 index 000000000000..002992b6988c --- /dev/null +++ b/locale/en/docs/guides/blocking-vs-non-blocking.md @@ -0,0 +1,148 @@ +--- +title: Overview of Blocking vs Non-Blocking +layout: docs.hbs +--- + +# Overview of Blocking vs Non-Blocking + +This overview covers the difference between **blocking** and **non-blocking** +calls in Node.js. This overview will refer to the event loop and libuv but no +prior knowledge of those topics is required. Readers are assumed to have a +basic understanding of the JavaScript language and Node.js callback pattern. + +> "I/O" refers primarily to interaction with the system's disk and +> network supported by [libuv](http://libuv.org/). + + +## Blocking + +**Blocking** is when the execution of additional JavaScript in the Node.js +process must wait until a non-JavaScript operation completes. This happens +because the event loop is unable to continue running JavaScript while a +**blocking** operation is occurring. + +In Node.js, JavaScript that exhibits poor performance due to being CPU intensive +rather than waiting on a non-JavaScript operation, such as I/O, isn't typically +referred to as **blocking**. Synchronous methods in the Node.js standard library +that use libuv are the most commonly used **blocking** operations. Native +modules may also have **blocking** methods. + +All of the I/O methods in the Node.js standard library provide asynchronous +versions, which are **non-blocking**, and accept callback functions. Some +methods also have **blocking** counterparts, which have names that end with +`Sync`. + + +## Comparing Code + +**Blocking** methods execute **synchronously** and **non-blocking** methods +execute **asynchronously**. + +Using the File System module as an example, this is a **synchronous** file read: + +```js +const fs = require('fs'); +const data = fs.readFileSync('/file.md'); // blocks here until file is read +``` + +And here is an equivalent **asynchronous** example: + +```js +const fs = require('fs'); +fs.readFile('/file.md', (err, data) => { + if (err) throw err; +}); +``` + +The first example appears simpler than the second but has the disadvantage of +the second line **blocking** the execution of any additional JavaScript until +the entire file is read. Note that in the synchronous version if an error is +thrown it will need to be caught or the process will crash. In the asynchronous +version, it is up to the author to decide whether an error should throw as +shown. + +Let's expand our example a little bit: + +```js +const fs = require('fs'); +const data = fs.readFileSync('/file.md'); // blocks here until file is read +console.log(data); +// moreWork(); will run after console.log +``` + +And here is a similar, but not equivalent asynchronous example: + +```js +const fs = require('fs'); +fs.readFile('/file.md', (err, data) => { + if (err) throw err; + console.log(data); +}); +// moreWork(); will run before console.log +``` + +In the first example above, `console.log` will be called before `moreWork()`. In +the second example `fs.readFile()` is **non-blocking** so JavaScript execution +can continue and `moreWork()` will be called first. The ability to run +`moreWork()` without waiting for the file read to complete is a key design +choice that allows for higher throughput. + + +## Concurrency and Throughput + +JavaScript execution in Node.js is single threaded, so concurrency refers to the +event loop's capacity to execute JavaScript callback functions after completing +other work. Any code that is expected to run in a concurrent manner must allow +the event loop to continue running as non-JavaScript operations, like I/O, are +occurring. + +As an example, let's consider a case where each request to a web server takes +50ms to complete and 45ms of that 50ms is database I/O that can be done +asynchronously. Choosing **non-blocking** asynchronous operations frees up that +45ms per request to handle other requests. This is a significant difference in +capacity just by choosing to use **non-blocking** methods instead of +**blocking** methods. + +The event loop is different than models in many other languages where additional +threads may be created to handle concurrent work. + + +## Dangers of Mixing Blocking and Non-Blocking Code + +There are some patterns that should be avoided when dealing with I/O. Let's look +at an example: + +```js +const fs = require('fs'); +fs.readFile('/file.md', (err, data) => { + if (err) throw err; + console.log(data); +}); +fs.unlinkSync('/file.md'); +``` + +In the above example, `fs.unlinkSync()` is likely to be run before +`fs.readFile()`, which would delete `file.md` before it is actually read. A +better way to write this that is completely **non-blocking** and guaranteed to +execute in the correct order is: + + +```js +const fs = require('fs'); +fs.readFile('/file.md', (err, data) => { + if (err) throw err; + console.log(data); + fs.unlink('/file.md', (err) => { + if (err) throw err; + }); +}); +``` + +The above places a **non-blocking** call to `fs.unlink()` within the callback of +`fs.readFile()` which guarantees the correct order of operations. + + +## Additional Resources + +- [libuv](http://libuv.org/) +- [About Node.js](https://nodejs.org/en/about/) diff --git a/locale/en/docs/guides/domain-postmortem.md b/locale/en/docs/guides/domain-postmortem.md new file mode 100644 index 000000000000..a2f9d1941ddf --- /dev/null +++ b/locale/en/docs/guides/domain-postmortem.md @@ -0,0 +1,444 @@ +--- +title: Domain Module Postmortem +layout: docs.hbs +--- + +# Domain Module Postmortem + +## Usability Issues + +### Implicit Behavior + +It's possible for a developer to create a new domain and then simply run +`domain.enter()`. Which then acts as a catch-all for any exception in the +future that couldn't be observed by the thrower. Allowing a module author to +intercept the exceptions of unrelated code in a different module. Preventing +the originator of the code from knowing about its own exceptions. + +Here's an example of how one indirectly linked modules can affect another: + +```js +// module a.js +const b = require('./b'); +const c = require('./c'); + + +// module b.js +const d = require('domain').create(); +d.on('error', () => { /* silence everything */ }); +d.enter(); + + +// module c.js +const dep = require('some-dep'); +dep.method(); // Uh-oh! This method doesn't actually exist. +``` + +Since module `b` enters the domain but never exits any uncaught exception will +be swallowed. Leaving module `c` in the dark as to why it didn't run the entire +script. Leaving a potentially partially populated `module.exports`. Doing this +is not the same as listening for `'uncaughtException'`. As the latter is +explicitly meant to globally catch errors. The other issue is that domains are +processed prior to any `'uncaughtException'` handlers, and prevent them from +running. + +Another issue is that domains route errors automatically if no `'error'` +handler was set on the event emitter. There is no opt-in mechanism for this, +and automatically propagates across the entire asynchronous chain. This may +seem useful at first, but once asynchronous calls are two or more modules deep +and one of them doesn't include an error handler the creator of the domain will +suddenly be catching unexpected exceptions, and the thrower's exception will go +unnoticed by the author. + +The following is a simple example of how a missing `'error'` handler allows +the active domain to hijack the error: + +```js +const domain = require('domain'); +const net = require('net'); +const d = domain.create(); +d.on('error', (err) => console.error(err.message)); + +d.run(() => net.createServer((c) => { + c.end(); + c.write('bye'); +}).listen(8000)); +``` + +Even manually removing the connection via `d.remove(c)` does not prevent the +connection's error from being automatically intercepted. + +Failures that plagues both error routing and exception handling are the +inconsistencies in how errors are bubbled. The following is an example of how +nested domains will and won't bubble the exception based on when they happen: + +```js +const domain = require('domain'); +const net = require('net'); +const d = domain.create(); +d.on('error', () => console.error('d intercepted an error')); + +d.run(() => { + const server = net.createServer((c) => { + const e = domain.create(); // No 'error' handler being set. + e.run(() => { + // This will not be caught by d's error handler. + setImmediate(() => { + throw new Error('thrown from setImmediate'); + }); + // Though this one will bubble to d's error handler. + throw new Error('immediately thrown'); + }); + }).listen(8080); +}); +``` + +It may be expected that nested domains always remain nested, and will always +propagate the exception up the domain stack. Or that exceptions will never +automatically bubble. Unfortunately both these situations occur, leading to +potentially confusing behavior that may even be prone to difficult to debug +timing conflicts. + + +### API Gaps + +While APIs based on using `EventEmitter` can use `bind()` and errback style +callbacks can use `intercept()`, alternative APIs that implicitly bind to the +active domain must be executed inside of `run()`. Meaning if module authors +wanted to support domains using a mechanism alternative to those mentioned they +must manually implement domain support themselves. Instead of being able to +leverage the implicit mechanisms already in place. + + +### Error Propagation + +Propagating errors across nested domains is not straight forward, if even +possible. Existing documentation shows a simple example of how to `close()` an +`http` server if there is an error in the request handler. What it does not +explain is how to close the server if the request handler creates another +domain instance for another async request. Using the following as a simple +example of the failing of error propagation: + +```js +const d1 = domain.create(); +d1.foo = true; // custom member to make more visible in console +d1.on('error', (er) => { /* handle error */ }); + +d1.run(() => setTimeout(() => { + const d2 = domain.create(); + d2.bar = 43; + d2.on('error', (er) => console.error(er.message, domain._stack)); + d2.run(() => { + setTimeout(() => { + setTimeout(() => { + throw new Error('outer'); + }); + throw new Error('inner') + }); + }); +})); +``` + +Even in the case that the domain instances are being used for local storage so +access to resources are made available there is still no way to allow the error +to continue propagating from `d2` back to `d1`. Quick inspection may tell us +that simply throwing from `d2`'s domain `'error'` handler would allow `d1` to +then catch the exception and execute its own error handler. Though that is not +the case. Upon inspection of `domain._stack` you'll see that the stack only +contains `d2`. + +This may be considered a failing of the API, but even if it did operate in this +way there is still the issue of transmitting the fact that a branch in the +asynchronous execution has failed, and that all further operations in that +branch must cease. In the example of the http request handler, if we fire off +several asynchronous requests and each one then `write()`'s data back to the +client many more errors will arise from attempting to `write()` to a closed +handle. More on this in _Resource Cleanup on Exception_. + + +### Resource Cleanup on Exception + +The following script contains a more complex example of properly cleaning up +in a small resource dependency tree in the case that an exception occurs in a +given connection or any of its dependencies. Breaking down the script into its +basic operations: + +```js +'use strict'; + +const domain = require('domain'); +const EE = require('events'); +const fs = require('fs'); +const net = require('net'); +const util = require('util'); +const print = process._rawDebug; + +const pipeList = []; +const FILENAME = '/tmp/tmp.tmp'; +const PIPENAME = '/tmp/node-domain-example-'; +const FILESIZE = 1024; +var uid = 0; + +// Setting up temporary resources +const buf = Buffer(FILESIZE); +for (var i = 0; i < buf.length; i++) + buf[i] = ((Math.random() * 1e3) % 78) + 48; // Basic ASCII +fs.writeFileSync(FILENAME, buf); + +function ConnectionResource(c) { + EE.call(this); + this._connection = c; + this._alive = true; + this._domain = domain.create(); + this._id = Math.random().toString(32).substr(2).substr(0, 8) + (++uid); + + this._domain.add(c); + this._domain.on('error', () => { + this._alive = false; + }); +} +util.inherits(ConnectionResource, EE); + +ConnectionResource.prototype.end = function end(chunk) { + this._alive = false; + this._connection.end(chunk); + this.emit('end'); +}; + +ConnectionResource.prototype.isAlive = function isAlive() { + return this._alive; +}; + +ConnectionResource.prototype.id = function id() { + return this._id; +}; + +ConnectionResource.prototype.write = function write(chunk) { + this.emit('data', chunk); + return this._connection.write(chunk); +}; + +// Example begin +net.createServer((c) => { + const cr = new ConnectionResource(c); + + const d1 = domain.create(); + fs.open(FILENAME, 'r', d1.intercept((fd) => { + streamInParts(fd, cr, 0); + })); + + pipeData(cr); + + c.on('close', () => cr.end()); +}).listen(8080); + +function streamInParts(fd, cr, pos) { + const d2 = domain.create(); + var alive = true; + d2.on('error', (er) => { + print('d2 error:', er.message) + cr.end(); + }); + fs.read(fd, new Buffer(10), 0, 10, pos, d2.intercept((bRead, buf) => { + if (!cr.isAlive()) { + return fs.close(fd); + } + if (cr._connection.bytesWritten < FILESIZE) { + // Documentation says callback is optional, but doesn't mention that if + // the write fails an exception will be thrown. + const goodtogo = cr.write(buf); + if (goodtogo) { + setTimeout(() => streamInParts(fd, cr, pos + bRead), 1000); + } else { + cr._connection.once('drain', () => streamInParts(fd, cr, pos + bRead)); + } + return; + } + cr.end(buf); + fs.close(fd); + })); +} + +function pipeData(cr) { + const pname = PIPENAME + cr.id(); + const ps = net.createServer(); + const d3 = domain.create(); + const connectionList = []; + d3.on('error', (er) => { + print('d3 error:', er.message); + cr.end(); + }); + d3.add(ps); + ps.on('connection', (conn) => { + connectionList.push(conn); + conn.on('data', () => {}); // don't care about incoming data. + conn.on('close', () => { + connectionList.splice(connectionList.indexOf(conn), 1); + }); + }); + cr.on('data', (chunk) => { + for (var i = 0; i < connectionList.length; i++) { + connectionList[i].write(chunk); + } + }); + cr.on('end', () => { + for (var i = 0; i < connectionList.length; i++) { + connectionList[i].end(); + } + ps.close(); + }); + pipeList.push(pname); + ps.listen(pname); +} + +process.on('SIGINT', () => process.exit()); +process.on('exit', () => { + try { + for (var i = 0; i < pipeList.length; i++) { + fs.unlinkSync(pipeList[i]); + } + fs.unlinkSync(FILENAME); + } catch (e) { } +}); + +``` + +- When a new connection happens, concurrently: + - Open a file on the file system + - Open Pipe to unique socket +- Read a chunk of the file asynchronously +- Write chunk to both the TCP connection and any listening sockets +- If any of these resources error, notify all other attached resources that + they need to clean up and shutdown + +As we can see from this example a lot more must be done to properly clean up +resources when something fails than what can be done strictly through the +domain API. All that domains offer is an exception aggregation mechanism. Even +the potentially useful ability to propagate data with the domain is easily +countered, in this example, by passing the needed resources as a function +argument. + +One problem domains perpetuated was the supposed simplicity of being able to +continue execution, contrary to what the documentation stated, of the +application despite an unexpected exception. This example demonstrates the +fallacy behind that idea. + +Attempting proper resource cleanup on unexpected exception becomes more complex +as the application itself grows in complexity. This example only has 3 basic +resources in play, and all of them with a clear dependency path. If an +application uses something like shared resources or resource reuse the ability +to cleanup, and properly test that cleanup has been done, grows greatly. + +In the end, in terms of handling errors, domains aren't much more than a +glorified `'uncaughtException'` handler. Except with more implicit and +unobservable behavior by third-parties. + + +### Resource Propagation + +Another use case for domains was to use it to propagate data along asynchronous +data paths. One problematic point is the ambiguity of when to expect the +correct domain when there are multiple in the stack (which must be assumed if +the async stack works with other modules). Also the conflict between being +able to depend on a domain for error handling while also having it available to +retrieve the necessary data. + +The following is a involved example demonstrating the failing using domains to +propagate data along asynchronous stacks: + +```js +const domain = require('domain'); +const net = require('net'); + +const server = net.createServer((c) => { + // Use a domain to propagate data across events within the + // connection so that we don't have to pass arguments + // everywhere. + const d = domain.create(); + d.data = { connection: c }; + d.add(c); + // Mock class that does some useless async data transformation + // for demonstration purposes. + const ds = new DataStream(dataTransformed); + c.on('data', (chunk) => ds.data(chunk)); +}).listen(8080, () => console.log(`listening on 8080`)); + +function dataTransformed(chunk) { + // FAIL! Because the DataStream instance also created a + // domain we have now lost the active domain we had + // hoped to use. + domain.active.data.connection.write(chunk); +} + +function DataStream(cb) { + this.cb = cb; + // DataStream wants to use domains for data propagation too! + // Unfortunately this will conflict with any domain that + // already exists. + this.domain = domain.create(); + this.domain.data = { inst: this }; +} + +DataStream.prototype.data = function data(chunk) { + // This code is self contained, but pretend it's a complex + // operation that crosses at least one other module. So + // passing along "this", etc., is not easy. + this.domain.run(function() { + // Simulate an async operation that does the data transform. + setImmediate(() => { + for (var i = 0; i < chunk.length; i++) + chunk[i] = ((chunk[i] + Math.random() * 100) % 96) + 33; + // Grab the instance from the active domain and use that + // to call the user's callback. + const self = domain.active.data.inst; + self.cb.call(self, chunk); + }); + }); +}; +``` + +The above shows that it is difficult to have more than one asynchronous API +attempt to use domains to propagate data. This example could possibly be fixed +by assigning `parent: domain.active` in the `DataStream` constructor. Then +restoring it via `domain.active = domain.active.data.parent` just before the +user's callback is called. Also the instantiation of `DataStream` in the +`'connection'` callback must be run inside `d.run()`, instead of simply using +`d.add(c)`, otherwise there will be no active domain. + +In short, for this to have a prayer of a chance usage would need to strictly +adhere to a set of guidelines that would be difficult to enforce or test. + + +## Performance Issues + +A significant deterrent from using domains is the overhead. Using node's +built-in http benchmark, `http_simple.js`, without domains it can handle over +22,000 requests/second. Whereas if it's run with `NODE_USE_DOMAINS=1` that +number drops down to under 17,000 requests/second. In this case there is only +a single global domain. If we edit the benchmark so the http request callback +creates a new domain instance performance drops further to 15,000 +requests/second. + +While this probably wouldn't affect a server only serving a few hundred or even +a thousand requests per second, the amount of overhead is directly proportional +to the number of asynchronous requests made. So if a single connection needs to +connect to several other services all of those will contribute to the overall +latency of delivering the final product to the client. + +Using `AsyncWrap` and tracking the number of times +`init`/`pre`/`post`/`destroy` are called in the mentioned benchmark we find +that the sum of all events called is over 170,000 times per second. This means +even adding 1 microsecond overhead per call for any type of setup or tear down +will result in a 17% performance loss. Granted, this is for the optimized +scenario of the benchmark, but I believe this demonstrates the necessity for a +mechanism such as domain to be as cheap to run as possible. + + +## Looking Ahead + +The domain module has been soft deprecated since Dec 2014, but has not yet been +removed because node offers no alternative functionality at the moment. As of +this writing there is ongoing work building out the `AsyncWrap` API and a +proposal for Zones being prepared for the TC39. At such time there is suitable +functionality to replace domains it will undergo the full deprecation cycle and +eventually be removed from core. diff --git a/locale/en/docs/guides/event-loop-timers-and-nexttick.md b/locale/en/docs/guides/event-loop-timers-and-nexttick.md new file mode 100644 index 000000000000..a0a60735c67e --- /dev/null +++ b/locale/en/docs/guides/event-loop-timers-and-nexttick.md @@ -0,0 +1,492 @@ +--- +title: The Node.js Event Loop, Timers, and process.nextTick() +layout: docs.hbs +--- + +# The Node.js Event Loop, Timers, and `process.nextTick()` + +## What is the Event Loop? + +The event loop is what allows Node.js to perform non-blocking I/O +operations — despite the fact that JavaScript is single-threaded — by +offloading operations to the system kernel whenever possible. + +Since most modern kernels are multi-threaded, they can handle multiple +operations executing in the background. When one of these operations +completes, the kernel tells Node.js so that the appropriate callback +may be added to the **poll** queue to eventually be executed. We'll explain +this in further detail later in this topic. + +## Event Loop Explained + +When Node.js starts, it initializes the event loop, processes the +provided input script (or drops into the [REPL][], which is not covered in +this document) which may make async API calls, schedule timers, or call +`process.nextTick()`, then begins processing the event loop. + +The following diagram shows a simplified overview of the event loop's +order of operations. + +``` + ┌───────────────────────┐ +┌─>│ timers │ +│ └──────────┬────────────┘ +│ ┌──────────┴────────────┐ +│ │ I/O callbacks │ +│ └──────────┬────────────┘ +│ ┌──────────┴────────────┐ +│ │ idle, prepare │ +│ └──────────┬────────────┘ ┌───────────────┐ +│ ┌──────────┴────────────┐ │ incoming: │ +│ │ poll │<─────┤ connections, │ +│ └──────────┬────────────┘ │ data, etc. │ +│ ┌──────────┴────────────┐ └───────────────┘ +│ │ check │ +│ └──────────┬────────────┘ +│ ┌──────────┴────────────┐ +└──┤ close callbacks │ + └───────────────────────┘ +``` + +*note: each box will be referred to as a "phase" of the event loop.* + +Each phase has a FIFO queue of callbacks to execute. While each phase is +special in its own way, generally, when the event loop enters a given +phase, it will perform any operations specific to that phase, then +execute callbacks in that phase's queue until the queue has been +exhausted or the maximum number of callbacks has executed. When the +queue has been exhausted or the callback limit is reached, the event +loop will move to the next phase, and so on. + +Since any of these operations may schedule _more_ operations and new +events processed in the **poll** phase are queued by the kernel, poll +events can be queued while polling events are being processed. As a +result, long running callbacks can allow the poll phase to run much +longer than a timer's threshold. See the [**timers**](#timers) and +[**poll**](#poll) sections for more details. + +_**NOTE:** There is a slight discrepancy between the Windows and the +Unix/Linux implementation, but that's not important for this +demonstration. The most important parts are here. There are actually +seven or eight steps, but the ones we care about — ones that Node.js +actually uses - are those above._ + + +## Phases Overview + +* **timers**: this phase executes callbacks scheduled by `setTimeout()` + and `setInterval()`. +* **I/O callbacks**: executes almost all callbacks with the exception of + close callbacks, the ones scheduled by timers, and `setImmediate()`. +* **idle, prepare**: only used internally. +* **poll**: retrieve new I/O events; node will block here when appropriate. +* **check**: `setImmediate()` callbacks are invoked here. +* **close callbacks**: e.g. `socket.on('close', ...)`. + +Between each run of the event loop, Node.js checks if it is waiting for +any asynchronous I/O or timers and shuts down cleanly if there are not +any. + +## Phases in Detail + +### timers + +A timer specifies the **threshold** _after which_ a provided callback +_may be executed_ rather than the **exact** time a person _wants it to +be executed_. Timers callbacks will run as early as they can be +scheduled after the specified amount of time has passed; however, +Operating System scheduling or the running of other callbacks may delay +them. + +_**Note**: Technically, the [**poll** phase](#poll) controls when timers +are executed._ + +For example, say you schedule a timeout to execute after a 100 ms +threshold, then your script starts asynchronously reading a file which +takes 95 ms: + +```js + +var fs = require('fs'); + +function someAsyncOperation (callback) { + // Assume this takes 95ms to complete + fs.readFile('/path/to/file', callback); +} + +var timeoutScheduled = Date.now(); + +setTimeout(function () { + + var delay = Date.now() - timeoutScheduled; + + console.log(delay + "ms have passed since I was scheduled"); +}, 100); + + +// do someAsyncOperation which takes 95 ms to complete +someAsyncOperation(function () { + + var startCallback = Date.now(); + + // do something that will take 10ms... + while (Date.now() - startCallback < 10) { + ; // do nothing + } + +}); +``` + +When the event loop enters the **poll** phase, it has an empty queue +(`fs.readFile()` has not completed), so it will wait for the number of ms +remaining until the soonest timer's threshold is reached. While it is +waiting 95 ms pass, `fs.readFile()` finishes reading the file and its +callback which takes 10 ms to complete is added to the **poll** queue and +executed. When the callback finishes, there are no more callbacks in the +queue, so the event loop will see that the threshold of the soonest +timer has been reached then wrap back to the **timers** phase to execute +the timer's callback. In this example, you will see that the total delay +between the timer being scheduled and its callback being executed will +be 105ms. + +Note: To prevent the **poll** phase from starving the event loop, [libuv][] +(the C library that implements the Node.js +event loop and all of the asynchronous behaviors of the platform) +also has a hard maximum (system dependent) before it stops polling for +more events. + +### I/O callbacks + +This phase executes callbacks for some system operations such as types +of TCP errors. For example if a TCP socket receives `ECONNREFUSED` when +attempting to connect, some \*nix systems want to wait to report the +error. This will be queued to execute in the **I/O callbacks** phase. + +### poll + +The **poll** phase has two main functions: + +1. Executing scripts for timers whose threshold has elapsed, then +2. Processing events in the **poll** queue. + +When the event loop enters the **poll** phase _and there are no timers +scheduled_, one of two things will happen: + +* _If the **poll** queue **is not empty**_, the event loop will iterate +through its queue of callbacks executing them synchronously until +either the queue has been exhausted, or the system-dependent hard limit +is reached. + +* _If the **poll** queue **is empty**_, one of two more things will +happen: + * If scripts have been scheduled by `setImmediate()`, the event loop + will end the **poll** phase and continue to the **check** phase to + execute those scheduled scripts. + + * If scripts **have not** been scheduled by `setImmediate()`, the + event loop will wait for callbacks to be added to the queue, then + execute them immediately. + +Once the **poll** queue is empty the event loop will check for timers +_whose time thresholds have been reached_. If one or more timers are +ready, the event loop will wrap back to the **timers** phase to execute +those timers' callbacks. + +### check + +This phase allows a person to execute callbacks immediately after the +**poll** phase has completed. If the **poll** phase becomes idle and +scripts have been queued with `setImmediate()`, the event loop may +continue to the **check** phase rather than waiting. + +`setImmediate()` is actually a special timer that runs in a separate +phase of the event loop. It uses a libuv API that schedules callbacks to +execute after the **poll** phase has completed. + +Generally, as the code is executed, the event loop will eventually hit +the **poll** phase where it will wait for an incoming connection, request, +etc. However, if a callback has been scheduled with `setImmediate()` +and the **poll** phase becomes idle, it will end and continue to the +**check** phase rather than waiting for **poll** events. + +### close callbacks + +If a socket or handle is closed abruptly (e.g. `socket.destroy()`), the +`'close'` event will be emitted in this phase. Otherwise it will be +emitted via `process.nextTick()`. + +## `setImmediate()` vs `setTimeout()` + +`setImmediate` and `setTimeout()` are similar, but behave in different +ways depending on when they are called. + +* `setImmediate()` is designed to execute a script once the current +**poll** phase completes. +* `setTimeout()` schedules a script to be run after a minimum threshold +in ms has elapsed. + +The order in which the timers are executed will vary depending on the +context in which they are called. If both are called from within the +main module, then timing will be bound by the performance of the process +(which can be impacted by other applications running on the machine). + +For example, if we run the following script which is not within an I/O +cycle (i.e. the main module), the order in which the two timers are +executed is non-deterministic, as it is bound by the performance of the +process: + + +```js +// timeout_vs_immediate.js +setTimeout(function timeout () { + console.log('timeout'); +},0); + +setImmediate(function immediate () { + console.log('immediate'); +}); +``` + +``` +$ node timeout_vs_immediate.js +timeout +immediate + +$ node timeout_vs_immediate.js +immediate +timeout +``` + +However, if you move the two calls within an I/O cycle, the immediate +callback is always executed first: + +```js +// timeout_vs_immediate.js +var fs = require('fs') + +fs.readFile(__filename, () => { + setTimeout(() => { + console.log('timeout') + }, 0) + setImmediate(() => { + console.log('immediate') + }) +}) +``` + +``` +$ node timeout_vs_immediate.js +immediate +timeout + +$ node timeout_vs_immediate.js +immediate +timeout +``` + +The main advantage to using `setImmediate()` over `setTimeout()` is +`setImmediate()` will always be executed before any timers if scheduled +within an I/O cycle, independently of how many timers are present. + +## `process.nextTick()` + +### Understanding `process.nextTick()` + +You may have noticed that `process.nextTick()` was not displayed in the +diagram, even though it's a part of the asynchronous API. This is because +`process.nextTick()` is not technically part of the event loop. Instead, +the `nextTickQueue` will be processed after the current operation +completes, regardless of the current phase of the event loop. + +Looking back at our diagram, any time you call `process.nextTick()` in a +given phase, all callbacks passed to `process.nextTick()` will be +resolved before the event loop continues. This can create some bad +situations because **it allows you to "starve" your I/O by making +recursive `process.nextTick()` calls**, which prevents the event loop +from reaching the **poll** phase. + +### Why would that be allowed? + +Why would something like this be included in Node.js? Part of it is a +design philosophy where an API should always be asynchronous even where +it doesn't have to be. Take this code snippet for example: + +```js +function apiCall (arg, callback) { + if (typeof arg !== 'string') + return process.nextTick(callback, + new TypeError('argument should be string')); +} +``` + +The snippet does an argument check and if it's not correct, it will pass +the error to the callback. The API updated fairly recently to allow +passing arguments to `process.nextTick()` allowing it to take any +arguments passed after the callback to be propagated as the arguments to +the callback so you don't have to nest functions. + +What we're doing is passing an error back to the user but only *after* +we have allowed the rest of the user's code to execute. By using +`process.nextTick()` we guarantee that `apiCall()` always runs its +callback *after* the rest of the user's code and *before* the event loop +is allowed to proceed. To achieve this, the JS call stack is allowed to +unwind then immediately execute the provided callback which allows a +person to make recursive calls to `process.nextTick()` without reaching a +`RangeError: Maximum call stack size exceeded from v8`. + +This philosophy can lead to some potentially problematic situations. +Take this snippet for example: + +```js +// this has an asynchronous signature, but calls callback synchronously +function someAsyncApiCall (callback) { callback(); }; + +// the callback is called before `someAsyncApiCall` completes. +someAsyncApiCall(() => { + + // since someAsyncApiCall has completed, bar hasn't been assigned any value + console.log('bar', bar); // undefined + +}); + +var bar = 1; +``` + +The user defines `someAsyncApiCall()` to have an asynchronous signature, +but it actually operates synchronously. When it is called, the callback +provided to `someAsyncApiCall()` is called in the same phase of the +event loop because `someAsyncApiCall()` doesn't actually do anything +asynchronously. As a result, the callback tries to reference `bar` even +though it may not have that variable in scope yet, because the script has not +been able to run to completion. + +By placing the callback in a `process.nextTick()`, the script still has the +ability to run to completion, allowing all the variables, functions, +etc., to be initialized prior to the callback being called. It also has +the advantage of not allowing the event loop to continue. It may be +useful for the user to be alerted to an error before the event loop is +allowed to continue. Here is the previous example using `process.nextTick()`: + +```js +function someAsyncApiCall (callback) { + process.nextTick(callback); +}; + +someAsyncApiCall(() => { + console.log('bar', bar); // 1 +}); + +var bar = 1; +``` + +Here's another real world example: + +```js +const server = net.createServer(() => {}).listen(8080); + +server.on('listening', () => {}); +``` + +When only a port is passed the port is bound immediately. So the +`'listening'` callback could be called immediately. Problem is that the +`.on('listening')` will not have been set by that time. + +To get around this the `'listening'` event is queued in a `nextTick()` +to allow the script to run to completion. Which allows the user to set +any event handlers they want. + +## `process.nextTick()` vs `setImmediate()` + +We have two calls that are similar as far as users are concerned, but +their names are confusing. + +* `process.nextTick()` fires immediately on the same phase +* `setImmediate()` fires on the following iteration or 'tick' of the +event loop + +In essence, the names should be swapped. `process.nextTick()` fires more +immediately than `setImmediate()` but this is an artifact of the past +which is unlikely to change. Making this switch would break a large +percentage of the packages on npm. Every day more new modules are being +added, which mean every day we wait, more potential breakages occur. +While they are confusing, the names themselves won't change. + +*We recommend developers use `setImmediate()` in all cases because it's +easier to reason about (and it leads to code that's compatible with a +wider variety of environments, like browser JS.)* + +## Why use `process.nextTick()`? + +There are two main reasons: + +1. Allow users to handle errors, cleanup any then unneeded resources, or +perhaps try the request again before the event loop continues. + +2. At times it's necessary to allow a callback to run after the call +stack has unwound but before the event loop continues. + +One example is to match the user's expectations. Simple example: + +```js +var server = net.createServer(); +server.on('connection', function(conn) { }); + +server.listen(8080); +server.on('listening', function() { }); +``` + +Say that `listen()` is run at the beginning of the event loop, but the +listening callback is placed in a `setImmediate()`. Now, unless a +hostname is passed binding to the port will happen immediately. Now for +the event loop to proceed it must hit the **poll** phase, which means +there is a non-zero chance that a connection could have been received +allowing the connection event to be fired before the listening event. + +Another example is running a function constructor that was to, say, +inherit from `EventEmitter` and it wanted to call an event within the +constructor: + +```js +const EventEmitter = require('events'); +const util = require('util'); + +function MyEmitter() { + EventEmitter.call(this); + this.emit('event'); +} +util.inherits(MyEmitter, EventEmitter); + +const myEmitter = new MyEmitter(); +myEmitter.on('event', function() { + console.log('an event occurred!'); +}); +``` + +You can't emit an event from the constructor immediately +because the script will not have processed to the point where the user +assigns a callback to that event. So, within the constructor itself, +you can use `process.nextTick()` to set a callback to emit the event +after the constructor has finished, which provides the expected results: + +```js +const EventEmitter = require('events'); +const util = require('util'); + +function MyEmitter() { + EventEmitter.call(this); + + // use nextTick to emit the event once a handler is assigned + process.nextTick(function () { + this.emit('event'); + }.bind(this)); +} +util.inherits(MyEmitter, EventEmitter); + +const myEmitter = new MyEmitter(); +myEmitter.on('event', function() { + console.log('an event occurred!'); +}); +``` + +[libuv]: http://libuv.org +[REPL]: https://nodejs.org/api/repl.html#repl_repl diff --git a/locale/en/docs/guides/timers-in-node.md b/locale/en/docs/guides/timers-in-node.md new file mode 100644 index 000000000000..3288eca20aae --- /dev/null +++ b/locale/en/docs/guides/timers-in-node.md @@ -0,0 +1,192 @@ +--- +title: Timers in Node.js +layout: docs.hbs +--- + +# Timers in Node.js and beyond + +The Timers module in Node.js contains functions that execute code after a set +period of time. Timers do not need to be imported via `require()`, since +all the methods are available globally to emulate the browser JavaScript API. +To fully understand when timer functions will be executed, it's a good idea to +read up on the the Node.js +[Event Loop](../topics/event-loop-timers-and-nexttick.md). + +## Controlling the Time Continuum with Node.js + +The Node.js API provides several ways of scheduling code to execute at +some point after the present moment. The functions below may seem familiar, +since they are available in most browsers, but Node.js actually provides +its own implementation of these methods. Timers integrate very closely +with the system, and despite the fact that the API mirrors the browser +API, there are some differences in implementation. + +### "When I say so" Execution ~ *`setTimeout()`* + +`setTimeout()` can be used to schedule code execution after a designated +amount of milliseconds. This function is similar to +[`window.setTimeout()`](https://developer.mozilla.org/en-US/docs/Web/API/WindowTimers/setTimeout) +from the browser JavaScript API, however a string of code cannot be passed +to be executed. + +`setTimeout()` accepts a function to execute as its first argument and the +millisecond delay defined as a number as the second argument. Additional +arguments may also be included and these will be passed on to the function. Here +is an example of that: + +```js +function myFunc (arg) { + console.log('arg was => ' + arg); +} + +setTimeout(myFunc, 1500, 'funky'); +``` + +The above function `myFunc()` will execute as close to 1500 +milliseconds (or 1.5 seconds) as possible due to the call of `setTimeout()`. + +The timeout interval that is set cannot be relied upon to execute after +that *exact* number of milliseconds. This is because other executing code that +blocks or holds onto the event loop will push the execution of the timeout +back. The *only* guarantee is that the timeout will not execute *sooner* than +the declared timeout interval. + +`setTimeout()` returns a `Timeout` object that can be used to reference the +timeout that was set. This returned object can be used to cancel the timeout ( +see `clearTimeout()` below) as well as change the execution behavior (see +`unref()` below). + +### "Right after this" Execution ~ *`setImmediate()`* + +`setImmediate()` will execute code at the end of the current event loop cycle. +This code will execute *after* any I/O operations in the current event loop and +*before* any timers scheduled for the next event loop. This code execution +could be thought of as happening "right after this", meaning any code following +the `setImmediate()` function call will execute before the `setImmediate()` +function argument. + +The first argument to `setImmediate()` will be the function to execute. Any +subsequent arguments will be passed to the function when it is executed. +Here's an example: + +```js +console.log('before immediate'); + +setImmediate((arg) => { + console.log(`executing immediate: ${arg}`); +}, 'so immediate'); + +console.log('after immediate'); +``` + +The above function passed to `setImmediate()` will execute after all runnable +code has executed, and the console output will be: + +``` +before immediate +after immediate +executing immediate: so immediate +``` + +`setImmediate()` returns and `Immediate` object, which can be used to cancel +the scheduled immediate (see `clearImmediate()` below). + +Note: Don't get `setImmediate()` confused with `process.nextTick()`. There are +some major ways they differ. The first is that `process.nextTick()` will run +*before* any `Immediate`s that are set as well as before any scheduled I/O. +The second is that `process.nextTick()` is non-clearable, meaning once +code has been scheduled to execute with `process.nextTick()`, the execution +cannot be stopped, just like with a normal function. Refer to [this guide](../topics/event-loop-timers-and-nexttick.md#processnexttick) +to better understand the operation of `process.nextTick()`. + +### "Infinite Loop" Execution ~ *`setInterval()`* + +If there is a block of code that should execute multiple times, `setInterval()` +can be used to execute that code. `setInterval()` takes a function +argument that will run an infinite number of times with a given millisecond +delay as the second argument. Just like `setTimeout()`, additional arguments +can be added beyond the delay, and these will be passed on to the function call. +Also like `setTimeout()`, the delay cannot be guaranteed because of operations +that may hold on to the event loop, and therefore should be treated as an +approximate delay. See the below example: + +```js +function intervalFunc () { + console.log('Cant stop me now!'); +} + +setInterval(intervalFunc, 1500); +``` +In the above example, `intervalFunc()` will execute about every 1500 +milliseconds, or 1.5 seconds, until it is stopped (see below). + +Just like `setTimeout()`, `setInterval()` also returns a `Timeout` object which +can be used to reference and modify the interval that was set. + +## Clearing the Future + +What can be done if a `Timeout` or `Immediate` object needs to be cancelled? +`setTimeout()`, `setImmediate()`, and `setInterval()` return a timer object +that can be used to reference the set `Timeout` or `Immediate` object. +By passing said object into the respective `clear` function, execution of +that object will be halted completely. The respective functions are +`clearTimeout()`, `clearImmediate()`, and `clearInterval()`. See the example +below for an example of each: + +```js +let timeoutObj = setTimeout(() => { + console.log('timeout beyond time'); +}, 1500); + +let immediateObj = setImmediate(() => { + console.log('immediately executing immediate'); +}); + +let intervalObj = setInterval(() => { + console.log('interviewing the interval'); +}, 500); + +clearTimeout(timeoutObj); +clearImmediate(immediateObj); +clearInterval(intervalObj); +``` + +## Leaving Timeouts Behind + +Remember that `Timeout` objects are returned by `setTimeout` and `setInterval`. +The `Timeout` object provides two functions intended to augment `Timeout` +behavior with `unref()` and `ref()`. If there is a `Timeout` object scheduled +using a `set` function, `unref()` can be called on that object. This will change +the behavior slightly, and not call the `Timeout` object *if it is the last +code to execute*. The `Timeout` object will not keep the process alive, waiting +to execute. + +In similar fashion, a `Timeout` object that has had `unref()` called on it +can remove that behavior by calling `ref()` on that same `Timeout` object, +which will then ensure its execution. Be aware, however, that this does +not *exactly* restore the initial behavior for performance reasons. See +below for examples of both: + +```js +let timerObj = setTimeout(() => { + console.log('will i run?'); +}); + +// if left alone, this statement will keep the above +// timeout from running, since the timeout will be the only +// thing keeping the program from exiting +timerObj.unref(); + +// we can bring it back to life by calling ref() inside +// an immediate +setImmediate(() => { + timerObj.ref(); +}); +``` +## Further Down the Event Loop + +There's much more to the Event Loop and Timers than this guide +has covered. To learn more about the internals of the Node.js +Event Loop and how Timers operate during execution, check out +this Node.js guide: [The Node.js Event Loop, Timers, and +process.nextTick()](../topics/event-loop-timers-and-nexttick.md).