Skip to content
Emilio G.C edited this page Oct 10, 2017 · 177 revisions

Osmosis is a utility for easily extracting data from HTML or XML documents.

Command reference

These are all of the "commands" that are available for chaining in an Osmosis instance.


( selector )

Click on nodes found by selector


( string )

Discard any nodes whose contents do not match string


( opts )
( key, val )

Set HTTP options and configure Osmosis


( callback( data ) )

Calls callback with the current data object

( null )

Empty the data object

( object )

Add or replace each key in the data object with a new val


( callback( msg ) )

Call callback when any debug messages are received


( seconds )

Delay starting next promise for seconds (float or int)


( osmosis..., osmosis... )

Call each Osmosis instance with the current context. This will always continue, even if an instance fails.


Reset the current context to the Document


( callback )

Create a DOM object from the current context.

The callback will be be called with 3 arguments (window, data, and next). The next([context], [data]) function must be called at least once


( callback )

Calls callback when parsing has completely finished


( callback( msg ) )

Call callback when any error messages are received


( selector )

Discard any nodes that match selector


( selector )

Discard any nodes that do not match selector


( selector )

Find elements based on selector anywhere within the current document


( [selector] )

Follow URLs found via selector. If selector isn't provided, follow will search the current element text or common URL attributes (href, src, etc).



get / post

( url , [data] , [opts] )

Make an HTTP request

url - A string containing a URL, which can be relative to the current context.

data (optional) - An object containing GET query parameters or POST request data.

opts (optional) - An object containing HTTP request options.

Note: Query parameter values will be urlencoded by needle so make sure that your parameter values are not urlencoded.


( callback( msg ) )

Call callback when any log messages are received


( user , pass , [success] , [fail] )

Submit a login form.


user - A string containing a username, email address, etc.

pass - A password string

success (optional) - A selector string determining if the login attempt succeeded

fail (optional) - A selector string determining if the login attempt failed

How it works

login finds the first form containing input[type="password"] and uses that input as the password field. It will use the preceding <input> element as the user field.


( [selector], RegExp )

Discard any nodes whose contents do not match RegExp

page / paginate

( selector , [limit] )

Paginate the previous request limit times based on selector.


selector (String) - A selector string for either:

  • an element with the next page URL in its inner text or in an attribute that commonly contains a URL (href, src, etc.)
  • an element whose name and value attributes will respectively be added or replaced in the next page query.

selector (Object) - An object where each key is a query parameter name and each value is either a selector string or an increment amount (+1, -1, etc.).


limit (Number) - Total number of "next page" requests to make.

limit (String) - A selector string for an element containing the total number of requests to make.

.paginate('a.nextPage') // go to `a.nextPage` `@href`
.paginate('link[rel="next"]@href') // go to `link` `@href`
.paginate('input[name="page"]') // update `page` parameter of the next query

// adds 20 to the `startIndex` query parameter
// sets `page` query parameter to `a.nextPage` content
// stops after 15 requests are made
.paginate({ startIndex: +20,  page: 'a.nextPage' }, 15)

pause / resume / stop

Pause, resume or stop an osmosis instance.


( string )

Parse an HTML or XML string


string - A string or buffer containing the HTML/XML data


( name , selector)

Set name to the value of selector

( object )

Set each key to the value of each val selector.

.set('title') // set 'title' to current element text
.set('title', 'a.title') // set 'title' to text of 'a.title'
    title:  'a.title',
    description: 'p.description',
    url: 'a.permalink @href',
    images: ['img @src'],
    comments: [
            'author': '.author'
            'content': 'p.content',
            'date': '.date'


( selector , [data] )

Submit a form


selector - A selector for the <form> element or submit button.

data (optional) - An object where each key and value represents a form input name and value


( callback( context, data, [next], [done] ) )

Calls callback with the context of the current element.


The context argument is the current context at that point in the command chain. If the previous command was get, post, follow, or parse then the context will be a Document. If the previous command was find then the current context will be one of the Elements that was found.


The data argument contains values set via osmosis.set. This object can be modified in any way.


The next argument is a function that will call the next command. It takes two arguments: context and data.


The done argument is a function to call when then will no longer call next. This is only required if then calls next asynchronously any number of times.

Note: If the callback accepts done as an argument, it must always call done, even if next was never called.


The callback will have these functions bound to its this value:

  • this.request(method, url, [data], callback([err], context), [opts])
  • this.log(msg)
  • this.debug(msg)
  • this.error(msg)


Example 1: find every ul > li and pass it to the next command

.then(function(context, data, next) {
    var items = context.find('ul > li');
    items.forEach(function(item) {
        next(item, data);

Example 2: set data.url to the current page URL

.then(function(context, data, next) {
    data.url = context.doc().request.url;
    next(context, data);

Example 3: only continue if lastname != undefined

.then(function(context, data, next) {
    if (data.lastname != undefined)
        next(context, data)

Example 4: using the done function

.then(function(context, data, next, done) {
    if (db.connected == false) {
        this.error('database disconnected');
    data.someArray.forEach(function(obj, index) {, function() {
            next(context, data);
            if (index == data.someArray.length-1)
Clone this wiki locally
You can’t perform that action at this time.