Web scraper for NodeJS
JavaScript HTML
Permalink
Failed to load latest commit information.
benchmark Version 1.0.0 staging May 13, 2016
lib 1.1.2 Nov 16, 2016
test 1.1.2 Nov 16, 2016
.gitignore Ignore docs May 24, 2016
.npmignore First commit Feb 10, 2015
.travis.yml Update travis test version Oct 30, 2015
Changes.md v1.0.0 staging Apr 29, 2016
Readme.md Updated readme links May 26, 2016
index.js 1.1.2 Nov 16, 2016
jsdoc.json Version 1.0.0 staging May 2, 2016
package.json 1.1.2 Nov 16, 2016

Readme.md

Osmosis

HTML/XML parser and web scraper for NodeJS.

NPM

Build Status

Downloads

Features

  • Uses native libxml C bindings
  • Clean promise-like interface
  • Supports CSS 3.0 and XPath 1.0 selector hybrids
  • Sizzle selectors, Slick selectors, and more
  • No large dependencies like jQuery, cheerio, or jsdom
  • Compose deep and complex data structures

  • HTML parser features

    • Fast parsing
    • Very fast searching
    • Small memory footprint
  • HTML DOM features

    • Load and search ajax content
    • DOM interaction and events
    • Execute embedded and remote scripts
    • Execute code in the DOM
  • HTTP request features

    • Logs urls, redirects, and errors
    • Cookie jar and custom cookies/headers/user agent
    • Login/form submission, session cookies, and basic auth
    • Single proxy or multiple proxies and handles proxy failure
    • Retries and redirect limits

Example

var osmosis = require('osmosis');

osmosis
.get('www.craigslist.org/about/sites')
.find('h1 + div a')
.set('location')
.follow('@href')
.find('header + div + div li > a')
.set('category')
.follow('@href')
.paginate('.totallink + a.button.next:first')
.find('p > a')
.follow('@href')
.set({
    'title':        'section > h2',
    'description':  '#postingbody',
    'subcategory':  'div.breadbox > span[4]',
    'date':         'time@datetime',
    'latitude':     '#map@data-latitude',
    'longitude':    '#map@data-longitude',
    'images':       ['img@src']
})
.data(function(listing) {
    // do something with listing data
})
.log(console.log)
.error(console.log)
.debug(console.log)

Documentation

For documentation and examples check out https://rchipka.github.com/node-osmosis/

Dependencies

Donate

Please consider a donation if you depend on web scraping and Osmosis makes your job a bit easier. Your contribution allows me to spend more time making this the best web scraper for Node.

Donation offers:

  • $25 - A custom Osmosis scraper to extract the data you need efficiently and in as few lines of code as possible.
  • $25/month - Become a sponsor. Your company will be listed on this page. Priority support and bug fixes.

Donate