Extract Postal Addresses from Web Pages, in the browser
ziprip looks for postal addresses on webpages, either in the browser or in node.js. It also works very hard to try and geocode the addresses it finds, usually by searching the page for things like Google Maps elements and so on.
It's intended especially to be used for bookmarklets and browser plugins, and used to power a website called PlaceSteal that never really took off.
Currently it handles UK and US addresses. Adding other English-speaking countries should be trivial, other languages a bit more challenging but certainly doable.
You can download the latest browser version on the downloads page, or from the
For node.js, simply:
npm install ziprip
and then, having given it a second to load:
window.ziprip.extract( document, window.URL );
ziprip has one method you should be interested in:
extract, which accepts a DOM object and a URL. For node.js you can use jsdom for your DOM object, and in the browser obviously you just pass in
var $ziprip = require('ziprip'); var addresses = $ziprip.extract( domObject, url );
This will return several address objects:
Consider the address: Prime Minister's Residence, 10 Downing Street, SW1A 2AA, UK...
title- A title for the address: "Prime Minister's Residence" in the example above. This may be the same as the first item in
atomsif we didn't find a more suitable title. atoms - An array of strings, representing the street address. Could be empty. ["10 Downing Street"] in the example above.
postcode- The postcode or zipcode for the address. Will always be set. "SW1A 2AA" in the example above.
country- The country the address is in. Will always be set. Current possible values are 'US' and 'UK' - "UK" in the example above.
lon- Coordinates for the address. Some extractors will also be able to determine these, so they're included. Should be either undefined or integer. Undefined in the example above.
isGeocoded- Boolean - are
countryas one flat list: ["10 Downing Street", "SW1A 2AA", "UK"]
formatForGeocode- Returns a string suitable for passing to a geocoder. Country is included only if it's not 'US', and other fields are comma-delimited, with title ommitted. "10 Downing Street, SW1A 2AA, UK"
ziprip is released under the MIT license, because all of its external dependencies use it. That, kids, is the magic of open source, or something.