Geocoding North America on AWS Amazon EC2 Cloud

eruci edited this page Feb 26, 2016 · 63 revisions

Geocoder API on the Amazon Cloud

Overview

Geocoder.ca provides a geocoding engine for North America (USA, Canada, Mexico). Geocoding is an information retrieval process that is divided into Forward Geocoding (mapping a location to latitude,longitude) and Reverse Geocoding (mapping a latitude,longitude to a location).

Now available as an AMI (Amazon Machine Image) which means you can start your own geocoding engine on EC2 (Amazon's Server Cloud) to forward and reverse geocode locations in USA or Canada using our Geocoding API via an HTTP request.

AMI Instance Sizing

It will run on any EC2 instance type, however more memory & cpus = faster geocoding. (when the instance has just started it might be slow - 1-4 seconds per query, because it is building and optimizing indexes. Everything will be optimized for your instance type in about 5-25 minutes. Be patient. Also, performance depends on the instance type, a micro instance with 1G ram and 1CPU can not deliver responses faster that 1 second per query because it has to swap in and out an index of size 48G in just 0.5G of RAM. For best performance get an instance with 24G of RAM, or more...)

Installation

Just start an instance from the AMI and you are ready to go.

Geocoder Data

The databases come from a variety of sources:

  • For the USA, geocoder uses data from "TIGER/Line Shapefiles - New 2015 Shapefiles" and openaddresses.io.
  • For Canada, it uses data from Statistics Canada Road Network Files, openaddresses.io and data.gc.ca
  • For Mexico it uses data from openaddresses.io

All the data sources we use are free/open data.

API

After your instance has launched you can access the api internally at http://localhost or externally at http://myec2server.amazonaws.com/

The default output is XML, you may also request CSV, JSON or JSONp.

Quick Start (For more examples http://geocoder.ca )

Examples

*(replace geocoder.ca with myec2server.amazonaws.com to run the example on your instance)

XML

http://geocoder.ca/1600%20pennsylvania%20avenue,Washington,DC

    <geodata>
    <latt>38.8745330000</latt>
    <longt>-76.9729480000</longt>
    <standard>
    <stnumber>1600</stnumber>
    <staddress>Pennsylvania Ave</staddress>
    <city>WASHINGTON</city>
    <prov>DC</prov>
    <confidence>0.8</confidence>
    </standard>
    <cid>-14375</cid>
    </geodata>

JSON

http://geocoder.ca/1600%20pennsylvania%20avenue,Washington,DC?json=1

    {"standard":{"staddress":"Pennsylvania Ave","stnumber":"1600","prov":"DC","city":"WASHINGTON","confidence":"0.8"},
    "longt":"-76.972948","cid":"-14377","latt":"38.874533"}

JSONp

http://geocoder.ca/1600%20pennsylvania%20avenue,Washington,DC?json=1&callback=test

    test({"standard":
    {"staddress":"Pennsylvania Ave","stnumber":"1600","prov":"DC","city":"WASHINGTON","confidence":"0.8"},
    "longt":"-76.972948","cid":"-14382","latt":"38.874533"});

CSV

http://geocoder.ca/1600%20pennsylvania%20avenue,Washington,DC?geoit=csv

    200,8,38.8745330000,-76.9729480000

Reverse Geocoding

http://geocoder.ca/38.8745330000,-76.9729480000?geoit=xml

    <result>
    <geodata>
    <latt>38.874533</latt>
    <longt>-76.972948</longt>
    <city>Washington</city>
    <prov>DC</prov>
    <postal>20020</postal>
    <stnumber>2300</stnumber>
    <staddress>Fairlawn Ave</staddress>
    <inlatt>38.874533</inlatt>
    <inlongt>-76.972948</inlongt>
    <distance>0.000</distance>
    <NearRoad>Fairlawn Ave</NearRoad>
    <NearRoadDistance>0.000</NearRoadDistance>
    <neighborhood>Hill East</neighborhood>
    <TimeZone>America/New_York</TimeZone>
    <AreaCode>202</AreaCode>
    <confidence/>
    <intersection>
    <street1>Pennsylvania Ave</street1>
    <street2>Fairlawn Ave</street2>
    <lattx>38.874533</lattx>
    <longtx>-76.972948</longtx>
    <city>Washington</city>
    <prov>DC</prov>
    <distance>0.000</distance>
    </intersection>
    <major_intersection>
    <street1>Pennsylvania Ave</street1>
    <street2>Fairlawn Ave</street2>
    <lattx>38.8745330000</lattx>
    <longtx>-76.9729480000</longtx>
    <city>Washington</city>
    <prov>DC</prov>
    <distance>0.000</distance>
    </major_intersection>
    <usa>
    <latt>38.8745330000</latt>
    <longt>-76.9729480000</longt>
    <uscity>Washington</uscity>
    <state>DC</state>
    <zip>20020</zip>
    <usstnumber>2300</usstnumber>
    <usstaddress>Fairlawn Ave</usstaddress>
    <inlatt>38.874533</inlatt>
    <inlongt>-76.972948</inlongt>
    <distance>0.000</distance>
    </usa>
    </geodata>
    </result> 

Fulltext Geocoding - Extracting and Geocoding Locations from free form text

[This and That and The other Street Porters Lake NS]

http://geocoder.ca/?scantext=This%20and%20That%20and%20The%20other%20Street%20Porters%20Lake%20NS&geoit=csv

    "stnumber","staddress","city","prov","latitude","longitude","confidence"
    "","This St And That St","Porters Lake","NS","44.7383729970","-63.3054570622","1"
    "","The Other St And That St","Porters Lake","NS","44.7388921810","-63.3031862750","1"
    "","The Other St And This St","Porters Lake","NS","44.7378788277","-63.3051008290","1"
    "","The Other St","Porters Lake","NS","44.7381666313","-63.3046677934","0.6"
    "","That St","Porters Lake","NS","44.7386080551","-63.3038505835","0.4"
    "","This St","Porters Lake","NS","44.7383729810","-63.3054570770","0.3"

US Zip+4 Code Info

http://geocoder.ca/?postal=75704-2933&geoit=xml&json=1&standard=1&showcountry=1&moreinfo=1

    {"country":"US","standard":
     {"staddress":"County Road 468","stnumber":"13501","prov":"TX","city":"Tyler","postal":"757042933","confidence":"0.8"},      
"longt":"-95.392415","TimeZone":"America/Chicago","county":"SMITH","AreaCode":"430,903","latt":"32.424704"}

Canadian Postal Code Info

http://geocoder.ca/?postal=k2c1n5&geoit=xml&json=1&standard=1&showcountry=1&moreinfo=1

{"country":"CA","standard":{"staddress":{},"stnumber":"1","prov":"ON","city":"Ottawa","confidence":"0.9"},"longt":"-75.701939","TimeZone":"America/Toronto","postal":"K2C1N5","AreaCode":"613","latt":"45.368932"}

Get Canadian postal code polygon boundary

http://geocoder.ca/?postal=k2c1n5&geoit=xml&getpolygon=1

<geodata>
    <latt>45.368932</latt>
    <longt>-75.701939</longt>
    <postal>K2C1N5</postal>
    <boundary>
     45.365633,-75.700757,45.369154,-75.700748,45.370316,-75.701609,45.370624,-75.704641,45.370624,-75.704641,45.365633,-75.700757,
    </boundary>
    <AreaCode>613</AreaCode>
    <TimeZone>America/Toronto</TimeZone>
    <standard>
     <stnumber>1</stnumber>
     <staddress/>
     <city>Ottawa</city>
     <prov>ON</prov>
     <confidence>0.9</confidence>
    </standard>
</geodata>

Show Streets and Census Data for a Postal/Zip Code

http://geocoder.ca/R2V1K1?showaddrs=R2V1K1&geoit=xml

    <geodata>
    <latt>49.941366</latt>
    <longt>-97.116340</longt>
    <postal>R2V1K1</postal>
    <AreaCode>204</AreaCode>
    <TimeZone>America/Winnipeg</TimeZone>
    <standard>
     <stnumber>1</stnumber>
     <staddress/>
     <city>Winnipeg</city>
     <prov>MB</prov>
     <confidence>0.9</confidence>
    </standard>
    <SDNAME>Winnipeg</SDNAME>
    <CDUID>4611</CDUID>
    <CDNAME>Division No. 11</CDNAME>
    <ERNAME>Winnipeg</ERNAME>
    <CCSNAME/>
    <ERUID>4650</ERUID>
    <SACCODE>602</SACCODE>
    <SACTYPE>1</SACTYPE>
    <CDTYPE>CDR</CDTYPE>
    <CSDTYPE>CY</CSDTYPE>
    <CTUID>6020553.00</CTUID>
    <CTNAME>0553.00</CTNAME>
    <CMAUID>602</CMAUID>
    <CMANAME>Winnipeg</CMANAME>
    <CMATYPE>B</CMATYPE>
    <CMAPUID>46602</CMAPUID>
    <streets>
     <street>222,Forrest AVE,Winnipeg,MB R2V1K1</street>
     <street>224,Forrest AVE,Winnipeg,MB R2V1K1</street>
     <street>226,Forrest AVE,Winnipeg,MB R2V1K1</street>
     <street>228,Forrest AVE,Winnipeg,MB R2V1K1</street>
     <street>230,Forrest AVE,Winnipeg,MB R2V1K1</street>
     <street>232,Forrest AVE,Winnipeg,MB R2V1K1</street>
     <street>236,Forrest AVE,Winnipeg,MB R2V1K1</street>
     <street>238,Forrest AVE,Winnipeg,MB R2V1K1</street>
     <street>240,Forrest AVE,Winnipeg,MB R2V1K1</street>
     <street>242,Forrest AVE,Winnipeg,MB R2V1K1</street>
     <street>246,Forrest AVE,Winnipeg,MB R2V1K1</street>
     <street>248,Forrest AVE,Winnipeg,MB R2V1K1</street>
     <street>254,Forrest AVE,Winnipeg,MB R2V1K1</street>
     <street>258,Forrest AVE,Winnipeg,MB R2V1K1</street>
     <street>260,Forrest AVE,Winnipeg,MB R2V1K1</street>
     <street>264,Forrest AVE,Winnipeg,MB R2V1K1</street>
     <street>268,Forrest AVE,Winnipeg,MB R2V1K1</street>
     <street>270,Forrest AVE,Winnipeg,MB R2V1K1</street>
     <street>272,Forrest AVE,Winnipeg,MB R2V1K1</street>
     <street>212,Forrest AVE,Winnipeg,MB R2V1K1</street>
     <street>214,Forrest AVE,Winnipeg,MB R2V1K1</street>
     <street>216,Forrest AVE,Winnipeg,MB R2V1K1</street>
     <street>220,Forrest AVE,Winnipeg,MB R2V1K1</street>
    </streets>
</geodata>

Match Partial Street Names

http://geocoder.ca/?cityname=Ottawa&provname=on&streetname=metcalfe&matchonly=1&geoit=xml

    <geodata>
    <info>
    <code>222</code>
    <description>Matched results.</description>
    <total>105</total>
    </info>
    <match>
    <zip>K1S3P3</zip>
    <stno>508</stno>
    <addr>METCALFE ST</addr>
    <city>OTTAWA</city>
    <prov>ON</prov>
    <latt>45.4086980000</latt>
    <longt>75.6853830000</longt>
    <name>METCALFE</name>
    <type>ST</type>
    <direction/>
    </match>

....
    </geodata>

Geocoding Intersections

http://geocoder.ca/?locate=Highway+401+%26+Yonge+Toronto&geoit=xml

    <geodata>
     <latt>43.753504</latt>
     <longt>-79.408326</longt>
     <city>Toronto</city>
     <prov>ON</prov>
     <street1>Yonge ST</street1>
     <street2>401 HWY</street2>
    </geodata>

IP Address Geocoding

http://geocoder.ca/24.122.3.136?geoit=xml

    <result>
     <geodata>
     <latt>45.504138</latt>
     <longt>-75.811115</longt>
     <city>CHELSEA</city>
     <prov>QC</prov>
     <postal>J9B1R5</postal>
     <TimeZone>America/Montreal</TimeZone>
     <AreaCode>819</AreaCode>
     </geodata>
    </result>

Canadian Parcel Geocoding

http://geocoder.ca/W5M-07-53-30-NE?geoit=xml

    <result>
     <geodata>
      <latt>53.608060</latt>
      <longt>-115.011890</longt>
      <city>Entwistle</city>
      <prov>AB</prov>
      <postal>T0E0T0</postal>
      <stnumber>53421</stnumber>
      <staddress>Range Road 75A</staddress>
      <inlatt>53.6085064142001</inlatt>
      <inlongt>-115.011165788</inlongt>
      <distance>0.069</distance>
      <NearRoad>Range Road 75A</NearRoad>
      <NearRoadDistance>0.011</NearRoadDistance>
      <betweenRoad1>50</betweenRoad1>
      <betweenRoad2>Township Road 534c</betweenRoad2>
      <TimeZone>America/Edmonton</TimeZone>
      <LSD>LSD-09 SEC-30 TWP-053 RGE-07 MER-5</LSD>
      <AreaCode>780</AreaCode>
      <confidence>1</confidence>
    </geodata>
    </result>

API in Depth

Geocoder.ca REST API for geocoding comes with the following output options.

XML: The default output format is XML.

JSON: If you wish to obtain JSON output, you must specify the (jsonp=1 and callback) or json=1 as input parameters

CSV: You may also obtain output in CSV format by replacing geoit=XML with geoit=CSV

/{locate}

Specifications:

Input

The following parameters must be sent using either a GET or POST HTTP method on Port 80.

Required Parameters

/?{scantext}

  • scantext
  • The only required parameter for extracting and geocoding locations from free form text, scantext, may be a text input of any length containing locations described as street addresses, street intersections or city names.
  • scantext differs from locate by returning all locations that are matched in text (ordered by confidence score), while locate only returns the top result.

More Examples

See http://geocoder.ca/?examples=1#recent

Optional Parameters (from Geocoder.ca Extended API )

  • strict Optional Parameter for enabling strict parsing of free form location input. 1.

  • decimal An integer positive number. This is an optional parameter to limit the number of decimal places in the response. (note that a small number will reduce accuracy)

  • geoit The output type desired. Only one of two allowed values: XML or CSV
  • json Output in JSON format. Optionally you may request data in JSON format. Accepted value: 1
  • jsonp Output in JSONP format. Optionally you may request data in JSONP format. Accepted value: 1
  • callback Callback string if Output is in JSONP format. Optionally you may request data in JSONP format. The callback can be any string value.

  • utm Output latitude/longitude pair in The Universal Transverse Mercator (UTM) geographic coordinate system. Optional. Accepted value: 1

  • id optionally you can include your own transaction id. this will be returned along with the response if provided.

  • strictmode Optionally you can prevent geocoder from making guesses on your input - for example if you enter just a city name without the state or province, instead of geocoder determining the most likely city, it will let you chose from a list of suggestions. Allowed values are Integer 0 or 1.

  • showpostal Optionally - If you supply just a street address (or intersection), the showpostal parameter will instruct the algorithm to return the postal code of the location along with the latitude/longitude pair. Only one allowed value: 1
  • showaddrs Optionally - If you supply a postal or zip code with your query, the showaddrs option will attempt to return all addresses associated with that zip or postal code. Only one allowed value: 1 Example
  • topmatches Optionally - If you supply a partial street address and wish to obtain a fixed number of the most likely suggestions, send a value through this parameter. This must be the maximum number of suggestions desired in the response. The topmatches parameter value must be a positive integer.
  • postal A six letter canadian postal code or a US Zip5 or Zip+4 Code The postal code format should follow the following format ANANAN where N represents a number and A a letter.
  • getpolygon Optional parameter to return the postal code polygon boundary Only one allowed value: 1 Example addresst (Deprecated) The name of the street address. (Deprecated) A string no longer that 220 bytes.

Output

The output may contain the following XML formated parameters.

  • cid Call id. (you may also supply your own id= to track queries)
  • Parameter Name Description Expected Output Values
  • latt The latitude. A decimal number.
  • longt The longitude. A decimal number.
  • cid Call id.
  • id Transaction id. If supplied in the input the transaction id will be returned along with the output.
  • stnumber In case you requested a properly formated address to be returned. The street number (integer value).
  • staddress In case you requested a properly formated address to be returned. The formated street address.
  • city In case you requested a properly formated address to be returned. The city name.
  • prov In case you requested a properly formated address to be returned. The province code.
  • confidence The Geocoding Confidence Score is a number representing our accuracy estimate on a geocoding request. This number ranges from 1 to 0. A higher score indicates a closer match (A score of 1 meaning best match accuracy.) A result with confidence score less than 0.5 is never returned to the user (it will most likely result in a suggestion being returned), except for ip address geocoding where rough approximations are allowed as in most cases we are looking at city level accuracy. a number [0..1] 1=best match, 0=no response.

Error Codes

The output could contain an error code. If your query does not produce coordinates the latt and longt containers will be empty.

These are the error codes that could be returned upon an unsuccessful lookup:

  • 000 Licensing Problem.

  • 005 Postal Code is not in the proper Format.

  • 004 Specify either a Canadian province two letter code or a US two letter state abbreviation.

  • 007 Supply a valid query.

  • 008 Your request did not produce any results. Check your spelling and try again.

Sometimes when a location is not found a suggestion (or more) will be contained within xml tags.

Reverse Geocoding

Reverse geocoding is the process whereupon you supply a point expressed as a latitude-longitude pair and get the closest civic street address to this pair, and also the postal/zip code, the nearest intersection, the nearest major intersection, and the road having the smallest vertical distance to this point. Reverse Geocoding

Specifications:

Input For performing reverse geocoding you may either pass a latitude,longitude pair to the API as described above, or , The following parameters must be sent to http://geocoder.ca using either a GET or POST HTTP method.

Parameter Name Description Permitted Values

  • latt The Latitude. A decimal number.
  • longt The Longitude. A decimal number.
  • range The optional range to limit the reverse geocoding. A number.
  • exact Optional. If specified geocoder will attempt to find an exact match (meaning the reverse of a rooftop geocoding request). 1.
  • moreinfo The optional moreinfo parameter will cause the xml port to return the county and metro area information. 1.
  • allna The optional allna parameter indicates that you wish to perform the reverse geocoding in both the USA and Canada, in some cases (for areas close to the border) obtaining two results instead of one. This parameter is also required for reverse geocoding in the US. 1.
  • decimal An integer positive number. This is an optional parameter to limit the number of decimal places in the response. (note that a small number will reduce accuracy) reverse An integer indicating your interface preference. valid values is integer 1.

Input Requirements:

You must provide a latitude-longitude pair.

Output

The output will contain these parameters in XML format.

Parameter Name Description Expected Output Values

  • latt The latitude of the result. A decimal number.
  • longt The longitude of the result. A decimal number.
  • city The city of the result set
  • prov The province or state
  • postal The postal or zip code.
  • stnumber The street number.
  • staddress The street address.
  • inlatt The input latitude.
  • inlongt The input longitude.
  • distance The distance of the result location from the input location.
  • NearRoad The nearest Road to the input point.
  • NearRoadDistance The distance of the nearest Road to the input point.
  • neighborhood The city neighborhood that the input point falls in.

If a us location is returned by the reverse geocoding engine the response will be contained within tags.

Optionally the output may contain additional information the closest street corner as well as the closest major intersection - These responses will be held inside respectively and xml tags:

  • street1 The first street of the intersection. A string
  • street2 The second street of the intersection. A string
  • lattx The latitude of the intersection. A decimal number
  • longtx The longitude of the intersection. A decimal number
  • city The city of the intersection. A string
  • prov The province of the intersection. The two letter Canadian province code or US two letter state abbreviation.
  • distance The distance of the intersection from the input location. A decimal number expressed in kilometres

Possible extra error codes related to reverse geocoding:

  • 009 The latitude and longitude you provided are not in the valid range. yy.yyy -xx.xxx

Extra parameters :

  • strict - Value: 1 - Perform a strict match

  • nopostal - Value: 1 - Ignore the postal code (i.e. return a street level match only)

More Reverse Geocoding:

You may also make reverse geocoding lookups using a Canadian postal code or a US zip code as input. The output may be provided in XML or Jsonp.

Accuracy and Coverage

We currently have over 99% coverage in the USA and Canada, and over 94% accuracy (result within 100m of ground truth/actual location). Our testing methodology and results may be seen here: https://github.com/eruci/openaddresses/tree/master/test

What is the geocoding accuracy?

We generally return two types of geocoding results, Rooftop / Parcel Geocoding and Street Interpolation Geocoding. (The other types as based on postal/zip codes, city/neighborhood names, ip addresses, etc).

Rooftop / Parcel Geocoding is the most precise Geocoding methodology because it returns an exact match to a single address point (normally placed at the center of the "roof" of the property/parcel, hence "Rooftop".)

Street Interpolation Geocoding works by estimating a location in a known address range, a method also known as address intepolation. The geocoding algorithm matches an address to a street and a specific segment then interpolates the position within that segment. Since this is an approximation, this method is generally less accurate than the "Rooftop" method. Currently over 40% of the geocoding requests we receive return Rooftop results with Confidence Score = 1. Interpolated results have a maximum Confidence Score of 0.9. for Example:

  1. 2525 Olympic ST, SPRINGFIELD, OR 97477 (Rooftop) Confidence Score=1

  2. 2524 Olympic ST, SPRINGFIELD, OR 97477 (Interpolated) Confidence Score=0.9

More API Examples

See Geocoder.ca and Premium API Docs