This PostgreSQL extension implements a Foreign Data Wrapper (FDW) for the OpenCageData API.
Please note that this version of ocgeo_fdw works with PostgreSQL version 9.6, 10, 11, 12, 13, and 14.
To build on POSIX-compliant systems you need to ensure that the PostgreSQL "-dev" package in
installed and that the pg_config
executable is in your path when you run make. This executable
is typically in your PostgreSQL installation's bin directory. E.g. in a recent
Ubuntu Linux you can install
postgresql-server-dev-12
:
sudo apt-get install postgresql-server-dev-12
Also libcurl
should have been installed, for example this package for Ubuntu, and please make sure that curl-config
is in the PATH
.
Then it's a matter of running (GNU) make like so:
make && sudo make install
CREATE EXTENSION ocgeo_fdw;
The server provides global information about the OpenCageData API. The only configuration option required is the API endpoint. The server can be defined as follows:
CREATE SERVER ocdata_server FOREIGN DATA WRAPPER ocgeo_fdw
OPTIONS (uri 'https://api.opencagedata.com/geocode/v1/json');
The name of option for the endpoint should be uri
and the URL given should correspond to
the API returning JSON format. Other than that, the name
of the server can be anything you like, instead of ocdata_server
.
Different users of your database can have different keys for accessing the API. The following statement is necessary for configuring the API key used for the current user.
CREATE USER MAPPING FOR current_user SERVER ocdata_server
OPTIONS (api_key '6d0e711d72d74daeb2b0bfd2a5cdfdba',
max_reqs_sec '1');
Please make sure that you are using the correct server name you have defined if you specified
something different in the CREATE SERVER
command
For testing purposes the above SQL command will store the 6d0e711d72d74daeb2b0bfd2a5cdfdba
testing API key that returns successful, but always
the same, responses. The above can be used to check whether the FDW works properly but you can
register for the free plan to get a valid key for more
extensive testing purposes, or, better yet, become a customer!.
If you want to change the API key, you can use the following:
ALTER USER MAPPING FOR current_user SERVER ocdata_server OPTIONS (SET api_key '<your API key>');
The max_reqs_sec
option is the maximum number of requests per second that according to the
selected plan the user is allowed to make. This option is
optional and defaults to 1 which is the corresponding value for the free plan.
The foreign table is a "virtual" table allowing to make requests to the OpenCageData API when used
in SQL SELECT commands. The following is a typical definition that allows to all the information
returned by the JSON API in the components
and formatted
fields:
CREATE FOREIGN TABLE ocgdata_api (
json_response JSONB,
bounding_box BOX,
location POINT,
_category TEXT,
_type TEXT,
city TEXT,
city_district TEXT,
continent TEXT,
country TEXT,
country_code TEXT,
county TEXT,
house_number TEXT,
political_union TEXT,
neighbourhood TEXT,
postcode TEXT,
road TEXT,
road_type TEXT,
state TEXT,
state_code TEXT,
state_district TEXT,
suburb TEXT,
confidence INTEGER,
formatted TEXT,
-- The following is the input to the select
q text
) SERVER ocdata_server;
The name of the table can be anything you like (instead of ocgdata_api
) but, as
with the definition of the User Mapping, it is important to use the correct server name.
With the above definition in place, you can do single API calls like so:
SELECT confidence, _type, _category, location, formatted
FROM ocgdata_api
WHERE q='Trierer Straße 15, 99423, Weimar, Deutschland';
The important thing to remember is to use the q
attribute to provide the search query.
Providing an "equal restriction" on q
is required, otherwise no API call will be made
(but see below on 'More advanced queries' for JOIN queries). Giving a latitude - longitude
coordinates separated by ,
(comma) as a query will result in a
reverse geocoding request. Another attribute
that can be used as input is the confidence
: Putting a >=
condition on this, will make an API
request using the min_confidence
parameter.
Example:
> SELECT _type, formatted FROM ocgdata_api WHERE q='eiffel tower, france' AND confidence>=5;
+-------------+-----------------------------------------------------------------------------+
│ _type │ formatted │
+-------------+-----------------------------------------------------------------------------+
│ attraction │ Eiffel Tower, 5 Avenue Anatole France, 75007 Paris, France │
│ information │ Eiffel Tower, Esplanade des Ouvriers de la Tour Eiffel, 75007 Paris, France │
│ road │ Rue Gustave Eiffel, 13010 Marseille, France │
│ road │ Rue Gustave Eiffel, 45000 Orléans, France │
│ road │ Rue Gustave Eiffel, 34000 Montpellier, France │
│ road │ Pont Eiffel, 27000 Évreux, France │
│ road │ Rue Gustave Eiffel, 33100 Bordeaux, France │
│ road │ Rue Gustave Eiffel, 84000 Avignon, France │
│ road │ Rue Gustave Eiffel, 18000 Bourges, France │
│ road │ Avenue Gustave Eiffel, 21000 Dijon, France │
│ road │ Rue Gustave Eiffel, 72100 Le Mans, France │
│ road │ Rue Gustave Eiffel, 30000 Nîmes, France │
│ road │ Rue Gustave Eiffel, 38000 Grenoble, France │
│ road │ Impasse Eiffel, 44700 Orvault, France │
│ road │ Rue Gustave Eiffel, 82000 Montauban, France │
│ road │ Rue Gustave Eiffel, 81000 Albi, France │
│ road │ Rue Gustave Eiffel, 89000 Auxerre, France │
│ road │ Rue Gustave Eiffel, 79000 Niort, France │
│ road │ Rue Gustave Eiffel, 94000 Créteil, France │
│ road │ Rue Gustave Eiffel, 11000 Carcassonne, France │
│ road │ Rue Gustave Eiffel, 22000 Saint-Brieuc, France │
│ road │ Rue Gustave Eiffel, 86100 Châtellerault, France │
│ road │ Rue Gustave Eiffel, 85000 La Roche-sur-Yon, France │
│ road │ Rue Gustave Eiffel, 60000 Beauvais, France │
│ road │ Avenue Eiffel, 78420 Carrières-sur-Seine, France │
+-------------+-----------------------------------------------------------------------------+
> SELECT country, COUNT(*) FROM ocgdata_api WHERE q='paris' AND _type='city' GROUP BY country ORDER BY 2;
+--------------------------+-------+
| country | count |
+--------------------------+-------+
| France | 1 |
| Canada | 2 |
| United States of America | 7 |
+--------------------------+-------+
Assuming we have a (local) table with our users and there's an address
attribute with
the user supplied home address, we can join the two tables to get geocoding information
as follows:
SELECT user_id, ocgdata_api.*
FROM users LEFT JOIN ocgdata_api ON q=users.address
WHERE confidence >= 5;
For executing this, PostgreSQL will select a "nested loop" plan where for each row in the
users
table an API call will be performed through the FDW. So this will result in making
as many API requests as rows in the users
table.
Note: This may result in a large number of API requests sent in quick succession. The FDW
performs a rate limiting mechanism to make sure that the maximum number of API calls per
second do not exceed the user's plan, as configured by the USER MAPPING
. But it
currently does not perform any throttling with respect to the maximum number of calls
per day. To make sure that even this limitation is respected, you can use the
ocgeo_stats
function described bellow.
If such command is used frequently, it may be a good idea to create a materialized
view:
CREATE MATERIALIZED VIEW users_locations AS
SELECT user_id, ocgdata_api.*
FROM users LEFT JOIN ocgdata_api ON q=users.address
..and then perform any subsequent queries on the local materialized view. When the users
base table is updated you need to perform a REFRESH MATERIALIZED VIEW users_locations
to
recreate the contents of the view.
If the definition of the foreign table includes an attribute of type JSONB, ocgeo_fdw
will store the JSON result message there. This permits more information to be retrieved since there are plenty PostgreSQL operators and functions for processing JSON data. For example, we can get the currency
information from the annotations
field of the JSON result, as shown next:
WITH temp(js) AS
(SELECT json_response FROM ocgdata_api WHERE q='taj mahal, India')
SELECT jsonb_pretty(js->'annotations'->'currency') AS currency
FROM temp;
+---------------------------------+
│ currency │
+---------------------------------+
│ { │
│ "name": "Indian Rupee", │
│ "symbol": "₹", │
│ "subunit": "Paisa", │
│ "iso_code": "INR", │
│ "html_entity": "₹", │
│ "iso_numeric": "356", │
│ "decimal_mark": ".", │
│ "symbol_first": 1, │
│ "subunit_to_unit": 100, │
│ "alternate_symbols": [ │
│ "Rs", │
│ "৳", │
│ "૱", │
│ "௹", │
│ "रु", │
│ "₨" │
│ ], │
│ "thousands_separator": ",", │
│ "smallest_denomination": 50 │
│ } │
+---------------------------------+
Here I am using a "Common Table Expression" (CTE) (WITH
query) to define the subquery temp
to the foreign table which is then used in the main query.
ocgeo_fdw
uses the POINT
and BOX
geometric data types of PostgreSQL for the geometry
and bounds
JSON fields of the API response. This allows for more advanced queries using the various geometric functions and operators of PostgreSQL..
Support for PostGIS in on plan..
Using the following in the psql
command session you can debug information and the actual API
URL used:
SET client_min_messages TO DEBUG1;
For example, the following:
SELECT confidence, _type, _category, location, formatted
FROM ocgdata_api
WHERE q='Trierer Straße 15, 99423, Weimar, Deutschland';
will print:
DEBUG: ocgeoGetForeignRelSize: remote conds: 1, local conds: 0
DEBUG: ocgeoGetForeignPaths: param paths 0
DEBUG: printRestrictInfoList: (q = 'Trierer Straße 15, 99423, Weimar, Deutschland')
DEBUG: ocgeoGetForeignPlan, 6 column list, 1 scan clauses
DEBUG: function ocgeoBeginForeignScan qual: q='Trierer Straße 15, 99423, Weimar, Deutschland' and confidence>=0
DEBUG: API https://api.opencagedata.com/geocode/v1/json?q=Trierer%20Stra%C3%9Fe%2015%2C%2099423%2C%20Weimar%2C%20Deutschland&key=6d0e711d72d74daeb2b0bfd2a5cdfdba&limit=50&no_annotations=0 returned status: 200, results: 1, time: 1226.36 msec
This FDW provides an ocgeo_stats
function that can be used to retrieve information about the
usage of the API in the current session. The information returned contains:
- The total number of API calls (requests)
- The total number of failed API calls (e.g. because of network errors, rate limiting, etc)
- The total number of seconds taken in the submission of the API requests and the parsing of the JSON response, for all calls (successful and failed)
- The rate limit information as returned by the API in the most recent call. This includes the limit (i.e. the maximum number of requests per day), the remaining API calls in the current day, and the date time when the counter will be reset.
You can use this function as follows:
> SELECT * FROM ocgeo_stats();
+-----------+------------+------------+------------+----------------+------------------------+
| nbr_calls | nbr_failed | total_time | rate_limit | rate_remaining | rate_reset |
+-----------+------------+------------+------------+----------------+------------------------+
| 4 | 0 | 3.09 | 2500 | 2482 | 2020-07-15 03:00:00+03 |
+-----------+------------+------------+------------+----------------+------------------------+
The rate information is returned by the API server on each request so it is the most accurate, according to the most recent API call. On the other hand, the number of calls and total time are recorded per session so they are "local" to the current PostgreSQL connection. So, for example, based on the above we see that we have 2482 remaining API calls in the current "day", and therefore we have made 2500 - 2482 = 8 requests, but only 4 of them were performed in the current session.