DataAPI

Hiroshi Ichikawa edited this page Apr 22, 2016 · 5 revisions

The Person Finder application stores and exports records using the People Finder Interchange Format (PFIF), which is based on XML. Documentation on the format is available here:

If you'd like to automatically detect when Google Person Finder launches a new repository, see RepositoryFeed.

Table of contents

Requesting an API key for an existing Person Finder repository

To search, download data from, or upload data to an existing Google Person Finder repository, you need an API key. Note: You don't need an API key from Google for instances of the Person Finder application that you launch yourself. API keys from Google are only required for access to repositories hosted at google.org/personfinder that Google manages for major disaster events.

  • If you are developing an application, you can use our "test" repository with an API key 43HxMWGBijFaYEr5 . When you upload data with the key, use a domain name testkey.personfinder.google.org as the prefix of record IDs.
  • For access to other repositories, request an API key here. The three types of access you can apply for are:
    • Search: Allows you to retrieve data based on a search query.
    • Write: Allows you to publish records to the database.
    • Read: Allows you to retrieve all records in the database.

When Google receives a request for an API key, we may evaluate the request based on factors including the motivation for the request and the likelihood that the request will meaningfully expand the usefulness of Person Finder and its accessibility to users. Google reserves the right to grant or deny a request for an API key for any reason, in its sole discretion. Any entity requesting an API key must agree to the Person Finder API Terms of Service.


Using the Person Finder search API

To use this API, you must apply for "Search" access type described above.

You can access the search API here:

https://www.google.org/personfinder/repository/api/search?key=api_key&q=your query

It will return the matching results as a XML file with PFIF format. By default it will return up to 100 records. You can use a max_results=N parameter to restrict the number of result records.

If you know the PFIF person_record_id of a specific person, you can also fetch a single PFIF person record with notes at the following URL:

https://www.google.org/personfinder/repository/api/search?key=api_key&id=person_record_id

When displaying any content accessed via the API to a user, you must display a link to the original source of information to refer any enquiries in connection with that specific record back to the original source, and may not intermix content accessed via the API with other content in any way that makes it unclear which content came from which source.


Uploading data into Person Finder

To use this API, you must apply for "Write" access type described above.

You can also push one or more PFIF records to the Person Finder by posting an XML file to the following URL:

https://www.google.org/personfinder/repository/api/write?key=api_key

PFIF 1.1, 1.2, 1.3, and 1.4 (Example) are accepted. Note that:

  • You will need an API key (see the section above for instructions on obtaining a key).
  • person_record_id and note_record_id must be in the form of domain_name/unique_string e.g., example.com/113 . domain_name must be the one you specified in the "Domain Name" field of the API key request form. If you use the "test" repository, it must be testkey.personfinder.google.org .

Once you have prepared an XML file, you can use the following command to upload it:

curl -X POST -H 'Content-type: application/xml' --data-binary @your_file.xml \     https://www.google.org/personfinder/repository/api/write?key=auth_token

NOTE: Make sure not to drop "@" before the file name. Otherwise a string "your_file.xml" is sent as the POST body, instead of the content of the file.

The XML document can contain <pfif:person> elements with nested <pfif:note> elements. To understand the proper XML format, see the PFIF example document. We recommend that you upload a single record or a small number of records as a test, retrieve the records using the Individual Person Record API (/api/read), and view the records on the site to verify that the results are what you expected. Pay careful attention to the handling of accented letters, note text, source URLs, and photo URLs (if you have them).

Due to the size limitation on POST requests, you should split up files into batches of 100 <pfif:person> elements. If you encounter an error, or need to correct problems in a previous upload, it is safe to upload the same records again. Records will replace existing records with the same person_record_id or note_record_id.

The response will be an XML document like this:

<?xml version="1.0"?>
<status:status>
<status:write>
<status:record_type>pfif:person</status:record_type>
<status:parsed>1</status:parsed>
<status:written>1</status:written>
<status:skipped>
</status:skipped>
</status:write>

<status:write>
<status:record_type>pfif:note</status:record_type>
<status:parsed>1</status:parsed>
<status:written>1</status:written>
<status:skipped>
</status:skipped>
</status:write>
</status:status>

Each <status:write> element describes one batch of writes. <status:record_type> indicates the type of the batch. <status:parsed> says how many XML records were successfully parsed. <status:written> says how many were written to the datastore. In the above example, 1 person and 1 note were successfully written.
When there are problems it will look like this:

<?xml version="1.0"?>
<status:status>
<status:write>
<status:record_type>pfif:person</status:record_type>
<status:parsed>1</status:parsed>
<status:written>0</status:written>
<status:skipped>
<pfif:person_record_id>google.com/person.4040</pfif:person_record_id>
<status:error>not in authorized domain: u'google.com/person.4040'</status:error>
</status:skipped>
</status:write>

<status:write>
<status:record_type>pfif:note</status:record_type>
<status:parsed>1</status:parsed>
<status:written>0</status:written>
<status:skipped>
<pfif:note_record_id>zesty.ca/note.53</pfif:note_record_id>
<status:error>ValueError: bad datetime: u'xyz'</status:error>
</status:skipped>
</status:write>
</status:status>

Each <status:skipped> entry describes the reason why a particular record was skipped, and includes the record ID if one was given.

When you upload person or note records, you will be replacing any existing records with the same record ID. It should be safe to upload the same data multiple times while you fix formatting problems.

Google will treat all PFIF records submitted through Google Person Finder and through the API in conformance with the PFIF Data Expiry Mechanism (see http://zesty.ca/pfif/1.4).


Downloading data from Person Finder

To use this API, you must apply for "Read" access type described above.

PFIF 1.4 person and note feeds are available here:

https://www.google.org/personfinder/repository/feeds/person?key=api_key
https://www.google.org/personfinder/repository/feeds/note?key=api_key

By default, these feeds return the most recently added person records or note records in reverse chronological order. These query parameters are supported:

  • max_results: Return up to the specified number of results (maximum 200).
  • skip: Skip the specified number of records before returning the next max_results results (maximum 800).
  • min_entry_date: Return only results with an entry_date greater than or equal to the specified timestamp, which should be in UTC in yyyy-mm-ddThh:mm:ssZ format. If this parameter is specified, results will be returned in forward chronological order.
  • person_record_id: Return only notes for this person record. This parameter is only valid for the note feed.

You can use the person_record_id parameter to subscribe to a feed of notes on a specific person.

google/personfinder Github repository contains a command line tool to hit this API: tools/download_feed.py. Usage example:

# Download the person feed:
./tools/download_feed.py --key=43HxMWGBijFaYEr5 --min_entry_date=2010-01-01 --out=persons.csv \
    https://www.google.org/personfinder/test/feeds/person 
# Download the note feed:
./tools/download_feed.py --notes --key=43HxMWGBijFaYEr5 --min_entry_date=2010-01-01 --out=notes.csv \
    https://www.google.org/personfinder/test/feeds/note

If you need to keep another database synchronized with the Google Person Finder database, use the min_entry_date and skip parameters to download incremental updates. Use the latest entry_date you have previously received from Google Person Finder as the min_entry_date for your first request. Then take the latest entry_date in the batch you receive, and use it as the min_entry_date for your next request. Use the skip parameter to skip the records you already received that have the same entry_date. This algorithm is implemented by tools/download_feed.py.

We also recommend that you use the min_entry_date parameter in the above fashion whenever you need to scan a large number of records. (The skip parameter is limited to a maximum value of 800 due to limitations in the App Engine datastore, so using skip alone will not let you scan all the records.)

When displaying any content accessed via the API to a user, you must display a link to the original source of information to refer any enquiries in connection with that specific record back to the original source, and may not intermix content accessed via the API with other content in any way that makes it unclear which content came from which source.