# Accessing LMEC Collections via JSON API

This notebook provides some tips for using those APIs to query the LMEC collections portal and programmatically retrieve metadata about collections items.

### URL syntax

To retrieve any page as JSON, simply append `.json` to the page URL. On the collections portal, this should be placed directly after `search`:

    # normal, return HTML
    "https://collections.leventhalmap.org/search?utf8=%E2%9C%93&q=lowell&search_field=all_fields"

    # return JSON
    "https://collections.leventhalmap.org/search.json?utf8=%E2%9C%93&q=Lowell&search_field=all_fields"

### Increasing max items returned from search query

By default this query will return a max of 20 items (it's reading from the page). You can increase this to 100 by replacing `utf8=%E2%9C%93&` with `per_page=100&`:

    # normal, return HTML with up to 100 items per page
    https://collections.leventhalmap.org/search?per_page=100&q=lowell&search_field=all_fields

    # return JSON
    https://collections.leventhalmap.org/search.json?per_page=100&q=lowell&search_field=all_fields

### Tweaking the query with other filters

You can also tweak your search by adjusting things like "Place," "Topic," and "Date" on the collections portal itself before grabbing the URL. The following query searches against 2 parameters: 1) maps that match the keyword "Lowell" 2) with a date of 1850 or later. It also lists 100 items per page, although only 19 maps are returned:

    # normal, return HTML with up to 100 items and a date constraint
    https://collections.leventhalmap.org/search?per_page=100&q=lowel&range%5Bdate_facet_yearly_itim%5D%5Bbegin%5D=1850&range%5Bdate_facet_yearly_itim%5D%5Bend%5D=1951&search_field=dummy_range

    # return JSON

    https://collections.leventhalmap.org/search.json?per_page=100&q=lowel&range%5Bdate_facet_yearly_itim%5D%5Bbegin%5D=1900&range%5Bdate_facet_yearly_itim%5D%5Bend%5D=1950&search_field=dummy_range

### Collections item-level syntax

At the item level, `.json` should be appended to the end of the collections item, directly after the commonwealth ID:

    # normal, return HTML
    https://collections.leventhalmap.org/search/commonwealth:3f463717c

    # return JSON
    https://collections.leventhalmap.org/search/commonwealth:3f463717c.json

### Parsing a single item

We can parse a single item by first reading JSON data from a given URL into a Python dictionary, and then printing it as a string:

In [4]:
import json
import requests
import pandas as pd

data = requests.get("https://collections.leventhalmap.org/search/commonwealth:3f463717c.json")

print(json.dumps(data.json(), indent=2))

{
  "response": {
    "document": {
      "id": "commonwealth:3f463717c",
      "system_create_dtsi": "2017-04-06T21:12:58Z",
      "system_modified_dtsi": "2021-11-15T10:30:53Z",
      "curator_model_ssi": "Curator::DigitalObject",
      "curator_model_suffix_ssi": "DigitalObject",
      "title_info_primary_tsi": "City of Lowell",
      "title_info_alternative_tsim": [
        "Latest map of the city of Lowell Massachusetts"
      ],
      "genre_basic_ssim": [
        "Maps"
      ],
      "date_tsim": [
        "1904"
      ],
      "date_type_ssm": [
        "dateCreated"
      ],
      "date_edtf_ssm": [
        "1904"
      ],
      "date_start_dtsi": "1904-01-01T00:00:00Z",
      "date_end_dtsi": "1904-12-31T23:59:59.999Z",
      "name_tsim": [
        "Geo. H. Walker & Co.",
        "Kearney, Stephen"
      ],
      "name_role_tsim": [
        "Creator",
        "Contributor"
      ],
      "name_facet_ssim": [
        "Geo. H. Walker & Co.",
        "Kearney, Stephen"
      ],

### Retrieving a larger query

That was just JSON from one item. We can also retrieve and parse multiple items at once by redefining the `data` variable with a **search URL** instead of a single item.

For example, the URL

`https://collections.leventhalmap.org/search.json?per_page=100&q=lowell&search_field=all_fields`

will return a larger response. This search for "lowell" returned 35 items total:

In [5]:
data = requests.get("https://collections.leventhalmap.org/search.json?per_page=100&q=lowell&search_field=all_fields")

len(data.json()["response"]["docs"])

35

We printed the total number of items because printing the full JSON would take up way too much space.

### Architecture of the API response

If you loop through the query's `response`, you can parse each section of the collections portal's web page:

In [305]:
for a in data.json()["response"]:
    print(a)

docs
facets
pages


where `docs` contains collection items, `facets` contains filters (e.g., the "Date" or "Subject" filter), and `pages` contains actions for moving through pages. (Since this query contains 35 items and we've set our page view to 100, there's only 1 page here.)

We mostly will want to interact with `docs`. Let's start by figuring out what kind of metadata each item contains.

### Accessing metadata fields

We can easily list metadata fields by:

1. putting our data into a data frame and
2. listing the data frame's columns

In [306]:
df = pd.DataFrame(data.json()['response']['docs'])

print(list(df.columns.values))

['id', 'system_create_dtsi', 'system_modified_dtsi', 'curator_model_ssi', 'curator_model_suffix_ssi', 'title_info_primary_tsi', 'title_info_alternative_tsim', 'genre_basic_ssim', 'date_tsim', 'date_type_ssm', 'date_edtf_ssm', 'date_start_dtsi', 'date_end_dtsi', 'name_tsim', 'name_role_tsim', 'name_facet_ssim', 'related_item_host_ssim', 'subject_topic_tsim', 'subject_facet_ssim', 'subject_coordinates_geospatial', 'subject_point_geospatial', 'subject_geojson_facet_ssim', 'subject_hiergeo_geojson_ssm', 'physical_location_ssim', 'sub_location_tsi', 'identifier_uri_ss', 'identifier_local_other_tsim', 'identifier_local_call_tsim', 'identifier_local_barcode_tsim', 'note_tsim', 'scale_tsim', 'rights_ss', 'license_ss', 'reuse_allowed_ssi', 'digital_origin_ssi', 'extent_tsi', 'publisher_tsi', 'pubplace_tsi', 'resource_type_manuscript_bsi', 'type_of_resource_ssim', 'lang_term_ssim', 'publishing_state_ssi', 'processing_state_ssi', 'destination_site_ssim', 'hosting_status_ssi', 'harvesting_status_b

There are a *lot* of metadata fields here (77!). You don't need to see all of them, since many contain irrelevant detail or null values, so our next step is to filter some fields out.

You could start by examining what each field contains by visiting the [BPL's field name reference guide](https://github.com/boston-library/solr-core-conf/wiki/SolrDocument-field-reference:-public-API).

### Filtering a data frame by columns

Still, this task can be pretty arduous. Below, we selected a few particularly useful fields and stored them as a list so that we only see selected columns in the resultant data frame. We also renamed the fields for readability.

In [381]:
fields = ['title_info_primary_tsi', 'name_tsim', 'id', 'date_end_dtsi', 'georeferenced_bsi']
newFieldNames = {'title_info_primary_tsi':'title', 'name_tsim':'creator', 'id':'commonwealth_id', 'date_end_dtsi':'date', 'georeferenced_bsi':'georef'}

df_fltr = pd.DataFrame(df[fields])
df_fltr.rename(columns = newFieldNames, inplace = True)
df_fltr

Unnamed: 0,title,creator,commonwealth_id,date,georef
0,City of Lowell,"[Geo. H. Walker & Co., Kearney, Stephen]",commonwealth:3f463717c,1904-12-31T23:59:59.999Z,
1,City of Lowell,"[Geo. H. Walker & Co., Bowers, George.]",commonwealth:3f463719x,1904-12-31T23:59:59.999Z,
2,"View of Lowell, Mass","[Farrar, E. A.]",commonwealth:x059c994q,1834-12-31T23:59:59.999Z,
3,Map of the city of Lowell,"[Beard, Ithamar A., Hoar, J., Boynton, George ...",commonwealth:js9577479,1842-12-31T23:59:59.999Z,True
4,"Birds eye view of Lowell, Mass","[Bailey, H. H. (Howard Heston), 1836-1878, Haz...",commonwealth:x633fc76x,1876-12-31T23:59:59.999Z,
5,"Plan of the city of Lowell, Massachusetts",[Sidney & Neff],commonwealth:wd3761962,1850-12-31T23:59:59.999Z,True
6,"Atlas of the city of Lowell, Massachusetts",[L.J. Richards & Co.],commonwealth:w3765v28c,1896-12-31T23:59:59.999Z,
7,A plan & profile of the Boston & Lowell Railroad,"[Baldwin, James Fowle, 1782-1862]",commonwealth:9s161b444,1836-12-31T23:59:59.999Z,
8,"Plan of land in Ayers' New-City, Lowell, Mass",[Butterfield & Swan],commonwealth:79408313b,1852-12-31T23:59:59.999Z,
9,"Richards standard atlas of the city of Lowell,...",[Richards Map Company],commonwealth:1z40ph262,1924-12-31T23:59:59.999Z,


### Commonwealth ID's

One thing to highlight here are the **commonwealth ID's**. In our collections, commonwealth ID's are a stable item identifier. Prefixing any of these ID's with `https://collections.leventhalmap.org/search/` will take you directly to the item's web page.

### Filtering data frame by column values

Now, let's say we want to filter our response according to certain metadata fields, for example only retrieving ID's for maps that have been georeferenced.

The [`.loc` property](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html) of pandas makes it easy to access rows and columns by a label or array. 

In [380]:
df_fltr.loc[df['georeferenced_bsi'] == True]

Unnamed: 0,title,creator,commonwealth_id,date,georef,year
3,Map of the city of Lowell,"[Beard, Ithamar A., Hoar, J., Boynton, George ...",commonwealth:js9577479,1842-12-31T23:59:59.999Z,True,1842
5,"Plan of the city of Lowell, Massachusetts",[Sidney & Neff],commonwealth:wd3761962,1850-12-31T23:59:59.999Z,True,1850
19,Massachusetts,"[Goldthwait, J. H.]",commonwealth:1257bb57k,1824-12-31T23:59:59.999Z,True,1824
21,New map of Massachusetts compiled from the lat...,[E.P. Dutton (Firm)],commonwealth:cj82kv99b,1863-12-31T23:59:59.999Z,True,1863
22,"City of Cambridge, Mass",,commonwealth:x633fc74c,1877-12-31T23:59:59.999Z,True,1877
24,Map of Boston and its vicinity from actual sur...,"[Hales, John Groves.]",commonwealth:js956j99r,1833-12-31T23:59:59.999Z,True,1833
25,New map of Massachusetts,"[Dearborn, Nathaniel, 1786-1852]",commonwealth:xg94j267x,1840-12-31T23:59:59.999Z,True,1840
26,New map of Massachusetts,,commonwealth:wd376558g,1836-12-31T23:59:59.999Z,True,1836
28,Plan of the Craigie Estate in Cambridge,"[White, Ferdinand E., d. 1853]",commonwealth:1257bc17t,1850-12-31T23:59:59.999Z,True,1850
30,"Map of Massachusetts, Rhode-Island & Connecticut","[Wells, J.]",commonwealth:7h14b147r,1843-12-31T23:59:59.999Z,True,1843


### Calculate a new column based on an existing one

You could also add more parameters, such as filtering for maps that are georeferenced only after 1850. This requires creating a new field, since our `date` field is stored as an [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) string.

Here, we just extracted the first 4 characters from the `date` field and turned them into an integer:

In [366]:
df_fltr['year'] = df_fltr['date'].str[:4].astype(int)
df_fltr

Unnamed: 0,title,creator,commonwealth_id,date,georef,year
0,City of Lowell,"[Geo. H. Walker & Co., Kearney, Stephen]",commonwealth:3f463717c,1904-12-31T23:59:59.999Z,,1904
1,City of Lowell,"[Geo. H. Walker & Co., Bowers, George.]",commonwealth:3f463719x,1904-12-31T23:59:59.999Z,,1904
2,"View of Lowell, Mass","[Farrar, E. A.]",commonwealth:x059c994q,1834-12-31T23:59:59.999Z,,1834
3,Map of the city of Lowell,"[Beard, Ithamar A., Hoar, J., Boynton, George ...",commonwealth:js9577479,1842-12-31T23:59:59.999Z,True,1842
4,"Birds eye view of Lowell, Mass","[Bailey, H. H. (Howard Heston), 1836-1878, Haz...",commonwealth:x633fc76x,1876-12-31T23:59:59.999Z,,1876
5,"Plan of the city of Lowell, Massachusetts",[Sidney & Neff],commonwealth:wd3761962,1850-12-31T23:59:59.999Z,True,1850
6,"Atlas of the city of Lowell, Massachusetts",[L.J. Richards & Co.],commonwealth:w3765v28c,1896-12-31T23:59:59.999Z,,1896
7,A plan & profile of the Boston & Lowell Railroad,"[Baldwin, James Fowle, 1782-1862]",commonwealth:9s161b444,1836-12-31T23:59:59.999Z,,1836
8,"Plan of land in Ayers' New-City, Lowell, Mass",[Butterfield & Swan],commonwealth:79408313b,1852-12-31T23:59:59.999Z,,1852
9,"Richards standard atlas of the city of Lowell,...",[Richards Map Company],commonwealth:1z40ph262,1924-12-31T23:59:59.999Z,,1924


### Filtering against multiple parameters

Now we can use the `.loc` property again, but this time filtering by two parameters: 1) georeferenced maps that were 2) created beginning in 1850.

In [371]:
df_fltr.loc[(df_fltr['georef'] == True) & (df_fltr['year'] >= 1850)]

Unnamed: 0,title,creator,commonwealth_id,date,georef,year
5,"Plan of the city of Lowell, Massachusetts",[Sidney & Neff],commonwealth:wd3761962,1850-12-31T23:59:59.999Z,True,1850
21,New map of Massachusetts compiled from the lat...,[E.P. Dutton (Firm)],commonwealth:cj82kv99b,1863-12-31T23:59:59.999Z,True,1863
22,"City of Cambridge, Mass",,commonwealth:x633fc74c,1877-12-31T23:59:59.999Z,True,1877
28,Plan of the Craigie Estate in Cambridge,"[White, Ferdinand E., d. 1853]",commonwealth:1257bc17t,1850-12-31T23:59:59.999Z,True,1850
34,Map of the city of Boston and immediate neighb...,"[McIntyre, H. (Henry), Friend & Aub, Wagner & ...",commonwealth:3f4632536,1852-12-31T23:59:59.999Z,True,1852


### Filtering by date by API request

A less programmatic way to filter by date is to just manually set filters to your desired search on the [LMEC collections portal](https://collections.leventhalmap.org), and then grab the resulting URL.

To do it this way, we'll first redefine our original request, and then we'll recreate the necessary data frames:

In [384]:
# request a search query that is pre-filtered by a date range

data_Date = requests.get("https://collections.leventhalmap.org/search.json?utf8=%E2%9C%93&q=lowell&search_field=dummy_range&range%5Bdate_facet_yearly_itim%5D%5Bbegin%5D=1850&range%5Bdate_facet_yearly_itim%5D%5Bend%5D=1951&commit=Apply")

# define the results of that query as a data frame

df_Date = pd.DataFrame(data_Date.json()['response']['docs'])

# filter the data frame so that it only shows relevant columns
# and rename the columns so they're human readable

df_Date_fltr = pd.DataFrame(df_Date[fields])
df_Date_fltr.rename(columns = newFieldNames, inplace = True)

# print data frame

df_Date_fltr

Unnamed: 0,title,creator,commonwealth_id,date,georef
0,City of Lowell,"[Geo. H. Walker & Co., Kearney, Stephen]",commonwealth:3f463717c,1904-12-31T23:59:59.999Z,
1,City of Lowell,"[Geo. H. Walker & Co., Bowers, George.]",commonwealth:3f463719x,1904-12-31T23:59:59.999Z,
2,"Birds eye view of Lowell, Mass","[Bailey, H. H. (Howard Heston), 1836-1878, Haz...",commonwealth:x633fc76x,1876-12-31T23:59:59.999Z,
3,"Plan of the city of Lowell, Massachusetts",[Sidney & Neff],commonwealth:wd3761962,1850-12-31T23:59:59.999Z,True
4,"Atlas of the city of Lowell, Massachusetts",[L.J. Richards & Co.],commonwealth:w3765v28c,1896-12-31T23:59:59.999Z,
5,"Plan of land in Ayers' New-City, Lowell, Mass",[Butterfield & Swan],commonwealth:79408313b,1852-12-31T23:59:59.999Z,
6,"Richards standard atlas of the city of Lowell,...",[Richards Map Company],commonwealth:1z40ph262,1924-12-31T23:59:59.999Z,
7,Belvidere Park,[Walker Lith. & Pub. Co],commonwealth:xg94j2618,1900-12-31T23:59:59.999Z,
8,Boston Clinton Fitchburg and Mansfield Framing...,,commonwealth:cj82kn91w,1879-12-31T23:59:59.999Z,
9,Atlas of Massachusetts,"[Walker, O. W. (Oscar W.), Geo. H. Walker & Co...",commonwealth:xw42qw91v,1904-12-31T23:59:59.999Z,


### Filtering by column again

Filtering by the `georef` column shows us the same 5 maps of Lowell which meet these two parameters: 1) georeferenced and 2) created after 1850.

In [385]:
df_Date_fltr.loc[df_Date_fltr['georef'] == True]

Unnamed: 0,title,creator,commonwealth_id,date,georef
3,"Plan of the city of Lowell, Massachusetts",[Sidney & Neff],commonwealth:wd3761962,1850-12-31T23:59:59.999Z,True
11,New map of Massachusetts compiled from the lat...,[E.P. Dutton (Firm)],commonwealth:cj82kv99b,1863-12-31T23:59:59.999Z,True
12,"City of Cambridge, Mass",,commonwealth:x633fc74c,1877-12-31T23:59:59.999Z,True
15,Plan of the Craigie Estate in Cambridge,"[White, Ferdinand E., d. 1853]",commonwealth:1257bc17t,1850-12-31T23:59:59.999Z,True
18,Map of the city of Boston and immediate neighb...,"[McIntyre, H. (Henry), Friend & Aub, Wagner & ...",commonwealth:3f4632536,1852-12-31T23:59:59.999Z,True
