# Accessing LMEC Collections

The LMEC digital collections live at <https://collections.leventhalmap.org>. As a Digital Commonwealth partner institution, you can also find any of our items online at <https://digitalcommonwealth.org>. The over 11,000 items in our digital collections can be queried by a few different, related methods: JSON API, IIIF API, and scraping. As Digital Commonwealth partners, all of our collections adhere to [Digital Commonwealth's API specifications](https://digitalcommonwealth.org).

This notebook provides some tips for using those APIs to query the LMEC collections portal and programmatically retrieve metadata about collections items.

### Via JSON API

To retrieve any page as JSON, simply append `.json` to the page URL. On the collections portal, this should be placed directly after `search`:

    # normal, return HTML
    https://collections.leventhalmap.org/search?utf8=%E2%9C%93&q=lowell&search_field=all_fields

    # return JSON
    https://collections.leventhalmap.org/search.json?utf8=%E2%9C%93&q=Lowell&search_field=all_fields


By default this query will return 20 items. You can increase this to 100 by replacing `utf8=%E2%9C%93&` with `per_page=100&`:

    # normal, return HTML with up to 100 items per page
    https://collections.leventhalmap.org/search?per_page=100&q=lowell&search_field=all_fields

    # return JSON up to 100 items per page
    https://collections.leventhalmap.org/search.json?per_page=100&q=lowell&search_field=all_fields

The same syntax applies at the item level:

    # normal, return HTML
    https://collections.leventhalmap.org/search/commonwealth:3f463717c

    # return JSON
    https://collections.leventhalmap.org/search/commonwealth:3f463717c.json

We can parse one of these items with something like this, which reads JSON data from a given URL into a Python dictionary and prints it as a string:

In [19]:
import urllib
import json

url = "https://collections.leventhalmap.org/search/commonwealth:3f463717c.json"
response = urllib.request.urlopen(url)
data = json.loads(response.read())

print(json.dumps(data, indent=2))

{
  "response": {
    "document": {
      "id": "commonwealth:3f463717c",
      "system_create_dtsi": "2017-04-06T21:12:58Z",
      "system_modified_dtsi": "2021-11-15T10:30:53Z",
      "curator_model_ssi": "Curator::DigitalObject",
      "curator_model_suffix_ssi": "DigitalObject",
      "title_info_primary_tsi": "City of Lowell",
      "title_info_alternative_tsim": [
        "Latest map of the city of Lowell Massachusetts"
      ],
      "genre_basic_ssim": [
        "Maps"
      ],
      "date_tsim": [
        "1904"
      ],
      "date_type_ssm": [
        "dateCreated"
      ],
      "date_edtf_ssm": [
        "1904"
      ],
      "date_start_dtsi": "1904-01-01T00:00:00Z",
      "date_end_dtsi": "1904-12-31T23:59:59.999Z",
      "name_tsim": [
        "Geo. H. Walker & Co.",
        "Kearney, Stephen"
      ],
      "name_role_tsim": [
        "Creator",
        "Contributor"
      ],
      "name_facet_ssim": [
        "Geo. H. Walker & Co.",
        "Kearney, Stephen"
      ],

Redefining the variable `url` with a query for multuple collections items, like

`https://collections.leventhalmap.org/search.json?per_page=100&q=lowell&search_field=all_fields`

will return a larger response:

In [20]:
url = "https://collections.leventhalmap.org/search.json?per_page=100&q=lowell&search_field=all_fields"
response = urllib.request.urlopen(url)
data = json.loads(response.read())

print(json.dumps(data, indent=2))

{
  "response": {
    "docs": [
      {
        "id": "commonwealth:3f463717c",
        "system_create_dtsi": "2017-04-06T21:12:58Z",
        "system_modified_dtsi": "2021-11-15T10:30:53Z",
        "curator_model_ssi": "Curator::DigitalObject",
        "curator_model_suffix_ssi": "DigitalObject",
        "title_info_primary_tsi": "City of Lowell",
        "title_info_alternative_tsim": [
          "Latest map of the city of Lowell Massachusetts"
        ],
        "genre_basic_ssim": [
          "Maps"
        ],
        "date_tsim": [
          "1904"
        ],
        "date_type_ssm": [
          "dateCreated"
        ],
        "date_edtf_ssm": [
          "1904"
        ],
        "date_start_dtsi": "1904-01-01T00:00:00Z",
        "date_end_dtsi": "1904-12-31T23:59:59.999Z",
        "name_tsim": [
          "Geo. H. Walker & Co.",
          "Kearney, Stephen"
        ],
        "name_role_tsim": [
          "Creator",
          "Contributor"
        ],
        "name_facet_ssim": [

This search for "lowell," a town in Massachusetts, returned 35 items:

In [22]:
len(data["response"]["docs"])

35

If you loop through the query's `response`, you can parse each section of the collections portal's web page:

In [21]:
for a in data["response"]:
    print(a)

docs
facets
pages


where `docs` contains collection items, `facets` contains metadata filters, and `pages` contains actions for moving through pages. Since this query only contains 35 items and we've set our page view to 100, there's only 1 page here.

It's easy to retrieve a list of publicly-accessible metadata field names, which will allow us to further refine the query, by looping through `docs`:

In [23]:
for a in data["response"]["docs"][0]:
    print(a)

id
system_create_dtsi
system_modified_dtsi
curator_model_ssi
curator_model_suffix_ssi
title_info_primary_tsi
title_info_alternative_tsim
genre_basic_ssim
date_tsim
date_type_ssm
date_edtf_ssm
date_start_dtsi
date_end_dtsi
name_tsim
name_role_tsim
name_facet_ssim
related_item_host_ssim
subject_topic_tsim
subject_facet_ssim
subject_coordinates_geospatial
subject_point_geospatial
subject_geojson_facet_ssim
subject_hiergeo_geojson_ssm
physical_location_ssim
sub_location_tsi
identifier_uri_ss
identifier_local_other_tsim
identifier_local_call_tsim
identifier_local_barcode_tsim
note_tsim
scale_tsim
rights_ss
license_ss
reuse_allowed_ssi
digital_origin_ssi
extent_tsi
publisher_tsi
pubplace_tsi
resource_type_manuscript_bsi
type_of_resource_ssim
lang_term_ssim
publishing_state_ssi
processing_state_ssi
destination_site_ssim
hosting_status_ssi
harvesting_status_bsi
exemplary_image_ssi
exemplary_image_key_base_ss
admin_set_name_ssi
admin_set_ark_id_ssi
institution_name_ssi
institution_ark_id_ssi
co

You can examine what each field contains by visiting the [BPL's field name reference guide](https://github.com/boston-library/solr-core-conf/wiki/SolrDocument-field-reference:-public-API).

We can now loop through `docs` to retrieve, for example, a list of Commonwealth ID's:

In [24]:
for a in data["response"]["docs"]:
    print(a["id"])

commonwealth:3f463717c
commonwealth:3f463719x
commonwealth:x059c994q
commonwealth:js9577479
commonwealth:x633fc76x
commonwealth:wd3761962
commonwealth:w3765v28c
commonwealth:9s161b444
commonwealth:79408313b
commonwealth:1z40ph262
commonwealth:9s161d16c
commonwealth:xg94j2618
commonwealth:25152k02m
commonwealth:cj82kn91w
commonwealth:j9604j44v
commonwealth:c534g3670
commonwealth:9g54xk44k
commonwealth:b5645h71s
commonwealth:xw42qw91v
commonwealth:1257bb57k
commonwealth:x633fc48r
commonwealth:cj82kv99b
commonwealth:x633fc74c
commonwealth:x633fc601
commonwealth:js956j99r
commonwealth:xg94j267x
commonwealth:wd376558g
commonwealth:wd376308v
commonwealth:1257bc17t
commonwealth:3f463872n
commonwealth:7h14b147r
commonwealth:7h14b143n
commonwealth:4m90f9062
commonwealth:7h14b1570
commonwealth:3f4632536


In our collections, commonwealth ID's are a stable item identifier. Prefixing any of these ID's with `https://collections.leventhalmap.org/search/` will take you directly to the item's web page.

Now, let's say we want to filter our response according to certain metadata fields, for example only retrieving ID's for maps created after 1900.

We can start by retrieving 

In [50]:
import pandas as pd

newData = data['response']['docs']

dataframe = pd.DataFrame(newData)

fields = ['title_info_primary_tsi', 'id', 'date_facet_yearly_itim']

filtered_df = dataframe[fields]
pd.DataFrame(filtered_df['date_facet_yearly_itim'] > 1900)

TypeError: '>' not supported between instances of 'list' and 'int'