# DPLA similar-item-finder 🕵️

I wanted to play around with the idea of making connections between items from different providers in DPLA.  Because Texas currently has two pathways to DPLA, I wanted to try to show the connections between the collections from each partner, and experiment with new kinds of serendipitous discovery in DPLA.  What if there was a way to take an item from one DPLA partner (like the Texas Digital Library) and find similar items from another partner (such as the Portal to Texas History)?

This script uses the DPLA API to search for possibly similar items.  It asks for the ID for an item in DPLA, selects a random subject from that item, and then uses that subject term to search for possibly similar items.  I've scoped it to search specifically for items from the Portal to Texas History, but you can easily change that to another DPLA partner.

It will return three randomly selected items, and also give you a link to see more in the DPLA portal.

In [1]:
#import some packages we'll need
import random
from IPython import display

import sys
!{sys.executable} -m pip install dpla

from dpla.api import DPLA

Defaulting to user installation because normal site-packages is not writeable
distutils: /Users/elliotwilliams/Library/Python/3.9/lib/python/site-packages
sysconfig: /Users/elliotwilliams/Library/Python/3.9/lib/python3.9/site-packages[0m
distutils: /Users/elliotwilliams/Library/Python/3.9/lib/python/site-packages
sysconfig: /Users/elliotwilliams/Library/Python/3.9/lib/python3.9/site-packages[0m
user = True
home = None
root = None
prefix = None[0m


In [2]:
#create DPLA object for querying API using API key
dpla = DPLA('9e772db07b96bf5971582a9e95d873ef')

The next step will ask you for the DPLA item id for the item you'd like to start with.  The DPLA id can be found in the url of an item in DPLA.  For example, for the item 'https://dp.la/item/1894f57e382d6c0c3e9e92ae302ef319', the id is '1894f57e382d6c0c3e9e92ae302ef319'.

In [3]:
#get DPLA id from user input
dplaid = input("DPLA item ID: ")

DPLA item ID: 0637e203dab5f450dc3ee529b1468b56


In [4]:
result = dpla.fetch_by_id([dplaid])

#print item title, to confirm that request worked
print(result.items[0]["sourceResource"]["title"])

['Young ladies surround a piano at the University of Tampa']


Okay, now we get to the good stuff! The next step is going to randomly extract one of the subjects from this item to use as a search query for similar items.

In [5]:
#create new object containing all of the subjects from fetched item
subjects = result.items[0]["sourceResource"]["subject"]

#get random integer within the number of subjects in record
randA = random.randint(0,(len(subjects)-1))

#use random integer to get subject term corresponding to it
random_subject = subjects[randA]["name"]
print("Searching for \'"+random_subject+"\'...")

Searching for 'Students'...


Now, we're ready to search for similar items. The next cell will take the subject we selected above and search DPLA for items matching that subject that were contributed by the Portal to Texas History.

In [6]:
#query DPLA API for items matching the random subject selected
fields = {"provider" : "Portal to Texas History"}
result2 = dpla.search(random_subject, searchFields=fields, page_size=100)

print("Results found: "+str(result2.count))

Results found: 22399


Now that we've queried the DPLA API, we're going to extract three random items from the results, and see what we found.
(We only requested the first 100 items in the results, so if there are more than that, you'll need to go to DPLA to view the rest.)

In [7]:
#test to make sure that there were items found
if result2.count == 0:
    print("Sorry, no results found for \'"+random_subject+"\'.  Try again!")

else:
    print("Here are some similar items about \'"+random_subject+"\':")
    
    #pick three random numbers to get items
    #caution: because it is random, it's possible that you'll end up getting the same number (and thus the same item) twice
    if result2.count < 100:
        itemA = random.randint(0,(result2.count-1))
        itemB = random.randint(0,(result2.count-1))
        itemC = random.randint(0,(result2.count-1))
    else:
        itemA = random.randint(0,99)
        itemB = random.randint(0,99)
        itemC = random.randint(0,99)
    
    #print(str(itemA),"",str(itemB),"",str(itemC)) #shows randomly chosen numbers - useful for testing
    
    #create list of those random item numbers
    randItems = [itemA, itemB, itemC]
    
    #extract metadata for each of those randomly chosen items
    for i in randItems:
        title = result2.items[i]["sourceResource"]["title"]
        title = str(title)[1:-1] #removes square brackets from title
        identifier = result2.items[i]["@id"]
        dataProvider = result2.items[i]["dataProvider"]
        subjects = result2.items[i]["sourceResource"]["subject"]
        url = "https://dp.la/item/" + result2.items[i]["id"]
        
        print(title)
        print(dataProvider)
        print(url)
        print()
    
    print("See all results in DPLA:")
    print('https://dp.la/search?q='+random_subject.replace(" ","+")+'&partner=%22The+Portal+to+Texas+History%22')

Here are some similar items about 'Students':
'Students being filmed'
[{'scheme': 'http://id.loc.gov/authorities#conceptscheme'}, {}, {}, {}, {}, {}, {}, {}, {}, {}]
UNT Libraries Special Collections
https://dp.la/item/028e2a44165e6f69315a09953c489b12

'Johnathan helping students'
[{}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}]
UNT Libraries Special Collections
https://dp.la/item/89baccf3dd32fc281576cee469ea0599

'Students at Statue'
[{'scheme': 'http://id.loc.gov/authorities#conceptscheme'}, {}, {}, {}, {}, {}, {}, {}]
UNT Libraries Special Collections
https://dp.la/item/00e3a0d4fea8688dccc68a80b27b2002

See all results in DPLA:
https://dp.la/search?q=Students&partner=%22The+Portal+to+Texas+History%22


And that's it!  Hope you found some interesting or unexpected results.

* To get different items from your search query, re-run the last code cell that generates random item numbers and displays results.
* If you want to get a different subject from your original item, re-run the cell the generates the "random_subject" variable.
* If you'd like to try with a different item, go back up to the cell with the DPLA id input.  

To get results from a DPLA partner other than the Portal to Texas History, change {"provider" : "Portal to Texas History"} to the name of another DPLA partner (https://dp.la/browse-by-partner).