[Smithsonian Open Access] (https://www.si.edu/openaccess), allows us to download, share, and reuse millions of the Smithsonian’s images and data from across the Smithsonian’s 19 museums, nine research centers, libraries, archives, and the National Zoo.

This notebook tries to introduce how to explore the repository and create a CSV dataset.

The Open Access API requires an API key to access the endpoints. Please register with https://api.data.gov/signup/ to get a key.

## Setting up things

In [171]:
import requests, csv
import json
import pandas as pd

## Global configuration 

In this section, we can add our api_key, the text that we want to use to search and retrieve the elements, and the number of records to retrieve.

In [180]:
api_key = 'add_your_api_kry' # add your own api_key
q = 'theodore roosevelt' # querystring
rows = '100' # number of records to retrieve

## Accesssing Smithsonian repository
Please visit https://edan.si.edu/openaccess/apidocs/#api-search-search for more information.

In [191]:
url = 'https://api.si.edu/openaccess/api/v1.0/search'
r = requests.get(url, params = {'q': q, 'start':'0', 'rows': rows, 'api_key': api_key })
print(r.url)
response = r.text

https://api.si.edu/openaccess/api/v1.0/search?q=theodore+roosevelt&start=0&rows=100&api_key=L20kTDAWj35bazo1Zhwx8wN5Ua0zKmhHz8PtIacX


### Creating a csv file

In [192]:
csv_out = csv.writer(open('si_records.csv', 'w'), delimiter = ',', quotechar = '"', quoting = csv.QUOTE_MINIMAL)
csv_out.writerow(['title', 'date', 'media_usage', 'data_source', 'dimensions', 'sitter', 'type', 'medium', 'artist', 'link'])

78

### Reading the results and retrieving the metadata

In [183]:
results = json.loads(response)

for r in results['response']['rows']:
    print(r['id'] + ' ' +  r['title'])
    
    # getting the identifiers of the records to access the IIIF manifests
    try:
        for i in range(len(r['content']['descriptiveNonRepeating']['online_media']['media'])):
            idsId = r['content']['descriptiveNonRepeating']['online_media']['media'][i]['idsId']
            print(idsId)

            # retrieving the manifest
            iiifUrl = 'https://ids.si.edu/ids/manifest/' + idsId
            iiifItemResponse = requests.get(iiifUrl)

            iiifItem = json.loads(iiifItemResponse.text)

            # retrieving metadata
            title = date = licence = datasource = dimensions = sitter = typem = medium = artist = link =''

            for i in iiifItem['metadata']:
                if i['label'] == 'Title':
                    title = i['value']
                elif i['label'] == 'Date':
                    date = i['value']
                elif i['label'] == 'Media Usage':
                    licence = i['value']
                elif i['label'] == 'Data Source':
                    datasource = i['value']
                elif i['label'] == 'Dimensions':
                    dimensions = i['value']
                elif i['label'] == 'Sitter':
                    sitter = i['value']
                elif i['label'] == 'Type':
                    typem = i['value']
                elif i['label'] == 'Medium':    
                    medium = i['value']
                elif i['label'] == 'Artist':    
                    artist = i['value']
                else: pass

            csv_out.writerow([title,date,licence,datasource,dimensions,sitter,typem,medium,artist,iiifUrl])

    except:
        print("An exception occurred")        

edanmdm-npg_NPG.81.125 Theodore Roosevelt
NPG-NPG_81_125
Julius Ludovici, active 1884
edanmdm-npg_NPG.81.131 Theodore Roosevelt
NPG-NPG_81_131
Jessie Tarbox Beals, 1870 - 30 May 1942
edanmdm-npg_NPG.81.133 Theodore Roosevelt
NPG-NPG_81_133
Jessie Tarbox Beals, 1870 - 30 May 1942
edanmdm-npg_NPG.97.198 Theodore Roosevelt
NPG-NPG_97_198
James S. King, 1852 - 1925
edanmdm-npg_S_NPG.84.296 Theodore Roosevelt and Frank J. Hogan
NPG-S-NPG_84_296
Harris & Ewing Studio, active 1905 - 1977
edanmdm-npg_NPG.80.57 Theodore Roosevelt
NPG-8000114B_1
Sidney Lawton Smith, 1845 - 1929
edanmdm-npg_S_NPG.84.285 Theodore Roosevelt
NPG-S-NPG_84_285
Harris & Ewing Studio, active 1905 - 1977
edanmdm-npg_S_NPG.77.223 Theodore Roosevelt
NPG-S-NPG_77_223
Pirie MacDonald, 1867 - 1942
edanmdm-npg_NPG.91.21 Theodore Roosevelt
NPG-NPG_91_21
Albert Rosenthal, 1863 - 1939
edanmdm-npg_NPG.81.130 Theodore Roosevelt
NPG-NPG_81_130
Jessie Tarbox Beals, 1870 - 30 May 1942
edanmdm-npg_NPG.81.53 Theodore Roosevelt
NPG-NPG_8

# Create some summary data

We can use Pandas to give us a quick overview of the dataset.

In [184]:
# Load the CSV file from GitHub.
# This puts the data in a Pandas DataFrame
df = pd.read_csv('si_records.csv')

## Have a peek

In [185]:
df

Unnamed: 0,title,date,media_usage,data_source,dimensions,sitter,type,medium,artist,link
0,Theodore Roosevelt,1884,CC0,National Portrait Gallery,"15cm x 10.6cm (5 7/8"" x 4 3/16""), Image","Theodore Roosevelt, 27 Oct 1858 - 6 Jan 1919",Photograph,Albumen silver print,"Julius Ludovici, active 1884",https://ids.si.edu/ids/manifest/NPG-NPG_81_125
1,Theodore Roosevelt,1904,CC0,National Portrait Gallery,"14.5cm x 18.9cm (5 11/16"" x 7 7/16""), Image","Theodore Roosevelt, 27 Oct 1858 - 6 Jan 1919",Photograph,Gelatin silver print,"Jessie Tarbox Beals, 1870 - 30 May 1942",https://ids.si.edu/ids/manifest/NPG-NPG_81_131
2,Theodore Roosevelt,1904,CC0,National Portrait Gallery,"13.1cm x 7.5cm (5 3/16"" x 2 15/16""), Image","Theodore Roosevelt, 27 Oct 1858 - 6 Jan 1919",Photograph,Gelatin silver print,"Jessie Tarbox Beals, 1870 - 30 May 1942",https://ids.si.edu/ids/manifest/NPG-NPG_81_133
3,Theodore Roosevelt,c. 1915,CC0,National Portrait Gallery,"Sheet: 45.7 × 35.3 cm (18 × 13 7/8"")","Theodore Roosevelt, 27 Oct 1858 - 6 Jan 1919",Print,Etching on paper,"James S. King, 1852 - 1925",https://ids.si.edu/ids/manifest/NPG-NPG_97_198
4,Theodore Roosevelt and Frank J. Hogan,c. 1905,CC0,National Portrait Gallery,"22.2cm x 34.3cm (8 3/4"" x 13 1/2""), Image","Frank J. Hogan, 1877 - 1944",Photograph,Gelatin silver print,"Harris & Ewing Studio, active 1905 - 1977",https://ids.si.edu/ids/manifest/NPG-S-NPG_84_296
5,Theodore Roosevelt,1905,CC0,National Portrait Gallery,"Sheet: 61.2 x 45.5 cm (24 1/8 x 17 15/16"")","Theodore Roosevelt, 27 Oct 1858 - 6 Jan 1919",Print,Etching on paper,"Sidney Lawton Smith, 1845 - 1929",https://ids.si.edu/ids/manifest/NPG-8000114B_1
6,Theodore Roosevelt,1907,CC0,National Portrait Gallery,"25.1cm x 20.5cm (9 7/8"" x 8 1/16""), Image","Theodore Roosevelt, 27 Oct 1858 - 6 Jan 1919",Photograph,Gelatin silver print,"Harris & Ewing Studio, active 1905 - 1977",https://ids.si.edu/ids/manifest/NPG-S-NPG_84_285
7,Theodore Roosevelt,,CC0,National Portrait Gallery,"45.6cm x 30.6cm (17 15/16"" x 12 1/16""), Image","Theodore Roosevelt, 27 Oct 1858 - 6 Jan 1919",Photograph,Photogravure,"Pirie MacDonald, 1867 - 1942",https://ids.si.edu/ids/manifest/NPG-S-NPG_77_223
8,Theodore Roosevelt,c. 1903,CC0,National Portrait Gallery,"Sheet: 63.5cm x 41.7cm (25"" x 16 7/16"")","Theodore Roosevelt, 27 Oct 1858 - 6 Jan 1919",Print,Etching on paper,"Albert Rosenthal, 1863 - 1939",https://ids.si.edu/ids/manifest/NPG-NPG_91_21
9,Theodore Roosevelt,1904,CC0,National Portrait Gallery,"18.2cm x 23.1cm (7 3/16"" x 9 1/8""), Image","Theodore Roosevelt, 27 Oct 1858 - 6 Jan 1919",Photograph,Gelatin silver print,"Jessie Tarbox Beals, 1870 - 30 May 1942",https://ids.si.edu/ids/manifest/NPG-NPG_81_130


## How many items?

In [186]:
# How many items?
len(df)

31

## Exploring the authors

In [187]:
# Get unique values
artist = pd.unique(df['artist'].str.split('|', expand=True).stack()).tolist()
for a in sorted(artist):
    print(a)

Albert Rosenthal, 1863 - 1939
Anders Leonard Zorn, 18 Feb 1860 - 1920
Barnett McPhee Clinedinst, Jr., 1862 - 1953
Charles Davis Mitchell, 1887 - 1940
Eugene Zimmerman, 1862 - 1935
Frederich Graetz, c. 1840 - c. 1913
Harris & Ewing Studio, active 1905 - 1977
James S. King, 1852 - 1925
Jessie Tarbox Beals, 1870 - 30 May 1942
Julius Ludovici, active 1884
Leo Mielziner, 1869 - 1935
Oscar Edward Cesare, 1885 - 1948
Pach Brothers Studio, active 1867 - 1993
Pirie MacDonald, 1867 - 1942
Sidney Lawton Smith, 1845 - 1929
Underwood & Underwood, active 1880 - c. 1950
Unidentified Artist
Vincent, Brooks, Day & Son Lithography Company, active 1867 - c. 1905
Walter Joseph Enright, 1879 - 1969


## How often is each name used?

In [None]:
# Splits the people column and counts frequencies
artist_counts = df['artist'].str.split('|').apply(lambda x: pd.Series(x).value_counts()).sum().astype('int').sort_values(ascending=False).to_frame().reset_index(level=0)
# Add column names
artist_counts.columns = ['name', 'count']
# Display with horizontal bars
display(artist_counts.style.bar(subset=['count'], color='#d65f5f').set_properties(subset=['count'], **{'width': '300px'}))

## Creating a list of unique types

In [195]:
# Get unique values
types = pd.unique(df['type']).tolist()
for type in sorted(types, key=str.lower):
    print(type)

Drawing
Photograph
Print
