# Data Collection

This is an analyses of the Met Galleries collection: found at https://metmuseum.github.io/.

Due to the size of the collection I have decided to only focus on the arms and armor collection.

This notebook provides the code for collecting the data for the api.

In [1]:
# Libraries used
import pandas as pd
import json
import requests
import os

The dataframe below returns the total number of items and their Ids in the whole museum.

In [3]:
objects = pd.read_json('https://collectionapi.metmuseum.org/public/collection/v1/objects')

In [5]:
objects.loc[0]['total']

480546

It's a large dataset, (480546 on the 31/07/2022). For this small project I would only like to look at one department. The department I will be looking at here is the Arms and Armor department with ID 4.

The link below pulls up all the ids of any object in the arms and armor department and the total for that department.

In [8]:
arms_objects = pd.read_json('https://collectionapi.metmuseum.org/public/collection/v1/objects?departmentIds=4')

In [9]:
arms_objects.loc[0]['total']

13613

The Arms and Armor department has a total of 13614 objects. Let's look at the data we can get for one of these objects

In [10]:
url = 'https://collectionapi.metmuseum.org/public/collection/v1/objects/22183'
download = requests.get(url=url)

In [11]:
example_arm =json.loads(download.content)

In [12]:
example_arm

{'objectID': 22183,
 'isHighlight': False,
 'accessionNumber': '14.25.823a, b',
 'accessionYear': '1914',
 'isPublicDomain': False,
 'primaryImage': '',
 'primaryImageSmall': '',
 'additionalImages': [],
 'constituents': None,
 'department': 'Arms and Armor',
 'objectName': 'Arm defense and left gauntlet',
 'title': 'Arm Defense and Left Gauntlet',
 'culture': 'possibly Italian, Savoy or Spanish',
 'period': '',
 'dynasty': '',
 'reign': '',
 'portfolio': '',
 'artistRole': '',
 'artistPrefix': '',
 'artistDisplayName': '',
 'artistDisplayBio': '',
 'artistSuffix': '',
 'artistAlphaSort': '',
 'artistNationality': '',
 'artistBeginDate': '',
 'artistEndDate': '',
 'artistGender': '',
 'artistWikidata_URL': '',
 'artistULAN_URL': '',
 'objectDate': 'ca. 1590',
 'objectBeginDate': 1565,
 'objectEndDate': 1615,
 'medium': 'Steel, gold',
 'dimensions': 'arm defense (a): H. 19 3/4 in. (50.2 cm); W. 11 3/4 in. (29.8 cm); D. 7 5/8 in. (19.4 cm); Wt. 4 lb. 11.3 oz. (2133 g); left gauntlet (b):

From the data above only some of the fields that are of interest to this project.
Some preliminary questions:

   * What percentage of the collection is highlighted?
   * What percentage is in the public domain?
   * How many individual artists are represented?
   * Who has the most works?
   * A timeline with the birth and death of the artists.
   * Which countries are most presented?
   * Which countries are most highlighted?
   * Which medium is the most used?
   * What is classified most often?

Below I have constructed the data collecting dataframe along with the fields I want to use in this project.

In [6]:
arms_and_armor_df = pd.DataFrame(columns = ['objectID', 
                                                 'isHighlight', 
                                                 'isPublicDomain',
                                                 'objectName', 
                                                 'title', 
                                                 'artistRole',
                                                 'artistDisplayName',
                                                 'artistBeginDate',
                                                 'artistEndDate',
                                                 'objectBeginDate',
                                                 'objectEndDate', 
                                                 'medium',
                                                 'country', 
                                                 'classification'])

This data is then collected using the code below

In [7]:
base_url = 'https://collectionapi.metmuseum.org/public/collection/v1/objects/'

In [8]:
for objectID in arms_objects["objectIDs"]:
    request = requests.get(url=base_url+ str(objectID))
    artifact = json.loads(request.content)
    artifact_df = pd.DataFrame([artifact])
    arms_and_armor_df= pd.concat([arms_and_armor_df,artifact_df], join= 'inner', ignore_index="True")

ConnectionError: HTTPSConnectionPool(host='collectionapi.metmuseum.org', port=443): Max retries exceeded with url: /public/collection/v1/objects/25038 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x00000143B3F4ED00>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

In [13]:
Below is a visual check that the correct amount of data has been collected. In the next section this will be cleaned.

SyntaxError: invalid syntax (<ipython-input-13-0f1e9db4ce9b>, line 1)

In [14]:
arms_and_armor_df

Unnamed: 0,objectID,isHighlight,isPublicDomain,objectName,title,artistRole,artistDisplayName,artistBeginDate,artistEndDate,objectBeginDate,objectEndDate,medium,country,classification
0,21810,False,True,Gorget,Gorget of an Officer of the King's American Re...,,,,,1781,1781,"Brass, gold",,Armor Parts-Gorgets
1,21811,False,True,Blade and mounting for a sword (Katana),Blade and Mounting for a Sword (Katana),,,,,1801,1900,"Steel, wood, lacquer, iron, gold, copper-gold ...",Japan,Swords
2,21812,False,False,Thumb of a left gauntlet,Thumb of a Left Gauntlet,,,,,1525,1575,"Steel, gold, leather",France,Armor Parts-Gauntlets
3,21813,False,True,Fencing shoulder and arm guard,Fencing Shoulder and Arm Guard,,,,,1590,1900,Cloth,Japan,Fencing Equipment
4,21814,False,True,Sword guard (Tsuba),Sword guard (<i>Tsuba</i>) Depicting Bodhidhar...,,,,,1701,1800,"Iron, gold, silver",Japan,Sword Furniture-Tsuba
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5288,27756,False,False,Dagger with sheath,Dagger with Sheath,,,,,1701,1900,"Steel, silver, gold, wood, ray skin",Tibet,Daggers
5289,27757,False,False,Ritual dagger (Phur pa),Ritual Dagger (Phur Pa),,,,,1801,1900,Wood,Tibet,Daggers
5290,27758,False,True,Ritual dagger (Phur pa),Ritual Dagger (Phur Pa),,,,,1801,1900,Brass,Tibet,Daggers
5291,27759,False,False,Ritual dagger (Phur pa),Ritual Dagger (Phur Pa),,,,,1801,1900,Brass,Tibet,Daggers


Below I have exported the dataframe into a json file for easier use

In [35]:
with open("met_arms_and_armor.json", "w") as f:
    print(f.write(arms_and_armor_df.to_json()))

3412041
