Key information regarding the project data:

1) This python notebook is used to process the data from the raw data files that are obtained from the repository:
https://github.com/NationalGalleryOfArt/opendata/tree/main/data
2) We will be processing all the files obtained from the mentioned repository so as to collect only data that we need in order to form our art inventory.
3) The input of the raw data is in csv format and we will process and output the data in csv as well.
4) The source data is available under the source folder within same directory structure and the processed data will be stored in the processed folder.
5) In order to process the data we will be using python.
6) The processed data will then be fed to the mysql database which will be created later.

In [26]:
# importing necessary libraries
import pandas as pd
import numpy as np

In [27]:
# reading locations csv file
locations = pd.read_csv('source/locations.csv')
locations.head()

Unnamed: 0,locationid,site,room,publicaccess,description,unitposition
0,3591,East Building,EBL,1,East Bldg Lawn,
1,3592,East Building,EBL,1,East Bldg Lawn,E
2,3593,East Building,EBL,1,East Bldg Lawn,S
3,3594,East Building,EBL,1,East Bldg Lawn,W
4,3617,East Building,EC-003,1,"East Bldg, Auditorium Lobby",


In [28]:
# drop column publicaccess
locations = locations.drop(columns=['publicaccess','unitposition'])
locations.head()

Unnamed: 0,locationid,site,room,description
0,3591,East Building,EBL,East Bldg Lawn
1,3592,East Building,EBL,East Bldg Lawn
2,3593,East Building,EBL,East Bldg Lawn
3,3594,East Building,EBL,East Bldg Lawn
4,3617,East Building,EC-003,"East Bldg, Auditorium Lobby"


In [29]:
# renaming the columns
locations.columns = ["loc_id","loc_site","loc_room","loc_description"]
locations.head()

Unnamed: 0,loc_id,loc_site,loc_room,loc_description
0,3591,East Building,EBL,East Bldg Lawn
1,3592,East Building,EBL,East Bldg Lawn
2,3593,East Building,EBL,East Bldg Lawn
3,3594,East Building,EBL,East Bldg Lawn
4,3617,East Building,EC-003,"East Bldg, Auditorium Lobby"


In [30]:
# reading objects csv file
objects = pd.read_csv('source/objects.csv')
objects.head()

  objects = pd.read_csv('source/objects.csv')


Unnamed: 0,objectid,accessioned,accessionnum,locationid,title,displaydate,beginyear,endyear,visualbrowsertimespan,medium,...,visualbrowserclassification,parentid,isvirtual,departmentabbr,portfolio,series,volume,watermarks,lastdetectedmodification,customprinturl
0,112226,1,2000.41.14,,Untitled (Stack),1992,1992.0,1992.0,1976 to 2000,brush and black ink on Bodleian paper,...,drawing,,0,CG-W,,,,BODLEIAN,2019-10-28 22:01:34.883-04,
1,114386,1,2000.127.6.1-124,,"Lithographs, Volume 15",,1804.0,1866.0,1801 to 1825,book of lithographs,...,volume,,0,CG-E,,,,,2020-05-06 22:01:32.06-04,
2,118962,1,2001.13.2,,Travelers beside a Ruined Portico,c. 1650,1570.0,1710.0,1551 to 1600,etching on laid paper,...,print,,0,CG-E,Roman Ruins with Animals and Figures,Roman Ruins with Animals and Figures,,post horn with MS,2020-05-06 22:01:32.06-04,
3,119195,1,2001.67.203,,"At Home after World War II, November 13",1945,1945.0,1945.0,1926 to 1950,gelatin silver print,...,photograph,,0,CPH,,,,,2021-08-11 22:01:16.29-04,
4,124519,1,2003.39.1,,Niche with Falconry Gear,probably 1660s,1660.0,1669.0,1651 to 1700,oil on canvas,...,painting,,0,CNE-B,,,,,2022-06-10 22:01:21.357-04,


In [31]:
# selecting only the columns we need
objects = objects[["objectid","locationid","title","beginyear","endyear","medium","dimensions","inscription","attribution","visualbrowserclassification","parentid"]]
objects.head()

Unnamed: 0,objectid,locationid,title,beginyear,endyear,medium,dimensions,inscription,attribution,visualbrowserclassification,parentid
0,112226,,Untitled (Stack),1992.0,1992.0,brush and black ink on Bodleian paper,overall: 51.8 x 72.3 cm (20 3/8 x 28 7/16 in.),lower right in graphite: WK 92,Win Knowlton,drawing,
1,114386,,"Lithographs, Volume 15",1804.0,1866.0,book of lithographs,,,Paul Gavarni,volume,
2,118962,,Travelers beside a Ruined Portico,1570.0,1710.0,etching on laid paper,plate: 19.3 x 13.5 cm (7 5/8 x 5 5/16 in.)\r\n...,lower left in plate: Jonas Umbach del.; lower ...,Bernhard Zaech after Jonas Umbach,print,
3,119195,,"At Home after World War II, November 13",1945.0,1945.0,gelatin silver print,sheet (trimmed to image): 19.6 x 29.8 cm (7 11...,center right verso: artist's stamp,Anatoly Skurikhin,photograph,
4,124519,,Niche with Falconry Gear,1660.0,1669.0,oil on canvas,overall: 80.5 x 64.5 cm (31 11/16 x 25 3/8 in....,lower right: Chr.Pierson.f,Christoffel Pierson,painting,


In [32]:
# renaming the columns
objects.columns = ["obj_id","loc_id","obj_title","obj_beginyear","obj_endyear","obj_medium","obj_dimensions","obj_inscription","obj_attribution","obj_class","obj_parentid"]
objects.head()

Unnamed: 0,obj_id,loc_id,obj_title,obj_beginyear,obj_endyear,obj_medium,obj_dimensions,obj_inscription,obj_attribution,obj_class,obj_parentid
0,112226,,Untitled (Stack),1992.0,1992.0,brush and black ink on Bodleian paper,overall: 51.8 x 72.3 cm (20 3/8 x 28 7/16 in.),lower right in graphite: WK 92,Win Knowlton,drawing,
1,114386,,"Lithographs, Volume 15",1804.0,1866.0,book of lithographs,,,Paul Gavarni,volume,
2,118962,,Travelers beside a Ruined Portico,1570.0,1710.0,etching on laid paper,plate: 19.3 x 13.5 cm (7 5/8 x 5 5/16 in.)\r\n...,lower left in plate: Jonas Umbach del.; lower ...,Bernhard Zaech after Jonas Umbach,print,
3,119195,,"At Home after World War II, November 13",1945.0,1945.0,gelatin silver print,sheet (trimmed to image): 19.6 x 29.8 cm (7 11...,center right verso: artist's stamp,Anatoly Skurikhin,photograph,
4,124519,,Niche with Falconry Gear,1660.0,1669.0,oil on canvas,overall: 80.5 x 64.5 cm (31 11/16 x 25 3/8 in....,lower right: Chr.Pierson.f,Christoffel Pierson,painting,


In [33]:
# reading published_images csv file
published_images = pd.read_csv('source/published_images.csv')
published_images.head()

Unnamed: 0,uuid,iiifurl,iiifthumburl,viewtype,sequence,width,height,maxpixels,created,modified,depictstmsobjectid,assistivetext
0,00004dec-8300-4487-8d89-562d0126b6a1,https://api.nga.gov/iiif/00004dec-8300-4487-8d...,https://api.nga.gov/iiif/00004dec-8300-4487-8d...,primary,0.0,2623,4000,640.0,2010-09-07 15:08:48-04,2022-06-15 12:51:00-04,11975,
1,00007f61-4922-417b-8f27-893ea328206c,https://api.nga.gov/iiif/00007f61-4922-417b-8f...,https://api.nga.gov/iiif/00007f61-4922-417b-8f...,primary,0.0,3365,4332,,2013-07-05 15:41:08-04,2022-05-23 14:59:28-04,17387,
2,0000bd8c-39de-4453-b55d-5e28a9beed38,https://api.nga.gov/iiif/0000bd8c-39de-4453-b5...,https://api.nga.gov/iiif/0000bd8c-39de-4453-b5...,primary,0.0,3500,4688,,2013-08-05 14:31:59-04,2022-05-23 15:05:58-04,19245,
3,0000e5a4-7d32-4c2a-97c6-a6b571c9fd71,https://api.nga.gov/iiif/0000e5a4-7d32-4c2a-97...,https://api.nga.gov/iiif/0000e5a4-7d32-4c2a-97...,primary,0.0,2252,3000,,2013-03-18 14:39:55-04,2022-05-17 18:19:25-04,153987,
4,0001668a-dd1c-48e8-9267-b6d1697d43c8,https://api.nga.gov/iiif/0001668a-dd1c-48e8-92...,https://api.nga.gov/iiif/0001668a-dd1c-48e8-92...,primary,0.0,3446,4448,,2014-01-02 14:50:50-05,2022-05-23 15:39:38-04,23830,


In [34]:
# selecting only the columns we need
published_images = published_images[["depictstmsobjectid","iiifurl","iiifthumburl","sequence","viewtype"]]
published_images.head()

Unnamed: 0,depictstmsobjectid,iiifurl,iiifthumburl,sequence,viewtype
0,11975,https://api.nga.gov/iiif/00004dec-8300-4487-8d...,https://api.nga.gov/iiif/00004dec-8300-4487-8d...,0.0,primary
1,17387,https://api.nga.gov/iiif/00007f61-4922-417b-8f...,https://api.nga.gov/iiif/00007f61-4922-417b-8f...,0.0,primary
2,19245,https://api.nga.gov/iiif/0000bd8c-39de-4453-b5...,https://api.nga.gov/iiif/0000bd8c-39de-4453-b5...,0.0,primary
3,153987,https://api.nga.gov/iiif/0000e5a4-7d32-4c2a-97...,https://api.nga.gov/iiif/0000e5a4-7d32-4c2a-97...,0.0,primary
4,23830,https://api.nga.gov/iiif/0001668a-dd1c-48e8-92...,https://api.nga.gov/iiif/0001668a-dd1c-48e8-92...,0.0,primary


In [35]:
# selecting only viewtype=primary images
published_images = published_images[published_images.viewtype == 'primary']
published_images.head()

Unnamed: 0,depictstmsobjectid,iiifurl,iiifthumburl,sequence,viewtype
0,11975,https://api.nga.gov/iiif/00004dec-8300-4487-8d...,https://api.nga.gov/iiif/00004dec-8300-4487-8d...,0.0,primary
1,17387,https://api.nga.gov/iiif/00007f61-4922-417b-8f...,https://api.nga.gov/iiif/00007f61-4922-417b-8f...,0.0,primary
2,19245,https://api.nga.gov/iiif/0000bd8c-39de-4453-b5...,https://api.nga.gov/iiif/0000bd8c-39de-4453-b5...,0.0,primary
3,153987,https://api.nga.gov/iiif/0000e5a4-7d32-4c2a-97...,https://api.nga.gov/iiif/0000e5a4-7d32-4c2a-97...,0.0,primary
4,23830,https://api.nga.gov/iiif/0001668a-dd1c-48e8-92...,https://api.nga.gov/iiif/0001668a-dd1c-48e8-92...,0.0,primary


In [36]:
# renaming the columns
published_images = published_images[["depictstmsobjectid","iiifurl","iiifthumburl","sequence"]]
published_images.columns = ["obj_id","img_url","img_thumburl","img_sequence"]
published_images.head()

Unnamed: 0,obj_id,img_url,img_thumburl,img_sequence
0,11975,https://api.nga.gov/iiif/00004dec-8300-4487-8d...,https://api.nga.gov/iiif/00004dec-8300-4487-8d...,0.0
1,17387,https://api.nga.gov/iiif/00007f61-4922-417b-8f...,https://api.nga.gov/iiif/00007f61-4922-417b-8f...,0.0
2,19245,https://api.nga.gov/iiif/0000bd8c-39de-4453-b5...,https://api.nga.gov/iiif/0000bd8c-39de-4453-b5...,0.0
3,153987,https://api.nga.gov/iiif/0000e5a4-7d32-4c2a-97...,https://api.nga.gov/iiif/0000e5a4-7d32-4c2a-97...,0.0
4,23830,https://api.nga.gov/iiif/0001668a-dd1c-48e8-92...,https://api.nga.gov/iiif/0001668a-dd1c-48e8-92...,0.0


In [37]:
# filter data to only get the images with sequence=0
published_images = published_images[published_images["img_sequence"]==0]

# drop column img_sequence
published_images = published_images.drop(columns=['img_sequence'])
published_images.head()


Unnamed: 0,obj_id,img_url,img_thumburl
0,11975,https://api.nga.gov/iiif/00004dec-8300-4487-8d...,https://api.nga.gov/iiif/00004dec-8300-4487-8d...
1,17387,https://api.nga.gov/iiif/00007f61-4922-417b-8f...,https://api.nga.gov/iiif/00007f61-4922-417b-8f...
2,19245,https://api.nga.gov/iiif/0000bd8c-39de-4453-b5...,https://api.nga.gov/iiif/0000bd8c-39de-4453-b5...
3,153987,https://api.nga.gov/iiif/0000e5a4-7d32-4c2a-97...,https://api.nga.gov/iiif/0000e5a4-7d32-4c2a-97...
4,23830,https://api.nga.gov/iiif/0001668a-dd1c-48e8-92...,https://api.nga.gov/iiif/0001668a-dd1c-48e8-92...


In [38]:
objects.head()

Unnamed: 0,obj_id,loc_id,obj_title,obj_beginyear,obj_endyear,obj_medium,obj_dimensions,obj_inscription,obj_attribution,obj_class,obj_parentid
0,112226,,Untitled (Stack),1992.0,1992.0,brush and black ink on Bodleian paper,overall: 51.8 x 72.3 cm (20 3/8 x 28 7/16 in.),lower right in graphite: WK 92,Win Knowlton,drawing,
1,114386,,"Lithographs, Volume 15",1804.0,1866.0,book of lithographs,,,Paul Gavarni,volume,
2,118962,,Travelers beside a Ruined Portico,1570.0,1710.0,etching on laid paper,plate: 19.3 x 13.5 cm (7 5/8 x 5 5/16 in.)\r\n...,lower left in plate: Jonas Umbach del.; lower ...,Bernhard Zaech after Jonas Umbach,print,
3,119195,,"At Home after World War II, November 13",1945.0,1945.0,gelatin silver print,sheet (trimmed to image): 19.6 x 29.8 cm (7 11...,center right verso: artist's stamp,Anatoly Skurikhin,photograph,
4,124519,,Niche with Falconry Gear,1660.0,1669.0,oil on canvas,overall: 80.5 x 64.5 cm (31 11/16 x 25 3/8 in....,lower right: Chr.Pierson.f,Christoffel Pierson,painting,


In [39]:
locations.head()

Unnamed: 0,loc_id,loc_site,loc_room,loc_description
0,3591,East Building,EBL,East Bldg Lawn
1,3592,East Building,EBL,East Bldg Lawn
2,3593,East Building,EBL,East Bldg Lawn
3,3594,East Building,EBL,East Bldg Lawn
4,3617,East Building,EC-003,"East Bldg, Auditorium Lobby"


In [40]:
objects = pd.merge(objects, locations, on='loc_id', how='left')
objects.head()

Unnamed: 0,obj_id,loc_id,obj_title,obj_beginyear,obj_endyear,obj_medium,obj_dimensions,obj_inscription,obj_attribution,obj_class,obj_parentid,loc_site,loc_room,loc_description
0,112226,,Untitled (Stack),1992.0,1992.0,brush and black ink on Bodleian paper,overall: 51.8 x 72.3 cm (20 3/8 x 28 7/16 in.),lower right in graphite: WK 92,Win Knowlton,drawing,,,,
1,114386,,"Lithographs, Volume 15",1804.0,1866.0,book of lithographs,,,Paul Gavarni,volume,,,,
2,118962,,Travelers beside a Ruined Portico,1570.0,1710.0,etching on laid paper,plate: 19.3 x 13.5 cm (7 5/8 x 5 5/16 in.)\r\n...,lower left in plate: Jonas Umbach del.; lower ...,Bernhard Zaech after Jonas Umbach,print,,,,
3,119195,,"At Home after World War II, November 13",1945.0,1945.0,gelatin silver print,sheet (trimmed to image): 19.6 x 29.8 cm (7 11...,center right verso: artist's stamp,Anatoly Skurikhin,photograph,,,,
4,124519,,Niche with Falconry Gear,1660.0,1669.0,oil on canvas,overall: 80.5 x 64.5 cm (31 11/16 x 25 3/8 in....,lower right: Chr.Pierson.f,Christoffel Pierson,painting,,,,


In [41]:
# removing the rows with null values which emphasizes that the object is on display in museum
objects = objects[~objects["loc_id"].isnull()]
objects.head()

Unnamed: 0,obj_id,loc_id,obj_title,obj_beginyear,obj_endyear,obj_medium,obj_dimensions,obj_inscription,obj_attribution,obj_class,obj_parentid,loc_site,loc_room,loc_description
11,131894,6478.0,Minerva,1596.0,1596.0,engraving on laid paper,sheet: 33.9 x 25 cm (13 3/8 x 9 13/16 in.) (tr...,"in image, at centre left: Cum privil. Sa. Cae....",Hendrick Goltzius,print,,West Building,G-022-A,West Ground Floor Gallery 22A
16,136019,8692.0,George Moore in the Artist's Garden,1879.0,1879.0,oil on canvas,overall: 54.6 x 45.1 cm (21 1/2 x 17 3/4 in.)\...,"lower right, atelier stamp in red: E.M",Edouard Manet,painting,,West Building,M-087,West Main Floor Gallery 87
46,166432,8562.0,A Pastoral Visit,1881.0,1881.0,oil on canvas,overall: 119.38 × 167.16 cm (47 × 65 13/16 in....,lower right: Richd. N. Brooke. 1881. / (ELEVE ...,Richard Norris Brooke,painting,,West Building,M-065,West Main Floor Gallery 65
47,166433,8581.0,The Longshoremen's Noon,1879.0,1879.0,oil on canvas,overall: 84 × 127.3 cm (33 1/16 × 50 1/8 in.)\...,lower left: J. G. Brown N.A. / N.Y. 1879,John George Brown,painting,,West Building,M-068,West Main Floor Gallery 68
48,166461,8564.0,Leisure and Labor,1858.0,1858.0,oil on canvas,overall: 39.53 × 57.94 cm (15 9/16 × 22 13/16 ...,,Frank Blackwell Mayer,painting,,West Building,M-065,West Main Floor Gallery 65


In [42]:
# selecting only the records with obj_parentid = null
objects = objects[objects["obj_parentid"].isnull()]
# dro the column obj_parentid
objects = objects.drop(columns=['obj_parentid'])
objects.head()

Unnamed: 0,obj_id,loc_id,obj_title,obj_beginyear,obj_endyear,obj_medium,obj_dimensions,obj_inscription,obj_attribution,obj_class,loc_site,loc_room,loc_description
11,131894,6478.0,Minerva,1596.0,1596.0,engraving on laid paper,sheet: 33.9 x 25 cm (13 3/8 x 9 13/16 in.) (tr...,"in image, at centre left: Cum privil. Sa. Cae....",Hendrick Goltzius,print,West Building,G-022-A,West Ground Floor Gallery 22A
16,136019,8692.0,George Moore in the Artist's Garden,1879.0,1879.0,oil on canvas,overall: 54.6 x 45.1 cm (21 1/2 x 17 3/4 in.)\...,"lower right, atelier stamp in red: E.M",Edouard Manet,painting,West Building,M-087,West Main Floor Gallery 87
46,166432,8562.0,A Pastoral Visit,1881.0,1881.0,oil on canvas,overall: 119.38 × 167.16 cm (47 × 65 13/16 in....,lower right: Richd. N. Brooke. 1881. / (ELEVE ...,Richard Norris Brooke,painting,West Building,M-065,West Main Floor Gallery 65
47,166433,8581.0,The Longshoremen's Noon,1879.0,1879.0,oil on canvas,overall: 84 × 127.3 cm (33 1/16 × 50 1/8 in.)\...,lower left: J. G. Brown N.A. / N.Y. 1879,John George Brown,painting,West Building,M-068,West Main Floor Gallery 68
48,166461,8564.0,Leisure and Labor,1858.0,1858.0,oil on canvas,overall: 39.53 × 57.94 cm (15 9/16 × 22 13/16 ...,,Frank Blackwell Mayer,painting,West Building,M-065,West Main Floor Gallery 65


In [43]:
objects = objects[objects["obj_beginyear"]>=1200]
objects.head()

Unnamed: 0,obj_id,loc_id,obj_title,obj_beginyear,obj_endyear,obj_medium,obj_dimensions,obj_inscription,obj_attribution,obj_class,loc_site,loc_room,loc_description
11,131894,6478.0,Minerva,1596.0,1596.0,engraving on laid paper,sheet: 33.9 x 25 cm (13 3/8 x 9 13/16 in.) (tr...,"in image, at centre left: Cum privil. Sa. Cae....",Hendrick Goltzius,print,West Building,G-022-A,West Ground Floor Gallery 22A
16,136019,8692.0,George Moore in the Artist's Garden,1879.0,1879.0,oil on canvas,overall: 54.6 x 45.1 cm (21 1/2 x 17 3/4 in.)\...,"lower right, atelier stamp in red: E.M",Edouard Manet,painting,West Building,M-087,West Main Floor Gallery 87
46,166432,8562.0,A Pastoral Visit,1881.0,1881.0,oil on canvas,overall: 119.38 × 167.16 cm (47 × 65 13/16 in....,lower right: Richd. N. Brooke. 1881. / (ELEVE ...,Richard Norris Brooke,painting,West Building,M-065,West Main Floor Gallery 65
47,166433,8581.0,The Longshoremen's Noon,1879.0,1879.0,oil on canvas,overall: 84 × 127.3 cm (33 1/16 × 50 1/8 in.)\...,lower left: J. G. Brown N.A. / N.Y. 1879,John George Brown,painting,West Building,M-068,West Main Floor Gallery 68
48,166461,8564.0,Leisure and Labor,1858.0,1858.0,oil on canvas,overall: 39.53 × 57.94 cm (15 9/16 × 22 13/16 ...,,Frank Blackwell Mayer,painting,West Building,M-065,West Main Floor Gallery 65


In [44]:
# join object and published_images
objects = pd.merge(objects, published_images, on='obj_id')
objects.head()

Unnamed: 0,obj_id,loc_id,obj_title,obj_beginyear,obj_endyear,obj_medium,obj_dimensions,obj_inscription,obj_attribution,obj_class,loc_site,loc_room,loc_description,img_url,img_thumburl
0,131894,6478.0,Minerva,1596.0,1596.0,engraving on laid paper,sheet: 33.9 x 25 cm (13 3/8 x 9 13/16 in.) (tr...,"in image, at centre left: Cum privil. Sa. Cae....",Hendrick Goltzius,print,West Building,G-022-A,West Ground Floor Gallery 22A,https://api.nga.gov/iiif/5c7ab9ab-76f8-424a-9d...,https://api.nga.gov/iiif/5c7ab9ab-76f8-424a-9d...
1,136019,8692.0,George Moore in the Artist's Garden,1879.0,1879.0,oil on canvas,overall: 54.6 x 45.1 cm (21 1/2 x 17 3/4 in.)\...,"lower right, atelier stamp in red: E.M",Edouard Manet,painting,West Building,M-087,West Main Floor Gallery 87,https://api.nga.gov/iiif/d285792e-0961-4900-af...,https://api.nga.gov/iiif/d285792e-0961-4900-af...
2,166432,8562.0,A Pastoral Visit,1881.0,1881.0,oil on canvas,overall: 119.38 × 167.16 cm (47 × 65 13/16 in....,lower right: Richd. N. Brooke. 1881. / (ELEVE ...,Richard Norris Brooke,painting,West Building,M-065,West Main Floor Gallery 65,https://api.nga.gov/iiif/e5b06963-1e7c-43c7-89...,https://api.nga.gov/iiif/e5b06963-1e7c-43c7-89...
3,166433,8581.0,The Longshoremen's Noon,1879.0,1879.0,oil on canvas,overall: 84 × 127.3 cm (33 1/16 × 50 1/8 in.)\...,lower left: J. G. Brown N.A. / N.Y. 1879,John George Brown,painting,West Building,M-068,West Main Floor Gallery 68,https://api.nga.gov/iiif/c38ed386-91a4-4a58-96...,https://api.nga.gov/iiif/c38ed386-91a4-4a58-96...
4,166461,8564.0,Leisure and Labor,1858.0,1858.0,oil on canvas,overall: 39.53 × 57.94 cm (15 9/16 × 22 13/16 ...,,Frank Blackwell Mayer,painting,West Building,M-065,West Main Floor Gallery 65,https://api.nga.gov/iiif/58338188-a428-4069-85...,https://api.nga.gov/iiif/58338188-a428-4069-85...


In [45]:
# drop loc_id   
objects = objects.drop(columns=['loc_id'])

In [46]:
# converting fields to string
objects["obj_beginyear"] = objects["obj_beginyear"].astype(str)
objects["obj_endyear"] = objects["obj_endyear"].astype(str)
objects.head()

Unnamed: 0,obj_id,obj_title,obj_beginyear,obj_endyear,obj_medium,obj_dimensions,obj_inscription,obj_attribution,obj_class,loc_site,loc_room,loc_description,img_url,img_thumburl
0,131894,Minerva,1596.0,1596.0,engraving on laid paper,sheet: 33.9 x 25 cm (13 3/8 x 9 13/16 in.) (tr...,"in image, at centre left: Cum privil. Sa. Cae....",Hendrick Goltzius,print,West Building,G-022-A,West Ground Floor Gallery 22A,https://api.nga.gov/iiif/5c7ab9ab-76f8-424a-9d...,https://api.nga.gov/iiif/5c7ab9ab-76f8-424a-9d...
1,136019,George Moore in the Artist's Garden,1879.0,1879.0,oil on canvas,overall: 54.6 x 45.1 cm (21 1/2 x 17 3/4 in.)\...,"lower right, atelier stamp in red: E.M",Edouard Manet,painting,West Building,M-087,West Main Floor Gallery 87,https://api.nga.gov/iiif/d285792e-0961-4900-af...,https://api.nga.gov/iiif/d285792e-0961-4900-af...
2,166432,A Pastoral Visit,1881.0,1881.0,oil on canvas,overall: 119.38 × 167.16 cm (47 × 65 13/16 in....,lower right: Richd. N. Brooke. 1881. / (ELEVE ...,Richard Norris Brooke,painting,West Building,M-065,West Main Floor Gallery 65,https://api.nga.gov/iiif/e5b06963-1e7c-43c7-89...,https://api.nga.gov/iiif/e5b06963-1e7c-43c7-89...
3,166433,The Longshoremen's Noon,1879.0,1879.0,oil on canvas,overall: 84 × 127.3 cm (33 1/16 × 50 1/8 in.)\...,lower left: J. G. Brown N.A. / N.Y. 1879,John George Brown,painting,West Building,M-068,West Main Floor Gallery 68,https://api.nga.gov/iiif/c38ed386-91a4-4a58-96...,https://api.nga.gov/iiif/c38ed386-91a4-4a58-96...
4,166461,Leisure and Labor,1858.0,1858.0,oil on canvas,overall: 39.53 × 57.94 cm (15 9/16 × 22 13/16 ...,,Frank Blackwell Mayer,painting,West Building,M-065,West Main Floor Gallery 65,https://api.nga.gov/iiif/58338188-a428-4069-85...,https://api.nga.gov/iiif/58338188-a428-4069-85...


In [47]:
# resizing the images
objects["img_thumburl"] = objects["img_thumburl"].str.replace("200","500",regex=True)
objects.head()

Unnamed: 0,obj_id,obj_title,obj_beginyear,obj_endyear,obj_medium,obj_dimensions,obj_inscription,obj_attribution,obj_class,loc_site,loc_room,loc_description,img_url,img_thumburl
0,131894,Minerva,1596.0,1596.0,engraving on laid paper,sheet: 33.9 x 25 cm (13 3/8 x 9 13/16 in.) (tr...,"in image, at centre left: Cum privil. Sa. Cae....",Hendrick Goltzius,print,West Building,G-022-A,West Ground Floor Gallery 22A,https://api.nga.gov/iiif/5c7ab9ab-76f8-424a-9d...,https://api.nga.gov/iiif/5c7ab9ab-76f8-424a-9d...
1,136019,George Moore in the Artist's Garden,1879.0,1879.0,oil on canvas,overall: 54.6 x 45.1 cm (21 1/2 x 17 3/4 in.)\...,"lower right, atelier stamp in red: E.M",Edouard Manet,painting,West Building,M-087,West Main Floor Gallery 87,https://api.nga.gov/iiif/d285792e-0961-4900-af...,https://api.nga.gov/iiif/d285792e-0961-4900-af...
2,166432,A Pastoral Visit,1881.0,1881.0,oil on canvas,overall: 119.38 × 167.16 cm (47 × 65 13/16 in....,lower right: Richd. N. Brooke. 1881. / (ELEVE ...,Richard Norris Brooke,painting,West Building,M-065,West Main Floor Gallery 65,https://api.nga.gov/iiif/e5b06963-1e7c-43c7-89...,https://api.nga.gov/iiif/e5b06963-1e7c-43c7-89...
3,166433,The Longshoremen's Noon,1879.0,1879.0,oil on canvas,overall: 84 × 127.3 cm (33 1/16 × 50 1/8 in.)\...,lower left: J. G. Brown N.A. / N.Y. 1879,John George Brown,painting,West Building,M-068,West Main Floor Gallery 68,https://api.nga.gov/iiif/c38ed386-91a4-4a58-96...,https://api.nga.gov/iiif/c38ed386-91a4-4a58-96...
4,166461,Leisure and Labor,1858.0,1858.0,oil on canvas,overall: 39.53 × 57.94 cm (15 9/16 × 22 13/16 ...,,Frank Blackwell Mayer,painting,West Building,M-065,West Main Floor Gallery 65,https://api.nga.gov/iiif/58338188-a428-4069-85...,https://api.nga.gov/iiif/58338188-a428-4069-85...


In [48]:
# drop column img_url
objects = objects.drop(columns=['img_url'])
objects.head()

Unnamed: 0,obj_id,obj_title,obj_beginyear,obj_endyear,obj_medium,obj_dimensions,obj_inscription,obj_attribution,obj_class,loc_site,loc_room,loc_description,img_thumburl
0,131894,Minerva,1596.0,1596.0,engraving on laid paper,sheet: 33.9 x 25 cm (13 3/8 x 9 13/16 in.) (tr...,"in image, at centre left: Cum privil. Sa. Cae....",Hendrick Goltzius,print,West Building,G-022-A,West Ground Floor Gallery 22A,https://api.nga.gov/iiif/5c7ab9ab-76f8-424a-9d...
1,136019,George Moore in the Artist's Garden,1879.0,1879.0,oil on canvas,overall: 54.6 x 45.1 cm (21 1/2 x 17 3/4 in.)\...,"lower right, atelier stamp in red: E.M",Edouard Manet,painting,West Building,M-087,West Main Floor Gallery 87,https://api.nga.gov/iiif/d285792e-0961-4900-af...
2,166432,A Pastoral Visit,1881.0,1881.0,oil on canvas,overall: 119.38 × 167.16 cm (47 × 65 13/16 in....,lower right: Richd. N. Brooke. 1881. / (ELEVE ...,Richard Norris Brooke,painting,West Building,M-065,West Main Floor Gallery 65,https://api.nga.gov/iiif/e5b06963-1e7c-43c7-89...
3,166433,The Longshoremen's Noon,1879.0,1879.0,oil on canvas,overall: 84 × 127.3 cm (33 1/16 × 50 1/8 in.)\...,lower left: J. G. Brown N.A. / N.Y. 1879,John George Brown,painting,West Building,M-068,West Main Floor Gallery 68,https://api.nga.gov/iiif/c38ed386-91a4-4a58-96...
4,166461,Leisure and Labor,1858.0,1858.0,oil on canvas,overall: 39.53 × 57.94 cm (15 9/16 × 22 13/16 ...,,Frank Blackwell Mayer,painting,West Building,M-065,West Main Floor Gallery 65,https://api.nga.gov/iiif/58338188-a428-4069-85...


In [49]:
# rename column img_thumburl to img_url
objects = objects.rename(columns={"img_thumburl": "img_url"})
objects.head()

Unnamed: 0,obj_id,obj_title,obj_beginyear,obj_endyear,obj_medium,obj_dimensions,obj_inscription,obj_attribution,obj_class,loc_site,loc_room,loc_description,img_url
0,131894,Minerva,1596.0,1596.0,engraving on laid paper,sheet: 33.9 x 25 cm (13 3/8 x 9 13/16 in.) (tr...,"in image, at centre left: Cum privil. Sa. Cae....",Hendrick Goltzius,print,West Building,G-022-A,West Ground Floor Gallery 22A,https://api.nga.gov/iiif/5c7ab9ab-76f8-424a-9d...
1,136019,George Moore in the Artist's Garden,1879.0,1879.0,oil on canvas,overall: 54.6 x 45.1 cm (21 1/2 x 17 3/4 in.)\...,"lower right, atelier stamp in red: E.M",Edouard Manet,painting,West Building,M-087,West Main Floor Gallery 87,https://api.nga.gov/iiif/d285792e-0961-4900-af...
2,166432,A Pastoral Visit,1881.0,1881.0,oil on canvas,overall: 119.38 × 167.16 cm (47 × 65 13/16 in....,lower right: Richd. N. Brooke. 1881. / (ELEVE ...,Richard Norris Brooke,painting,West Building,M-065,West Main Floor Gallery 65,https://api.nga.gov/iiif/e5b06963-1e7c-43c7-89...
3,166433,The Longshoremen's Noon,1879.0,1879.0,oil on canvas,overall: 84 × 127.3 cm (33 1/16 × 50 1/8 in.)\...,lower left: J. G. Brown N.A. / N.Y. 1879,John George Brown,painting,West Building,M-068,West Main Floor Gallery 68,https://api.nga.gov/iiif/c38ed386-91a4-4a58-96...
4,166461,Leisure and Labor,1858.0,1858.0,oil on canvas,overall: 39.53 × 57.94 cm (15 9/16 × 22 13/16 ...,,Frank Blackwell Mayer,painting,West Building,M-065,West Main Floor Gallery 65,https://api.nga.gov/iiif/58338188-a428-4069-85...


In [50]:
# converting to csv file
objects.to_csv('processed/objects.csv', index=False)