**MOMA Dataframes**

This notebook includes the code used to create dataframes from the available MOMA data. 

Due to the size and time differences between the datasets we decided to split MOMA's artworks between contemporary (before 1980) and modern (after 1980). This year was chosen as the earliest artwork in Rhizome's database is from 1982. 

We then also decided to split the artworks by departments to allow us to visualize differences between them as well as have smaller subsets of MOMA's overall artworks data to compare with Rhizome. 

For the artworks we opted to only keep the columns that would be most relevant for our work: 'Title', 'Artist', 'ConstituentID', 'Date', 'Medium', 'Classification', 'Department', 'DateAcquired', 'URL', ThumbnailURL'.

The resulting dataframes are:
- 1 x Artist DF (all artists)
- 1 x Artworks DF with selected columns (all artworks)
- 2 x Artworks DF per department, per time period (contemporary, modern) for a total of 16 DFs

In [2]:
#LIBRARIES
import re
import pandas as pd
#PATHS
path = '../MoMA_data/'

In [15]:
def normalizeDate(year):
    if  re.match('.*\d\d\d\d.*', str(year)):
        newDate = re.findall(r'\d\d\d\d', year)
        #modify the date in order to normalize it using the first year in the list 
        return int(newDate[0])
        
    else: 
        return 0
    

In [21]:
def getGender(ids):
    listIds = str(ids).split(', ')
    listGenders = list()
    for id in listIds:
        gender = artists[artists['ID']== id]
        if len(gender)>0:
            gender = gender['Gender'].values[0]
            listGenders.append(gender)
        else:
            listGenders.append('missing')

    
    return ", ".join(listGenders)

    

In [18]:
def getNationality(ids):
    listIds = str(ids).split(', ')
    listNationality = list()
    for id in listIds:
        nationality = artists[artists['ID']== id]
        if len(nationality)>0:
            nationality = nationality['Nationality'].values[0]
            listNationality.append(nationality)
        else:
            listNationality.append('missing')

    
    return ", ".join(listNationality)

    
    
        

In [4]:
#select and clean artist columns 
original_artists = pd.read_csv(path+'Artists.csv')
artists= original_artists[['ConstituentID','DisplayName','Nationality','Gender','BeginDate','EndDate','Wiki QID','ULAN']]
artists.rename(columns={'ConstituentID': 'ID', 'DisplayName': 'Artist','BeginDate': 'Birth', 'EndDate': 'Death'}, inplace=True)
artists['Gender'] = artists['Gender'].fillna('missing').astype('str')
artists['Wiki QID'] = artists['Wiki QID'].fillna('missing').astype('str')
artists['ULAN'] = artists['ULAN'].fillna('0').astype('int')
artists['ID'] = artists['ID'].fillna('0').astype('str')
artists['Artist'] = artists['Artist'].fillna('Unknown').astype('str')
artists['Nationality'] = artists['Nationality'].fillna('missing').astype('str')
artists['Gender'] = artists['Gender'].fillna('missing').astype('str')
artists['Birth'] = artists['Birth'].fillna('0').astype('int')
artists['Death'] = artists['Death'].fillna('0').astype('int')
#normalize gender labels to M, F, NB for ease of comparison w/ Rhizome
for x, row in artists.iterrows():
    if re.match(r'[M|m]ale', row.Gender):
        artists.at[x, 'Gender'] = 'M'
    elif re.match(r'[F|f]emale',row.Gender):
        artists.at[x, 'Gender'] = 'F'
    elif re.match(r'Non-[B|b]inary', row.Gender):
        artists.at[x, 'Gender'] = 'NB'
# #pickle updated artists dataset
# artists.to_pickle(f'./MoMA_data/pickle/artists.pkl')

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  artists.rename(columns={'ConstituentID': 'ID', 'DisplayName': 'Artist','BeginDate': 'Birth', 'EndDate': 'Death'}, inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  artists['Gender'] = artists['Gender'].fillna('missing').astype('str')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  artists['Wiki QID'] = artists['Wiki QID'].fillna('missing').asty

In [24]:
artists.dtypes

ID             object
Artist         object
Nationality    object
Gender         object
Birth           int64
Death           int64
Wiki QID       object
ULAN            int64
dtype: object

In [19]:
#clean original artworks df
original_artworks = pd.read_csv(path+'Artworks.csv')
original_artworks =  original_artworks[['Title','Artist','ConstituentID','Date','Medium','Department','DateAcquired','URL','ThumbnailURL']]
original_artworks['Date'] = original_artworks['Date'].apply(lambda x: normalizeDate(x))
original_artworks['DateAcquired']=original_artworks['DateAcquired'].where((original_artworks['DateAcquired'].str.len() <= 4), original_artworks['DateAcquired'].str[0:4])
original_artworks['Gender'] = original_artworks['ConstituentID'].apply(lambda x: getGender(x))
original_artworks['Nationality'] = original_artworks['ConstituentID'].apply(lambda x: getNationality(x))

original_artworks = original_artworks.rename(columns={"Date": "dateCreated", "DateAcquired": "dateAcquired"})
original_artworks['Title'] = original_artworks['Title'].fillna('Unknown')
original_artworks['DateAcquired'] = original_artworks['DateAcquired'].astype('str')
original_artworks['DateAcquired'] = original_artworks['DateAcquired'].replace('nan', str('0'))
original_artworks['DateAcquired']=original_artworks['DateAcquired'].astype('int')
original_artworks.dtypes

Title            object
Artist           object
ConstituentID    object
Date              int64
Medium           object
Department       object
DateAcquired      int64
URL              object
ThumbnailURL     object
Gender           object
Nationality      object
dtype: object

In [34]:
original_artworks

Unnamed: 0,Title,Artist,ConstituentID,dateCreated,Medium,Department,dateAcquired,URL,ThumbnailURL,Gender,Nationality
0,"Ferdinandsbrücke Project, Vienna, Austria (Ele...",Otto Wagner,6210,1896,Ink and cut-and-pasted painted pages on paper,Architecture & Design,1996,http://www.moma.org/collection/works/2,http://www.moma.org/media/W1siZiIsIjU5NDA1Il0s...,M,Austrian
1,"City of Music, National Superior Conservatory ...",Christian de Portzamparc,7470,1987,Paint and colored pencil on print,Architecture & Design,1995,http://www.moma.org/collection/works/3,http://www.moma.org/media/W1siZiIsIjk3Il0sWyJw...,M,French
2,"Villa near Vienna Project, Outside Vienna, Aus...",Emil Hoppe,7605,1903,"Graphite, pen, color pencil, ink, and gouache ...",Architecture & Design,1997,http://www.moma.org/collection/works/4,http://www.moma.org/media/W1siZiIsIjk4Il0sWyJw...,M,Austrian
3,"The Manhattan Transcripts Project, New York, N...",Bernard Tschumi,7056,1980,Photographic reproduction with colored synthet...,Architecture & Design,1995,http://www.moma.org/collection/works/5,http://www.moma.org/media/W1siZiIsIjEyNCJdLFsi...,M,missing
4,"Villa, project, outside Vienna, Austria, Exter...",Emil Hoppe,7605,1903,"Graphite, color pencil, ink, and gouache on tr...",Architecture & Design,1997,http://www.moma.org/collection/works/6,http://www.moma.org/media/W1siZiIsIjEyNiJdLFsi...,M,Austrian
...,...,...,...,...,...,...,...,...,...,...,...
138146,Untitled,"Chesnutt Brothers Studio, Andrew Chesnutt, Lew...","133005, 133006, 133007",1890,Gelatin silver print,Photography,2020,http://www.moma.org/collection/works/418928,http://www.moma.org/media/W1siZiIsIjQ5MjcyMiJd...,"missing, M, M","missing, American, American"
138147,Plate (folio 2 verso) from Muscheln und schirm...,Sophie Taeuber-Arp,5777,1939,One from an illustrated book with four line bl...,Drawings & Prints,2019,http://www.moma.org/collection/works/419286,http://www.moma.org/media/W1siZiIsIjQ4NTExNSJd...,F,Swiss
138148,Plate (folio 6) from Muscheln und schirme (She...,Sophie Taeuber-Arp,5777,1939,One from an illustrated book with four line bl...,Drawings & Prints,2019,http://www.moma.org/collection/works/419287,http://www.moma.org/media/W1siZiIsIjQ4NTExOCJd...,F,Swiss
138149,Plate (folio 12) from Muscheln und schirme (Sh...,Sophie Taeuber-Arp,5777,1939,One from an illustrated book with four line bl...,Drawings & Prints,2019,http://www.moma.org/collection/works/419288,http://www.moma.org/media/W1siZiIsIjQ4NTEyMCJd...,F,Swiss


In [44]:
artists.to_pickle(f'../MoMA_data/pickle/MoMAartists.pkl')
original_artworks.to_pickle(f'../MoMA_data/pickle/MoMAartworks.pkl')

In [39]:
#CREATE THE NEW SUBSETS
before80 = original_artworks[original_artworks.dateCreated < 1983]
after80 = original_artworks[original_artworks.dateCreated >= 1983]


In [40]:
before80.head()

Unnamed: 0,Title,Artist,ConstituentID,dateCreated,Medium,Department,dateAcquired,URL,ThumbnailURL,Gender,Nationality
0,"Ferdinandsbrücke Project, Vienna, Austria (Ele...",Otto Wagner,6210,1896,Ink and cut-and-pasted painted pages on paper,Architecture & Design,1996,http://www.moma.org/collection/works/2,http://www.moma.org/media/W1siZiIsIjU5NDA1Il0s...,M,Austrian
2,"Villa near Vienna Project, Outside Vienna, Aus...",Emil Hoppe,7605,1903,"Graphite, pen, color pencil, ink, and gouache ...",Architecture & Design,1997,http://www.moma.org/collection/works/4,http://www.moma.org/media/W1siZiIsIjk4Il0sWyJw...,M,Austrian
3,"The Manhattan Transcripts Project, New York, N...",Bernard Tschumi,7056,1980,Photographic reproduction with colored synthet...,Architecture & Design,1995,http://www.moma.org/collection/works/5,http://www.moma.org/media/W1siZiIsIjEyNCJdLFsi...,M,missing
4,"Villa, project, outside Vienna, Austria, Exter...",Emil Hoppe,7605,1903,"Graphite, color pencil, ink, and gouache on tr...",Architecture & Design,1997,http://www.moma.org/collection/works/6,http://www.moma.org/media/W1siZiIsIjEyNiJdLFsi...,M,Austrian
5,"The Manhattan Transcripts Project, New York, N...",Bernard Tschumi,7056,1976,Gelatin silver photograph,Architecture & Design,1995,http://www.moma.org/collection/works/7,http://www.moma.org/media/W1siZiIsIjE0OCJdLFsi...,M,missing


In [41]:
after80.head()

Unnamed: 0,Title,Artist,ConstituentID,dateCreated,Medium,Department,dateAcquired,URL,ThumbnailURL,Gender,Nationality
1,"City of Music, National Superior Conservatory ...",Christian de Portzamparc,7470,1987,Paint and colored pencil on print,Architecture & Design,1995,http://www.moma.org/collection/works/3,http://www.moma.org/media/W1siZiIsIjk3Il0sWyJw...,M,French
66,"Villa dall'Ava, Paris (Saint-Cloud), France, E...","Rem Koolhaas, Madelon Vriesendorp","6956, 6957",1987,Synthetic polymer paint and ink on paper,Architecture & Design,2000,http://www.moma.org/collection/works/82,http://www.moma.org/media/W1siZiIsIjYwMTEyIl0s...,"M, F","Dutch, Dutch"
68,"Parc de la Villette, Le Case Vide, Paris, Fran...",Bernard Tschumi,7056,1984,"Pen, ink, gouache, and airbrush on paper",Architecture & Design,2000,http://www.moma.org/collection/works/85,http://www.moma.org/media/W1siZiIsIjYwMTA5Il0s...,M,missing
70,"Parc de la Villette, Paris, France, Aerial per...",Bernard Tschumi,7056,1986,"Pen, ink, and gouache on paper",Architecture & Design,2000,http://www.moma.org/collection/works/88,http://www.moma.org/media/W1siZiIsIjYxNTQ5Il0s...,M,missing
122,"Autonomous Artisans' House, project, New York ...",Steven Holl,2702,1984,Graphite on vellum,Architecture & Design,1989,http://www.moma.org/collection/works/171,http://www.moma.org/media/W1siZiIsIjE2MjgiXSxb...,M,American


In [42]:

before80.to_pickle(f'../MoMA_data/pickle/old_artworks.pkl')
after80.to_pickle(f'../MoMA_data/pickle/new_artworks.pkl')

## Pickle departments 

in order to make further analysis easier on MoMA's data we decided to additionally produce a subset for each department, divided by creation date as well as the two ones created in the precedent step  

In [43]:
#split artwork datasets further based on department for ease of comparison

architecture_design = before80[before80['Department'] == "Architecture & Design"]
architecture_design.to_pickle(f'../MoMA_data/pickle/departments/architecture_design_mod.pkl')
architecture_design_img  = before80[before80['Department'] == "Architecture & Design - Image Archive"]
architecture_design_img.to_pickle(f'../MoMA_data/pickle/departments/architecture_design_img_mod.pkl')
draws_prints= before80[before80['Department'] == "Drawings & Prints"]
draws_prints.to_pickle(f'../MoMA_data/pickle/departments/draws_prints_mod.pkl')
films= before80[before80['Department'] == "Film"]
films.to_pickle(f'../MoMA_data/pickle/departments/films_mod.pkl')
fluxus= before80[before80['Department'] == "Fluxus Collection"]
fluxus.to_pickle(f'../MoMA_data/pickle/departments/fluxus_mod.pkl')
media_perf= before80[before80['Department'] == "Media and Performance"]
media_perf.to_pickle(f'../MoMA_data/pickle/departments/media_perf_mod.pkl')
painting_sculp= before80[before80['Department'] == "Painting & Sculpture"]
painting_sculp.to_pickle(f'../MoMA_data/pickle/departments/paint_sculp_mod.pkl')
photo= before80[before80['Department'] == "Photography"]
photo.to_pickle(f'../MoMA_data/pickle/departments/photo_mod.pkl')
architecture_design = after80[after80['Department'] == "Architecture & Design"]
architecture_design.to_pickle(f'../MoMA_data/pickle/departments/architecture_design_cont.pkl')
architecture_design_img  = after80[after80['Department'] == "Architecture & Design - Image Archive"]
architecture_design_img.to_pickle(f'../MoMA_data/pickle/departments/architecture_design_img_cont.pkl')
draws_prints= after80[after80['Department'] == "Drawings & Prints"]
draws_prints.to_pickle(f'../MoMA_data/pickle/departments/draws_prints_cont.pkl')
films= after80[after80['Department'] == "Film"]
films.to_pickle(f'../MoMA_data/pickle/departments/films_cont.pkl')
fluxus= after80[after80['Department'] == "Fluxus Collection"]
fluxus.to_pickle(f'../MoMA_data/pickle/departments/fluxus_cont.pkl')
media_perf= after80[after80['Department'] == "Media and Performance"]
media_perf.to_pickle(f'../MoMA_data/pickle/departments/media_perf_cont.pkl')
painting_sculp= after80[after80['Department'] == "Painting & Sculpture"]
painting_sculp.to_pickle(f'../MoMA_data/pickle/departments/paint_sculp_cont.pkl')
photo= after80[after80['Department'] == "Photography"]
photo.to_pickle(f'../MoMA_data/pickle/departments/photo_cont.pkl')