**MOMA Dataframes**

This notebook includes the code used to create dataframes from the available MOMA data. 

Due to the size and time differences between the datasets we decided to split MOMA's artworks between contemporary (before 1980) and modern (after 1980). This year was chosen as the earliest artwork in Rhizome's database is from 1982. 

We then also decided to split the artworks by departments to allow us to visualize differences between them as well as have smaller subsets of MOMA's overall artworks data to compare with Rhizome. 

For the artworks we opted to only keep the columns that would be most relevant for our work: 'Title', 'Artist', 'ConstituentID', 'Date', 'Medium', 'Classification', 'Department', 'DateAcquired', 'URL', ThumbnailURL'.

The resulting dataframes are:
- 1 x Artist DF (all artists)
- 1 x Artworks DF with selected columns (all artworks)
- 2 x Artworks DF per department, per time period (contemporary, modern) for a total of 16 DFs

In [22]:
#LIBRARIES
import re
import pandas as pd
#PATHS
path = '../MoMA_data/'

In [23]:
#FUNCTIONS

#function to assign gender and nationality to artwork using artist dataset
def get_gender_nat(df, artists):
    for index, i in df.iterrows():    
        if ', 'in i['ID']:
            IDs = i['ID'].split(', ')
            genders = list()
            nationalities = list()
            for x in IDs:
                 gender = getattr(artists[artists['ID']== x], 'Gender')
                 gender = gender.values
                 gender = gender[0]
                 genders.append(gender)
                 nationality = getattr(artists[artists['ID']== x], 'Nationality')
                 nationality = nationality.values
                 nationality = nationality[0]
                 nationalities.append("".join(nationality))
            df.at[index, 'Gender'] = ", ".join(genders)
            df.at[index, 'Nationality'] = ", ".join(nationalities)    
        else:
            gender_s = getattr(artists[artists['ID'] == i['ID']], 'Gender')
            gender_s = gender_s.values
            nationality_s = getattr(artists[artists['ID'] == i['ID']], 'Nationality')
            nationality_s = nationality_s.values
            df.at[index, 'Gender'] = "".join(gender_s)
            df.at[index, 'Nationality'] = "".join(nationality_s)
            
    return df

In [24]:
#select and clean artist columns 
original_artists = pd.read_csv(path+'Artists.csv')
artists= original_artists[['ConstituentID','DisplayName','Nationality','Gender','BeginDate','EndDate','Wiki QID','ULAN']]
artists.rename(columns={'ConstituentID': 'ID', 'DisplayName': 'Artist','BeginDate': 'Birth', 'EndDate': 'Death'}, inplace=True)
artists['Gender'] = artists['Gender'].fillna('missing').astype('str')
artists['Wiki QID'] = artists['Wiki QID'].fillna('missing').astype('str')
artists['ULAN'] = artists['ULAN'].fillna('0').astype('int')
artists['ID'] = artists['ID'].fillna('0').astype('str')
artists['Artist'] = artists['Artist'].fillna('Unknown').astype('str')
artists['Nationality'] = artists['Nationality'].fillna('missing').astype('str')
artists['Gender'] = artists['Gender'].fillna('missing').astype('str')
artists['Birth'] = artists['Birth'].fillna('0').astype('int')
artists['Death'] = artists['Death'].fillna('0').astype('int')
#normalize gender labels to M, F, NB for ease of comparison w/ Rhizome
for x, row in artists.iterrows():
    if re.match(r'[M|m]ale', row.Gender):
        artists.at[x, 'Gender'] = 'M'
    elif re.match(r'[F|f]emale',row.Gender):
        artists.at[x, 'Gender'] = 'F'
    elif re.match(r'Non-[B|b]inary', row.Gender):
        artists.at[x, 'Gender'] = 'NB'
# #pickle updated artists dataset
# artists.to_pickle(f'./MoMA_data/pickle/artists.pkl')

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  artists.rename(columns={'ConstituentID': 'ID', 'DisplayName': 'Artist','BeginDate': 'Birth', 'EndDate': 'Death'}, inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  artists['Gender'] = artists['Gender'].fillna('missing').astype('str')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  artists['Wiki QID'] = artists['Wiki QID'].fillna('missing').asty

In [14]:
artists

Unnamed: 0,ID,Artist,Nationality,Gender,Birth,Death,Wiki QID,ULAN
0,1,Robert Arneson,American,M,1930,1992,missing,0
1,2,Doroteo Arnaiz,Spanish,M,1936,0,missing,0
2,3,Bill Arnold,American,M,1941,0,missing,0
3,4,Charles Arnoldi,American,M,1946,0,Q1063584,500027998
4,5,Per Arnoldi,Danish,M,1941,0,missing,0
...,...,...,...,...,...,...,...,...
15217,133006,Andrew Chesnutt,American,M,1861,1934,missing,0
15218,133007,Lewis Chesnutt,American,M,1860,1933,missing,0
15219,133026,Alfred Tritschler,German,missing,1905,1970,missing,0
15220,133027,Studio of Dr. Paul Wolff & Tritschler,missing,missing,0,0,missing,0


In [41]:
#select artwork columns
original_artworks = pd.read_csv(path+'Artworks.csv')
original_artworks =  original_artworks[['Title','Artist','ConstituentID','Date','Medium','Department','DateAcquired','URL','ThumbnailURL']]
#Split artworks dataset into two parts, before/after 1980, for ease of comparison w/ Rhizome as well as internally
for index, row in original_artworks.iterrows():
    #condition to extract only the first number having four digits, i.e. the year
    if  re.match('.*\d\d\d\d.*', str(row[3])):
        year = re.findall(r'\d\d\d\d', row[3])
        #modify the date in order to normalize it using the first year in the list 
        row.Date = str(year[0])
        
    else: 
        row.Date = 0
    row.DateAcquired = str(row.DateAcquired).split('-')[0]
    print(row.DateAcquired)
#normalize missing values and datatypes 
original_artworks['Title'] = original_artworks['Title'].fillna('Unknown').astype('str')
original_artworks['Artist'] = original_artworks['Artist'].fillna('Unknown').astype('str')
original_artworks['ConstituentID'] = original_artworks['ConstituentID'].fillna('missing').astype('str')
original_artworks['Medium'] = original_artworks['Medium'].fillna('missing').astype('str')
original_artworks['Date'] = original_artworks['Date'].fillna('0').astype('int')
original_artworks['URL'] = original_artworks['URL'].fillna('missing').astype('str')
original_artworks['ThumbnailURL'] = original_artworks['ThumbnailURL'].fillna('missing').astype('str')
#CREATE THE NEW SUBSETS
before80 = original_artworks[original_artworks.Date < 1980]
after80 = original_artworks[original_artworks.Date >= 1980]
#clean columns
after80.rename(columns={'ConstituentID': 'ID', 'Date': 'DateCreated'}, inplace=True)
before80.rename(columns={'ConstituentID': 'ID', 'Date': 'DateCreated'}, inplace=True)
#assign gender and nationality to artworks 
before80['Nationality'] = "to do"
before80['Gender'] = "to do"
vizOldincorrect = get_gender_nat(before80, artists)
before80 = vizOldincorrect.replace([''], ['missing'], regex=True)
after80['Nationality'] = "to do"
after80['Gender'] = "to do"
vizNewincorrect = get_gender_nat(after80, artists)
after80 = vizNewincorrect.replace([''], ['missing'], regex=True)
#pickle updated artwork datasets


1996
1995
1997
1995
1997
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1997
1995
1997
1995
1997
1995
1997
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1995
1966
1980
1980
2000
1980
2000
1980
2000
1980
1980
1980
1980
1980
1980
1980
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1990
1966
1966
1966
1966
1989
1989
2000
1989
2000
1966
1989
2000
1989
2000
1989
2000
1989
2000
1989
2000
1989
2000
1989
2000
1989
2000
1989
2000
1989
2000
1989
2000
1989
2000
1989
2000
2000
2000
2000
2000
1992
1992
1992
1992
1992
1992
1992
1992
1992
1988
1988
1988
1988
1988
1988
1992
1995
1993
1993
1993
1993
1993
1993
1993
1993
1993
1993
1993
1947
1993
1993
1993
1993
1993
1993
1993
1993
1993
1993
1993
1993
1993
1993
1993
1993
1991
1991
1998
1998
1973


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after80.rename(columns={'ConstituentID': 'ID', 'Date': 'DateCreated'}, inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  before80.rename(columns={'ConstituentID': 'ID', 'Date': 'DateCreated'}, inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  before80['Nationality'] = "to do"
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value

OSError: Cannot save file into a non-existent directory: 'MoMA_data/pickle'

In [56]:
['DateAcquired']
before80.to_pickle(f'../MoMA_data/pickle/old_artworks.pkl')
after80.to_pickle(f'../MoMA_data/pickle/new_artworks.pkl')

## Pickle departments 

in order to make further analysis easier on MoMA's data we decided to additionally produce a subset for each department, divided by creation date as well as the two ones created in the precedent step  

In [57]:
#split artwork datasets further based on department for ease of comparison

architecture_design = before80[before80['Department'] == "Architecture & Design"]
architecture_design.to_pickle(f'../MoMA_data/pickle/departments/architecture_design_mod.pkl')
architecture_design_img  = before80[before80['Department'] == "Architecture & Design - Image Archive"]
architecture_design_img.to_pickle(f'../MoMA_data/pickle/departments/architecture_design_img_mod.pkl')
draws_prints= before80[before80['Department'] == "Drawings & Prints"]
draws_prints.to_pickle(f'../MoMA_data/pickle/departments/draws_prints_mod.pkl')
films= before80[before80['Department'] == "Film"]
films.to_pickle(f'../MoMA_data/pickle/departments/films_mod.pkl')
fluxus= before80[before80['Department'] == "Fluxus Collection"]
fluxus.to_pickle(f'../MoMA_data/pickle/departments/fluxus_mod.pkl')
media_perf= before80[before80['Department'] == "Media and Performance"]
media_perf.to_pickle(f'../MoMA_data/pickle/departments/media_perf_mod.pkl')
painting_sculp= before80[before80['Department'] == "Painting & Sculpture"]
painting_sculp.to_pickle(f'../MoMA_data/pickle/departments/paint_sculp_mod.pkl')
photo= before80[before80['Department'] == "Photography"]
photo.to_pickle(f'../MoMA_data/pickle/departments/photo_mod.pkl')
architecture_design = after80[after80['Department'] == "Architecture & Design"]
architecture_design.to_pickle(f'../MoMA_data/pickle/departments/architecture_design_cont.pkl')
architecture_design_img  = after80[after80['Department'] == "Architecture & Design - Image Archive"]
architecture_design_img.to_pickle(f'../MoMA_data/pickle/departments/architecture_design_img_cont.pkl')
draws_prints= after80[after80['Department'] == "Drawings & Prints"]
draws_prints.to_pickle(f'../MoMA_data/pickle/departments/draws_prints_cont.pkl')
films= after80[after80['Department'] == "Film"]
films.to_pickle(f'../MoMA_data/pickle/departments/films_cont.pkl')
fluxus= after80[after80['Department'] == "Fluxus Collection"]
fluxus.to_pickle(f'../MoMA_data/pickle/departments/fluxus_cont.pkl')
media_perf= after80[after80['Department'] == "Media and Performance"]
media_perf.to_pickle(f'../MoMA_data/pickle/departments/media_perf_cont.pkl')
painting_sculp= after80[after80['Department'] == "Painting & Sculpture"]
painting_sculp.to_pickle(f'../MoMA_data/pickle/departments/paint_sculp_cont.pkl')
photo= after80[after80['Department'] == "Photography"]
photo.to_pickle(f'../MoMA_data/pickle/departments/photo_cont.pkl')