**MOMA Dataframes**

This notebook includes the code used to create dataframes from the available MOMA data. 

Due to the size and time differences between the datasets we decided to split MOMA's artworks between contemporary (before 1980) and modern (after 1980). This year was chosen as the earliest artwork in Rhizome's database is from 1982. 

We then also decided to split the artworks by departments to allow us to visualize differences between them as well as have smaller subsets of MOMA's overall artworks data to compare with Rhizome. 

For the artworks we opted to only keep the columns that would be most relevant for our work: 'Title', 'Artist', 'ConstituentID', 'Date', 'Medium', 'Classification', 'Department', 'DateAcquired', 'URL', ThumbnailURL'.

The resulting dataframes are:
- 1 x Artist DF (all artists)
- 1 x Artworks DF with selected columns (all artworks)
- 2 x Artworks DF per department, per time period (contemporary, modern) for a total of 16 DFs

In [14]:
#LIBRARIES
import re
import pandas as pd

In [15]:
#PATHS
path = 'MoMA_data/'

### Pickle originals

Before pickling and further manipulating the data inside each set, we need to perform some preliminary cleaning of the columns. 

In [16]:
#ARTISTS
original_artists = pd.read_csv(path+'Artists.csv')
artists= original_artists[['ConstituentID','DisplayName','Nationality','Gender','BeginDate','EndDate','Wiki QID','ULAN']]

In [None]:
artists.rename(columns={'ConstituentID': 'ID', 'DisplayName': 'Artist','BeginDate': 'Birth', 'EndDate': 'Death'}, inplace=True)

artists['Gender'] = artists['Gender'].fillna('missing').astype('str')
artists['Wiki QID'] = artists['Wiki QID'].fillna('missing').astype('str')
artists['ULAN'] = artists['ULAN'].fillna('0').astype('int')
artists['ID'] = artists['ID'].fillna('0').astype('str')
artists['Artist'] = artists['Artist'].fillna('Unknown').astype('str')
artists['Nationality'] = artists['Nationality'].fillna('missing').astype('str')
artists['Gender'] = artists['Gender'].fillna('missing').astype('str')
artists['Birth'] = artists['Birth'].fillna('0').astype('int')
artists['Death'] = artists['Death'].fillna('0').astype('int')



#artists.to_pickle(f'./MoMA_data/artists.pkl')

Normalize Gender labels to 'M', 'F' and 'NB' to make them easier to compare with Rizome's data 

In [18]:
for x, row in artists.iterrows():
    if re.match(r'[M|m]ale', row.Gender):
        artists.at[x, 'Gender'] = 'M'
                
    elif re.match(r'[F|f]emale',row.Gender):
        artists.at[x, 'Gender'] = 'F'

    elif re.match(r'Non-[B|b]inary', row.Gender):
        artists.at[x, 'Gender'] = 'NB'

In [19]:
artists.to_pickle(f'./MoMA_data/pickle/MoMAartists.pkl')

In [20]:
#ARTWORKS
original_artworks = pd.read_csv(path+'Artworks.csv')
original_artworks =  original_artworks[['Title','Artist','ConstituentID','Date','Medium','Department','DateAcquired','URL','ThumbnailURL']]


### Divide artworks into two datasets

The original artworks set is split into two datasets, throughout this little algoritm.

In [21]:
#START FUNCTION 
for index, row in original_artworks.iterrows():
    #condition to extract only the first number having four digits, i.e. the year
    if  re.match('.*\d\d\d\d.*', str(row[3])):
        year = re.findall(r'\d\d\d\d', row[3])
        #modify the date in order to normalize it using the first year in the list 
        row.Date = str(year[0])
        row.DateAcquired = str(row.DateAcquired).split('-')[0]
    else: 
        
        row.Date = 0

original_artworks['Title'] = original_artworks['Title'].fillna('Unknown').astype('str')
original_artworks['Artist'] = original_artworks['Artist'].fillna('Unknown').astype('str')
original_artworks['ConstituentID'] = original_artworks['ConstituentID'].fillna('missing').astype('str')
original_artworks['Medium'] = original_artworks['Medium'].fillna('missing').astype('str')
original_artworks['Date'] = original_artworks['Date'].fillna('0').astype('int')
original_artworks['URL'] = original_artworks['URL'].fillna('missing').astype('str')
original_artworks['ThumbnailURL'] = original_artworks['ThumbnailURL'].fillna('missing').astype('str')
#CREATE THE NEW SUBSETS
before80 = original_artworks[original_artworks.Date < 1980]
after80 = original_artworks[original_artworks.Date >= 1980]

   

    

After the creation of before and after 1980 subsets we looked for missing values and normalized them with the strings 'Unknown' or 'missing' depending on the kind of data. Additionally we used the .astype command, to meke sure pandas parses/recognizes them using their correct data type.
Finally, we pickle the subsets, saving them in the /pickle/ directory 

In [22]:
#old len 100692
after80

Unnamed: 0,Title,Artist,ConstituentID,Date,Medium,Department,DateAcquired,URL,ThumbnailURL
1,"City of Music, National Superior Conservatory ...",Christian de Portzamparc,7470,1987,Paint and colored pencil on print,Architecture & Design,1995,http://www.moma.org/collection/works/3,http://www.moma.org/media/W1siZiIsIjk3Il0sWyJw...
3,"The Manhattan Transcripts Project, New York, N...",Bernard Tschumi,7056,1980,Photographic reproduction with colored synthet...,Architecture & Design,1995,http://www.moma.org/collection/works/5,http://www.moma.org/media/W1siZiIsIjEyNCJdLFsi...
31,"The Manhattan Transcripts Project, New York, N...",Bernard Tschumi,7056,1980,Photographic reproduction with colored synthet...,Architecture & Design,1995,http://www.moma.org/collection/works/33,http://www.moma.org/media/W1siZiIsIjIwMCJdLFsi...
35,"The Manhattan Transcripts Project, New York, N...",Bernard Tschumi,7056,1980,Photographic reproduction with colored synthet...,Architecture & Design,1995,http://www.moma.org/collection/works/38,http://www.moma.org/media/W1siZiIsIjI2NyJdLFsi...
40,"The Manhattan Transcripts Project, New York, N...",Bernard Tschumi,7056,1980,Ink on tracing paper,Architecture & Design,1995,http://www.moma.org/collection/works/44,http://www.moma.org/media/W1siZiIsIjI5NiJdLFsi...
...,...,...,...,...,...,...,...,...,...
138114,Cóctel (Cocktail),Alejandro Kuropatwa,132939,1996,Chromogenic color print,Photography,2020,missing,missing
138115,Cóctel (Cocktail),Alejandro Kuropatwa,132939,1996,Chromogenic color print,Photography,2020,missing,missing
138116,Cóctel (Cocktail),Alejandro Kuropatwa,132939,1996,Chromogenic color print,Photography,2020,missing,missing
138117,Cóctel (Cocktail),Alejandro Kuropatwa,132939,1996,Chromogenic color print,Photography,2020,missing,missing


In [None]:
after80.rename(columns={'ConstituentID': 'ID', 'Date': 'DateCreated'}, inplace=True)
before80.rename(columns={'ConstituentID': 'ID', 'Date': 'DateCreated'}, inplace=True)

## Artists and nationality for old and modern MOMA

Function performing the task for both datasates

In [24]:
def get_gender_nat(df, artists):
    for index, i in df.iterrows():    
        if ', 'in i['ID']:
            IDs = i['ID'].split(', ')
            genders = list()
            nationalities = list()
            for x in IDs:
                 gender = getattr(artists[artists['ID']== x], 'Gender')
                 gender = gender.values
                 gender = gender[0]
                 genders.append(gender)
                 nationality = getattr(artists[artists['ID']== x], 'Nationality')
                 nationality = nationality.values
                 nationality = nationality[0]
                 nationalities.append("".join(nationality))
            df.at[index, 'Gender'] = ", ".join(genders)
            df.at[index, 'Nationality'] = ", ".join(nationalities)    
        else:
            gender_s = getattr(artists[artists['ID'] == i['ID']], 'Gender')
            gender_s = gender_s.values
            nationality_s = getattr(artists[artists['ID'] == i['ID']], 'Nationality')
            nationality_s = nationality_s.values
            df.at[index, 'Gender'] = "".join(gender_s)
            df.at[index, 'Nationality'] = "".join(nationality_s)
            
    return df

In [None]:
# ADD GENDER AND NAT TO OLDER ARTWORKS
# VERY SLOW UP TO 6 MINS

before80['Nationality'] = "to do"
before80['Gender'] = "to do"
vizOldincorrect = get_gender_nat(before80, artists)
before80 = vizOldincorrect.replace([''], ['missing'], regex=True)

In [None]:
# ADD GENDER AND NAT TO NEWER ARTWORKS
# VERY SLOW UP TO 6 MINS

after80['Nationality'] = "to do"
after80['Gender'] = "to do"
vizNewincorrect = get_gender_nat(after80, artists)
after80 = vizNewincorrect.replace([''], ['missing'], regex=True)

In [None]:
before80

In [28]:
after80.to_pickle(f'./MoMA_data/pickle/new_artworks.pkl')
before80.to_pickle(f'./MoMA_data/pickle/old_artworks.pkl')

## Pickle departments 

in order to make further analysis easier on MoMA's data we decided to additionally produce a subset for each department, divided by creation date as well as the two ones created in the precedent step  

In [34]:
#create pickles for each dempartment in before80 csv

architecture_design = before80[before80['Department'] == "Architecture & Design"]
architecture_design.to_pickle(f'./MoMA_data/pickle/departments/architecture_design_mod.pkl')

architecture_design_img  = before80[before80['Department'] == "Architecture & Design - Image Archive"]
architecture_design_img.to_pickle(f'./MoMA_data/pickle/departments/architecture_design_img_mod.pkl')

draws_prints= before80[before80['Department'] == "Drawings & Prints"]
draws_prints.to_pickle(f'./MoMA_data/pickle/departments/draws_prints_mod.pkl')

films= before80[before80['Department'] == "Film"]
films.to_pickle(f'./MoMA_data/pickle/departments/films_mod.pkl')

fluxus= before80[before80['Department'] == "Fluxus Collection"]
fluxus.to_pickle(f'./MoMA_data/pickle/departments/fluxus_mod.pkl')

media_perf= before80[before80['Department'] == "Media and Performance"]
media_perf.to_pickle(f'./MoMA_data/pickle/departments/media_perf_mod.pkl')

painting_sculp= before80[before80['Department'] == "Painting & Sculpture"]
painting_sculp.to_pickle(f'./MoMA_data/pickle/departments/paint_sculp_mod.pkl')

photo= before80[before80['Department'] == "Photography"]
photo.to_pickle(f'./MoMA_data/pickle/departments/photo_mod.pkl')

In [35]:
architecture_design = after80[after80['Department'] == "Architecture & Design"]
architecture_design.to_pickle(f'./MoMA_data/pickle/departments/architecture_design_cont.pkl')

architecture_design_img  = after80[after80['Department'] == "Architecture & Design - Image Archive"]
architecture_design_img.to_pickle(f'./MoMA_data/pickle/departments/architecture_design_img_cont.pkl')

draws_prints= after80[after80['Department'] == "Drawings & Prints"]
draws_prints.to_pickle(f'./MoMA_data/pickle/departments/draws_prints_cont.pkl')

films= after80[after80['Department'] == "Film"]
films.to_pickle(f'./MoMA_data/pickle/departments/films_cont.pkl')

fluxus= after80[after80['Department'] == "Fluxus Collection"]
fluxus.to_pickle(f'./MoMA_data/pickle/departments/fluxus_cont.pkl')

media_perf= after80[after80['Department'] == "Media and Performance"]
media_perf.to_pickle(f'./MoMA_data/pickle/departments/media_perf_cont.pkl')

painting_sculp= after80[after80['Department'] == "Painting & Sculpture"]
painting_sculp.to_pickle(f'./MoMA_data/pickle/departments/paint_sculp_cont.pkl')

photo= after80[after80['Department'] == "Photography"]
photo.to_pickle(f'./MoMA_data/pickle/departments/photo_cont.pkl')

In [None]:
photo