[Link to our site](https://lindsayhardy17.github.io/)

# <center>Evaluating the changes in art museums </center>
### <center>Lindsay Hardy and Emily O'Connell </center>

### Milestone 1 Write-up

For our project, we will be analyzing a [dataset](https://github.com/MuseumofModernArt/collection/blob/master/Artworks.csv) concerning the collection of the Museum of Modern Art (MoMA). In late 2019, MoMA underwent [renovations](https://time.com/5688507/moma-reopening/) and reorganization of its galleries to feature more recent contemporary artists and global art. Our overarching question is exploring whether or not MoMA fulfilled its goal of expanding the diversity of its art collection. In order to see if this shift was reflected in their collection and acquisitions, we will attempt to measure the overall diversity a work adds to the collection by analyzing the relevant aspects of the artwork such as the artist's nationality, the artist’s gender, the year the work was created, the year it was acquired, and various other factors. Although it is impossible to truly quantify the abstract concept of diversity, we believe that analyzing overall patterns in acquisitions will be helpful in determining whether or not MoMA has been successful in its goal of introducing new viewpoints to its collection. 

For example, we will answer questions such as: is there an overwhelming number of male artists represented?; has MoMA recently been acquiring artwork from non-American at a higher frequency than in the past?; is the artwork acquired from non-American artists from Western European countries or elsewhere?; how does the average age of a work of art change in different time periods. By answering all of these smaller questions, we will be able to make a judgement on our overall  research question of assessing the diversity of the collection in recent years. We will also look at the trends over time to see if there are changes in what kind of artwork MoMA typically collects and note any differences between time periods. We have found a rich and detailed dataset with 138,219 works of art owned by MoMA acquired from 1929 to June of 2020, so we will have a lot of information pertaining to the type of artwork MoMA has acquired in different time periods.  

In addition to the thorough analysis we will conduct on MoMA dataset, we also plan to compare the MoMA’s collection to other large modern/contemporary art museums like the [Tate](https://github.com/tategallery/collection), a modern art museum in London. Through a comparison of the Tate and the MoMA we will be able to see if there are any differences in collections and trends between museums in two different countries. We will also investigate the collection of the [Metropolitan Museum of Art](https://github.com/metmuseum/openaccess), another famous art museum in New York City. Even though the Met is not specifically focused on modern art, we will be able to key into its modern/contemporary art collection and only focus specifically on those works. By comparing the MoMA to these other museums we will be able to see if they not only reached their goals but if their goals place them as a forward thinker on modern art and not just a part of the crowd. 

In terms of collaboration, we have set up a [GitHub repository](https://github.com/lindsayhardy17/lindsayhardy17.github.io)and plan for storing the datasets, sharing code, and version control. We plan to meet on Zoom once a week on Tuesdays at 3:10, right after class, and will also be flexible with adding a second time to meet on Thursday after class as well.  During these meetings we plan to check our progress, solve problems, and ask questions, in addition to communicating outside of scheduled meetings as needed.







### Milestone - Data Extraction, Cleaning, and Loading

The first thing that we needed to do was unzip our files. We originally tried to pull the data from the github repo, but the MoMA has a no pull policy. So instead we downloaded the data from kaggle.

In [180]:
import pandas as pd
import zipfile
import numpy as np

For this first milestone the only things we needed were zipfile and pandas. So We unzipped the files and placed them in a working form in our folder

In [181]:
with zipfile.ZipFile('archive.zip', 'r') as zip_ref:
    zip_ref.extractall()

Next we read in the Artworks file and the Artists file and created a new dataframe called gender. We will be analyzing the gender and nationality of the artists so we wanted it in our main dataframe. Originally we tried to merge the two dataframes on 'Artist ID', but upon a closer look, "Artist ID" had multiple ids in the artworks dataframe but not in the artists dataframe. Some had up to 5 ids while others only had 1. Because of this we decided to just merge on "Name" because of its consistence across both dataframes. Once we merged we renamed the dataframe "moma". The resulting dataframe is made up of the works title, the artist and their background, the name of the work, information on the medium and different measurement types, when the moma acquired the work, and a couple other minor things. We are looking to analyze the dataframe to find insights on the dates that they were acquired with regard to when the works were made and if there were significant changes in the works acquired after a certain year. We think with this dataset we will be able to do all of these things because of the columns below, but will need to make sure that the columns are in the correct format.

In [182]:
moma_artworks = pd.read_csv("Artworks.csv")
moma_artists = pd.read_csv("Artists.csv")
moma_artists.columns

Index(['Artist ID', 'Name', 'Nationality', 'Gender', 'Birth Year',
       'Death Year'],
      dtype='object')

In [183]:
gender = moma_artists[["Gender", 'Name', 'Nationality', 'Birth Year','Death Year']]
moma = moma_artworks.merge(gender, on = ['Name'], how = 'left')
moma.head()


Unnamed: 0,Artwork ID,Title,Artist ID,Name,Date,Medium,Dimensions,Acquisition Date,Credit,Catalogue,...,Height (cm),Length (cm),Width (cm),Depth (cm),Weight (kg),Duration (s),Gender,Nationality,Birth Year,Death Year
0,2,"Ferdinandsbrücke Project, Vienna, Austria, Ele...",6210,Otto Wagner,1896,Ink and cut-and-pasted painted pages on paper,"19 1/8 x 66 1/2"" (48.6 x 168.9 cm)",1996-04-09,Fractional and promised gift of Jo Carole and ...,Y,...,48.6,,168.9,,,,Male,Austrian,1841.0,1918.0
1,3,"City of Music, National Superior Conservatory ...",7470,Christian de Portzamparc,1987,Paint and colored pencil on print,"16 x 11 3/4"" (40.6 x 29.8 cm)",1995-01-17,Gift of the architect in honor of Lily Auchinc...,Y,...,40.6401,,29.8451,,,,Male,French,1944.0,
2,4,"Villa near Vienna Project, Outside Vienna, Aus...",7605,Emil Hoppe,1903,"Graphite, pen, color pencil, ink, and gouache ...","13 1/2 x 12 1/2"" (34.3 x 31.8 cm)",1997-01-15,Gift of Jo Carole and Ronald S. Lauder,Y,...,34.3,,31.8,,,,Male,Austrian,1876.0,1957.0
3,5,"The Manhattan Transcripts Project, New York, N...",7056,Bernard Tschumi,1980,Photographic reproduction with colored synthet...,"20 x 20"" (50.8 x 50.8 cm)",1995-01-17,Purchase and partial gift of the architect in ...,Y,...,50.8,,50.8,,,,Male,,1944.0,
4,6,"Villa, project, outside Vienna, Austria, Exter...",7605,Emil Hoppe,1903,"Graphite, color pencil, ink, and gouache on tr...","15 1/8 x 7 1/2"" (38.4 x 19.1 cm)",1997-01-15,Gift of Jo Carole and Ronald S. Lauder,Y,...,38.4,,19.1,,,,Male,Austrian,1876.0,1957.0


Then we checked on the datatypes, everything looked like it was the right type of object except for the dates, so we first changed Acquisition Date

In [184]:
display(moma.dtypes)
#checked to see if the values are correct, the date and the acquisition date aren't in date time so converting them
moma['Acquisition Date'] = pd.to_datetime(moma['Acquisition Date'], errors = 'coerce')
#changed to datetime
moma.dtypes

Artwork ID              int64
Title                  object
Artist ID              object
Name                   object
Date                   object
Medium                 object
Dimensions             object
Acquisition Date       object
Credit                 object
Catalogue              object
Department             object
Classification         object
Object Number          object
Diameter (cm)         float64
Circumference (cm)    float64
Height (cm)           float64
Length (cm)           float64
Width (cm)            float64
Depth (cm)            float64
Weight (kg)           float64
Duration (s)          float64
Gender                 object
Nationality            object
Birth Year            float64
Death Year            float64
dtype: object

Artwork ID                     int64
Title                         object
Artist ID                     object
Name                          object
Date                          object
Medium                        object
Dimensions                    object
Acquisition Date      datetime64[ns]
Credit                        object
Catalogue                     object
Department                    object
Classification                object
Object Number                 object
Diameter (cm)                float64
Circumference (cm)           float64
Height (cm)                  float64
Length (cm)                  float64
Width (cm)                   float64
Depth (cm)                   float64
Weight (kg)                  float64
Duration (s)                 float64
Gender                        object
Nationality                   object
Birth Year                   float64
Death Year                   float64
dtype: object

While we were checking the different values we realized that the Date, the year the artword was created, had a lot of different problems and could not be analyzed because of its different date representations, like having 1967-1977, or c. 1989, or early 1992. Because of this we had to fix the Date column, which is shown below. Having this column be correct is very important to us because the date of creation is crucial to our analysis.

In [185]:
moma['Date'].unique()[:100]
#a small sample of the unique types and their differences 

array(['1896', '1987', '1903', '1980', '1976-77', '1968', '1900', '1978',
       '1905', '1906', '1979', '1980-81', '1918', '1970', '1975', '1984',
       '1986', '1974', 'n.d.', 'c. 1917', '1917', '1923', 'Unknown',
       '1930', '1936', '1935', '1937', '1938', '1977', '1958', '1985',
       '1989', '1949', '1958–1964', 'c. 1935', '1991', '1941', '1965',
       '1981', '1983', '1985–1988', 'c. 1989-91', '1992', '1915-17',
       'c. 1915-17', '1953', '1910', 'c.1985', '1982–1986', '1982-86',
       '1945', '1923–1924', '.1-3 1987; .4 1990', '1990', '1976', '1995',
       '1927–1931', 'c. 1929-30', '1964', '1959', 'c. 1918-20',
       'c.1918-1920', '1939', 'c.1976', '1975-79', '1993', '1996', '1988',
       '1982-83', '1982–1983', '1952-53', '1921', '1957', '1972',
       '1956-57', '1924', '1962', '1925', '1960', '1969', '1963', '1994',
       '1961', '1960-61', '1952', 'c. 1978-84', '1927', '1979–1985',
       'before 1933', '1929', 'c. 1960-62', '1967', '1956', 'c. 1961',
       '

In [186]:
n_d = moma[(moma["Date"] == "nan") | (moma["Date"] == "n.d") | (moma["Date"] == "Unkn") | (moma["Date"] == "Unknown") | (moma["Date"] == "Various") | (moma["Date"] == "unknown")].index
moma.drop(n_d, inplace = True)

First we removed the obvious components that were incorrect, but then realized the large extent to which the dataset had issues, so we used regular expressions to extract a date, shown below, and then checked the remaining dates that were not in the correct format that we missed

In [187]:
moma['date_edit'] = moma['Date'].str.extract('(\d{4})', expand = False)
#creating a new column to store the dates
moma[(moma['date_edit'].isnull() == True) & (moma['Date'].isnull() == False)]['Date'].unique()
#checking where the date_edit is null but the Date is true - these will be the remaining dates that could
#not be converted to a correct date format with four numbers

array(['n.d.', '8th-9th century C.E.', '7th-8th century C.E.', 'Unkown',
       '(London?, published in aid of the Comforts Fund  for Women and Children of Sovie',
       '(n.d.)', 'New York', 'November 10', '(19)71', '(19)69',
       'date of publicati', 'nd', 'no date',
       '(newspaper published March 30)', 'n. d.', 'c. 196?', 'TBC', 'TBD'],
      dtype=object)

Because none of these dates had any meaning to us, we could officially switch over to have the date_edit column be our official date column, which we do below

In [188]:
moma['Date'] = moma['date_edit']
moma.drop(columns=['date_edit'], inplace = True)
moma['Date'].unique()
#one last check to make sure that all of our corresponding values are actually dates

array(['1896', '1987', '1903', '1980', '1976', '1968', '1900', '1978',
       '1905', '1906', '1979', '1918', '1970', '1975', '1984', '1986',
       '1974', nan, '1917', '1923', '1930', '1936', '1935', '1937',
       '1938', '1977', '1958', '1985', '1989', '1949', '1991', '1941',
       '1965', '1981', '1983', '1992', '1915', '1953', '1910', '1982',
       '1945', '1990', '1995', '1927', '1929', '1964', '1959', '1939',
       '1993', '1996', '1988', '1952', '1921', '1957', '1972', '1956',
       '1924', '1962', '1925', '1960', '1969', '1963', '1994', '1961',
       '1933', '1967', '1934', '1940', '1946', '1955', '1997', '1922',
       '1942', '1954', '1916', '1973', '1926', '1932', '1947', '1943',
       '1944', '1966', '1971', '1999', '1951', '1913', '1928', '1886',
       '1920', '1950', '1931', '1901', '1948', '1912', '1908', '1902',
       '1904', '1998', '1898', '1875', '1880', '1909', '1501', '1897',
       '1907', '1895', '1914', '1885', '1768', '1878', '1808', '1865',
       '1

After we officially had all the values as dates we converted them to a date time

In [189]:
# convert the strings back to integers
moma['Date'] = pd.to_datetime(moma['Date'], errors="coerce")
moma['Date'] = moma['Date'].dt.year

One of the main things that we will be looking at is when the MoMA acquired the work of art. Without this knowledge we won't be able to see if they really changed following their addition. Because of this we removed works that did not have a date from the original dataframe moma. We decided not to remove the date of the creation of art for right now because we believe those works could potentially be useful.

In [190]:
no_date = moma[(moma['Acquisition Date'].isnull() == True)].index
moma.drop(no_date, inplace = True)
#dropped dates that didn't have an acquisition date

Then we grouped the variables based on certain classifications that we thought the categories would fall under, because we think it might be helpful to us in the future!

We realized we needed to add an age in to see when the artist created the work, because that could be something interesting to analyze

In [191]:
moma['age_made'] = moma['Date'] - moma['Birth Year']
moma['alive?'] = moma['Date']

In [192]:
def test(acquisition,death):
    if acquisition > death:
        return 'False'
    else:
        return 'True'
moma['Acquisition Year'] = moma['Acquisition Date'].dt.year
moma['alive?'] = moma.apply(lambda row: test(row['Acquisition Year'],row['Death Year']),axis=1)
#could end up doing analysis on how old the artist was etc.

We used apply above to check each row to see if the artist was alive when the work was acquired, then made a column called alive? that was true if they were alive and false if they weren't

In [193]:
moma.drop(columns = ['Acquisition Year'], inplace = True)

In [194]:
moma.columns

Index(['Artwork ID', 'Title', 'Artist ID', 'Name', 'Date', 'Medium',
       'Dimensions', 'Acquisition Date', 'Credit', 'Catalogue', 'Department',
       'Classification', 'Object Number', 'Diameter (cm)',
       'Circumference (cm)', 'Height (cm)', 'Length (cm)', 'Width (cm)',
       'Depth (cm)', 'Weight (kg)', 'Duration (s)', 'Gender', 'Nationality',
       'Birth Year', 'Death Year', 'age_made', 'alive?'],
      dtype='object')

In [195]:
ids = moma[['Artwork ID', 'Artist ID']]
names_dates = moma[['Title', 'Name', 'Date', 'Acquisition Date', 'Artwork ID', 'Gender', 'Credit', 'Nationality','age_made'
                   , 'alive?']]
dimensions = moma[['Artwork ID','Dimensions', 'Diameter (cm)',
       'Circumference (cm)', 'Height (cm)', 'Length (cm)', 'Width (cm)',
       'Depth (cm)', 'Weight (kg)', 'Duration (s)']]
credit = moma[['Artwork ID','Credit']]
extra = moma[['Medium', 'Department', 'Classification', 'Artwork ID']]

## Milestone 2

In order to do through analysis we should compare the MoMA's data to data from other museums. The first museum that we will examine is the Tate, which is located in London, Englad.

In [196]:
tate_artworks = pd.read_csv("artwork_data-tate.csv")
tate_artists = pd.read_csv("artist_data-tate.csv")

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


In [197]:
display(tate_artworks.head())

Unnamed: 0,id,accession_number,artist,artistRole,artistId,title,dateText,medium,creditLine,year,acquisitionYear,dimensions,width,height,depth,units,inscription,thumbnailCopyright,thumbnailUrl,url
0,1035,A00001,"Blake, Robert",artist,38,A Figure Bowing before a Seated Old Man with h...,date not known,"Watercolour, ink, chalk and graphite on paper....",Presented by Mrs John Richmond 1922,,1922.0,support: 394 x 419 mm,394,419,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-a-fi...
1,1036,A00002,"Blake, Robert",artist,38,"Two Drawings of Frightened Figures, Probably f...",date not known,Graphite on paper,Presented by Mrs John Richmond 1922,,1922.0,support: 311 x 213 mm,311,213,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-two-...
2,1037,A00003,"Blake, Robert",artist,38,The Preaching of Warning. Verso: An Old Man En...,?c.1785,Graphite on paper. Verso: graphite on paper,Presented by Mrs John Richmond 1922,1785.0,1922.0,support: 343 x 467 mm,343,467,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
3,1038,A00004,"Blake, Robert",artist,38,Six Drawings of Figures with Outstretched Arms,date not known,Graphite on paper,Presented by Mrs John Richmond 1922,,1922.0,support: 318 x 394 mm,318,394,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-six-...
4,1039,A00005,"Blake, William",artist,39,The Circle of the Lustful: Francesca da Rimini...,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826.0,1919.0,image: 243 x 335 mm,243,335,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...


In [198]:
tate_newartist = tate_artists[['name','gender','placeOfBirth','yearOfBirth','yearOfDeath']]

In [199]:
tate_newartworks = tate_artworks[['id','accession_number','artist','artistRole','artistId','title','medium','year'
                                 ,'acquisitionYear','dimensions']]

In [200]:
tate = tate_newartworks.merge(tate_newartist, left_on = ['artist'],right_on = ['name'], how = 'left')

In [201]:
tate = tate.drop(columns = ['name'])

In [202]:
tate.dtypes

id                    int64
accession_number     object
artist               object
artistRole           object
artistId              int64
title                object
medium               object
year                 object
acquisitionYear     float64
dimensions           object
gender               object
placeOfBirth         object
yearOfBirth         float64
yearOfDeath         float64
dtype: object

In [203]:
no_date = tate[(tate['acquisitionYear'].isnull() == True)].index
tate.drop(no_date, inplace = True)

In [204]:
tate['year'].replace({'no date':np.nan, 'c.1997-9':1997}, inplace = True)
tate['age_made'] = pd.to_numeric(tate['year']) - tate['yearOfBirth']
tate['alive?'] = tate['year']

In [205]:
def test(acquisition,death):
    if acquisition > death:
        return 'False'
    else:
        return 'True'
tate['alive?'] = tate.apply(lambda row: test(row['acquisitionYear'],row['yearOfDeath']),axis=1)

In [206]:
tate.head()

Unnamed: 0,id,accession_number,artist,artistRole,artistId,title,medium,year,acquisitionYear,dimensions,gender,placeOfBirth,yearOfBirth,yearOfDeath,age_made,alive?
0,1035,A00001,"Blake, Robert",artist,38,A Figure Bowing before a Seated Old Man with h...,"Watercolour, ink, chalk and graphite on paper....",,1922.0,support: 394 x 419 mm,Male,"London, United Kingdom",1762.0,1787.0,,False
1,1036,A00002,"Blake, Robert",artist,38,"Two Drawings of Frightened Figures, Probably f...",Graphite on paper,,1922.0,support: 311 x 213 mm,Male,"London, United Kingdom",1762.0,1787.0,,False
2,1037,A00003,"Blake, Robert",artist,38,The Preaching of Warning. Verso: An Old Man En...,Graphite on paper. Verso: graphite on paper,1785.0,1922.0,support: 343 x 467 mm,Male,"London, United Kingdom",1762.0,1787.0,23.0,False
3,1038,A00004,"Blake, Robert",artist,38,Six Drawings of Figures with Outstretched Arms,Graphite on paper,,1922.0,support: 318 x 394 mm,Male,"London, United Kingdom",1762.0,1787.0,,False
4,1039,A00005,"Blake, William",artist,39,The Circle of the Lustful: Francesca da Rimini...,Line engraving on paper,1826.0,1919.0,image: 243 x 335 mm,Male,"London, United Kingdom",1757.0,1827.0,69.0,False
