# Museum of Modern Art (MOMA) Artwork Data Exploration
    
In this project I will explore the vast Artwork data available from [New York's Museum of Modern Art](https://github.com/MuseumofModernArt/collection) to answer the following questions:

### What does the makeup of the artwork look like in terms of artist nationality, and gender?

### What other interesting information may be seen from the data collected?

#### To accomplish this I will do the following:

1. Explore the Data
2. Clean the Data
3. Analyze the Data
4. Make Conclusions from my analysis

### 1. Explore the Data

In [1]:
from csv import reader

opened_file = open('MOMAArtworks.csv')
read_file = reader(opened_file)
moma = list(read_file)
moma_header = moma[0]
moma = moma[1:]

print(moma_header)
print('\n')
print(moma[0:3])

['\ufeffTitle', 'Artist', 'ConstituentID', 'ArtistBio', 'Nationality', 'BeginDate', 'EndDate', 'Gender', 'Date', 'Medium', 'Dimensions', 'CreditLine', 'AccessionNumber', 'Classification', 'Department', 'DateAcquired', 'Cataloged', 'ObjectID', 'URL', 'ThumbnailURL', 'Circumference (cm)', 'Depth (cm)', 'Diameter (cm)', 'Height (cm)', 'Length (cm)', 'Weight (kg)', 'Width (cm)', 'Seat Height (cm)', 'Duration (sec.)']


[['Ferdinandsbrücke Project, Vienna, Austria (Elevation, preliminary version)', 'Otto Wagner', '6210', '(Austrian, 1841–1918)', '(Austrian)', '(1841)', '(1918)', '(Male)', '1896', 'Ink and cut-and-pasted painted pages on paper', '19 1/8 x 66 1/2" (48.6 x 168.9 cm)', 'Fractional and promised gift of Jo Carole and Ronald S. Lauder', '885.1996', 'Architecture', 'Architecture & Design', '1996-04-09', 'Y', '2', 'http://www.moma.org/collection/works/2', 'http://www.moma.org/media/W1siZiIsIjU5NDA1Il0sWyJwIiwiY29udmVydCIsIi1yZXNpemUgMzAweDMwMFx1MDAzZSJdXQ.jpg?sha=137b8455b1ec6167', 

In [2]:
def explore_data(dataset, start, end, rows_and_columns = False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n')

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        
print(moma_header)
print('\n')
explore_data(moma, 0, 3, True)

['\ufeffTitle', 'Artist', 'ConstituentID', 'ArtistBio', 'Nationality', 'BeginDate', 'EndDate', 'Gender', 'Date', 'Medium', 'Dimensions', 'CreditLine', 'AccessionNumber', 'Classification', 'Department', 'DateAcquired', 'Cataloged', 'ObjectID', 'URL', 'ThumbnailURL', 'Circumference (cm)', 'Depth (cm)', 'Diameter (cm)', 'Height (cm)', 'Length (cm)', 'Weight (kg)', 'Width (cm)', 'Seat Height (cm)', 'Duration (sec.)']


['Ferdinandsbrücke Project, Vienna, Austria (Elevation, preliminary version)', 'Otto Wagner', '6210', '(Austrian, 1841–1918)', '(Austrian)', '(1841)', '(1918)', '(Male)', '1896', 'Ink and cut-and-pasted painted pages on paper', '19 1/8 x 66 1/2" (48.6 x 168.9 cm)', 'Fractional and promised gift of Jo Carole and Ronald S. Lauder', '885.1996', 'Architecture', 'Architecture & Design', '1996-04-09', 'Y', '2', 'http://www.moma.org/collection/works/2', 'http://www.moma.org/media/W1siZiIsIjU5NDA1Il0sWyJwIiwiY29udmVydCIsIi1yZXNpemUgMzAweDMwMFx1MDAzZSJdXQ.jpg?sha=137b8455b1ec6167', '

### 2. Clean the Data

In [3]:
n = []
for row in moma:
    n.append(row[4])
    
g = []
for row in moma:
    g.append(row[7])

print(n[:5])
print(g[:5])

['(Austrian)', '(French)', '(Austrian)', '()', '(Austrian)']
['(Male)', '(Male)', '(Male)', '(Male)', '(Male)']


In [4]:
for row in moma:
    nationality = row[4]
    nationality = nationality.replace("(","")
    nationality = nationality.replace(")","")
    row[4] = nationality
    gender = row[7]
    gender = gender.replace("(","")
    gender = gender.replace(")","")
    row[7] = gender
    
print(moma[:3])
print('\n')
print(moma[300][4])
print(moma[400][4])
print(moma[300][7])
print(moma[400][7])

[['Ferdinandsbrücke Project, Vienna, Austria (Elevation, preliminary version)', 'Otto Wagner', '6210', '(Austrian, 1841–1918)', 'Austrian', '(1841)', '(1918)', 'Male', '1896', 'Ink and cut-and-pasted painted pages on paper', '19 1/8 x 66 1/2" (48.6 x 168.9 cm)', 'Fractional and promised gift of Jo Carole and Ronald S. Lauder', '885.1996', 'Architecture', 'Architecture & Design', '1996-04-09', 'Y', '2', 'http://www.moma.org/collection/works/2', 'http://www.moma.org/media/W1siZiIsIjU5NDA1Il0sWyJwIiwiY29udmVydCIsIi1yZXNpemUgMzAweDMwMFx1MDAzZSJdXQ.jpg?sha=137b8455b1ec6167', '', '', '', '48.6', '', '', '168.9', '', ''], ['City of Music, National Superior Conservatory of Music and Dance, Paris, France, View from interior courtyard', 'Christian de Portzamparc', '7470', '(French, born 1944)', 'French', '(1944)', '(0)', 'Male', '1987', 'Paint and colored pencil on print', '16 x 11 3/4" (40.6 x 29.8 cm)', 'Gift of the architect in honor of Lily Auchincloss', '1.1995', 'Architecture', 'Architectu

In [5]:
for row in moma:
    nationality = row[4]
    nationality = nationality.title()
    if not nationality:
        nationality = "Unknown"
    row[4] = nationality
    gender = row[7]
    gender = gender.title()
    if not gender:
        gender = "Unknown"
    row[7] = gender
    
print(moma[:3])
print('\n')
print(moma[300][4])
print(moma[400][4])
print(moma[300][7])
print(moma[400][7])

[['Ferdinandsbrücke Project, Vienna, Austria (Elevation, preliminary version)', 'Otto Wagner', '6210', '(Austrian, 1841–1918)', 'Austrian', '(1841)', '(1918)', 'Male', '1896', 'Ink and cut-and-pasted painted pages on paper', '19 1/8 x 66 1/2" (48.6 x 168.9 cm)', 'Fractional and promised gift of Jo Carole and Ronald S. Lauder', '885.1996', 'Architecture', 'Architecture & Design', '1996-04-09', 'Y', '2', 'http://www.moma.org/collection/works/2', 'http://www.moma.org/media/W1siZiIsIjU5NDA1Il0sWyJwIiwiY29udmVydCIsIi1yZXNpemUgMzAweDMwMFx1MDAzZSJdXQ.jpg?sha=137b8455b1ec6167', '', '', '', '48.6', '', '', '168.9', '', ''], ['City of Music, National Superior Conservatory of Music and Dance, Paris, France, View from interior courtyard', 'Christian de Portzamparc', '7470', '(French, born 1944)', 'French', '(1944)', '(0)', 'Male', '1987', 'Paint and colored pencil on print', '16 x 11 3/4" (40.6 x 29.8 cm)', 'Gift of the architect in honor of Lily Auchincloss', '1.1995', 'Architecture', 'Architectu

In [6]:
nationality = []
for row in moma:
    nationality.append(row[4])

print(nationality[:500])

['Austrian', 'French', 'Austrian', 'Unknown', 'Austrian', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'American', 'Unknown', 'Austrian', 'Unknown', 'Austrian', 'Unknown', 'Austrian', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'German', 'American', 'American ', 'Dutch Dutch', 'American', 'Unknown', 'American', 'Unknown', 'American', 'American', 'American', 'American', 'American', 'Italian Italian Italian', 'American', 'Swedish', 'Swedish', 'Swedish', 'Swedish', 'Swedish', 'Swedish', 'Swedish', 'Swedish', 'Swedish', 'Swedi

In [7]:
def process_data(nationality):
    if ' ' in nationality:
        nationality_split = nationality.split(' ')
        nationality_1 = nationality_split[0]
        nationality = nationality_1
    else:
        return nationality
    return nationality

for row in moma:
    nationality = row[4]
    nationality = process_data(nationality)
    row[4] = nationality
    
n = []
for row in moma:
    n.append(row[4])

print(n[:5])

['Austrian', 'French', 'Austrian', 'Unknown', 'Austrian']


In [8]:
for row in moma:
    nationality = row[4]
    if nationality == 'South':
        nationality = 'South African'
    if nationality == 'Puerto':
        nationality = 'Puerto Rican'
    if nationality == 'New':
        nationality = 'New Zealander'
    if nationality == 'Native':
        nationality = 'Native American'
    if nationality == ' ':
        nationality = 'Unknown'
    if nationality == '':
        nationality = 'Unknown'
    if nationality == 'Nationality':
        nationality = 'Unknown'
    row[4] = nationality

In [9]:
gender = []
for row in moma:
    gender.append(row[7])

print(gender[:500])

['Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male Male', 'Male Female', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male Male Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male Male', 'Male', 'Male', 'Male Male Male', 'Male Male Male', 'Male Male Male', 'Male Male Male', 'Male', 'Male', 'Mal

In [10]:
def process_data(gender):
    if ' ' in gender:
        gender_split = gender.split(' ')
        gender_1 = gender_split[0]
        gender = gender_1
    else:
        return gender
    return gender

for row in moma:
    gender = row[7]
    gender = process_data(gender)
    row[7] = gender
    
g = []
for row in moma:
    g.append(row[7])

print(g[:5])

['Male', 'Male', 'Male', 'Male', 'Male']


In [11]:
for row in moma:
    gender = row[7]
    if gender == ' ':
        gender = 'Unknown'
    if gender == '':
        gender = 'Unknown'
    row[7] = gender

### 3. Analyze the Data

In [12]:
nationality_freq = {}

for row in moma:
    nationality = row[4]
    if nationality not in nationality_freq:
        nationality_freq[nationality] = 1
    else:
        nationality_freq[nationality] += 1

print(nationality_freq)

{'Austrian': 1021, 'French': 22904, 'Unknown': 8411, 'American': 60881, 'German': 9711, 'Dutch': 1717, 'Italian': 3086, 'Swedish': 290, 'British': 6111, 'Japanese': 2674, 'Argentine': 987, 'Brazilian': 847, 'Swiss': 2327, 'Luxembourgish': 44, 'Spanish': 3158, 'Russian': 2378, 'Iranian': 20, 'Finnish': 230, 'Danish': 517, 'Belgian': 1482, 'Czech': 773, 'Moroccan': 16, 'Coptic': 3, 'Persian': 1, 'Canadian': 945, 'Colombian': 745, 'Australian': 271, 'Chinese': 280, 'Mexican': 1344, 'Yugoslav': 164, 'Scottish': 62, 'Hungarian': 156, 'Polish': 552, 'Slovenian': 38, 'Chilean': 602, 'Latvian': 70, 'Greek': 55, 'Israeli': 362, 'Czechoslovakian': 5, 'Icelandic': 52, 'Croatian': 154, 'Norwegian': 188, 'Ukrainian': 83, 'Cuban': 213, 'Romanian': 73, 'Venezuelan': 489, 'Uruguayan': 106, 'Georgian': 11, 'Thai': 31, 'Algerian': 6, 'Guatemalan': 71, 'Indian': 197, 'Irish': 37, 'Costa': 64, 'Korean': 104, 'Ethiopian': 6, 'Kuwaiti': 1, 'Haitian': 21, 'South African': 431, 'Zimbabwean': 15, 'Ecuadorian':

In [13]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
        
display_table(moma, 4)

American : 43.793609460645385
French : 16.47556431541239
German : 6.985426347667209
Unknown : 6.0502956451682515
British : 4.395833633054712
Spanish : 2.2716482757628507
Italian : 2.219856421470601
Japanese : 1.923491921909393
Russian : 1.7105698542634766
Swiss : 1.673883957473133
Dutch : 1.2350918586082378
Belgian : 1.066049000848811
Mexican : 0.9667812801219986
Austrian : 0.7344372671164885
Argentine : 0.7099800025895927
Canadian : 0.6797680875857802
Brazilian : 0.6092736192435512
Czech : 0.5560431023320721
Colombian : 0.5359018256628638
Chilean : 0.4330374483879786
Polish : 0.39707088290724946
Danish : 0.37189428707073907
Venezuelan : 0.35175301040153073
Ivorian : 0.34743702254384323
South African : 0.31003179444388496
Israeli : 0.2603979340804788
Swedish : 0.20860607978822887
Chinese : 0.20141276669208302
Australian : 0.1949387849055518
Finnish : 0.16544620121135392
Cuban : 0.15321756894790603
Indian : 0.1417082679940727
Norwegian : 0.13523428620754147
Yugoslav : 0.1179703347767915

![](MOMA_Nationality_Chart.png)

In [14]:
gender_freq = {}

for row in moma:
    gender = row[7]
    if gender not in gender_freq:
        gender_freq[gender] = 1
    else:
        gender_freq[gender] += 1

print(gender_freq)
print('\n')
for gender, artworks in gender_freq.items():
    template = "There are {a:,} artworks by {g} artists"
    print(template.format(g=gender, a=artworks))


{'Male': 109722, 'Unknown': 10353, 'Female': 18942, 'Non-Binary': 1}


There are 109,722 artworks by Male artists
There are 10,353 artworks by Unknown artists
There are 18,942 artworks by Female artists
There are 1 artworks by Non-Binary artists


In [15]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
        
display_table(moma, 7)

Male : 78.9264699535312
Female : 13.625573666719418
Unknown : 7.44723704843977
Non-Binary : 0.0007193313096145823


![](http://localhost:8888/view/DataQuest/MOMA_Gender_Chart.png)

### 4. Conclusion

I have briefly analyzed the nationality and gender representations of the data available from artworks exhibited over the last 150 years at [New York's Museum of Modern Art](https://github.com/MuseumofModernArt/collection). My analysis is the following:

#### What does the makeup of the artwork look like in terms of artist nationality, and gender?

It is clear that American artists are predominantly represented in this museum. American work makes up 44% of the artwork represented in this data. This is more than the other top four countries represented combined. This may be because it is an American museum, therefore, an analysis of modern art museums accross the world may help understand if this is a pattern accross the world, or if different countries display their nations artwork as predominantly as it is done in this museum. It is also important to note that four out of the five countries most represented in the artwork on display is from European the countries of France, Germany, Britain, and Sapin. In fact, out of that region, only Japan makes the top ten countries. There are various reasons why this may be the case and an important analysis to evaluate this further would be to look at the changes throughout time. Some important questions to ask are:

- Has the representation of more nationalities increased or decreased throughout the years? 
- What are the methods for curating artwork and where are the gaps for nationality representation? Is there enough interest, submissions, selections for different countries around the world and how has that changed throughout the years?

Similary, male artists are predominantly represented in this museum. Male work makes up 79% of the artwork represented in this data. This is more than three times the work of the Female, Unknown, and non-binary gender artwork represented combined. In order to determine if this is a pattern evident accross all of the museums in the world, a comparative analysis of gender representation in modern art museums, may help understand if this is the norm, or if different countries display artwork of diverse gender identities in a more equitable and inclusive way. There are various reasons why male representation is more predominant, and an important analysis to evaluate this further would be to look at the changes throughout the museum's time. Some important questions to ask are:

- Has the representation of gender identities other that male, increased or decreased throughout the years? 
- What are the methods for curating artwork and where are the gaps for gender representation? Is there enough interest, submissions, selections for different gender identities around the world and how has that changed throughout the years?

#### What other interesting information may be seen from the data collected?

The vast data collected by New York's Museum of Modern Art may help understand what has influenced the representaation of diverse nationalities and gender identities throughout different periods of time. Looking at changes in artwork representation throughout history may provide clues as to whether efforts have been more or less inclusive throughout history and what may be effective in attracting more diverse artists, and what may be done within the museum to achieve more diversity and inclusion in the artists represented.