# ART CUT
This notebook was created by [Francesca Borriello](https://github.com/Fran-cesca), [Lorenza Pierucci](https://github.com/LorenzaPierucci) and [Laura Travaglini](https://github.com/lauratravaglini) as part of their final project for the [Digital Pubishing and Electronic Storytelling](https://www.unibo.it/it/didattica/insegnamenti/insegnamento/2021/443749) course at the University of Bologna (academic year 2021/2022).

# About 
Starting from the datasets made publicly available by the Museum of Modern Art of New York (**MoMA**) and by the **Tate galleries**, *Art cut* analyses artworks acquisitions throughout the years with the aim of understanding which criteria brought together the museums' collections in a historical and social perspective.

Art and history of art are no sealed compartments: they are heavily inter-dependent with social, political, economic factors, which in turn influence our very perception of what art is. 
Cultural institutions – museums in particular – play a fundamental role in this intertwined dynamics: through their selection, they have the potential to shape the public understanding of arts and its modifications throughout time.

In some way, what makes into museums makes into history of art and viceversa.

From these considerations stems our analysis: how do external (social, political, economic) factors influence the perception of art and its history? 
A way to investigate it is by looking at some representative and influential museums around the world, and at their acquisition policies and campaigns in particular. 

# 1. Creating dataframes.
After importing all the necessary libraries, we can read our museums' online CSV files containing information about artworks and artists as `Pandas Dataframes` in order to better manipulate and analyse them.

## Import

In [2]:
import pandas as pd
import csv
import re
from collections import defaultdict
from rdflib import Namespace , Literal , URIRef
from rdflib.namespace import RDF , RDFS
import ssl
from json import JSONDecodeError
from qwikidata.sparql import return_sparql_query_results 

For both Museums, we gather data directly from the remote files available on their Github pages ([MoMA](https://github.com/MuseumofModernArt/collection), [Tate](https://github.com/tategallery/collection)). 
In particular, we work on two separate datasets: one carries information about **artworks** (their title, date, acquisition year atc.), the other provides data on the **artists** (their name, nationality, gender etc.).
In addition, we merge these two dataframes on some selected columns to create a new one for analysing **acquisition**-related issues.

# MoMA

In [3]:
spreadsheet = pd.read_csv('https://media.githubusercontent.com/media/MuseumofModernArt/collection/master/Artworks.csv')
MoMA_artworks = spreadsheet[['ConstituentID','Title','Date', 'DateAcquired']]
MoMA_artworks = MoMA_artworks.rename(columns = {'ConstituentID':'Id'})
MoMA_artists = pd.read_csv('https://media.githubusercontent.com/media/MuseumofModernArt/collection/master/Artists.csv')
MoMA_artists = MoMA_artists[['ConstituentID', 'DisplayName', 'Nationality', 'Gender']]
MoMA_artists['ConstituentID'] = MoMA_artists['ConstituentID'].astype(str)
MoMA_artists = MoMA_artists.rename(columns = {'ConstituentID':'Id', 'DisplayName':'Name'})

In [4]:
MoMA_acquisitions = pd.merge(MoMA_artists, MoMA_artworks[['Id', 'DateAcquired']], on='Id', how='left')

In [5]:
display(MoMA_artworks)

Unnamed: 0,Id,Title,Date,DateAcquired
0,6210,"Ferdinandsbrücke Project, Vienna, Austria (Ele...",1896,1996-04-09
1,7470,"City of Music, National Superior Conservatory ...",1987,1995-01-17
2,7605,"Villa near Vienna Project, Outside Vienna, Aus...",1903,1997-01-15
3,7056,"The Manhattan Transcripts Project, New York, N...",1980,1995-01-17
4,7605,"Villa, project, outside Vienna, Austria, Exter...",1903,1997-01-15
...,...,...,...,...
140843,3048,"Page from Sketchbook #24, New York City",1954-55,2020-12-09
140844,3048,"Page from Sketchbook #24, New York City",1954-55,2020-12-09
140845,3048,"Page from Sketchbook #24, New York City",1954-55,2020-12-09
140846,3048,"Front cover of Sketchbook #24, New York City",1954-55,2020-12-09


In [6]:
display(MoMA_artists)

Unnamed: 0,Id,Name,Nationality,Gender
0,1,Robert Arneson,American,Male
1,2,Doroteo Arnaiz,Spanish,Male
2,3,Bill Arnold,American,Male
3,4,Charles Arnoldi,American,Male
4,5,Per Arnoldi,Danish,Male
...,...,...,...,...
15238,135018,Abdoulaye Konaté,Malian,
15239,135032,Yolanda Lopez,American,Female
15240,135042,Arnt Jensen,Danish,Male
15241,135111,After Sophie Taeuber-Arp,,


In [7]:
display(MoMA_acquisitions)

Unnamed: 0,Id,Name,Nationality,Gender,DateAcquired
0,1,Robert Arneson,American,Male,1981-04-28
1,1,Robert Arneson,American,Male,1997-05-28
2,2,Doroteo Arnaiz,Spanish,Male,1965-03-09
3,3,Bill Arnold,American,Male,1972-03-07
4,3,Bill Arnold,American,Male,1972-03-07
...,...,...,...,...,...
135966,135018,Abdoulaye Konaté,Malian,,2022-09-20
135967,135032,Yolanda Lopez,American,Female,
135968,135042,Arnt Jensen,Danish,Male,
135969,135111,After Sophie Taeuber-Arp,,,1969-11-12


# Tate

In [8]:
spreadsheet = pd.read_csv('https://raw.githubusercontent.com/tategallery/collection/master/artwork_data.csv')
Tate_artworks = spreadsheet[['artistId','title', 'year', 'acquisitionYear']]
Tate_artworks = Tate_artworks.rename(columns = {'artistId':'Id', 'acquisitionYear':'DateAcquired', 'year':'Date', 'title':'Title'})
Tate_artworks['Id'] = Tate_artworks['Id'].astype(str)
Tate_artists = pd.read_csv('https://raw.githubusercontent.com/tategallery/collection/master/artist_data.csv')
Tate_artists = Tate_artists[['id', 'name','placeOfBirth', 'gender']]
Tate_artists = Tate_artists.rename(columns = {'id':'Id', 'name':'Name', 'gender':'Gender'})
Tate_artists["Id"] = Tate_artists["Id"].astype(str)

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


In [9]:
display(Tate_artworks)

Unnamed: 0,Id,Title,Date,DateAcquired
0,38,A Figure Bowing before a Seated Old Man with h...,,1922.0
1,38,"Two Drawings of Frightened Figures, Probably f...",,1922.0
2,38,The Preaching of Warning. Verso: An Old Man En...,1785.0,1922.0
3,38,Six Drawings of Figures with Outstretched Arms,,1922.0
4,39,The Circle of the Lustful: Francesca da Rimini...,1826.0,1919.0
...,...,...,...,...
69196,16646,Larvae (from Tampax Romana),1975,2013.0
69197,16646,Living Womb (from Tampax Romana),1976,2013.0
69198,2365,Present Tense,1996,2013.0
69199,2760,Work No. 227: The lights going on and off,2000,2013.0


In [10]:
display(Tate_artists)

Unnamed: 0,Id,Name,placeOfBirth,Gender
0,10093,"Abakanowicz, Magdalena",Polska,Female
1,0,"Abbey, Edwin Austin","Philadelphia, United States",Male
2,2756,"Abbott, Berenice","Springfield, United States",Female
3,1,"Abbott, Lemuel Francis","Leicestershire, United Kingdom",Male
4,622,"Abrahams, Ivor","Wigan, United Kingdom",Male
...,...,...,...,...
3527,12542,"Zorio, Gilberto","Andorno Micca, Italia",Male
3528,2186,"Zox, Larry","Des Moines, United States",Male
3529,621,"Zuccarelli, Francesco",Italia,Male
3530,2187,"Zuloaga, Ignacio",España,Male


# 2. Cleaning data

Identifying and fixing incoherent, corrupt or defective data is an essential process for ensuring a satisfactory threshold of reliability to any further analysis. Let us delve into it.

## MoMA

### Missing values
First of all, let us deal with missing values, substituting them with zeros to better handle them.

In [11]:
MoMA_artists.fillna(value='0', inplace=True)
MoMA_artworks.fillna(value='0', inplace=True)
MoMA_acquisitions.fillna(value='0', inplace=True)

### Dates
#### Acquisition dates
Artworks acquisition dates are in the form `YYYY-MM-DD`.<br>
For the sake of our analysis, we extract only the year.

In [12]:
def cleanAcquisitionDatesMoMA(date):
    if '-' in date:
        date = date.split('-')[0]
        return date
    else:
        return date

In [13]:
MoMA_artworks["DateAcquired"] = MoMA_artworks["DateAcquired"].apply(cleanAcquisitionDatesMoMA)
MoMA_acquisitions["DateAcquired"] = MoMA_acquisitions["DateAcquired"].apply(cleanAcquisitionDatesMoMA)

#### Artworks' creation dates
In MoMA database, artworks' creation dates are mostly already represented by just one year.<br>
Nevertheless, there are some exceptions: years separated by a slash or a dash and strings of any kind, such as '(1950).  (Prints executed 1948', '(1883, published 1897)' or '(1911, dated 1912, published c. 1917)'.

We wrote a code to extract the year through a regex only matching the first sequence of four digits for each value.

In [14]:
def cleanDatesMoMA(date):
    if '-' in date:
        splitted = date.split('-')
        date = ' '.join(splitted) 
    if '/' in date:
        splitted = date.split('/')
        date = ' '.join(splitted) 
    if ',' in date:
        splitted = date.split(',')
        date = ' '.join(splitted) 
    if '.' in date:
        splitted = date.split('.')
        date = ' '.join(splitted) 
        
    x = re.search("\d{4}", date)
    if x:
        date = x.group()
    else:
        date = '0'
    return date

In [15]:
MoMA_artworks["Date"] = MoMA_artworks["Date"].astype(str)
MoMA_artworks["Date"] = MoMA_artworks["Date"].apply(cleanDatesMoMA)

In [16]:
display(MoMA_artworks)

Unnamed: 0,Id,Title,Date,DateAcquired
0,6210,"Ferdinandsbrücke Project, Vienna, Austria (Ele...",1896,1996
1,7470,"City of Music, National Superior Conservatory ...",1987,1995
2,7605,"Villa near Vienna Project, Outside Vienna, Aus...",1903,1997
3,7056,"The Manhattan Transcripts Project, New York, N...",1980,1995
4,7605,"Villa, project, outside Vienna, Austria, Exter...",1903,1997
...,...,...,...,...
140843,3048,"Page from Sketchbook #24, New York City",1954,2020
140844,3048,"Page from Sketchbook #24, New York City",1954,2020
140845,3048,"Page from Sketchbook #24, New York City",1954,2020
140846,3048,"Front cover of Sketchbook #24, New York City",1954,2020


## Tate

### Missing values
Again, let us replace missing values and strings indicating lack of information with zeros, thus obtaining a dataframe filled with coherent data.

In [41]:
Tate_artists.fillna(value='0', inplace=True)
Tate_artworks.fillna(value='0', inplace=True)
Tate_artworks['Date'].replace(to_replace=['no date','c'], value='0', inplace= True)

### Dates
Artworks acquisition dates and artworks creation dates are represented as floats (e.g. 1997.0). We convert them to the YYYY format.

In [19]:
def cleanDatesTate(date):
    if '.' in date:
        date = date.split('.')[0] 
    return date

In [20]:
Tate_artworks["Date"] = Tate_artworks["Date"].astype(str)
Tate_artworks["Date"] = Tate_artworks["Date"].apply(cleanDatesTate)
Tate_artworks["DateAcquired"] = Tate_artworks["DateAcquired"].astype(str)
Tate_artworks["DateAcquired"] = Tate_artworks["DateAcquired"].apply(cleanDatesTate)

### Artists' names
In both Tate dataframes artists' names are in the form `Surname, Name`. For clarity purposes, we decided to normalise them as `Name Surname`, and wrote a code to do so. 

In [21]:
def cleanArtistsNames(name):
    if ',' in name:
        name= name.split(',')
        name[0], name[1] = name[1], name[0]
        name = ' '.join(name)
    return name.strip()

In [22]:
Tate_artists["Name"] = Tate_artists["Name"].apply(cleanArtistsNames)

### Nationalities
In Tate's dataframe about artists, nationalities are often in the form `city, country`(e.g. 'Philadelphia, United States') and sometimes just indicate a city (e.g. 'Wimbledon'). <br>
Finally, all countries' names are in their original form (e.g. 'Nihon' for 'Japan'). <br>
We normalised it indicating, for each artist, its country of origin, in English. <br>
We looked for all diverging values and replaced them one by one through a script.

In [23]:
def cleanNationalitiesTate(naz): 
    if ',' in naz: 
        naz = naz.split(',')[1] 
    if naz == 'Blackheath': 
        naz= naz.replace('Blackheath', 'United Kingdom') 
    if naz == 'London': 
        naz= naz.replace('London', 'United Kingdom') 
    if naz == 'Kensington': 
        naz= naz.replace('Kensington', 'United Kingdom') 
    if naz == 'Chung-hua Min-kuo': 
        naz= naz.replace('Chung-hua Min-kuo', 'Taiwan') 
    if naz == 'Solothurn': 
        naz= naz.replace('Solothurn', 'Schweiz') 
    if naz == 'Melmerby': 
        naz= naz.replace('Melmerby', 'United Kingdom') 
    if naz == 'Montserrat': 
        naz= naz.replace('Montserrat', 'España') 
    if naz == 'Canterbury': 
        naz= naz.replace('Canterbury', 'United Kingdom') 
    if naz == 'Staten Island': 
        naz= naz.replace('Staten Island', 'United States') 
    if naz == 'Epsom': 
        naz= naz.replace('Epsom', 'United Kingdom') 
    if naz == 'Plymouth': 
        naz= naz.replace('Plymouth', 'United Kingdom') 
    if naz == 'Wimbledon': 
        naz= naz.replace('Wimbledon', 'United Kingdom') 
    if naz == 'Edinburgh': 
        naz= naz.replace('Edinburgh', 'United Kingdom') 
    if naz == 'Beckington': 
        naz= naz.replace('Beckington', 'United Kingdom') 
    if naz == 'Hertfordshire': 
        naz= naz.replace('Hertfordshire', 'United Kingdom') 
    if naz == 'Isle of Man': 
        naz= naz.replace('Isle of Man', 'United Kingdom') 
    if naz == 'Bristol': 
        naz= naz.replace('Bristol', 'United Kingdom') 
    if naz == 'Liverpool': 
        naz= naz.replace('Liverpool', 'United Kingdom') 
    if naz == 'Braintree': 
        naz= naz.replace('Braintree', 'United Kingdom') 
    if naz == 'Stoke on Trent': 
        naz= naz.replace('Stoke on Trent', 'United Kingdom') 
    if naz == 'Rochdale': 
        naz= naz.replace('Rochdale', 'United Kingdom') 
    if 'D.C.' in naz: 
        naz= naz.replace('D.C.', 'Colombia') 
    if 'Otok' in naz: 
        naz= naz.replace('Otok', 'Hrvatska') 
    if 'Département de la' in naz: 
        naz= naz.replace('Département de la', 'France') 
    if naz == 'Niederschlesien': 
        naz= naz.replace('Niederschlesien', 'Polska') 
    if naz == 'Perth': 
        naz= naz.replace('Perth', 'Australia') 
    if naz == 'Bermondsey': 
        naz= naz.replace('Bermondsey', 'United Kingdom') 
    if naz == 'Egremont': 
        naz= naz.replace('Egremont', 'United Kingdom') 
    if naz == 'Charlotte Amalie': 
        naz= naz.replace('Charlotte Amalie', 'United States') 
    if naz == 'Charlieu': 
        naz= naz.replace('Charlieu', 'France') 
    if naz == 'Stockholm': 
        naz= naz.replace('Stockholm', 'Sverige') 
    if naz == 'Auteuil': 
        naz= naz.replace('Auteuil', 'France') 
 
    if 'Polska' in naz: 
        naz = naz.replace('Polska', 'Poland') 
    if "Yisra'el" in naz: 
        naz = naz.replace("Yisra'el", 'Israel') 
    if 'Deutschland' in naz: 
        naz = naz.replace('Deutschland', 'Germany') 
    if 'Schweiz' in naz: 
        naz = naz.replace('Schweiz', 'Switzerland') 
    if 'Suomi' in naz: 
        naz = naz.replace('Suomi', 'Finland') 
    if 'Zhonghua' in naz: 
        naz = naz.replace('Zhonghua', 'China') 
    if 'Türkiye' in naz: 
        naz = naz.replace('Türkiye', 'Turkey') 
    if 'Al-‘Iraq' in naz: 
        naz = naz.replace('Al-‘Iraq', 'Iraq') 
    if 'België' in naz: 
        naz = naz.replace('België', 'Belgium') 
    if 'Rossiya' in naz: 
        naz = naz.replace('Rossiya', 'Russia') 
    if 'Nihon' in naz: 
        naz = naz.replace('Nihon', 'Japan') 
    if 'Éire' in naz: 
        naz = naz.replace('Éire', 'Ireland') 
    if 'Österreich' in naz: 
        naz = naz.replace('Österreich', 'Austria') 
    if 'Saint Hélier' in naz: 
        naz = naz.replace('Saint Hélier', 'United Kingdom') 
    if 'Ceská Republik' in naz: 
        naz = naz.replace('Ceská Republik', 'Czech Republic') 
    if 'Ukrayina' in naz: 
        naz = naz.replace('Ukrayina', 'Ukraine') 
    if 'Ellás' in naz: 
        naz = naz.replace('Ellás', 'Greece') 
    if 'Latvija ' in naz: 
        naz = naz.replace('Latvija ', 'Latvia') 
    if 'Douglas' in naz: 
        naz = naz.replace('Douglas', 'United Kingdom') 
    if 'România' in naz: 
        naz = naz.replace('România', 'Romania') 
    if 'Sverige' in naz: 
        naz = naz.replace('Sverige', 'Sweden') 
    if 'Bharat' in naz: 
        naz = naz.replace('Bharat', 'India')     
    if 'España' in naz: 
        naz = naz.replace('España', 'Spain')   
    if 'Magyarország' in naz: 
        naz = naz.replace('Magyarország', 'Hungery')  
    if 'Slovenská Republika' in naz: 
        naz = naz.replace('Slovenská Republika', 'Slovenia')  
        
    return naz.strip()

In [24]:
Tate_artists["placeOfBirth"] = Tate_artists["placeOfBirth"].apply(cleanNationalitiesTate)

In [25]:
display(Tate_artists)

Unnamed: 0,Id,Name,placeOfBirth,Gender
0,10093,Magdalena Abakanowicz,Poland,Female
1,0,Edwin Austin Abbey,United States,Male
2,2756,Berenice Abbott,United States,Female
3,1,Lemuel Francis Abbott,United Kingdom,Male
4,622,Ivor Abrahams,United Kingdom,Male
...,...,...,...,...
3527,12542,Gilberto Zorio,Italia,Male
3528,2186,Larry Zox,United States,Male
3529,621,Francesco Zuccarelli,Italia,Male
3530,2187,Ignacio Zuloaga,Spain,Male


# Integration

## Tate

Some information is missing. In particular, missing information relates to **gender**, We proceed to integrate it via **Wikidata** (when possible) and manually.

1. We create a subset of the dataframe containing all and only the the rows in which the gender information is missing.

In [26]:
Tate_to_integrate = Tate_artists[Tate_artists['Gender']== '0']
display(Tate_to_integrate)

Unnamed: 0,Id,Name,placeOfBirth,Gender
70,5221,Anonymous,0,0
84,657,Shusaku Arakawa,Japan,0
105,2202,born 1945; Mel Ramsden Art & Language (Michael...,0,0
106,17138,born 1939; David Bainbridge Art & Language (Te...,0,0
107,668,born 1939; Michael Baldwin Art & Language (Ter...,0,0
...,...,...,...,...
3319,14825,Shelagh Wakely,0,0
3386,18071,Richard Westmacott,0,0
3495,11740,José Yalenti,0,0
3509,15539,Marc Voge) Young-Hae Chang Heavy Industries (Y...,0,0


2. We exclude some entities to which a gender cannot be attributed (e.g., collective or anonymous artists).

In [27]:
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Anonymous']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'born 1945; Mel Ramsden Art & Language (Michael Baldwin  born 1944)']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'born 1939; David Bainbridge Art & Language (Terry Atkinson  born 1941; Michael Baldwin  born 1945; Harold Hurrell  born']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'born 1939; Michael Baldwin Art & Language (Terry Atkinson  born 1945)']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != '1939-1993; Mel Ramsden Art & Language (Ian Burn  born 1944)']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Atlas Group']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Black Audio Film Collective (John Akomfrah; Reece Auguis; Edward George; Lina Gopaul; Avril Johnson; David Lawson; Trevo']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Fionnuala and Leslie Boyd and Evans']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'British (?) School']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'British (?) School 19th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'British School 17th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'British School 16th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'British School 17th or 18th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'British School 18th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'British School 19th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'British School 20th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Chinese School 18th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'French School 18th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'French School 19th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'International Local (Sarah Charlesworth; Joseph Kosuth; Anthony McCall)']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Italian or German (?) School 17th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Langlands and Bell, Ben and Nikki']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Italian or German (?) School 17th century']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Ben and Nikki Langlands and Bell']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Lucy and Eegyudluk']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'France) M/M (Paris']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'T R Uthco (Doug Hall born 1944, Diane Andrews Hall born 1945, Jody Procter 1944-1998)']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Art & Language (Ian Burn, 1939-1993; Mel Ramsden, born 1944)']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Unknown']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Marc Voge) Young-Hae Chang Heavy Industries (Young-Hae Chang']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'K.O.S.']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Diane Andrews Hall born 1945 T R Uthco (Doug Hall born 1944  Jody Procter 1944-1998)']
Tate_to_integrate = Tate_to_integrate[Tate_to_integrate['Name'] != 'Mel Ramsden Art & Language (Ian Burn  born 1944)']

3. We proceed to search the artists' **Wikidata entities**: this will then allow us to look up for their gender.
The SPARQL query you can read below searches for human individuals with a specific artistic occupation (photograpers, artists, graphic artists, painters, video artists, sculptors and visual artists). The `{}` placeholder will be replaced by the artist's name from the dataframe via python `format()` method.
We directly apply the query to our dataframe through a function taking advantage of `qwikidata` (a Python package allowing to interact with Wikidata) and we insert the result in a new colum named `Artist Entity` and created on the fly.<br>
Since Wikidata SPARQL endpoint does not support heavy queries, we search for one occupation at a time and create a CSV file for all the artists for which the corresponding wikidata entity was found (e.g., all photographers).<br>
We then continue the research thoughout the rest of the dataframe, from which we progressively remove the rows for which we obtain the wikidata entity. We work on copies to avoid compromising the original file.<br>
Finally, we integrate all profession-specific CSV files into one dataframe, which contains all the rows lacking gender information with the corresponding Wikidata entity.

In [28]:
#Define the SPARQL query.
artists_ids = """
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?artist
WHERE {{
    ?artist wdt:P31 wd:Q5 .
    ?artist wdt:P106 ?occupation
                  FILTER (?occupation IN (wd:Q1281618) ) 
    ?artist rdfs:label ?o
    FILTER regex(?o, \"^{}$\" )
            FILTER (langMatches(lang(?o), "EN")).
}}

"""

In [29]:
# Define the function for applying the query to the dataframe and returning the wanted results.
def find_artists_ids(name):
    query = artists_ids.format(name.strip())
    res = return_sparql_query_results(query_string=query)
   
    try:
        wdt_uri = res['results']['bindings'][0]['artist']['value']
    except (IndexError, KeyError):
        return ""
    return wdt_uri.split("/")[-1]

In [77]:
# Apply the query,
Tate_to_integrate["Artist Entity"] = Tate_to_integrate["name"].apply(find_artists_ids)

In [80]:
# Create a CSV file for profession, e.g., photographers, visual artists etc. 
copy = Tate_to_integrate.copy(deep=True) 
sculptors =  copy[copy['Artist Entity']!= ''] 
sculptors.to_csv('Sculptors.csv')

In [None]:
# Apply the query iteratively, changing the occupation
to_integrate =  copy[copy['Artist Entity']== ''] 
to_integrate["Artist Entity"] = to_integrate["name"].apply(find_artists_genders_from_ids)

In [30]:
# Integrate all CSV files in one dataframe
Tate_artists_integrated = pd.concat(map(pd.read_csv, ['Artists.csv', 'Photographers.csv', 'Videoartists.csv', 'Graphicartists.csv', 'Painters.csv', 'Integratedmanually.csv']), ignore_index=True)

In [31]:
display(Tate_artists_integrated)

Unnamed: 0,Id,Name,Gender,placeOfBirth,Artist Entity
0,657,Shusaku Arakawa,0,"Nagoya, Nihon",Q478264
1,14424,Kiyohiko Komura,0,0,Q64826662
2,16926,Len Lye,0,0,Q1288566
3,5672,Vladimir Mayakovsky,0,0,Q132964
4,18266,Mithu Sen,0,0,Q43136922
...,...,...,...,...,...
77,17129,Vara,0,0,0
78,2107,Dee Villers,0,0,0
79,17732,Francis Vivares,0,0,Q5493438
80,11825,Imke Wagener,0,0,0


4. Once we have all possible Wikidata entities, we interrogate Wikidata to retrieve artists' gender.<br>
We apply the following SPARQL query to our new dataframe and add the retrieved gender information in a new `gender` column, created on the fly (the information which could not be found on Wikidata was integrated manually).

In [32]:
#Define the SPARQL query.
artists_genders = """ 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX wd: <http://www.wikidata.org/entity/> 
SELECT DISTINCT (SAMPLE(?genderLabel) AS ?genderL)
WHERE {{ 
     wd:{}  wdt:P21 ?gender . 
     ?gender rdfs:label ?genderLabel
    FILTER (langMatches(lang(?genderLabel), "EN"))
}} 
"""

In [33]:
# Define the function for applying the query to the dataframe and returning the wanted results.
def find_artists_genders(wikiId): 
    query = artists_genders.format(wikiId.strip()) 
    res = return_sparql_query_results(query_string=query) 
    print(query) 
    try: 
        gender= res['results']['bindings'][0]['genderL']['value'] 
    except (IndexError, KeyError, JSONDecodeError, ChunkedEncodingError): 
        return "" 
    return gender

In [None]:
# Apply the query.
Tate_artists_integrated['gender'] = ArtistIntegrated['Artist Entity'].apply(find_artists_genders)

In [34]:
# Manually integrate on a CSV file the missing information.
Tate_artists_with_gender = pd.read_csv('ArtistIntegratedManually.csv')
Tate_artists_with_gender["Id"] = Tate_artists_with_gender['Id'].astype(str)
Tate_artists_with_gender = Tate_artists_with_gender[['Id', 'Name', 'placeOfBirth', 'Gender' ]]

5. Finally, we can add the dataframe with integrated information to the the dataframe already containing gender data (excluding collective and anonym artists). We will add the acquisition year from each artists by merging it to the arworks dataframe: this will allow us for further analyses.

In [35]:
Tate_gender = Tate_artists[Tate_artists['Gender'] != '0']

In [36]:
Tate_final = Tate_gender.append(Tate_artists_with_gender)
# Wikidata results are lowercase, let us capitalize them and count occurrences for each gender.
Tate_final['Gender'].replace(to_replace=['male'], value='Male', inplace= True)
Tate_final['Gender'].replace(to_replace=['female'], value='Female', inplace= True)

In [37]:
Tate_acquisitions = pd.merge(Tate_final, Tate_artworks[['Id', 'DateAcquired']], on='Id', how='left')
Tate_acquisitions.fillna(value='0', inplace= True)

In [38]:
display(Tate_acquisitions)

Unnamed: 0,Id,Name,placeOfBirth,Gender,DateAcquired
0,10093,Magdalena Abakanowicz,Poland,Female,2009
1,10093,Magdalena Abakanowicz,Poland,Female,2009
2,10093,Magdalena Abakanowicz,Poland,Female,2009
3,10093,Magdalena Abakanowicz,Poland,Female,2009
4,0,Edwin Austin Abbey,United States,Male,1924
...,...,...,...,...,...
68768,17129,Vara,0,0,0
68769,2107,Dee Villers,0,0,1975
68770,17732,Francis Vivares,0,Male,0
68771,11825,Imke Wagener,0,Female,0


# Exploration
We can now start exploring our Museums, to get to know them better through available data.

## How many artworks?

In [39]:
museums=[MoMA_artworks, Tate_artworks]
names = ['MoMA','Tate']
for museum in museums:
    selected_rows = museum[~museum['Title'].isnull()]
    name = names.pop(0)
    print("Total artworks at", name, ":", len(selected_rows.index))

Total artworks at MoMA : 140848
Total artworks at Tate : 69201


## When do artworks date back?

In [42]:
museums=[MoMA_artworks, Tate_artworks]
names = ['MoMA','Tate']
for museum in museums:
    museum["Date"] = museum["Date"].astype(int)
    museum.sort_values(by=['Date'], inplace=True)
    museumWithoutZeros = museum[museum['Date'] != 0]
    firstDate = museumWithoutZeros['Date'].iat[0]
    lastDate = museumWithoutZeros['Date'].iat[-1]
    name = names.pop(0)
    print("Most ancient artwork at", name, "dates back to",firstDate )
    print("Most recent artwork at", name, "dates back to",lastDate )    

Most ancient artwork at MoMA dates back to 1768
Most recent artwork at MoMA dates back to 2022
Most ancient artwork at Tate dates back to 1545
Most recent artwork at Tate dates back to 2012


## When were artworks acquired?

In [43]:
museums=[MoMA_artworks, Tate_artworks]
names = ['MoMA','Tate']
for museum in museums:
    museum["DateAcquired"] = museum["DateAcquired"].astype(int)
    museum.sort_values(by=['DateAcquired'], inplace=True)
    museumWithoutZeros = museum[museum['DateAcquired'] != 0]
    firstDate = museumWithoutZeros['DateAcquired'].iat[0]
    lastDate = museumWithoutZeros['DateAcquired'].iat[-1]
    name = names.pop(0)
    print("Most ancient acquisition at", name, "dates back to",firstDate )
    print("Most recent acquisition at", name, "dates back to",lastDate )    

Most ancient acquisition at MoMA dates back to 1929
Most recent acquisition at MoMA dates back to 2022
Most ancient acquisition at Tate dates back to 1823
Most recent acquisition at Tate dates back to 2013


## Which periods are represented at the Museums? In which proportion?

### MoMA

In [44]:
MoMA_artworks.to_csv('MoMA_artworks.csv') 
with open('MoMA_artworks.csv', mode='r', encoding='utf-8') as csvfile: 
    reader = csv.DictReader(csvfile) 
    centuries_MoMa = {'18th': 0, '19th': 0, '20th': 0, '21th': 0}
    for item in reader: 
        if int(item['Date']) in range (1700,1800): 
            centuries_MoMa['18th'] += 1   
        if int(item['Date']) in range (1800,1900): 
            centuries_MoMa['19th'] += 1
        if int(item['Date']) in range (1900,2000): 
            centuries_MoMa['20th'] += 1  
        if int(item['Date']) in range (2000,2011): 
            centuries_MoMa['21th'] += 1    
 
print(centuries_MoMa)

{'18th': 87, '19th': 6902, '20th': 116283, '21th': 10656}


In [45]:
tot_centuries_MoMa = centuries_MoMa['18th'] + centuries_MoMa['19th'] + centuries_MoMa['20th'] + centuries_MoMa['21th'] 
for el in centuries_MoMa:
    percentage = (centuries_MoMa[el]/tot_centuries_MoMa)*100 
    print (el, percentage,'%')

18th 0.06496027716384924 %
19th 5.153515321665372 %
20th 86.82501045337794 %
21th 7.956513947792844 %


### Tate

In [46]:
Tate_artworks.to_csv('Tate_artworks.csv') 
with open('Tate_artworks.csv', mode='r', encoding='utf-8') as csvfile: 
    reader = csv.DictReader(csvfile) 
    centuries_Tate = {'16th':0, '17th':0, '18th': 0, '19th': 0, '20th': 0, '21th': 0}
    for item in reader: 
        if int(item['Date']) in range (1500,1600): 
            centuries_Tate['16th'] += 1 
        if int(item['Date']) in range (1600,1700): 
            centuries_Tate['17th'] += 1 
        if int(item['Date']) in range (1700,1800): 
            centuries_Tate['18th'] += 1   
        if int(item['Date']) in range (1800,1900): 
            centuries_Tate['19th'] += 1
        if int(item['Date']) in range (1900,2000): 
            centuries_Tate['20th'] += 1  
        if int(item['Date']) in range (2000,2011): 
            centuries_Tate['21th'] += 1    
 
print(centuries_Tate)

{'16th': 15, '17th': 87, '18th': 4364, '19th': 39509, '20th': 18111, '21th': 1643}


In [47]:
tot_centuries_Tate = centuries_Tate['16th']+ centuries_Tate['17th'] + centuries_Tate['18th'] + centuries_Tate['19th'] + centuries_Tate['20th'] + centuries_Tate['21th'] 
for el in centuries_Tate:
    percentage = (centuries_Tate[el]/tot_centuries_Tate)*100 
    print (el, percentage,'%')

16th 0.02353716518382526 %
17th 0.1365155580661865 %
18th 6.847745924147562 %
19th 61.995323949850146 %
20th 28.418773242950618 %
21th 2.57810415980166 %


## Artists

For examining artist-related issues, we rely on the two CSV files from the Museums containing information about them, which we already transformed into dataframes (Tate_artists and MoMA_artists). <br> 
In doing so, we avoid duplicates (the same artists may have more than one artwork in the same museum).

### How many artists?

In [48]:
print('Total number of artists at MoMA', len(MoMA_artists))

Total number of artists at MoMA 15243


In [49]:
print('Total number of artists at Tate', len(Tate_artists))

Total number of artists at Tate 3532


### Artists' gender: which is the most represented overall?

### Tate

In [50]:
Tate_acquisitions_dropped = Tate_acquisitions.drop_duplicates(subset='Name', keep="first")
Tate_acquisitions_dropped['Gender'].value_counts()

Male      2943
Female     534
0           10
Name: Gender, dtype: int64

In [51]:
genderCount = {'Male': 2954, 'Female': 534}
tot = len(Tate_acquisitions_dropped)
print ("Male artists' percentage is:", round((genderCount['Male']/tot)*100),'%;', "Female artists' percentage is:", round((genderCount['Female']/tot)*100),'%')

Male artists' percentage is: 85 %; Female artists' percentage is: 15 %


### MoMA

As for Tate, let us analyse the frequency of male and female artists in the collection.

In [52]:
MoMA_artists['Gender'].replace(to_replace=['male'], value='Male', inplace= True)
MoMA_artists['Gender'].replace(to_replace=['female'], value='Female', inplace= True)

In [53]:
MoMA_artists['Gender'].value_counts()

Male          9732
0             3165
Female        2343
Non-Binary       2
Non-binary       1
Name: Gender, dtype: int64

In [54]:
genderCount = {'Male': 9732, 'Female': 2343}
tot = len(MoMA_artists)
print ("Male artists' percentage is:", round((genderCount['Male']/tot)*100),'%;', "Female artists' percentage is:", round((genderCount['Female']/tot)*100),'%')

Male artists' percentage is: 64 %; Female artists' percentage is: 15 %


## Focus: most represented women

In [55]:
Tate_Women= Tate_acquisitions[Tate_acquisitions['Gender']== 'Female']
Tate_Women['Name'].value_counts().to_csv('donneTate.csv')

In [58]:
MomA_Women= MoMA_acquisitions[MoMA_acquisitions['Gender']== 'Female']
MomA_Women['Name'].value_counts().to_csv('donneMoMA.csv')

### Nationalities: which are the most represented nationalities overall?
As for genres, we examine the distribution of artists' nationalities in out datasets.

### MoMa

In [59]:
MoMA_artists['Nationality'].value_counts()

American     5181
0            2472
German        965
British       860
French        847
             ... 
Coptic          1
Burkinabé       1
Kuwaiti         1
Cypriot         1
Ivorian         1
Name: Nationality, Length: 120, dtype: int64

### Tate

In [60]:
Tate_artists['placeOfBirth'].value_counts()

United Kingdom                        1522
0                                      492
United States                          341
France                                 160
Germany                                142
                                      ... 
Tunis                                    1
Choson Minjujuui In'min Konghwaguk       1
Samoa                                    1
As-Sudan                                 1
Eesti                                    1
Name: placeOfBirth, Length: 99, dtype: int64

# Acquisition criteria.

## 1.  In which years are artists' works mostly acquired?

### Year by year

In [61]:
MoMA_artworks['DateAcquired'].value_counts()

1964    12828
2008     7204
1968     6894
0        6682
2001     4170
        ...  
1933       93
1932       18
1929        9
1930        7
1931        3
Name: DateAcquired, Length: 95, dtype: int64

In [62]:
Tate_artworks['DateAcquired'].astype(str).astype(int)
Tate_artworks['DateAcquired'].value_counts()

1856    37893
1997     3706
1975     3046
2009     1364
1979     1166
        ...  
1873        1
1843        1
1863        1
1842        1
1855        1
Name: DateAcquired, Length: 179, dtype: int64

### Every ten years

Let us focus on the 20th century and analyse acquisitions from a larger perspective: not year by year, but for every ten years.

### MoMA

In [63]:
MoMA_artworks.to_csv('MoMA_artworks.csv') 
with open('MoMA_artworks.csv', mode='r', encoding='utf-8') as csvfile: 
    reader = csv.DictReader(csvfile) 
    years= defaultdict(dict) 
    for item in reader: 
        if int(item['DateAcquired']) in range(1928,1940): 
            if '1930s' not in years: 
                years['1930s'] = 1
            else: 
                years['1930s'] += 1
            
        if int(item['DateAcquired']) in range(1940,1950): 
            if '1940s' not  in years: 
                years['1940s'] = 1
            else: 
                years['1940s'] += 1
     
        if int(item['DateAcquired']) in range(1950,1960): 
            if '1950s' not in years: 
                    years['1950s'] = 1
            else: 
                    years['1950s'] += 1

        if int(item['DateAcquired']) in range(1960,1970): 
            if '1960s' not in years: 
                    years['1960s'] = 1
            else: 
                    years['1960s'] += 1
     
        if int(item['DateAcquired']) in range(1970,1980): 
            if '1970s' not in years: 
                years['1970s'] = 1
            else: 
                years['1970s'] += 1 
        
        if int(item['DateAcquired']) in range(1980,1990): 
            if '1980s' not in years: 
                years['1980s'] = 1
            else: 
                years['1980s'] += 1 
    
        if int(item['DateAcquired']) in range(1990,2000): 
            if '1990s' not in years: 
                years['1990s'] = 1
            else: 
                years['1990s'] += 1

        if int(item['DateAcquired']) in range(2000,2011): 
            if '2000s' not in years: 
                   years['2000s'] = 1
            else: 
                years['2000s'] += 1

         
     
print('MoMA:', years)

MoMA: defaultdict(<class 'dict'>, {'1930s': 1794, '1940s': 7821, '1950s': 6396, '1960s': 30716, '1970s': 13196, '1980s': 10229, '1990s': 11446, '2000s': 26861})


In [72]:
Tate_artworks.to_csv('Tate_artworks.csv') 
with open('Tate_artworks.csv', mode='r', encoding='utf-8') as csvfile: 
    reader = csv.DictReader(csvfile) 
    years= defaultdict(dict) 
    for item in reader: 
        if int(item['DateAcquired']) in range(1900,1910): 
            if '1900s' not in years: 
                years['1900s'] = 1
            else: 
                years['1900s'] += 1
    
        if int(item['DateAcquired']) in range(1910,1920): 
            if '1910s' not in years: 
                years['1910s'] = 1
            else: 
                years['1910s'] += 1
    
        if int(item['DateAcquired']) in range(1920,1930): 
            if '1920s' not in years: 
                years['1920s'] = 1
            else: 
                years['1920s'] += 1
         
        if int(item['DateAcquired']) in range(1930,1940): 
            if '1930s' not in years: 
                years['1930s'] = 1
            else: 
                years['1930s'] += 1
        
            
        if int(item['DateAcquired']) in range(1940,1950): 
            if '1940s' not  in years: 
                years['1940s'] = 1
            else: 
                years['1940s'] += 1
     
        if int(item['DateAcquired']) in range(1950,1960): 
            if '1950s' not in years: 
                    years['1950s'] = 1
            else: 
                    years['1950s'] += 1

        if int(item['DateAcquired']) in range(1960,1970): 
            if '1960s' not in years: 
                    years['1960s'] = 1
            else: 
                    years['1960s'] += 1
     
        if int(item['DateAcquired']) in range(1970,1980): 
            if '1970s' not in years: 
                years['1970s'] = 1
            else: 
                years['1970s'] += 1 
        
        if int(item['DateAcquired']) in range(1980,1990): 
            if '1980s' not in years: 
                years['1980s'] = 1
            else: 
                years['1980s'] += 1 
    
        if int(item['DateAcquired']) in range(1990,2000): 
            if '1990s' not in years: 
                years['1990s'] = 1
            else: 
                years['1990s'] += 1

        if int(item['DateAcquired']) in range(2000,2011): 
            if '2000s' not in years: 
                   years['2000s'] = 1
            else: 
                years['2000s'] += 1

         
     
print('Tate:', years)

Tate: defaultdict(<class 'dict'>, {'1900s': 414, '1910s': 617, '1920s': 1055, '1930s': 494, '1940s': 736, '1950s': 552, '1960s': 874, '1970s': 6357, '1980s': 5284, '1990s': 6449, '2000s': 5345})


## Gender Gap 
We have already analysed the total number of male and female artists. Let us now anlysed the gender of acquired artists in time. In this way, we will try to investigate the gender gap in our museum's collections and try to understand how it changes (if it does) throughout the years. 

### MoMa

Number of female and male artists acquired every ten years.

In [65]:
MoMA_acquisitions_dropped = MoMA_acquisitions.drop_duplicates(subset='Name', keep="first")
MoMA_acquisitions_dropped.to_csv('MoMA_acquisitions_dropped.csv')
with open('MoMA_acquisitions_dropped.csv', mode='r', encoding='utf-8') as csvfile: 
    reader = csv.DictReader(csvfile) 
    gender={'1930s': {'Male': 0, 'Female': 0}, '1940s': {'Male': 0, 'Female': 0}, '1950s': {'Male': 0, 'Female': 0}, '1960s': {'Male': 0, 'Female': 0}, '1970s': {'Male': 0, 'Female': 0}, '1980s': {'Male': 0, 'Female': 0}, '1990s': {'Male': 0, 'Female': 0}, '2000s': {'Male': 0, 'Female': 0}} 
    for item in reader: 
        if int(item['DateAcquired']) in range (1928,1940): 
            if (item['Gender'] == 'Female'):  
                gender['1930s']['Female'] += 1 
            else: 
                gender['1930s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1940,1950): 
            if (item['Gender'] == 'Female'):  
                gender['1940s']['Female'] += 1 
            else: 
                gender['1940s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1950,1960): 
            if (item['Gender'] == 'Female'):  
                gender['1950s']['Female'] += 1 
            else: 
                gender['1950s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1960,1970): 
            if (item['Gender'] == 'Female'):  
                gender['1960s']['Female'] += 1 
            else: 
                gender['1960s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1970,1980): 
            if (item['Gender'] == 'Female'):  
                gender['1970s']['Female'] += 1 
            else: 
                gender['1970s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1980,1990): 
            if (item['Gender'] == 'Female'):  
                gender['1980s']['Female'] += 1 
            else: 
                gender['1980s']['Male'] += 1 
        if int(item['DateAcquired']) in range (1990,2000): 
            if (item['Gender'] == 'Female'):  
                gender['1990s']['Female'] += 1 
            else: 
                gender['1990s']['Male'] += 1 
        if int(item['DateAcquired']) in range (2000,2011): 
            if (item['Gender'] == 'Female'):  
                gender['2000s']['Female'] += 1 
            else: 
                gender['2000s']['Male'] += 1 
    print(gender)

{'1930s': {'Male': 184, 'Female': 16}, '1940s': {'Male': 697, 'Female': 98}, '1950s': {'Male': 858, 'Female': 93}, '1960s': {'Male': 1463, 'Female': 145}, '1970s': {'Male': 889, 'Female': 158}, '1980s': {'Male': 1142, 'Female': 211}, '1990s': {'Male': 926, 'Female': 280}, '2000s': {'Male': 1771, 'Female': 503}}


Percentage of female-male artists every ten years

In [66]:
for el in gender: 
    tot = gender[el]['Male'] + gender[el]['Female'] 
    percentage = (gender[el]['Male']/tot)*100 
    print (el, 'Male', round(percentage),'%', 'Female', round(100-percentage),'%')

1930s Male 92 % Female 8 %
1940s Male 88 % Female 12 %
1950s Male 90 % Female 10 %
1960s Male 91 % Female 9 %
1970s Male 85 % Female 15 %
1980s Male 84 % Female 16 %
1990s Male 77 % Female 23 %
2000s Male 78 % Female 22 %


## Tate

Number of female and male artists acquired every ten years.

In [73]:
Tate_acquisitions_dropped.to_csv('Tate_acquisitions_dropped.csv')
with open('Tate_acquisitions_dropped.csv', mode='r', encoding='utf-8') as csvfile: 
    reader = csv.DictReader(csvfile) 
    gender={'1900s':{'Male': 0, 'Female': 0}, '1910s': {'Male': 0, 'Female': 0}, '1920s':{'Male': 0, 'Female': 0}, '1930s': {'Male': 0, 'Female': 0}, '1940s': {'Male': 0, 'Female': 0}, '1950s': {'Male': 0, 'Female': 0}, '1960s': {'Male': 0, 'Female': 0}, '1970s': {'Male': 0, 'Female': 0}, '1980s': {'Male': 0, 'Female': 0}, '1990s': {'Male': 0, 'Female': 0}, '2000s': {'Male': 0, 'Female': 0}} 
    for item in reader: 
        if int(item['DateAcquired']) in range(1900,1910): 
            if (item['Gender'] == 'Female'):  
                gender['1900s']['Female'] += 1 
            else: 
                gender['1900s']['Male'] += 1 
    
        if int(item['DateAcquired']) in range(1910,1920): 
            if (item['Gender'] == 'Female'):  
                gender['1910s']['Female'] += 1 
            else: 
                gender['1910s']['Male'] += 1 
        
        if int(item['DateAcquired']) in range (1920,1930): 
            if (item['Gender'] == 'Female'):  
                gender['1920s']['Female'] += 1 
            else: 
                gender['1920s']['Male'] += 1 
                
        if int(item['DateAcquired']) in range (1930,1940): 
            if (item['Gender'] == 'Female'):  
                gender['1930s']['Female'] += 1 
            else: 
                gender['1930s']['Male'] += 1       
                
        if int(item['DateAcquired']) in range (1940,1950): 
            if (item['Gender'] == 'Female'):  
                gender['1940s']['Female'] += 1 
            else: 
                gender['1940s']['Male'] += 1 
                
        if int(item['DateAcquired']) in range (1950,1960): 
            if (item['Gender'] == 'Female'):  
                gender['1950s']['Female'] += 1 
            else: 
                gender['1950s']['Male'] += 1 
                
        if int(item['DateAcquired']) in range (1960,1970): 
            if (item['Gender'] == 'Female'):  
                gender['1960s']['Female'] += 1 
            else: 
                gender['1960s']['Male'] += 1 
                
        if int(item['DateAcquired']) in range (1970,1980): 
            if (item['Gender'] == 'Female'):  
                gender['1970s']['Female'] += 1 
            else: 
                gender['1970s']['Male'] += 1 
                
        if int(item['DateAcquired']) in range (1980,1990): 
            if (item['Gender'] == 'Female'):  
                gender['1980s']['Female'] += 1 
            else: 
                gender['1980s']['Male'] += 1 
                
        if int(item['DateAcquired']) in range (1990,2000): 
            if (item['Gender'] == 'Female'):  
                gender['1990s']['Female'] += 1 
            else: 
                gender['1990s']['Male'] += 1 
                
        if int(item['DateAcquired']) in range (2000,2011): 
            if (item['Gender'] == 'Female'):  
                gender['2000s']['Female'] += 1 
            else: 
                gender['2000s']['Male'] += 1 
                
    print(gender)

{'1900s': {'Male': 85, 'Female': 1}, '1910s': {'Male': 102, 'Female': 7}, '1920s': {'Male': 171, 'Female': 19}, '1930s': {'Male': 127, 'Female': 27}, '1940s': {'Male': 100, 'Female': 13}, '1950s': {'Male': 136, 'Female': 12}, '1960s': {'Male': 214, 'Female': 17}, '1970s': {'Male': 481, 'Female': 67}, '1980s': {'Male': 285, 'Female': 44}, '1990s': {'Male': 318, 'Female': 72}, '2000s': {'Male': 402, 'Female': 145}}


Percentage of female-male artists acquired every ten years

In [74]:
for el in gender: 
    tot = gender[el]['Male'] + gender[el]['Female'] 
    percentage = (gender[el]['Male']/tot)*100 
    print (el, 'Male', round(percentage),'%', 'Female', round(100-percentage),'%')

1900s Male 99 % Female 1 %
1910s Male 94 % Female 6 %
1920s Male 90 % Female 10 %
1930s Male 82 % Female 18 %
1940s Male 88 % Female 12 %
1950s Male 92 % Female 8 %
1960s Male 93 % Female 7 %
1970s Male 88 % Female 12 %
1980s Male 87 % Female 13 %
1990s Male 82 % Female 18 %
2000s Male 73 % Female 27 %


## Nationalities 

Does nationality affect the artists' selection? Does acquisition campaigns show different tendencies and patterns throughout the years, when it comes to the artists' nationality?

### MoMA
For every ten years, we count the nationalities' frequency.

In [75]:
from collections import defaultdict  
 
with open('MoMA_acquisitions_dropped.csv', mode='r', encoding='utf-8') as csvfile: 
    reader = csv.DictReader(csvfile) 
    nationalities = defaultdict(dict) 
    for item in reader: 
        if int(item['DateAcquired']) in range (1928,1941): 
            if item['Nationality'] not in nationalities['1930s']: 
                nationalities['1930s'][item['Nationality']] = 1 
            else: 
                nationalities['1930s'][item['Nationality']] += 1 
        if int(item['DateAcquired']) in range (1940,1951): 
            if item['Nationality'] not in nationalities['1940s']: 
                nationalities['1940s'][item['Nationality']] = 1 
            else: 
                nationalities['1940s'][item['Nationality']] += 1 
        if int(item['DateAcquired']) in range (1950,1961): 
            if item['Nationality'] not in nationalities['1950s']: 
                nationalities['1950s'][item['Nationality']] = 1 
            else: 
                nationalities['1950s'][item['Nationality']] += 1 
        if int(item['DateAcquired']) in range (1960,1971): 
            if item['Nationality'] not in nationalities['1960s']: 
                nationalities['1960s'][item['Nationality']] = 1 
            else: 
                nationalities['1960s'][item['Nationality']] += 1 
        if int(item['DateAcquired']) in range (1970,1981): 
            if item['Nationality'] not in nationalities['1970s']: 
                nationalities['1970s'][item['Nationality']] = 1 
            else: 
                nationalities['1970s'][item['Nationality']] += 1 
        if int(item['DateAcquired']) in range (1980,1991): 
            if item['Nationality'] not in nationalities['1980s']: 
                nationalities['1980s'][item['Nationality']] = 1 
            else: 
                nationalities['1980s'][item['Nationality']] += 1 
        if int(item['DateAcquired']) in range (1990,2001): 
            if item['Nationality'] not in nationalities['1990s']: 
                nationalities['1990s'][item['Nationality']] = 1 
            else: 
                nationalities['1990s'][item['Nationality']] += 1 
        if int(item['DateAcquired']) in range (2000,2011): 
            if item['Nationality'] not in nationalities['2000s']: 
                nationalities['2000s'][item['Nationality']] = 1 
            else: 
                nationalities['2000s'][item['Nationality']] += 1 
         
                 
 
print(nationalities)

defaultdict(<class 'dict'>, {'1980s': {'American': 715, 'Danish': 14, 'Estonian': 2, 'Swedish': 11, 'Finnish': 8, 'British': 100, 'Romanian': 2, 'French': 100, 'Belgian': 12, 'Dutch': 34, 'Norwegian': 4, 'Czech': 14, 'Austrian': 24, 'Italian': 37, 'Japanese': 72, 'German': 112, 'Russian': 18, 'Swiss': 54, 'Canadian': 33, 'Congolese': 1, 'Brazilian': 2, 'Hungarian': 9, 'Polish': 17, 'Icelandic': 1, 'Australian': 6, 'Croatian': 3, 'Spanish': 13, 'Slovak': 1, 'Cuban': 4, 'Mexican': 10, 'Greek': 1, 'Chinese': 1, 'Native American': 2, 'Chilean': 3, 'Nationality unknown': 5, 'Israeli': 7, 'Argentine': 1, 'Colombian': 2, 'Luxembourger': 1, 'Venezuelan': 1, 'Portuguese': 1, 'Peruvian': 3, 'Indian': 1, 'Haitian': 1, 'Moroccan': 1, '0': 14, 'Latvian': 2, 'Georgian': 1, 'Irish': 1, 'Ukrainian': 1, 'Puerto Rican': 1}, '1960s': {'Spanish': 21, '0': 15, 'American': 645, 'Italian': 95, 'French': 174, 'Japanese': 68, 'British': 71, 'Guatemalan': 2, 'Finnish': 4, 'Argentine': 37, 'Kuwaiti': 1, 'German'

In [76]:
from collections import defaultdict 

with open('Tate_acquisitions_dropped.csv', mode='r', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)
    nationalities = defaultdict(dict)
    for item in reader:
        if int(item['DateAcquired']) in range (1900,1910):
            if item['placeOfBirth'] not in nationalities['1900s']:
                nationalities['1900s'][item['placeOfBirth']] = 1
            else:
                nationalities['1900s'][item['placeOfBirth']] += 1
        if int(item['DateAcquired']) in range (1910,1920):
            if item['placeOfBirth'] not in nationalities['1910s']:
                nationalities['1910s'][item['placeOfBirth']] = 1
            else:
                nationalities['1910s'][item['placeOfBirth']] += 1
        if int(item['DateAcquired']) in range (1920,1930):
            if item['placeOfBirth'] not in nationalities['1920s']:
                nationalities['1920s'][item['placeOfBirth']] = 1
            else:
                nationalities['1920s'][item['placeOfBirth']] += 1
        if int(item['DateAcquired']) in range (1930,1940):
            if item['placeOfBirth'] not in nationalities['1930s']:
                nationalities['1930s'][item['placeOfBirth']] = 1
            else:
                nationalities['1930s'][item['placeOfBirth']] += 1
        if int(item['DateAcquired']) in range (1940,1951):
            if item['placeOfBirth'] not in nationalities['1940s']:
                nationalities['1940s'][item['placeOfBirth']] = 1
            else:
                nationalities['1940s'][item['placeOfBirth']] += 1
        if int(item['DateAcquired']) in range (1950,1961):
            if item['placeOfBirth'] not in nationalities['1950s']:
                nationalities['1950s'][item['placeOfBirth']] = 1
            else:
                nationalities['1950s'][item['placeOfBirth']] += 1
        if int(item['DateAcquired']) in range (1960,1971):
            if item['placeOfBirth'] not in nationalities['1960s']:
                nationalities['1960s'][item['placeOfBirth']] = 1
            else:
                nationalities['1960s'][item['placeOfBirth']] += 1
        if int(item['DateAcquired']) in range (1970,1981):
            if item['placeOfBirth'] not in nationalities['1970s']:
                nationalities['1970s'][item['placeOfBirth']] = 1
            else:
                nationalities['1970s'][item['placeOfBirth']] += 1
        if int(item['DateAcquired']) in range (1980,1991):
            if item['placeOfBirth'] not in nationalities['1980s']:
                nationalities['1980s'][item['placeOfBirth']] = 1
            else:
                nationalities['1980s'][item['placeOfBirth']] += 1
        if int(item['DateAcquired']) in range (1990,2001):
            if item['placeOfBirth'] not in nationalities['1990s']:
                nationalities['1990s'][item['placeOfBirth']] = 1
            else:
                nationalities['1990s'][item['placeOfBirth']] += 1
        if int(item['DateAcquired']) in range (2000,2011): 
            if item['placeOfBirth'] not in nationalities['2000s']: 
                nationalities['2000s'][item['placeOfBirth']] = 1 
            else: 
                nationalities['2000s'][item['placeOfBirth']] += 1 
        
                

print(nationalities)

defaultdict(<class 'dict'>, {'2000s': {'Poland': 9, 'United States': 99, 'Germany': 39, 'Finland': 1, 'China': 10, 'Iraq': 1, 'Russia': 2, 'United Kingdom': 150, 'Belgium': 5, 'México': 7, 'Perú': 3, 'Ukraine': 3, 'Îran': 5, 'Italia': 12, 'Venezuela': 4, 'Turkey': 2, 'France': 14, 'Israel': 5, 'Brasil': 17, 'Jugoslavija': 2, 'Uganda': 1, 'Norge': 1, 'Nederland': 4, 'South Africa': 6, '0': 26, 'Romania': 2, 'Argentina': 8, 'Cuba': 4, 'Canada': 9, 'Ireland': 4, 'Greece': 2, 'Colombia': 5, 'Latvija': 1, 'Sweden': 1, 'Chile': 2, 'Czech Republica': 4, 'Danmark': 5, 'Spain': 5, 'Austria': 4, 'Switzerland': 6, 'Pakistan': 1, 'Mehoz': 2, 'India': 3, 'Japan': 10, 'Bahamas': 1, 'Hungery': 3, 'Bangladesh': 1, 'Hrvatska': 2, 'Slovenia': 2, "Taehan Min'guk": 1, 'Zimbabwe': 1, 'Sri Lanka': 1, 'New Zealand': 4, 'Luxembourg': 1, 'Ísland': 1, 'Pilipinas': 1, 'Lietuva': 2, 'Australia': 2, 'Al-Lubnan': 3, 'Kenya': 1, "Al-Jaza'ir": 1, 'Lao': 1, 'Malta': 1, 'Panamá': 1, 'Misr': 1, 'Portugal': 3, 'Shqipëria

## Centuries

### Moma

In [77]:
with open('MoMA_artworks.csv', mode='r', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)
    MoMA_centuries_acquired = defaultdict(dict)
    for item in reader:
        if int(item['DateAcquired']) in range (1900,1910):
            if int(item['Date']) in range(1500,1600):
                if '16th' not in MoMA_centuries_acquired['1910s']:
                    MoMA_centuries_acquired['1910s']['16th'] = 1
                else:
                    MoMA_centuries_acquired['1910s']['16th'] += 1
            if int(item['Date']) in range(1600,1700):
                if '17th' not in MoMA_centuries_acquired['1910s']:
                    MoMA_centuries_acquired['1910s']['17th'] = 1
                else:
                    MoMA_centuries_acquired['1910s']['17th'] += 1
            if int(item['Date']) in range(1700,1800):
                if '18th' not in MoMA_centuries_acquired['1910s']:
                    MoMA_centuries_acquired['1910s']['18th'] = 1
                else:
                    MoMA_centuries_acquired['1910s']['18th'] += 1
            if int(item['Date']) in range(1800,1900):
                if '19th' not in MoMA_centuries_acquired['1910s']:
                    MoMA_centuries_acquired['1910s']['19th'] = 1
                else:
                    MoMA_centuries_acquired['1910s']['19th'] += 1
            if int(item['Date']) in range(1900,1911):
                if '20th' not in MoMA_centuries_acquired['1910s']:
                    MoMA_centuries_acquired['1910s']['20th'] = 1
                else:
                    MoMA_centuries_acquired['1910s']['20th'] += 1
            
            
        
        if int(item['DateAcquired']) in range (1910,1920):
            if int(item['Date']) in range(1500,1600):
                if '16th' not in MoMA_centuries_acquired['1920s']:
                    MoMA_centuries_acquired['1920s']['16th'] = 1
                else:
                    MoMA_centuries_acquired['1920s']['16th'] += 1
            if int(item['Date']) in range(1600,1700):
                if '17th' not in MoMA_centuries_acquired['1920s']:
                    MoMA_centuries_acquired['1920s']['17th'] = 1
                else:
                    MoMA_centuries_acquired['1920s']['17th'] += 1
            if int(item['Date']) in range(1700,1800):
                if '18th' not in MoMA_centuries_acquired['1920s']:
                    MoMA_centuries_acquired['1920s']['18th'] = 1
                else:
                    MoMA_centuries_acquired['1920s']['18th'] += 1
            if int(item['Date']) in range(1800,1900):
                if '19th' not in MoMA_centuries_acquired['1920s']:
                    MoMA_centuries_acquired['1920s']['19th'] = 1
                else:
                    MoMA_centuries_acquired['1920s']['19th'] += 1 
            if int(item['Date']) in range(1900,2000):
                if '20th' not in MoMA_centuries_acquired['1920s']:
                    MoMA_centuries_acquired['1920s']['20th'] = 1
                else:
                    MoMA_centuries_acquired['1920s']['20th'] += 1
        
        
        if int(item['DateAcquired']) in range (1920,1930):
            if int(item['Date']) in range(1500,1600):
                if '16th' not in MoMA_centuries_acquired['1930s']:
                    MoMA_centuries_acquired['1930s']['16th'] = 1
                else:
                    MoMA_centuries_acquired['1930s']['16th'] += 1
            if int(item['Date']) in range(1600,1700):
                if '17th' not in MoMA_centuries_acquired['1930s']:
                    MoMA_centuries_acquired['1930s']['17th'] = 1
                else:
                    MoMA_centuries_acquired['1930s']['17th'] += 1
            if int(item['Date']) in range(1700,1800):
                if '18th' not in MoMA_centuries_acquired['1930s']:
                    MoMA_centuries_acquired['1930s']['18th'] = 1
                else:
                    MoMA_centuries_acquired['1930s']['18th'] += 1
            if int(item['Date']) in range(1800,1900):
                if '19th' not in MoMA_centuries_acquired['1930s']:
                    MoMA_centuries_acquired['1930s']['19th'] = 1
                else:
                    MoMA_centuries_acquired['1930s']['19th'] += 1 
            if int(item['Date']) in range(1900,2000):
                if '20th' not in MoMA_centuries_acquired['1930s']:
                    MoMA_centuries_acquired['1930s']['20th'] = 1
                else:
                    MoMA_centuries_acquired['1930s']['20th'] += 1
   
   
   
        
        if int(item['DateAcquired']) in range (1940,1950):
            if int(item['Date']) in range(1500,1600):
                if '16th' not in MoMA_centuries_acquired['1940s']:
                    MoMA_centuries_acquired['1940s']['16th'] = 1
                else:
                    MoMA_centuries_acquired['1940s']['16th'] += 1
            if int(item['Date']) in range(1600,1700):
                if '17th' not in MoMA_centuries_acquired['1940s']:
                    MoMA_centuries_acquired['1940s']['17th'] = 1
                else:
                    MoMA_centuries_acquired['1940s']['17th'] += 1
            if int(item['Date']) in range(1700,1800):
                if '18th' not in MoMA_centuries_acquired['1940s']:
                    MoMA_centuries_acquired['1940s']['18th'] = 1
                else:
                    MoMA_centuries_acquired['1940s']['18th'] += 1
            if int(item['Date']) in range(1800,1900):
                if '19th' not in MoMA_centuries_acquired['1940s']:
                    MoMA_centuries_acquired['1940s']['19th'] = 1
                else:
                    MoMA_centuries_acquired['1940s']['19th'] += 1 
            if int(item['Date']) in range(1900,2000):
                if '20th' not in MoMA_centuries_acquired['1940s']:
                    MoMA_centuries_acquired['1940s']['20th'] = 1
                else:
                    MoMA_centuries_acquired['1940s']['20th'] += 1

        
        if int(item['DateAcquired']) in range (1950,1960):
            if int(item['Date']) in range(1500,1600):
                if '16th' not in MoMA_centuries_acquired['1950s']:
                    MoMA_centuries_acquired['1950s']['16th'] = 1
                else:
                    MoMA_centuries_acquired['1950s']['16th'] += 1
            if int(item['Date']) in range(1600,1700):
                if '17th' not in MoMA_centuries_acquired['1950s']:
                    MoMA_centuries_acquired['1950s']['17th'] = 1
                else:
                    MoMA_centuries_acquired['1950s']['17th'] += 1
            if int(item['Date']) in range(1700,1800):
                if '18th' not in MoMA_centuries_acquired['1950s']:
                    MoMA_centuries_acquired['1950s']['18th'] = 1
                else:
                    MoMA_centuries_acquired['1950s']['18th'] += 1
            if int(item['Date']) in range(1800,1900):
                if '19th' not in MoMA_centuries_acquired['1950s']:
                    MoMA_centuries_acquired['1950s']['19th'] = 1
                else:
                    MoMA_centuries_acquired['1950s']['19th'] += 1 
            if int(item['Date']) in range(1900,2000):
                if '20th' not in MoMA_centuries_acquired['1950s']:
                    MoMA_centuries_acquired['1950s']['20th'] = 1
                else:
                    MoMA_centuries_acquired['1950s']['20th'] += 1

                    
        if int(item['DateAcquired']) in range (1960,1970):
            if int(item['Date']) in range(1500,1600):
                if '16th' not in MoMA_centuries_acquired['1960s']:
                    MoMA_centuries_acquired['1960s']['16th'] = 1
                else:
                    MoMA_centuries_acquired['1960s']['16th'] += 1
            if int(item['Date']) in range(1600,1700):
                if '17th' not in MoMA_centuries_acquired['1960s']:
                    MoMA_centuries_acquired['1960s']['17th'] = 1
                else:
                    MoMA_centuries_acquired['1960s']['17th'] += 1
            if int(item['Date']) in range(1700,1800):
                if '18th' not in MoMA_centuries_acquired['1960s']:
                    MoMA_centuries_acquired['1960s']['18th'] = 1
                else:
                    MoMA_centuries_acquired['1960s']['18th'] += 1
            if int(item['Date']) in range(1800,1900):
                if '19th' not in MoMA_centuries_acquired['1960s']:
                    MoMA_centuries_acquired['1960s']['19th'] = 1
                else:
                    MoMA_centuries_acquired['1960s']['19th'] += 1 
            if int(item['Date']) in range(1900,2000):
                if '20th' not in MoMA_centuries_acquired['1960s']:
                    MoMA_centuries_acquired['1960s']['20th'] = 1
                else:
                    MoMA_centuries_acquired['1960s']['20th'] += 1

        
        if int(item['DateAcquired']) in range (1970,1980):
            if int(item['Date']) in range(1500,1600):
                if '16th' not in MoMA_centuries_acquired['1970s']:
                    MoMA_centuries_acquired['1970s']['16th'] = 1
                else:
                    MoMA_centuries_acquired['1970s']['16th'] += 1
            if int(item['Date']) in range(1600,1700):
                if '17th' not in MoMA_centuries_acquired['1970s']:
                    MoMA_centuries_acquired['1970s']['17th'] = 1
                else:
                    MoMA_centuries_acquired['1970s']['17th'] += 1
            if int(item['Date']) in range(1700,1800):
                if '18th' not in MoMA_centuries_acquired['1970s']:
                    MoMA_centuries_acquired['1970s']['18th'] = 1
                else:
                    MoMA_centuries_acquired['1970s']['18th'] += 1
            if int(item['Date']) in range(1800,1900):
                if '19th' not in MoMA_centuries_acquired['1970s']:
                    MoMA_centuries_acquired['1970s']['19th'] = 1
                else:
                    MoMA_centuries_acquired['1970s']['19th'] += 1 
            if int(item['Date']) in range(1900,2000):
                if '20th' not in MoMA_centuries_acquired['1970s']:
                    MoMA_centuries_acquired['1970s']['20th'] = 1
                else:
                    MoMA_centuries_acquired['1970s']['20th'] += 1

        
        if int(item['DateAcquired']) in range (1980,1990):
            if int(item['Date']) in range(1500,1600):
                if '16th' not in MoMA_centuries_acquired['1980s']:
                    MoMA_centuries_acquired['1980s']['16th'] = 1
                else:
                    MoMA_centuries_acquired['1980s']['16th'] += 1
            if int(item['Date']) in range(1600,1700):
                if '17th' not in MoMA_centuries_acquired['1980s']:
                    MoMA_centuries_acquired['1980s']['17th'] = 1
                else:
                    MoMA_centuries_acquired['1980s']['17th'] += 1
            if int(item['Date']) in range(1700,1800):
                if '18th' not in MoMA_centuries_acquired['1980s']:
                    MoMA_centuries_acquired['1980s']['18th'] = 1
                else:
                    MoMA_centuries_acquired['1980s']['18th'] += 1
            if int(item['Date']) in range(1800,1900):
                if '19th' not in MoMA_centuries_acquired['1980s']:
                    MoMA_centuries_acquired['1980s']['19th'] = 1
                else:
                    MoMA_centuries_acquired['1980s']['19th'] += 1 
            if int(item['Date']) in range(1900,2000):
                if '20th' not in MoMA_centuries_acquired['1980s']:
                    MoMA_centuries_acquired['1980s']['20th'] = 1
                else:
                    MoMA_centuries_acquired['1980s']['20th'] += 1

        
        if int(item['DateAcquired']) in range (1990,2000):
            if int(item['Date']) in range(1500,1600):
                if '16th' not in MoMA_centuries_acquired['1990s']:
                    MoMA_centuries_acquired['1990s']['16th'] = 1
                else:
                    MoMA_centuries_acquired['1990s']['16th'] += 1
            if int(item['Date']) in range(1600,1700):
                if '17th' not in MoMA_centuries_acquired['1990s']:
                    MoMA_centuries_acquired['1990s']['17th'] = 1
                else:
                    MoMA_centuries_acquired['1990s']['17th'] += 1
            if int(item['Date']) in range(1700,1800):
                if '18th' not in MoMA_centuries_acquired['1990s']:
                    MoMA_centuries_acquired['1990s']['18th'] = 1
                else:
                    MoMA_centuries_acquired['1990s']['18th'] += 1
            if int(item['Date']) in range(1800,1900):
                if '19th' not in MoMA_centuries_acquired['1990s']:
                    MoMA_centuries_acquired['1990s']['19th'] = 1
                else:
                    MoMA_centuries_acquired['1990s']['19th'] += 1 
            if int(item['Date']) in range(1900,2000):
                if '20th' not in MoMA_centuries_acquired['1990s']:
                    MoMA_centuries_acquired['1990s']['20th'] = 1
                else:
                    MoMA_centuries_acquired['1990s']['20th'] += 1

        
        
        if int(item['DateAcquired']) in range (2000,2011):
            if int(item['Date']) in range(1500,1600):
                if '16th' not in MoMA_centuries_acquired['2000s']:
                    MoMA_centuries_acquired['2000s']['16th'] = 1
                else:
                    MoMA_centuries_acquired['2000s']['16th'] += 1
            if int(item['Date']) in range(1600,1700):
                if '17th' not in MoMA_centuries_acquired['2000s']:
                    MoMA_centuries_acquired['2000s']['17th'] = 1
                else:
                    MoMA_centuries_acquired['2000s']['17th'] += 1
            if int(item['Date']) in range(1700,1800):
                if '18th' not in MoMA_centuries_acquired['2000s']:
                    MoMA_centuries_acquired['2000s']['18th'] = 1
                else:
                    MoMA_centuries_acquired['2000s']['18th'] += 1
            if int(item['Date']) in range(1800,1900):
                if '19th' not in MoMA_centuries_acquired['2000s']:
                    MoMA_centuries_acquired['2000s']['19th'] = 1
                else:
                    MoMA_centuries_acquired['2000s']['19th'] += 1 
            if int(item['Date']) in range(1900,2000):
                if '20th' not in MoMA_centuries_acquired['2000s']:
                    MoMA_centuries_acquired['2000s']['20th'] = 1
                else:
                    MoMA_centuries_acquired['2000s']['20th'] += 1
            if int(item['Date']) in range(2000,2011):
                if '21th' not in MoMA_centuries_acquired['2000s']:
                    MoMA_centuries_acquired['2000s']['21th'] = 1
                else:
                    MoMA_centuries_acquired['2000s']['21th'] += 1
        
        
print(MoMA_centuries_acquired)

defaultdict(<class 'dict'>, {'1930s': {'20th': 9}, '1940s': {'20th': 6973, '19th': 570}, '1950s': {'20th': 5784, '19th': 524, '18th': 5}, '1960s': {'20th': 27951, '19th': 2408, '18th': 81}, '1970s': {'20th': 12328, '19th': 739}, '1980s': {'20th': 8847, '19th': 1328, '18th': 1}, '1990s': {'20th': 11065, '19th': 236}, '2000s': {'20th': 18452, '19th': 296, '21th': 6817}})


## Tate

In [78]:
with open('Tate_artworks.csv', mode='r', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)
    Tate_centuries_acquired = defaultdict(dict)
    for item in reader:
        if int(item['DateAcquired']) in range (1900,1910):
            if int(item['Date']) in range(1500,1600):
                if '16th' not in Tate_centuries_acquired['1910s']:
                    Tate_centuries_acquired['1910s']['16th'] = 1
                else:
                    Tate_centuries_acquired['1910s']['16th'] += 1
            if int(item['Date']) in range(1600,1700):
                if '17th' not in Tate_centuries_acquired['1910s']:
                    Tate_centuries_acquired['1910s']['17th'] = 1
                else:
                    Tate_centuries_acquired['1910s']['17th'] += 1
            if int(item['Date']) in range(1700,1800):
                if '18th' not in Tate_centuries_acquired['1910s']:
                    Tate_centuries_acquired['1910s']['18th'] = 1
                else:
                    Tate_centuries_acquired['1910s']['18th'] += 1
            if int(item['Date']) in range(1800,1900):
                if '19th' not in Tate_centuries_acquired['1910s']:
                    Tate_centuries_acquired['1910s']['19th'] = 1
                else:
                    Tate_centuries_acquired['1910s']['19th'] += 1
            if int(item['Date']) in range(1900,1911):
                if '20th' not in Tate_centuries_acquired['1910s']:
                    Tate_centuries_acquired['1910s']['20th'] = 1
                else:
                    Tate_centuries_acquired['1910s']['20th'] += 1
            
            
        
        if int(item['DateAcquired']) in range (1910,1920):
            if int(item['Date']) in range(1500,1600):
                if '16th' not in Tate_centuries_acquired['1920s']:
                    Tate_centuries_acquired['1920s']['16th'] = 1
                else:
                    Tate_centuries_acquired['1920s']['16th'] += 1
            if int(item['Date']) in range(1600,1700):
                if '17th' not in Tate_centuries_acquired['1920s']:
                    Tate_centuries_acquired['1920s']['17th'] = 1
                else:
                    Tate_centuries_acquired['1920s']['17th'] += 1
            if int(item['Date']) in range(1700,1800):
                if '18th' not in Tate_centuries_acquired['1920s']:
                    Tate_centuries_acquired['1920s']['18th'] = 1
                else:
                    Tate_centuries_acquired['1920s']['18th'] += 1
            if int(item['Date']) in range(1800,1900):
                if '19th' not in Tate_centuries_acquired['1920s']:
                    Tate_centuries_acquired['1920s']['19th'] = 1
                else:
                    Tate_centuries_acquired['1920s']['19th'] += 1 
            if int(item['Date']) in range(1900,2000):
                if '20th' not in Tate_centuries_acquired['1920s']:
                    Tate_centuries_acquired['1920s']['20th'] = 1
                else:
                    Tate_centuries_acquired['1920s']['20th'] += 1
        
        
        
        
        if int(item['DateAcquired']) in range (1920,1930):
            if int(item['Date']) in range(1500,1600):
                if '16th' not in Tate_centuries_acquired['1930s']:
                    Tate_centuries_acquired['1930s']['16th'] = 1
                else:
                    Tate_centuries_acquired['1930s']['16th'] += 1
            if int(item['Date']) in range(1600,1700):
                if '17th' not in Tate_centuries_acquired['1930s']:
                    Tate_centuries_acquired['1930s']['17th'] = 1
                else:
                    Tate_centuries_acquired['1930s']['17th'] += 1
            if int(item['Date']) in range(1700,1800):
                if '18th' not in Tate_centuries_acquired['1930s']:
                    Tate_centuries_acquired['1930s']['18th'] = 1
                else:
                    Tate_centuries_acquired['1930s']['18th'] += 1
            if int(item['Date']) in range(1800,1900):
                if '19th' not in Tate_centuries_acquired['1930s']:
                    Tate_centuries_acquired['1930s']['19th'] = 1
                else:
                    Tate_centuries_acquired['1930s']['19th'] += 1 
            if int(item['Date']) in range(1900,2000):
                if '20th' not in Tate_centuries_acquired['1930s']:
                    Tate_centuries_acquired['1930s']['20th'] = 1
                else:
                    Tate_centuries_acquired['1930s']['20th'] += 1
   
   



        
        if int(item['DateAcquired']) in range (1940,1950):
            if int(item['Date']) in range(1500,1600):
                if '16th' not in Tate_centuries_acquired['1940s']:
                    Tate_centuries_acquired['1940s']['16th'] = 1
                else:
                    Tate_centuries_acquired['1940s']['16th'] += 1
            if int(item['Date']) in range(1600,1700):
                if '17th' not in Tate_centuries_acquired['1940s']:
                    Tate_centuries_acquired['1940s']['17th'] = 1
                else:
                    Tate_centuries_acquired['1940s']['17th'] += 1
            if int(item['Date']) in range(1700,1800):
                if '18th' not in Tate_centuries_acquired['1940s']:
                    Tate_centuries_acquired['1940s']['18th'] = 1
                else:
                    Tate_centuries_acquired['1940s']['18th'] += 1
            if int(item['Date']) in range(1800,1900):
                if '19th' not in Tate_centuries_acquired['1940s']:
                    Tate_centuries_acquired['1940s']['19th'] = 1
                else:
                    Tate_centuries_acquired['1940s']['19th'] += 1 
            if int(item['Date']) in range(1900,2000):
                if '20th' not in Tate_centuries_acquired['1940s']:
                    Tate_centuries_acquired['1940s']['20th'] = 1
                else:
                    Tate_centuries_acquired['1940s']['20th'] += 1

        
        if int(item['DateAcquired']) in range (1950,1960):
            if int(item['Date']) in range(1500,1600):
                if '16th' not in Tate_centuries_acquired['1950s']:
                    Tate_centuries_acquired['1950s']['16th'] = 1
                else:
                    Tate_centuries_acquired['1950s']['16th'] += 1
            if int(item['Date']) in range(1600,1700):
                if '17th' not in Tate_centuries_acquired['1950s']:
                    Tate_centuries_acquired['1950s']['17th'] = 1
                else:
                    Tate_centuries_acquired['1950s']['17th'] += 1
            if int(item['Date']) in range(1700,1800):
                if '18th' not in Tate_centuries_acquired['1950s']:
                    Tate_centuries_acquired['1950s']['18th'] = 1
                else:
                    Tate_centuries_acquired['1950s']['18th'] += 1
            if int(item['Date']) in range(1800,1900):
                if '19th' not in Tate_centuries_acquired['1950s']:
                    Tate_centuries_acquired['1950s']['19th'] = 1
                else:
                    Tate_centuries_acquired['1950s']['19th'] += 1 
            if int(item['Date']) in range(1900,2000):
                if '20th' not in Tate_centuries_acquired['1950s']:
                    Tate_centuries_acquired['1950s']['20th'] = 1
                else:
                    Tate_centuries_acquired['1950s']['20th'] += 1

                    
        if int(item['DateAcquired']) in range (1960,1970):
            if int(item['Date']) in range(1500,1600):
                if '16th' not in Tate_centuries_acquired['1960s']:
                    Tate_centuries_acquired['1960s']['16th'] = 1
                else:
                    Tate_centuries_acquired['1960s']['16th'] += 1
            if int(item['Date']) in range(1600,1700):
                if '17th' not in Tate_centuries_acquired['1960s']:
                    Tate_centuries_acquired['1960s']['17th'] = 1
                else:
                    Tate_centuries_acquired['1960s']['17th'] += 1
            if int(item['Date']) in range(1700,1800):
                if '18th' not in Tate_centuries_acquired['1960s']:
                    Tate_centuries_acquired['1960s']['18th'] = 1
                else:
                    Tate_centuries_acquired['1960s']['18th'] += 1
            if int(item['Date']) in range(1800,1900):
                if '19th' not in Tate_centuries_acquired['1960s']:
                    Tate_centuries_acquired['1960s']['19th'] = 1
                else:
                    Tate_centuries_acquired['1960s']['19th'] += 1 
            if int(item['Date']) in range(1900,2000):
                if '20th' not in Tate_centuries_acquired['1960s']:
                    Tate_centuries_acquired['1960s']['20th'] = 1
                else:
                    Tate_centuries_acquired['1960s']['20th'] += 1

        
        if int(item['DateAcquired']) in range (1970,1980):
            if int(item['Date']) in range(1500,1600):
                if '16th' not in Tate_centuries_acquired['1970s']:
                    Tate_centuries_acquired['1970s']['16th'] = 1
                else:
                    Tate_centuries_acquired['1970s']['16th'] += 1
            if int(item['Date']) in range(1600,1700):
                if '17th' not in Tate_centuries_acquired['1970s']:
                    Tate_centuries_acquired['1970s']['17th'] = 1
                else:
                    Tate_centuries_acquired['1970s']['17th'] += 1
            if int(item['Date']) in range(1700,1800):
                if '18th' not in Tate_centuries_acquired['1970s']:
                    Tate_centuries_acquired['1970s']['18th'] = 1
                else:
                    Tate_centuries_acquired['1970s']['18th'] += 1
            if int(item['Date']) in range(1800,1900):
                if '19th' not in Tate_centuries_acquired['1970s']:
                    Tate_centuries_acquired['1970s']['19th'] = 1
                else:
                    Tate_centuries_acquired['1970s']['19th'] += 1 
            if int(item['Date']) in range(1900,2000):
                if '20th' not in Tate_centuries_acquired['1970s']:
                    Tate_centuries_acquired['1970s']['20th'] = 1
                else:
                    Tate_centuries_acquired['1970s']['20th'] += 1

        
        if int(item['DateAcquired']) in range (1980,1990):
            if int(item['Date']) in range(1500,1600):
                if '16th' not in Tate_centuries_acquired['1980s']:
                    Tate_centuries_acquired['1980s']['16th'] = 1
                else:
                    Tate_centuries_acquired['1980s']['16th'] += 1
            if int(item['Date']) in range(1600,1700):
                if '17th' not in Tate_centuries_acquired['1980s']:
                    Tate_centuries_acquired['1980s']['17th'] = 1
                else:
                    Tate_centuries_acquired['1980s']['17th'] += 1
            if int(item['Date']) in range(1700,1800):
                if '18th' not in Tate_centuries_acquired['1980s']:
                    Tate_centuries_acquired['1980s']['18th'] = 1
                else:
                    Tate_centuries_acquired['1980s']['18th'] += 1
            if int(item['Date']) in range(1800,1900):
                if '19th' not in Tate_centuries_acquired['1980s']:
                    Tate_centuries_acquired['1980s']['19th'] = 1
                else:
                    Tate_centuries_acquired['1980s']['19th'] += 1 
            if int(item['Date']) in range(1900,2000):
                if '20th' not in Tate_centuries_acquired['1980s']:
                    Tate_centuries_acquired['1980s']['20th'] = 1
                else:
                    Tate_centuries_acquired['1980s']['20th'] += 1

        
        if int(item['DateAcquired']) in range (1990,2000):
            if int(item['Date']) in range(1500,1600):
                if '16th' not in Tate_centuries_acquired['1990s']:
                    Tate_centuries_acquired['1990s']['16th'] = 1
                else:
                    Tate_centuries_acquired['1990s']['16th'] += 1
            if int(item['Date']) in range(1600,1700):
                if '17th' not in Tate_centuries_acquired['1990s']:
                    Tate_centuries_acquired['1990s']['17th'] = 1
                else:
                    Tate_centuries_acquired['1990s']['17th'] += 1
            if int(item['Date']) in range(1700,1800):
                if '18th' not in Tate_centuries_acquired['1990s']:
                    Tate_centuries_acquired['1990s']['18th'] = 1
                else:
                    Tate_centuries_acquired['1990s']['18th'] += 1
            if int(item['Date']) in range(1800,1900):
                if '19th' not in Tate_centuries_acquired['1990s']:
                    Tate_centuries_acquired['1990s']['19th'] = 1
                else:
                    Tate_centuries_acquired['1990s']['19th'] += 1 
            if int(item['Date']) in range(1900,2000):
                if '20th' not in Tate_centuries_acquired['1990s']:
                    Tate_centuries_acquired['1990s']['20th'] = 1
                else:
                    Tate_centuries_acquired['1990s']['20th'] += 1

        
        
        if int(item['DateAcquired']) in range (2000,2011):
            if int(item['Date']) in range(1500,1600):
                if '16th' not in Tate_centuries_acquired['2000s']:
                    Tate_centuries_acquired['2000s']['16th'] = 1
                else:
                    Tate_centuries_acquired['2000s']['16th'] += 1
            if int(item['Date']) in range(1600,1700):
                if '17th' not in Tate_centuries_acquired['2000s']:
                    Tate_centuries_acquired['2000s']['17th'] = 1
                else:
                    Tate_centuries_acquired['2000s']['17th'] += 1
            if int(item['Date']) in range(1700,1800):
                if '18th' not in Tate_centuries_acquired['2000s']:
                    Tate_centuries_acquired['2000s']['18th'] = 1
                else:
                    Tate_centuries_acquired['2000s']['18th'] += 1
            if int(item['Date']) in range(1800,1900):
                if '19th' not in Tate_centuries_acquired['2000s']:
                    Tate_centuries_acquired['2000s']['19th'] = 1
                else:
                    Tate_centuries_acquired['2000s']['19th'] += 1 
            if int(item['Date']) in range(1900,2000):
                if '20th' not in Tate_centuries_acquired['2000s']:
                    Tate_centuries_acquired['2000s']['20th'] = 1
                else:
                    Tate_centuries_acquired['2000s']['20th'] += 1
            if int(item['Date']) in range(2000,2011):
                if '21th' not in Tate_centuries_acquired['2000s']:
                    Tate_centuries_acquired['2000s']['21th'] = 1
                else:
                    Tate_centuries_acquired['2000s']['21th'] += 1
        
        
print(Tate_centuries_acquired)

defaultdict(<class 'dict'>, {'1910s': {'19th': 245, '20th': 57, '18th': 18}, '1920s': {'19th': 267, '18th': 31, '20th': 210, '17th': 4}, '1930s': {'20th': 264, '19th': 560, '18th': 43, '17th': 2, '16th': 1}, '1940s': {'20th': 481, '19th': 155, '18th': 54, '17th': 2}, '1950s': {'20th': 439, '19th': 58, '17th': 12, '18th': 25, '16th': 2}, '1960s': {'20th': 691, '19th': 78, '18th': 64, '17th': 13, '16th': 3}, '1970s': {'20th': 5484, '18th': 96, '19th': 745, '16th': 2, '17th': 3}, '1980s': {'20th': 3028, '18th': 155, '19th': 1804, '16th': 2, '17th': 16}, '1990s': {'20th': 2448, '19th': 727, '17th': 19, '18th': 329, '16th': 2}, '2000s': {'20th': 3844, '21th': 1307, '18th': 38, '19th': 24, '16th': 1, '17th': 5}})
