# Is the Met becoming more Gender-inclusive?

Simona Sivkoff-Livneh

2022-12-21

https://public.paws.wmcloud.org/User:Kiki2976/MetInclusive.ipynb

## Overview

In this project, I explore themes of inclusion and representation in culture, as reflected in the collections and
acquisitions of the Metropolitan Museum of Art. I do this by looking at the gender of the artists whose works are currently represented in the holdings of various museums, using data from Wikidata. By segmenting this data by
acquisition date, I investigate whether there is a shift to greater equity of representation.

## Implementation details

My data source for this analysis is [Wikidata](https://www.wikidata.org/). Wikidata is a knowledge database that contains statements about entities in the world. Wikidata has extensive data about various works of art, and makes it possible to programmatically obtain information about the creator of a work of art, including their sex and country of origin.

For example, [Manneporte near Étretat (Q3820962)](https://www.wikidata.org/wiki/Q3820962) has the property of ‘creator’, with the value of [Claude Monet (Q296)](https://www.wikidata.org/wiki/Q296), which has the properties sex (male) and citizenship (French).

To query Wikidata, I am using the [Wikidata Query Service](https://query.wikidata.org/) (WQDS). Wikidata Query Service allows people to query Wikidata using a query language called SPARQL. To access WQDS and run SPARQL queries from Python, I am using [PAWS](https://wikitech.wikimedia.org/wiki/PAWS). PAWS allows people to write and share code that accesses Wikidata via interactive notebooks. PAWS provides an interactive Python environment with pre-loaded libraries for accessing Wikidata APIs and generating visualizations.

In [34]:
import urllib.parse

from IPython.display import IFrame


def query_wikidata(query='', width=800, height=500):
  """Display the results of a Wikidata query as an iframe."""
  quoted_query = urllib.parse.quote(query)
  return IFrame('https://query.wikidata.org/embed.html#' + quoted_query,
                width=width, height=height)

In [36]:
query = '''
  #defaultView:BarChart
  SELECT ?decade (COUNT(?gender) AS ?acquisitions) ?genderLabel WHERE {
    # Items in the collection (P195) of the Met (Q160236).
    ?item p:P195 ?node .
    ?node ps:P195 wd:Q160236 .
    ?node pq:P580 ?dateAcquired .
      
    # Extract the year from the acquisition date, cast to string,
    # and convert to decade by truncating the ones digit and
    # replacing it with a zero.
    BIND(YEAR(?dateAcquired) as ?yearAcquired) .
    BIND(CONCAT(SUBSTR(STR(?yearAcquired), 0, 4), "0") as ?decade) .

    ?item wdt:P170 ?artist .
    ?artist wdt:P21 ?gender . 
    VALUES ?gender {
      wd:Q6581097  # male
      wd:Q6581072  # female
      wd:Q48270    # non-binary
      wd:Q1097630  # intersex
    }
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  }
  GROUP BY ?decade ?genderLabel
'''

In [35]:
query_wikidata(width=1000, height=1000, query=query)

Next, I would like to calculate the percentage of acquired works that were created by non-male artists. To do this, I run the query again, this time consuming JSON output, which I process using Python code.

In [31]:
import requests

BASE_URL = 'https://query.wikidata.org/sparql'

r = requests.get(BASE_URL, params = {'format': 'json', 'query': query})
data = r.json()

male_acquisitions = {}
non_male_acquisitions = {}

for result in data['results']['bindings']:
    decade = int(result['decade']['value'])
    gender = result['genderLabel']['value']
    acquisitions = int(result['acquisitions']['value'])
    if gender == 'male':
        male_acquisitions[decade] = acquisitions
    else:
        non_male_acquisitions[decade] = non_male_acquisitions.get(decade, 0) + acquisitions

percent_non_male = {}
for decade in sorted(male_acquisitions):
    percent_non_male[decade] = 100 * (non_male_acquisitions.get(decade, 0) / male_acquisitions[decade])

In [32]:
from IPython.display import HTML, display
    
html = '''
 <table>
  <tr> <th>Decade</th> <th>% Acquisitions from non-male creators</th> </tr>
'''
for decade in sorted(percent_non_male):
    html += ('<tr> <td> %ss </td> <td> %.2f%% </td> </tr>' % (decade, percent_non_male[decade]))

html += '</table>'

display(HTML(html))


Decade,% Acquisitions from non-male creators
1870s,0.94%
1880s,7.46%
1890s,0.00%
1900s,3.01%
1910s,3.42%
1920s,3.98%
1930s,4.65%
1940s,6.79%
1950s,6.52%
1960s,6.39%


From this data, we can see that the percentage of newly-acquired works that were created by non-male artists rose through most the twentieth century, peaking in the 1980s and 1990s, but dropped in the first two decades of the current century.