# Looted Benin Art Work Distribution

Scrape <a href="https://digitalbenin.org/">the Benin site</a> to create a dataframe that contains the following scraped information about each institution:

- Museum name
- Country
- Number of disputed items

Export as a ```disputed-benin-artwork.csv```

In [1]:
import requests 
import pandas as pd
from bs4 import BeautifulSoup

In [3]:
## holding the url and grabbing it

url= "https://digitalbenin.org/institutions"
response = requests.get(url)

In [5]:
## is the link alive and well? 
response.status_code

200

In [6]:
## what type is the url?

type(response)

requests.models.Response

In [7]:
## let's get a string object so that we can work on it with beautifulsoup later
response.text

'<!DOCTYPE html><html class="h-100"><head><title>Digital Benin</title><link rel="icon" type="image/x-icon" href="/data/digital_benin/media_global/favicon.png"><meta name="viewport" content="width=device-width, initial-scale=1"><link rel="stylesheet" href="/style_old.css"><script src="/libraries/jquery/jquery-1.12.2.min.js"></script><script src="/libraries/bootstrap/bootstrap.bundle.min.js">  </script><script src="/deploy.js"></script><script src="/modal.js"></script><script src="/global.js"></script><link rel="stylesheet" href="/libraries/bootstrap/bootstrap.min.css"><link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap-icons@1.8.3/font/bootstrap-icons.css"><link rel="stylesheet" href="/style_global.css"><link rel="stylesheet" href="/style.css"></head><body class="d-flex flex-column h-100"><nav class="navbar navbar-expand-lg navbar-dark bg-black fixed-top shadow-sm" style="z-index:999"><div class="container-fluid"><a class="navbar-brand me-0 overflow-visible ps-2" href="/

In [8]:
type(response.text)

str

In [9]:
## bringing in beautifulsoup to recreate our string into html and css

soup = BeautifulSoup(response.text,"html.parser")
soup

<!DOCTYPE html>
<html class="h-100"><head><title>Digital Benin</title><link href="/data/digital_benin/media_global/favicon.png" rel="icon" type="image/x-icon"/><meta content="width=device-width, initial-scale=1" name="viewport"/><link href="/style_old.css" rel="stylesheet"/><script src="/libraries/jquery/jquery-1.12.2.min.js"></script><script src="/libraries/bootstrap/bootstrap.bundle.min.js"> </script><script src="/deploy.js"></script><script src="/modal.js"></script><script src="/global.js"></script><link href="/libraries/bootstrap/bootstrap.min.css" rel="stylesheet"/><link href="https://cdn.jsdelivr.net/npm/bootstrap-icons@1.8.3/font/bootstrap-icons.css" rel="stylesheet"/><link href="/style_global.css" rel="stylesheet"/><link href="/style.css" rel="stylesheet"/></head><body class="d-flex flex-column h-100"><nav class="navbar navbar-expand-lg navbar-dark bg-black fixed-top shadow-sm" style="z-index:999"><div class="container-fluid"><a class="navbar-brand me-0 overflow-visible ps-2" h

In [10]:
print(soup.prettify())

<!DOCTYPE html>
<html class="h-100">
 <head>
  <title>
   Digital Benin
  </title>
  <link href="/data/digital_benin/media_global/favicon.png" rel="icon" type="image/x-icon"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <link href="/style_old.css" rel="stylesheet"/>
  <script src="/libraries/jquery/jquery-1.12.2.min.js">
  </script>
  <script src="/libraries/bootstrap/bootstrap.bundle.min.js">
  </script>
  <script src="/deploy.js">
  </script>
  <script src="/modal.js">
  </script>
  <script src="/global.js">
  </script>
  <link href="/libraries/bootstrap/bootstrap.min.css" rel="stylesheet"/>
  <link href="https://cdn.jsdelivr.net/npm/bootstrap-icons@1.8.3/font/bootstrap-icons.css" rel="stylesheet"/>
  <link href="/style_global.css" rel="stylesheet"/>
  <link href="/style.css" rel="stylesheet"/>
 </head>
 <body class="d-flex flex-column h-100">
  <nav class="navbar navbar-expand-lg navbar-dark bg-black fixed-top shadow-sm" style="z-index:999">
   <div cla

In [11]:
## is this a bs4?

type(soup)

bs4.BeautifulSoup

In [13]:
## finding the div with the sort, museum?

soup.find_all("div", sort ="museum")

[<div class="col-12 col-md-5" sort="museum" val="British Museum"><div class="py-md-2"><div><a class="link-dark fs-5 fw-semibold text-decoration-none" href="/institutions/5">British Museum</a></div></div></div>,
 <div class="col-12 col-md-5" sort="museum" val="Ethnologisches Museum, Staatliche Museen zu Berlin"><div class="py-md-2"><div><a class="link-dark fs-5 fw-semibold text-decoration-none" href="/institutions/13">Ethnologisches Museum, Staatliche Museen zu Berlin</a></div></div></div>,
 <div class="col-12 col-md-5" sort="museum" val="Field Museum"><div class="py-md-2"><div><a class="link-dark fs-5 fw-semibold text-decoration-none" href="/institutions/15">Field Museum</a></div></div></div>,
 <div class="col-12 col-md-5" sort="museum" val="Museum of Archaeology and Anthropology, University of Cambridge"><div class="py-md-2"><div><a class="link-dark fs-5 fw-semibold text-decoration-none" href="/institutions/28">Museum of Archaeology and Anthropology, University of Cambridge</a></div><

In [20]:
len(soup.find_all("div", sort ="museum"))

131

In [15]:
museums = soup.find_all("div", sort ="museum")
museums

[<div class="col-12 col-md-5" sort="museum" val="British Museum"><div class="py-md-2"><div><a class="link-dark fs-5 fw-semibold text-decoration-none" href="/institutions/5">British Museum</a></div></div></div>,
 <div class="col-12 col-md-5" sort="museum" val="Ethnologisches Museum, Staatliche Museen zu Berlin"><div class="py-md-2"><div><a class="link-dark fs-5 fw-semibold text-decoration-none" href="/institutions/13">Ethnologisches Museum, Staatliche Museen zu Berlin</a></div></div></div>,
 <div class="col-12 col-md-5" sort="museum" val="Field Museum"><div class="py-md-2"><div><a class="link-dark fs-5 fw-semibold text-decoration-none" href="/institutions/15">Field Museum</a></div></div></div>,
 <div class="col-12 col-md-5" sort="museum" val="Museum of Archaeology and Anthropology, University of Cambridge"><div class="py-md-2"><div><a class="link-dark fs-5 fw-semibold text-decoration-none" href="/institutions/28">Museum of Archaeology and Anthropology, University of Cambridge</a></div><

In [17]:
## trying to get museum names

for name in museums:
    print(name.get_text())
    print("*******")

British Museum
*******
Ethnologisches Museum, Staatliche Museen zu Berlin
*******
Field Museum
*******
Museum of Archaeology and Anthropology, University of Cambridge
*******
National Museum, Benin
*******
Staatliche Ethnographische Sammlungen Sachsen und Staatliche Kunstsammlungen Dresden
*******
Weltmuseum Wien
*******
University of Pennsylvania Museum of Archaeology and Anthropology (Penn Museum)
*******
MARKK Museum am Rothenbaum Kulturen und Künste der Welt
*******
Metropolitan Museum of Art
*******
Pitt Rivers Museum
*******
Nationaal Museum van Wereldculturen and Wereldmuseum
*******
Rautenstrauch-Joest-Museum
*******
National Museum, Lagos
*******
National Museums Scotland
*******
Horniman Museum and Gardens
*******
National Museums Liverpool, World Museum
*******
Linden-Museum Stuttgart, Staatliches Museum für Völkerkunde
*******
Fowler Museum at UCLA
*******
Weltkulturen Museum Frankfurt am Main
*******
Världskultur Museerna, National Museums of World Culture
*******
American

In [19]:
museum_names = [(name.get_text()) for name in museums]
museum_names

['British Museum',
 'Ethnologisches Museum, Staatliche Museen zu Berlin',
 'Field Museum',
 'Museum of Archaeology and Anthropology, University of Cambridge',
 'National Museum, Benin',
 'Staatliche Ethnographische Sammlungen Sachsen und Staatliche Kunstsammlungen Dresden',
 'Weltmuseum Wien',
 'University of Pennsylvania Museum of Archaeology and Anthropology (Penn Museum)',
 'MARKK Museum am Rothenbaum Kulturen und Künste der Welt',
 'Metropolitan Museum of Art',
 'Pitt Rivers Museum',
 'Nationaal Museum van Wereldculturen and Wereldmuseum',
 'Rautenstrauch-Joest-Museum',
 'National Museum, Lagos',
 'National Museums Scotland',
 'Horniman Museum and Gardens',
 'National Museums Liverpool, World Museum',
 'Linden-Museum Stuttgart, Staatliches Museum für Völkerkunde',
 'Fowler Museum at UCLA',
 'Weltkulturen Museum Frankfurt am Main',
 'Världskultur Museerna, National Museums of World Culture',
 'American Museum of Natural History',
 'National Museum of Ireland',
 'Peabody Museum of Ar

In [21]:
## did i get the length right?

len(museum_names)

131

In [24]:
## now for country name
soup.find_all("div", sort ="filter_main")

[<div class="col col-md-auto inst-col-filter" sort="filter_main" val="United Kingdom"><div class="py-md-1"><div class="d-flex flex-wrap"><span class="text-truncate nav-link filter_main badge text-bg-dark" filter_id="unitedkingdom" role="button" style="max-width:100% margin: 2px 0">United Kingdom</span></div></div></div>,
 <div class="col col-md-auto inst-col-filter" sort="filter_main" val="Germany"><div class="py-md-1"><div class="d-flex flex-wrap"><span class="text-truncate nav-link filter_main badge text-bg-dark" filter_id="germany" role="button" style="max-width:100% margin: 2px 0">Germany</span></div></div></div>,
 <div class="col col-md-auto inst-col-filter" sort="filter_main" val="United States"><div class="py-md-1"><div class="d-flex flex-wrap"><span class="text-truncate nav-link filter_main badge text-bg-dark" filter_id="unitedstates" role="button" style="max-width:100% margin: 2px 0">United States</span></div></div></div>,
 <div class="col col-md-auto inst-col-filter" sort="fi

In [28]:
countries = soup.find_all("div", sort ="filter_main")

In [29]:
## make it into a list

countries = [(country.get_text()) for country in countries]
countries

['United Kingdom',
 'Germany',
 'United States',
 'United Kingdom',
 'Nigeria',
 'Germany',
 'Austria',
 'United States',
 'Germany',
 'United States',
 'United Kingdom',
 'Netherlands',
 'Germany',
 'Nigeria',
 'United Kingdom',
 'United Kingdom',
 'United Kingdom',
 'Germany',
 'United States',
 'Germany',
 'Sweden',
 'United States',
 'Ireland',
 'United States',
 'United States',
 'France',
 'United Kingdom',
 'Germany',
 'Russia',
 'Germany',
 'United States',
 'United Kingdom',
 'Norway',
 'Switzerland',
 'United Kingdom',
 'Germany',
 'Switzerland',
 'United States',
 'New Zealand',
 'United States',
 'Switzerland',
 'United Kingdom',
 'United Kingdom',
 'United States',
 'United Kingdom',
 'Germany',
 'Israel',
 'Australia',
 'Germany',
 'United Kingdom',
 'United States',
 'United States',
 'United Kingdom',
 'Switzerland',
 'United States',
 'United States',
 'Switzerland',
 'United States',
 'United States',
 'United States',
 'United States',
 'United States',
 'Denmark',
 

In [30]:
len(countries)

131

In [108]:
objects = [soup.find_all("div", sort ="count")]
objects

[[<div class="col-auto col-md-auto inst-col-count" count_default="944" inst="British Museum" sort="count" val="944"><div class="py-md-2 small fw-semibold text-end text-md-center"><div class="d-inline object_count" count_default="944">944</div><div class="d-inline d-md-none"> Objects</div></div></div>,
  <div class="col-auto col-md-auto inst-col-count" count_default="518" inst="Ethnologisches Museum, Staatliche Museen zu Berlin" sort="count" val="518"><div class="py-md-2 small fw-semibold text-end text-md-center"><div class="d-inline object_count" count_default="518">518</div><div class="d-inline d-md-none"> Objects</div></div></div>,
  <div class="col-auto col-md-auto inst-col-count" count_default="393" inst="Field Museum" sort="count" val="393"><div class="py-md-2 small fw-semibold text-end text-md-center"><div class="d-inline object_count" count_default="393">393</div><div class="d-inline d-md-none"> Objects</div></div></div>,
  <div class="col-auto col-md-auto inst-col-count" count_

In [112]:
objects = [(number.get_text()) for number in objects]
objects
## did this not work because the number and the objects are in two different divs? I do not know, i've tried
## other classes and it still doesn't work 

AttributeError: ResultSet object has no attribute 'get_text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

In [114]:
museum_list = []

for item in zip(museum_names, countries):
    museum_list.append(item)

museum_list

[('British Museum', 'United Kingdom'),
 ('Ethnologisches Museum, Staatliche Museen zu Berlin', 'Germany'),
 ('Field Museum', 'United States'),
 ('Museum of Archaeology and Anthropology, University of Cambridge',
  'United Kingdom'),
 ('National Museum, Benin', 'Nigeria'),
 ('Staatliche Ethnographische Sammlungen Sachsen und Staatliche Kunstsammlungen Dresden',
  'Germany'),
 ('Weltmuseum Wien', 'Austria'),
 ('University of Pennsylvania Museum of Archaeology and Anthropology (Penn Museum)',
  'United States'),
 ('MARKK Museum am Rothenbaum Kulturen und Künste der Welt', 'Germany'),
 ('Metropolitan Museum of Art', 'United States'),
 ('Pitt Rivers Museum', 'United Kingdom'),
 ('Nationaal Museum van Wereldculturen and Wereldmuseum', 'Netherlands'),
 ('Rautenstrauch-Joest-Museum', 'Germany'),
 ('National Museum, Lagos', 'Nigeria'),
 ('National Museums Scotland', 'United Kingdom'),
 ('Horniman Museum and Gardens', 'United Kingdom'),
 ('National Museums Liverpool, World Museum', 'United Kingd

In [115]:
df = pd.DataFrame(museum_list)
df.columns = ["Museum_name", "Country"]
df

Unnamed: 0,Museum_name,Country
0,British Museum,United Kingdom
1,"Ethnologisches Museum, Staatliche Museen zu Be...",Germany
2,Field Museum,United States
3,"Museum of Archaeology and Anthropology, Univer...",United Kingdom
4,"National Museum, Benin",Nigeria
...,...,...
126,"Allen Memorial Art Museum, Oberlin College",United States
127,Newark Museum of Art,United States
128,LACMA The Los Angeles County Museum of Art,United States
129,Hood Museum of Art,United States


In [116]:
df.to_csv("disputed-benin-artwork.csv", encoding = "UTF-8", index = "False")