# Looted Benin Art Work Distribution

Scrape <a href="https://digitalbenin.org/">the Benin site</a> to create a dataframe that contains the following scraped information about each institution:

- Museum name
- Country
- Number of disputed items

Export as a ```disputed-benin-artwork.csv```

In [1]:
## import library
import pandas as pd
import requests # es lo que trae todo el html del servidor
from bs4 import BeautifulSoup

In [2]:
## Requesting web content
url = "https://digitalbenin.org/institutions"
##scrape url website
response = requests.get(url)

In [3]:
## did it work?
response.status_code

200

In [4]:
## Create the soup
soup = BeautifulSoup(response.text, 'html.parser') #Esto vuelver a convertir el texto en un formato html, para scrapear
soup

<!DOCTYPE html>
<html class="h-100"><head><title>Digital Benin</title><link href="/data/digital_benin/media_global/favicon.png" rel="icon" type="image/x-icon"/><meta content="width=device-width, initial-scale=1" name="viewport"/><link href="/style_old.css" rel="stylesheet"/><script src="/libraries/jquery/jquery-1.12.2.min.js"></script><script src="/libraries/bootstrap/bootstrap.bundle.min.js"> </script><script src="/deploy.js"></script><script src="/modal.js"></script><script src="/global.js"></script><link href="/libraries/bootstrap/bootstrap.min.css" rel="stylesheet"/><link href="https://cdn.jsdelivr.net/npm/bootstrap-icons@1.8.3/font/bootstrap-icons.css" rel="stylesheet"/><link href="/style_global.css" rel="stylesheet"/><link href="/style.css" rel="stylesheet"/></head><body class="d-flex flex-column h-100"><nav class="navbar navbar-expand-lg navbar-dark bg-black fixed-top shadow-sm" style="z-index:999"><div class="container-fluid"><a class="navbar-brand me-0 overflow-visible ps-2" h

In [5]:
## prettify our printout
print(soup.prettify())

<!DOCTYPE html>
<html class="h-100">
 <head>
  <title>
   Digital Benin
  </title>
  <link href="/data/digital_benin/media_global/favicon.png" rel="icon" type="image/x-icon"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <link href="/style_old.css" rel="stylesheet"/>
  <script src="/libraries/jquery/jquery-1.12.2.min.js">
  </script>
  <script src="/libraries/bootstrap/bootstrap.bundle.min.js">
  </script>
  <script src="/deploy.js">
  </script>
  <script src="/modal.js">
  </script>
  <script src="/global.js">
  </script>
  <link href="/libraries/bootstrap/bootstrap.min.css" rel="stylesheet"/>
  <link href="https://cdn.jsdelivr.net/npm/bootstrap-icons@1.8.3/font/bootstrap-icons.css" rel="stylesheet"/>
  <link href="/style_global.css" rel="stylesheet"/>
  <link href="/style.css" rel="stylesheet"/>
 </head>
 <body class="d-flex flex-column h-100">
  <nav class="navbar navbar-expand-lg navbar-dark bg-black fixed-top shadow-sm" style="z-index:999">
   <div cla

In [6]:
## Return all institutions name and url
institutions = soup.find_all("a", class_ = "fs-5") 
institutions

[<a class="link-dark fs-5 fw-semibold text-decoration-none" href="/institutions/5">British Museum</a>,
 <a class="link-dark fs-5 fw-semibold text-decoration-none" href="/institutions/13">Ethnologisches Museum, Staatliche Museen zu Berlin</a>,
 <a class="link-dark fs-5 fw-semibold text-decoration-none" href="/institutions/15">Field Museum</a>,
 <a class="link-dark fs-5 fw-semibold text-decoration-none" href="/institutions/28">Museum of Archaeology and Anthropology, University of Cambridge</a>,
 <a class="link-dark fs-5 fw-semibold text-decoration-none" href="/institutions/36">National Museum, Benin</a>,
 <a class="link-dark fs-5 fw-semibold text-decoration-none" href="/institutions/47">Staatliche Ethnographische Sammlungen Sachsen und Staatliche Kunstsammlungen Dresden</a>,
 <a class="link-dark fs-5 fw-semibold text-decoration-none" href="/institutions/50">Weltmuseum Wien</a>,
 <a class="link-dark fs-5 fw-semibold text-decoration-none" href="/institutions/49">University of Pennsylvania 

In [7]:
len(institutions)

131

In [8]:
## return institutions names
institution_name = [institution.get_text() for institution in institutions]
institution_name

['British Museum',
 'Ethnologisches Museum, Staatliche Museen zu Berlin',
 'Field Museum',
 'Museum of Archaeology and Anthropology, University of Cambridge',
 'National Museum, Benin',
 'Staatliche Ethnographische Sammlungen Sachsen und Staatliche Kunstsammlungen Dresden',
 'Weltmuseum Wien',
 'University of Pennsylvania Museum of Archaeology and Anthropology (Penn Museum)',
 'MARKK Museum am Rothenbaum Kulturen und Künste der Welt',
 'Metropolitan Museum of Art',
 'Pitt Rivers Museum',
 'Nationaal Museum van Wereldculturen and Wereldmuseum',
 'Rautenstrauch-Joest-Museum',
 'National Museum, Lagos',
 'National Museums Scotland',
 'Horniman Museum and Gardens',
 'National Museums Liverpool, World Museum',
 'Linden-Museum Stuttgart, Staatliches Museum für Völkerkunde',
 'Fowler Museum at UCLA',
 'Weltkulturen Museum Frankfurt am Main',
 'Världskultur Museerna, National Museums of World Culture',
 'American Museum of Natural History',
 'National Museum of Ireland',
 'Peabody Museum of Ar

In [9]:
## return institutions urls
institutions_url = ["https://digitalbenin.org" + institution.get('href') for institution in institutions]
institutions_url

['https://digitalbenin.org/institutions/5',
 'https://digitalbenin.org/institutions/13',
 'https://digitalbenin.org/institutions/15',
 'https://digitalbenin.org/institutions/28',
 'https://digitalbenin.org/institutions/36',
 'https://digitalbenin.org/institutions/47',
 'https://digitalbenin.org/institutions/50',
 'https://digitalbenin.org/institutions/49',
 'https://digitalbenin.org/institutions/24',
 'https://digitalbenin.org/institutions/22',
 'https://digitalbenin.org/institutions/42',
 'https://digitalbenin.org/institutions/34',
 'https://digitalbenin.org/institutions/43',
 'https://digitalbenin.org/institutions/37',
 'https://digitalbenin.org/institutions/35',
 'https://digitalbenin.org/institutions/17',
 'https://digitalbenin.org/institutions/51',
 'https://digitalbenin.org/institutions/20',
 'https://digitalbenin.org/institutions/16',
 'https://digitalbenin.org/institutions/146',
 'https://digitalbenin.org/institutions/29',
 'https://digitalbenin.org/institutions/121',
 'https:/

In [10]:
## Return all institutions country
countries = soup.find_all("span", class_ = "badge")
countries

[<span class="text-truncate nav-link filter_main badge text-bg-dark" filter_id="unitedkingdom" role="button" style="max-width:100% margin: 2px 0">United Kingdom</span>,
 <span class="text-truncate nav-link filter_main badge text-bg-dark" filter_id="germany" role="button" style="max-width:100% margin: 2px 0">Germany</span>,
 <span class="text-truncate nav-link filter_main badge text-bg-dark" filter_id="unitedstates" role="button" style="max-width:100% margin: 2px 0">United States</span>,
 <span class="text-truncate nav-link filter_main badge text-bg-dark" filter_id="unitedkingdom" role="button" style="max-width:100% margin: 2px 0">United Kingdom</span>,
 <span class="text-truncate nav-link filter_main badge text-bg-dark" filter_id="nigeria" role="button" style="max-width:100% margin: 2px 0">Nigeria</span>,
 <span class="text-truncate nav-link filter_main badge text-bg-dark" filter_id="germany" role="button" style="max-width:100% margin: 2px 0">Germany</span>,
 <span class="text-truncate

In [11]:
## Return institutions country
countries_lc = [country.get_text() for country in countries]
countries_lc

['United Kingdom',
 'Germany',
 'United States',
 'United Kingdom',
 'Nigeria',
 'Germany',
 'Austria',
 'United States',
 'Germany',
 'United States',
 'United Kingdom',
 'Netherlands',
 'Germany',
 'Nigeria',
 'United Kingdom',
 'United Kingdom',
 'United Kingdom',
 'Germany',
 'United States',
 'Germany',
 'Sweden',
 'United States',
 'Ireland',
 'United States',
 'United States',
 'France',
 'Germany',
 'United Kingdom',
 'Russia',
 'Germany',
 'Norway',
 'United Kingdom',
 'United States',
 'Switzerland',
 'Germany',
 'United Kingdom',
 'Switzerland',
 'United States',
 'United States',
 'New Zealand',
 'United Kingdom',
 'Switzerland',
 'United Kingdom',
 'United States',
 'Germany',
 'United Kingdom',
 'Germany',
 'Australia',
 'Israel',
 'Switzerland',
 'United Kingdom',
 'United States',
 'United States',
 'United Kingdom',
 'United States',
 'United States',
 'Switzerland',
 'United States',
 'United States',
 'United States',
 'United States',
 'Germany',
 'Germany',
 'Canad

In [12]:
## Return all disputed items
items = soup.find_all("div", class_ = "object_count")
items

[<div class="d-inline object_count" count_default="944">944</div>,
 <div class="d-inline object_count" count_default="518">518</div>,
 <div class="d-inline object_count" count_default="393">393</div>,
 <div class="d-inline object_count" count_default="350">350</div>,
 <div class="d-inline object_count" count_default="285">285</div>,
 <div class="d-inline object_count" count_default="283">283</div>,
 <div class="d-inline object_count" count_default="202">202</div>,
 <div class="d-inline object_count" count_default="188">188</div>,
 <div class="d-inline object_count" count_default="179">179</div>,
 <div class="d-inline object_count" count_default="154">154</div>,
 <div class="d-inline object_count" count_default="148">148</div>,
 <div class="d-inline object_count" count_default="122">122</div>,
 <div class="d-inline object_count" count_default="92">92</div>,
 <div class="d-inline object_count" count_default="81">81</div>,
 <div class="d-inline object_count" count_default="74">74</div>,
 

In [13]:
## Return disputed items
items_lc = [item.get_text() for item in items]
items_lc

['944',
 '518',
 '393',
 '350',
 '285',
 '283',
 '202',
 '188',
 '179',
 '154',
 '148',
 '122',
 '92',
 '81',
 '74',
 '72',
 '71',
 '69',
 '64',
 '55',
 '53',
 '48',
 '46',
 '43',
 '37',
 '35',
 '32',
 '32',
 '28',
 '24',
 '23',
 '23',
 '23',
 '20',
 '18',
 '18',
 '17',
 '16',
 '15',
 '15',
 '14',
 '14',
 '13',
 '12',
 '10',
 '10',
 '9',
 '9',
 '9',
 '8',
 '8',
 '8',
 '8',
 '8',
 '7',
 '7',
 '7',
 '7',
 '7',
 '6',
 '6',
 '5',
 '5',
 '5',
 '5',
 '5',
 '4',
 '4',
 '4',
 '4',
 '4',
 '4',
 '4',
 '3',
 '3',
 '3',
 '3',
 '3',
 '3',
 '3',
 '3',
 '3',
 '3',
 '3',
 '3',
 '3',
 '3',
 '2',
 '2',
 '2',
 '2',
 '2',
 '2',
 '2',
 '2',
 '2',
 '2',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1']

In [14]:
## create a list that zips all items together.  
disputed_items = []
for item in zip (institution_name, countries_lc, items_lc, institutions_url):
    disputed_items.append(item)
disputed_items

[('British Museum',
  'United Kingdom',
  '944',
  'https://digitalbenin.org/institutions/5'),
 ('Ethnologisches Museum, Staatliche Museen zu Berlin',
  'Germany',
  '518',
  'https://digitalbenin.org/institutions/13'),
 ('Field Museum',
  'United States',
  '393',
  'https://digitalbenin.org/institutions/15'),
 ('Museum of Archaeology and Anthropology, University of Cambridge',
  'United Kingdom',
  '350',
  'https://digitalbenin.org/institutions/28'),
 ('National Museum, Benin',
  'Nigeria',
  '285',
  'https://digitalbenin.org/institutions/36'),
 ('Staatliche Ethnographische Sammlungen Sachsen und Staatliche Kunstsammlungen Dresden',
  'Germany',
  '283',
  'https://digitalbenin.org/institutions/47'),
 ('Weltmuseum Wien',
  'Austria',
  '202',
  'https://digitalbenin.org/institutions/50'),
 ('University of Pennsylvania Museum of Archaeology and Anthropology (Penn Museum)',
  'United States',
  '188',
  'https://digitalbenin.org/institutions/49'),
 ('MARKK Museum am Rothenbaum Kultur

In [15]:
len(disputed_items)

131

In [16]:
df = pd.DataFrame(disputed_items)
df.columns = ["Institution", "Country", "Disputed items", "More Info"]
df

Unnamed: 0,Institution,Country,Disputed items,More Info
0,British Museum,United Kingdom,944,https://digitalbenin.org/institutions/5
1,"Ethnologisches Museum, Staatliche Museen zu Be...",Germany,518,https://digitalbenin.org/institutions/13
2,Field Museum,United States,393,https://digitalbenin.org/institutions/15
3,"Museum of Archaeology and Anthropology, Univer...",United Kingdom,350,https://digitalbenin.org/institutions/28
4,"National Museum, Benin",Nigeria,285,https://digitalbenin.org/institutions/36
...,...,...,...,...
126,Speed Art Museum,United States,1,https://digitalbenin.org/institutions/345
127,"Niedersächsische Landesmuseen Oldenburg, Lande...",Germany,1,https://digitalbenin.org/institutions/346
128,Hull Museums,United Kingdom,1,https://digitalbenin.org/institutions/69
129,Great North Museum: Hancock,United Kingdom,1,https://digitalbenin.org/institutions/84


In [17]:
## use pandas to write to csv file
df.to_csv("disputed-benin-artwork.csv", index = False, encoding = "UTF-8")