# Looted Benin Art Work Distribution

Scrape <a href="https://digitalbenin.org/">the Benin site</a> to create a dataframe that contains the following scraped information about each institution:

- Museum name
- Country
- Number of disputed items

Export as a ```disputed-benin-artwork.csv```


In [1]:
## import libraries
import requests
import pandas as pd
from bs4 import BeautifulSoup

In [2]:
## request url
url = "https://digitalbenin.org/institutions"
response = requests.get(url)
response.status_code

200

In [3]:
## convert to soup
soup = BeautifulSoup(response.text, "html.parser")

In [9]:
## target container containing museums, countries, number of objects
all_info = soup.find(id="sortable_elems")
all_info

<div class="bg-white" id="sortable_elems"><div class="row g-2 g-md-0 gx-md-3 py-3 py-md-0 border-bottom sortable_elem div-hover align-items-center cat_12 cat_1 cat_13 cat_5 cat_11 cat_4 cat_8 cat_6 cat_14 cat_19 cat_18 cat_7 cat_16 cat_3 cat_20 cat_2 cat_10 cat_24 cat_15 cat_0 main_unitedkingdom" row_id="5" test='[{"DB_name":"Palace Architecture &amp; Ornamentation","DB_id":12,"DB_count":210},{"DB_name":"Altar &amp; Shrine Objects","DB_id":1,"DB_count":208},{"DB_name":"Personal Adornment","DB_id":13,"DB_count":120},{"DB_name":"Ceremonial Regalia","DB_id":5,"DB_count":107},{"DB_name":"Musical Instruments","DB_id":11,"DB_count":97},{"DB_name":"Ceremonial Objects","DB_id":4,"DB_count":82},{"DB_name":"Figures","DB_id":8,"DB_count":74},{"DB_name":"Containers","DB_id":6,"DB_count":70},{"DB_name":"Staffs","DB_id":14,"DB_count":55},{"DB_name":"Heads","DB_id":19,"DB_count":51},{"DB_name":"Weaponry","DB_id":18,"DB_count":26},{"DB_name":"Household &amp; Everyday Objects","DB_id":7,"DB_count":21},

In [17]:
## get museums html
museums = all_info.find_all("div", class_="col-md-5")
museums

[<div class="col-12 col-md-5" sort="museum" val="British Museum"><div class="py-md-2"><div><a class="link-dark fs-5 fw-semibold text-decoration-none" href="/institutions/5">British Museum</a></div></div></div>,
 <div class="col-12 col-md-5" sort="museum" val="Ethnologisches Museum, Staatliche Museen zu Berlin"><div class="py-md-2"><div><a class="link-dark fs-5 fw-semibold text-decoration-none" href="/institutions/13">Ethnologisches Museum, Staatliche Museen zu Berlin</a></div></div></div>,
 <div class="col-12 col-md-5" sort="museum" val="Field Museum"><div class="py-md-2"><div><a class="link-dark fs-5 fw-semibold text-decoration-none" href="/institutions/15">Field Museum</a></div></div></div>,
 <div class="col-12 col-md-5" sort="museum" val="Museum of Archaeology and Anthropology, University of Cambridge"><div class="py-md-2"><div><a class="link-dark fs-5 fw-semibold text-decoration-none" href="/institutions/28">Museum of Archaeology and Anthropology, University of Cambridge</a></div><

In [18]:
## get museums without text
museums_list = [museum.get_text() for museum in museums]
museums_list

['British Museum',
 'Ethnologisches Museum, Staatliche Museen zu Berlin',
 'Field Museum',
 'Museum of Archaeology and Anthropology, University of Cambridge',
 'National Museum, Benin',
 'Staatliche Ethnographische Sammlungen Sachsen und Staatliche Kunstsammlungen Dresden',
 'Weltmuseum Wien',
 'University of Pennsylvania Museum of Archaeology and Anthropology (Penn Museum)',
 'MARKK Museum am Rothenbaum Kulturen und Künste der Welt',
 'Metropolitan Museum of Art',
 'Pitt Rivers Museum',
 'Nationaal Museum van Wereldculturen and Wereldmuseum',
 'Rautenstrauch-Joest-Museum',
 'National Museum, Lagos',
 'National Museums Scotland',
 'Horniman Museum and Gardens',
 'National Museums Liverpool, World Museum',
 'Linden-Museum Stuttgart, Staatliches Museum für Völkerkunde',
 'Fowler Museum at UCLA',
 'Weltkulturen Museum Frankfurt am Main',
 'Världskultur Museerna, National Museums of World Culture',
 'American Museum of Natural History',
 'National Museum of Ireland',
 'Peabody Museum of Ar

In [19]:
## confirm number of museums same as number listed on site
len(museums_list)

131

In [25]:
## get countries html
countries = all_info.find_all("span", class_="text-truncate")
countries

[<span class="text-truncate nav-link filter_main badge text-bg-dark" filter_id="unitedkingdom" role="button" style="max-width:100% margin: 2px 0">United Kingdom</span>,
 <span class="text-truncate nav-link filter_main badge text-bg-dark" filter_id="germany" role="button" style="max-width:100% margin: 2px 0">Germany</span>,
 <span class="text-truncate nav-link filter_main badge text-bg-dark" filter_id="unitedstates" role="button" style="max-width:100% margin: 2px 0">United States</span>,
 <span class="text-truncate nav-link filter_main badge text-bg-dark" filter_id="unitedkingdom" role="button" style="max-width:100% margin: 2px 0">United Kingdom</span>,
 <span class="text-truncate nav-link filter_main badge text-bg-dark" filter_id="nigeria" role="button" style="max-width:100% margin: 2px 0">Nigeria</span>,
 <span class="text-truncate nav-link filter_main badge text-bg-dark" filter_id="germany" role="button" style="max-width:100% margin: 2px 0">Germany</span>,
 <span class="text-truncate

In [26]:
## countries without html
countries_list = [country.get_text() for country in countries]
countries_list

['United Kingdom',
 'Germany',
 'United States',
 'United Kingdom',
 'Nigeria',
 'Germany',
 'Austria',
 'United States',
 'Germany',
 'United States',
 'United Kingdom',
 'Netherlands',
 'Germany',
 'Nigeria',
 'United Kingdom',
 'United Kingdom',
 'United Kingdom',
 'Germany',
 'United States',
 'Germany',
 'Sweden',
 'United States',
 'Ireland',
 'United States',
 'United States',
 'France',
 'United Kingdom',
 'Germany',
 'Russia',
 'Germany',
 'United States',
 'United Kingdom',
 'Norway',
 'Switzerland',
 'United Kingdom',
 'Germany',
 'Switzerland',
 'United States',
 'New Zealand',
 'United States',
 'Switzerland',
 'United Kingdom',
 'United Kingdom',
 'United States',
 'United Kingdom',
 'Germany',
 'Israel',
 'Australia',
 'Germany',
 'United Kingdom',
 'United States',
 'United States',
 'United Kingdom',
 'Switzerland',
 'United States',
 'United States',
 'Switzerland',
 'United States',
 'United States',
 'United States',
 'United States',
 'United States',
 'Denmark',
 

In [27]:
## objects with html
objects = all_info.find_all("div", "object_count")
objects

[<div class="d-inline object_count" count_default="944">944</div>,
 <div class="d-inline object_count" count_default="518">518</div>,
 <div class="d-inline object_count" count_default="393">393</div>,
 <div class="d-inline object_count" count_default="350">350</div>,
 <div class="d-inline object_count" count_default="285">285</div>,
 <div class="d-inline object_count" count_default="283">283</div>,
 <div class="d-inline object_count" count_default="202">202</div>,
 <div class="d-inline object_count" count_default="188">188</div>,
 <div class="d-inline object_count" count_default="179">179</div>,
 <div class="d-inline object_count" count_default="154">154</div>,
 <div class="d-inline object_count" count_default="148">148</div>,
 <div class="d-inline object_count" count_default="122">122</div>,
 <div class="d-inline object_count" count_default="92">92</div>,
 <div class="d-inline object_count" count_default="81">81</div>,
 <div class="d-inline object_count" count_default="74">74</div>,
 

In [28]:
## objects without html
objects_list = [object_count.get_text() for object_count in objects]
objects_list

['944',
 '518',
 '393',
 '350',
 '285',
 '283',
 '202',
 '188',
 '179',
 '154',
 '148',
 '122',
 '92',
 '81',
 '74',
 '72',
 '71',
 '69',
 '64',
 '55',
 '53',
 '48',
 '46',
 '43',
 '37',
 '35',
 '32',
 '32',
 '28',
 '24',
 '23',
 '23',
 '23',
 '20',
 '18',
 '18',
 '17',
 '16',
 '15',
 '15',
 '14',
 '14',
 '13',
 '12',
 '10',
 '10',
 '9',
 '9',
 '9',
 '8',
 '8',
 '8',
 '8',
 '8',
 '7',
 '7',
 '7',
 '7',
 '7',
 '6',
 '6',
 '5',
 '5',
 '5',
 '5',
 '5',
 '4',
 '4',
 '4',
 '4',
 '4',
 '4',
 '4',
 '3',
 '3',
 '3',
 '3',
 '3',
 '3',
 '3',
 '3',
 '3',
 '3',
 '3',
 '3',
 '3',
 '3',
 '2',
 '2',
 '2',
 '2',
 '2',
 '2',
 '2',
 '2',
 '2',
 '2',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1',
 '1']

In [29]:
## convert to list of tuples
disputed_artifacts = []
for item in zip(museums_list, countries_list, objects_list):
    disputed_artifacts.append(item)
    
disputed_artifacts

[('British Museum', 'United Kingdom', '944'),
 ('Ethnologisches Museum, Staatliche Museen zu Berlin', 'Germany', '518'),
 ('Field Museum', 'United States', '393'),
 ('Museum of Archaeology and Anthropology, University of Cambridge',
  'United Kingdom',
  '350'),
 ('National Museum, Benin', 'Nigeria', '285'),
 ('Staatliche Ethnographische Sammlungen Sachsen und Staatliche Kunstsammlungen Dresden',
  'Germany',
  '283'),
 ('Weltmuseum Wien', 'Austria', '202'),
 ('University of Pennsylvania Museum of Archaeology and Anthropology (Penn Museum)',
  'United States',
  '188'),
 ('MARKK Museum am Rothenbaum Kulturen und Künste der Welt', 'Germany', '179'),
 ('Metropolitan Museum of Art', 'United States', '154'),
 ('Pitt Rivers Museum', 'United Kingdom', '148'),
 ('Nationaal Museum van Wereldculturen and Wereldmuseum',
  'Netherlands',
  '122'),
 ('Rautenstrauch-Joest-Museum', 'Germany', '92'),
 ('National Museum, Lagos', 'Nigeria', '81'),
 ('National Museums Scotland', 'United Kingdom', '74'),

In [30]:
## convert to dataframe, with column headers
df = pd.DataFrame(disputed_artifacts)
df.columns =["Museum", "Country", "Count_Disputed_Artifacts"]

df

Unnamed: 0,Museum,Country,Count_Disputed_Artifacts
0,British Museum,United Kingdom,944
1,"Ethnologisches Museum, Staatliche Museen zu Be...",Germany,518
2,Field Museum,United States,393
3,"Museum of Archaeology and Anthropology, Univer...",United Kingdom,350
4,"National Museum, Benin",Nigeria,285
...,...,...,...
126,"Allen Memorial Art Museum, Oberlin College",United States,1
127,Newark Museum of Art,United States,1
128,LACMA The Los Angeles County Museum of Art,United States,1
129,Hood Museum of Art,United States,1
