## Getting the artist's exhibitions (announcements related to him)

In [2]:
artist = "A.K. Burns"
url = f"https://www.e-flux.com/announcements/?c[]=Contemporary%20Art&p[]={artist}" #Could replace %20 with a space

In [3]:
import html
import requests
from bs4 import BeautifulSoup

def get_html_of_artist(artist, contemporary = False):
    if contemporary:
        url = f"https://www.e-flux.com/announcements/?c[]=Contemporary%20Art&p[]={artist}"
    else:
        url = f"https://www.e-flux.com/announcements/?p[]={artist}"

    response = requests.get(url)
    html = response.text
    return html

In [4]:
artist = ""
html = get_html_of_artist(artist)
soup = BeautifulSoup(html, 'html.parser')

The HTML we get back:

In [5]:
soup

<!DOCTYPE html>

<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html;charset=utf-8" http-equiv="Content-Type"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width, initial-scale=1, maximum-scale=1, minimum-scale=1" name="viewport"/>
<title>Announcements - e-flux</title>
<meta content="en" name="DC.LANGUAGE"/>
<link href="/styles/cbplayer.css?v=20220809025938" media="all" rel="stylesheet" type="text/css"/>
<link href="/styles/cblightbox.css?v=20210802045344" media="all" rel="stylesheet" type="text/css"/>
<link href="/styles/daterangepicker.css" media="all" rel="stylesheet" type="text/css"/>
<link href="/styles/main.css?v=20240226171903" media="all" rel="stylesheet" type="text/css"/>
<meta content="article" property="og:type"/>
<meta content="Announcements - e-flux" property="og:title"/>
<meta content="summary_large_image" name="twitter:card"/>
<meta content="Announcements - e-flux" name="twitter:title"

--------------------------------------------------------------------------------------------

- We need to find the exhibitions from the code.<br>
Luckily, they seem to be stored all in the same format:

Example of what we're looking for in the HTML code:

```HTML

<a class="preview-announcement__title" href="/announcements/165278/lia-gangitano-to-receive-the-2018-audrey-irmas-award-for-curatorial-excellence/">
Lia Gangitano to receive the 2018 Audrey Irmas Award for Curatorial Excellence
</a>


or:

<a class="preview-announcement__title" href="/announcements/528799/rirkrit-tiravanijawe-don-t-recognise-what-we-don-t-see/">
Rirkrit Tiravanija<br/><em>We Don’t Recognise What We Don’t See</em>
</a>
```

### **Get announcements: easy case**

In [6]:
import numpy as np

def get_announcements_of_artist(artist, contemporary = False):
    html = get_html_of_artist(artist, contemporary)
    soup = BeautifulSoup(html, "html.parser")
    announcements = soup.find_all("a", class_="preview-announcement__title") #All announcements are in <a> tags with this class, see website
    return announcements

def process_announcements(announcements):
    announcements_list = []
    for announcement in announcements:
        announcement_dict = {}
        announcement_dict['id'] = announcement["href"].split("/")[2] #the link is in the form of /announcements/123456/linktext.../
        announcement_dict['link'] = announcement["href"]
    
        try:
            announcement_dict['id']= int(announcement_dict['id'])
        except:
            print(f"'ID' problem with announcement: {announcement}, id: {announcement_dict['id']}")

        #TODO announcement_dict['artist_name'] = TODO (after <a> tag, before </br> tag, but may not have a name)
            
        announcement_dict['title'] = None
        try:
            announcement_dict['title'] = announcement.find("em").text.strip()
        except:
            try:
                announcement_dict['title'] = announcement.text.strip()
            except:
                print(f"Problem with announcement: {announcement}")
        
        announcements_list.append(announcement_dict)

    return announcements_list

[{'id': '515809', 'link': '/announcements/515809/picasso-sculptor-matter-and-body/', 'title': 'Picasso Sculptor: Matter and Body', 'artists': ['Picasso Sculptor', 'Announcements Picasso Sculptor', 'Body Share Email Facebook Twitter Copy Link Link Copied', 'Share Email Facebook Twitter Copy Link Link Copied', 'Subscribe Announcements Journal Architecture Criticism Film Notes Books Index Projects Education Podcasts Shop Events Bar Laika', 'App Contact About Directory Events Bar Laika', 'App Facebook Twitter Instagram We', 'Classon Avenue Brooklyn', 'Subscribe Contact About Privacy', 'Directory Facebook Twitter Instagram We', 'Classon Avenue Brooklyn', 'More Less Category Sculpture Modernism Subject Cubism Participants Pablo Picasso Share Email Facebook Twitter Copy Link Link Copied', 'Related See', 'Announcements Share Email Facebook Twitter Copy Link Link Copied', 'Subscribe Picasso Sculptor', 'Body Guggenheim Bilbao Pablo Picasso', 'Bernard Ruiz', 'Museo Picasso', 'Pablo Picasso', 'Sha

In [7]:
announcements = get_announcements_of_artist("Pablo Picasso")
process_announcements(announcements)

[{'id': 515809,
  'link': '/announcements/515809/picasso-sculptor-matter-and-body/',
  'title': 'Picasso Sculptor: Matter and Body'},
 {'id': 512737,
  'link': '/announcements/512737/chagall-matisse-mir-made-in-paris/',
  'title': 'Chagall, Matisse, Miró: Made in Paris'},
 {'id': 552019,
  'link': '/announcements/552019/picasso-ceramics/',
  'title': 'Picasso Ceramics'},
 {'id': 510210,
  'link': '/announcements/510210/jammie-holmesmake-the-revolution-irresistible/',
  'title': 'Make the Revolution Irresistible'},
 {'id': 509314,
  'link': '/announcements/509314/outstanding/',
  'title': 'Outstanding!'},
 {'id': 530487,
  'link': '/announcements/530487/picasso-untitled/',
  'title': 'Picasso: Untitled'},
 {'id': 534375,
  'link': '/announcements/534375/call-for-applications-artistic-director/',
  'title': 'Call for applications: Artistic Director'},
 {'id': 509744,
  'link': '/announcements/509744/no-feeling-is-final-the-skopje-solidarity-collection/',
  'title': 'No Feeling Is Final. 

<details><summary><u>Special cases that we need to consider:</u></summary>

```Python
t = """
<a class="preview-announcement__title" href="/announcements/528799/rirkrit-tiravanijawe-don-t-recognise-what-we-don-t-see/">
Rirkrit Tiravanija<br/><em>We Don’t Recognise What We Don’t See</em>
</a>
"""
soup = BeautifulSoup(t, "html.parser")

# Find the 'em' tag
em_tag = soup.find('em')
# If the 'em' tag exists, extract its text
name = em_tag.get_text() if em_tag else None
```

### **Get announcements: hard case**

The problem is we can only gather maximum 30 exhibitions per artist this way, because the website only shows 30 exhibitions per artist if we don't scroll. We need to use something like Selenium to scroll the page and get all the exhibitions.



Here is what GPT-4 suggested, works for Picasso:


In [32]:
import asyncio
from pyppeteer import launch
from bs4 import BeautifulSoup

async def get_artist_names_from_announcement(page, link):
    full_link = f"https://www.e-flux.com{link}"
    await page.goto(full_link)
    content = await page.content()
    soup = BeautifulSoup(content, 'html.parser')

    # Find the "Participants" section and then extract artist names within that section
    participants_section = soup.find('h6', text='Participants')
    if participants_section:
        artist_names_elements = participants_section.find_next_sibling('div', class_='sidebar-list').find_all('a')
        artist_names = [element.text.strip() for element in artist_names_elements]
    else:
        artist_names = []

    return artist_names


async def process_announcements(page, announcements):
    announcements_data = []
    for announcement in announcements:
        announcement_data = {
            'id': announcement['href'].split('/')[2],
            'link': announcement['href'],
            'title': announcement.text.strip()
        }
        
        # Fetch and store artist names for this announcement
        announcement_data['artists'] = await get_artist_names_from_announcement(page, announcement['href'])
        
        announcements_data.append(announcement_data)
    
    return announcements_data

async def scrape_infinite_scroll_page(artist, contemporary=False):
    if contemporary:
        url = f"https://www.e-flux.com/announcements/?c[]=Contemporary%20Art&p[]={artist}"
    else:
        url = f"https://www.e-flux.com/announcements/?p[]={artist}"
    
    browser = await launch(headless=True)
    page = await browser.newPage()
    await page.goto(url)

    last_height = await page.evaluate('document.body.scrollHeight')

    while True:
        await page.evaluate('window.scrollTo(0, document.body.scrollHeight);')
        await asyncio.sleep(1)
        new_height = await page.evaluate('document.body.scrollHeight')
        if new_height == last_height:
            break
        last_height = new_height

    page_content = await page.content()
    soup = BeautifulSoup(page_content, 'html.parser')
    announcements = soup.find_all("a", class_="preview-announcement__title")
    
    processed_announcements = await process_announcements(page, announcements)
    
    await browser.close()
    return processed_announcements

# Assuming you're running in an async environment like Jupyter Notebook
artist_name = "Pablo Picasso"
announcements_data = await scrape_infinite_scroll_page(artist_name)
print(announcements_data)


  participants_section = soup.find('h6', text='Participants')


[{'id': '515809', 'link': '/announcements/515809/picasso-sculptor-matter-and-body/', 'title': 'Picasso Sculptor: Matter and Body', 'artists': ['Pablo Picasso']}, {'id': '512737', 'link': '/announcements/512737/chagall-matisse-mir-made-in-paris/', 'title': 'Chagall, Matisse, Miró: Made in Paris', 'artists': ['Henri Matisse', 'Marc Chagall', 'Joan Miró', 'Pablo Picasso']}, {'id': '552019', 'link': '/announcements/552019/picasso-ceramics/', 'title': 'Picasso Ceramics', 'artists': ['Pablo Picasso']}, {'id': '510210', 'link': '/announcements/510210/jammie-holmesmake-the-revolution-irresistible/', 'title': 'Jammie HolmesMake the Revolution Irresistible', 'artists': ['Jammie Holmes', 'Mark Bradford', 'Anselm Kiefer', 'Emory Douglas', 'María Elena Ortiz', 'Martin Puryear', 'Agnes Martin', 'Njideka Akunyili Crosby', 'Pablo Picasso', 'National Gallery of Art', 'Dallas Museum of Art', 'Kehinde Wiley', 'Duke University', 'Philip Guston', 'Tadao Ando', 'Nasher Museum of Art', 'Teresita Fernández', 

With the announcements gathered we need to go into each announcement and get all participating artists.

In [26]:


print(announcements)

[{'id': 515809, 'link': '/announcements/515809/picasso-sculptor-matter-and-body/', 'title': 'Picasso Sculptor: Matter and Body'}, {'id': 512737, 'link': '/announcements/512737/chagall-matisse-mir-made-in-paris/', 'title': 'Chagall, Matisse, Miró: Made in Paris'}, {'id': 552019, 'link': '/announcements/552019/picasso-ceramics/', 'title': 'Picasso Ceramics'}, {'id': 510210, 'link': '/announcements/510210/jammie-holmesmake-the-revolution-irresistible/', 'title': 'Make the Revolution Irresistible'}, {'id': 509314, 'link': '/announcements/509314/outstanding/', 'title': 'Outstanding!'}, {'id': 530487, 'link': '/announcements/530487/picasso-untitled/', 'title': 'Picasso: Untitled'}, {'id': 534375, 'link': '/announcements/534375/call-for-applications-artistic-director/', 'title': 'Call for applications: Artistic Director'}, {'id': 509744, 'link': '/announcements/509744/no-feeling-is-final-the-skopje-solidarity-collection/', 'title': 'No Feeling Is Final. The Skopje Solidarity Collection'}, {'i

In [None]:
len(soup.find_all("a", class_="preview-announcement__title"))

176

Works with Picasso, we got 176 out of 176 exhibitions

<details><summary><u> Case when we want to click "Load more" button (might need fixes)</u></summary>

```python

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import time

# Create a new instance of the Firefox driver
driver = webdriver.Firefox()

# Go to your page url
url_Picasso = "https://www.e-flux.com/announcements/?p[]=Pablo%20Picasso"
driver.get(url_Picasso)

# Wait for the "Load more" button to become clickable
wait = WebDriverWait(driver, 10)

while True:
    try:
        # Wait until the "Load more" button is clickable, and then click it
        load_more_button = wait.until(EC.element_to_be_clickable((By.XPATH, '//button[text()="Load more"]')))
        load_more_button.click()

        # Wait for the page to load
        time.sleep(2)
    except Exception as e:
        # If an error occurs, that probably means there's no more "Load more" button
        break

# Now you can parse the page source with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser')

# Remember to close the driver
driver.quit()

```

## Collect exhibitions of all artists



Maybe we collect the data for those artists, which are among the contemporary ones, but we don't just collect the contemporary exhibitions by them, but all of their exhibitions? Seems reasonable

In [None]:
 by Pablo Picasso. Important loans from other museums as well as from private collections complete the show. Selected paintings illustrate the connections between graphic art and painting. Also on display are lithographic posters by artists such as Théophile-Alexandre Steinlen, Henri de Toulouse-Lautrec, and Pablo Picasso, which were also created in Paris. Designed by the artists themselves, but produced in large numbers, they fulfilled the claim of art for everybody.