### Complete list of De Kooning's one-man exibitions. 
In our research we considered also the number of exibitions and the venue of the exibitions as parameters in order to check whether the reputation of the artist has changed over the years. However, no complete dataset on artists' exhibitions was found. In order to get an idea on how many exibitions have been covered on catalogues, and, in particular, how many exhibitions are traced by bibliographic records on BnF and Gallica, we needed a "ground truth" to state if those sources of information could be somehow comprehensive. 

The case study is Willem de Kooning, since all data about exibitions are uploaded on the website of the Willem de Kooning Foundation. 
The result of the webscraping are shown here, with a total of 131 exhibitions, 79 possess a catalogue. 

In bibliography_DK.ipynb extraction from SPARQL endpoint of BnF and Google Books API has been done in order to get all bibliographic records on De Kooning - 31 of them are records on exhibitions. So 39% of exibitions with catalogue are present in that dataset, 23% of the total exhibitions are covered. There's also the need to say that the bibliographic records extracted do not concern only one-man shows, so they include further shows that are not present in the dataset reported here below. 


In [17]:
import requests 
from bs4 import BeautifulSoup as bs 

URLs = [
    'https://www.dekooning.org/the-artist/exhibitions/past/one-man/1940',
    'https://www.dekooning.org/the-artist/exhibitions/past/one-man/1950',
    'https://www.dekooning.org/the-artist/exhibitions/past/one-man/1960',
    'https://www.dekooning.org/the-artist/exhibitions/past/one-man/1970',
    'https://www.dekooning.org/the-artist/exhibitions/past/one-man/1980',
    'https://www.dekooning.org/the-artist/exhibitions/past/one-man/1990',
    'https://www.dekooning.org/the-artist/exhibitions/past/one-man/2000',
    'https://www.dekooning.org/the-artist/exhibitions/past/one-man/2010'
] 

titles_list = []

for url in URLs: 
    req = requests.get(url) 
    soup = bs(req.text, 'html.parser') 
    
    titles = soup.find_all('p', class_="unit_title spacing_03") 
    
    for title in titles:
        titles_list.append(title.text.strip().replace("\xa0\n", ";").replace("\xa0", ";").replace('\n',';'))


titles_list

['de Kooning;Charles Egan Gallery, New York, New York, (4/12/1948 to 5/12/1948), no catalogue.',
 'Willem de Kooning;Charles Egan Gallery, New York, New York, (4/1/1951 to 4/30/1951), no catalogue.',
 'Willem de Kooning:  Paintings on the Theme of the Woman;Sidney Janis Gallery, New York, New York, (3/16/1953 to 4/11/1953), no catalogue.',
 'Retrospective (de Kooning, 1935-53);Organized by School of the Museum of Fine Arts, Boston, Massachusetts, (4/21/1953 to 7/3/1953), catalogue.',
 'Recent Oils by Willem de Kooning;Martha Jackson Gallery, New York, New York, (11/9/1955 to 12/3/1955), catalogue.',
 'Willem de Kooning:  Recent Paintings;Sidney Janis Gallery, New York, New York, (4/2/1956 to 4/28/1956), no catalogue.',
 'Willem de Kooning;Sidney Janis Gallery, New York, New York, (5/4/1959 to 6/1/1959), no catalogue.',
 'Willem de Kooning;Paul Kantor Gallery, Beverly Hills, California, (4/3/1961 to 4/29/1961), catalogue.',
 'Recent Paintings by Willem de Kooning;Sidney Janis Gallery, N

In [18]:
replacements = {
    "Inc.": "Inc.",
    "and": "and",
    "Science": "Science",
    "Ontario": "",
    "The": "The",
    "Palazzo": "Palazzo",
    "Droll": "Droll",
    "Fourcade": " Fourcade",
    "University": "University",
    "Ishibashi": "Ishibashi",
    "Smithsonian": "Smithsonian",
    "Millbrook": "Millbrook",
    "Seattle": " Seattle",
    "World": "World",
    "Carnegie": "Carnegie",
    "Akademie ": "Akademie ",
    "Berkeley": "Berkeley",
    "Wellesley": "Wellesley",
    "Mitchell-Innes": "Mitchell-Innes",
    "Art": "Art",
    "Colorado": "Colorado"
}

new_list = []

# Iterate through each string in the original list
for item in titles_list:
    # Replace ';(' with ' ('
    item = item.replace(';(', ' (')
    
    # Find the index of the first occurrence of "catalogue." or "brochure."
    catalogue_index = item.find("catalogue.")
    brochure_index = item.find("brochure.")
    
    # Determine the index of the first occurrence among "catalogue." and "brochure."
    if catalogue_index != -1 and brochure_index != -1:
        first_occurrence_index = min(catalogue_index, brochure_index)
    elif catalogue_index != -1:
        first_occurrence_index = catalogue_index
    elif brochure_index != -1:
        first_occurrence_index = brochure_index
    else:
        first_occurrence_index = len(item)
    
    # Slice the string up to the first occurrence
    item = item[:first_occurrence_index + len("catalogue.")]
    
    parts = item.split(';')

    if len(parts) == 2:
        second_part = parts[1]
        # Iterate through each keyword in the replacements dictionary
        for keyword, replacement in replacements.items():
            if ", " in second_part and keyword in second_part:
                # Get the index of the keyword
                keyword_index = second_part.index(keyword)
                # Get the index of the last ", " before the keyword
                comma_index = second_part.rfind(", ", 0, keyword_index)
                # Replace ", " with " " before the keyword
                if comma_index != -1:  # Ensure ", " was found before the keyword
                    second_part = second_part[:comma_index] + " " + second_part[comma_index + 2:]
                # Replace the keyword with the corresponding replacement
                second_part = second_part.replace(keyword, replacement)
        # Split the second part (after ';') by ','
        second_parts = second_part.split(',')
        # Remove the third element if the length is greater than 5
        if len(second_parts) > 5:
            del second_parts[1]
        # Create a sublist with the first part and the second parts
        sublist = [parts[0]] + second_parts
        # Append the sublist to the new list
        new_list.append(sublist)

print(new_list)


[['de Kooning', 'Charles Egan Gallery', ' New York', ' New York', ' (4/12/1948 to 5/12/1948)', ' no catalogue.'], ['Willem de Kooning', 'Charles Egan Gallery', ' New York', ' New York', ' (4/1/1951 to 4/30/1951)', ' no catalogue.'], ['Willem de Kooning:  Paintings on the Theme of the Woman', 'Sidney Janis Gallery', ' New York', ' New York', ' (3/16/1953 to 4/11/1953)', ' no catalogue.'], ['Retrospective (de Kooning, 1935-53)', 'Organized by School of the Museum of Fine Arts', ' Boston', ' Massachusetts', ' (4/21/1953 to 7/3/1953)', ' catalogue.'], ['Recent Oils by Willem de Kooning', 'Martha Jackson Gallery', ' New York', ' New York', ' (11/9/1955 to 12/3/1955)', ' catalogue.'], ['Willem de Kooning:  Recent Paintings', 'Sidney Janis Gallery', ' New York', ' New York', ' (4/2/1956 to 4/28/1956)', ' no catalogue.'], ['Willem de Kooning', 'Sidney Janis Gallery', ' New York', ' New York', ' (5/4/1959 to 6/1/1959)', ' no catalogue.'], ['Willem de Kooning', 'Paul Kantor Gallery', ' Beverly H

In [19]:
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

# Initialize empty lists for each column
exhibition = []
venue = []
city = []
state = []
date = []
catalogue = []

# Populate the lists from the data in x
for i in new_list:
    if len(i) >= 6:
        exhibition.append(i[0])
        venue.append(i[1])
        city.append(i[2])
        state.append(i[3])
        date.append(i[4])
        catalogue.append(i[5])


df = pd.DataFrame(columns=["Exhibition_name", "Venue", "City", 'State', 'date', 'catalogue'])

df['Exhibition_name'] = exhibition
df['Venue'] = venue
df['City'] = city
df['State'] = state
df['date'] = date
df['catalogue'] = catalogue

df

Unnamed: 0,Exhibition_name,Venue,City,State,date,catalogue
0,de Kooning,Charles Egan Gallery,New York,New York,(4/12/1948 to 5/12/1948),no catalogue.
1,Willem de Kooning,Charles Egan Gallery,New York,New York,(4/1/1951 to 4/30/1951),no catalogue.
2,Willem de Kooning: Paintings on the Theme of ...,Sidney Janis Gallery,New York,New York,(3/16/1953 to 4/11/1953),no catalogue.
3,"Retrospective (de Kooning, 1935-53)",Organized by School of the Museum of Fine Arts,Boston,Massachusetts,(4/21/1953 to 7/3/1953),catalogue.
4,Recent Oils by Willem de Kooning,Martha Jackson Gallery,New York,New York,(11/9/1955 to 12/3/1955),catalogue.
5,Willem de Kooning: Recent Paintings,Sidney Janis Gallery,New York,New York,(4/2/1956 to 4/28/1956),no catalogue.
6,Willem de Kooning,Sidney Janis Gallery,New York,New York,(5/4/1959 to 6/1/1959),no catalogue.
7,Willem de Kooning,Paul Kantor Gallery,Beverly Hills,California,(4/3/1961 to 4/29/1961),catalogue.
8,Recent Paintings by Willem de Kooning,Sidney Janis Gallery,New York,New York,(3/5/1962 to 3/31/1962),catalogue.
9,'Woman' Drawings by Willem de Kooning,James Goodman Gallery,Buffalo,New York,(1/10/1964 to 1/25/1964),catalogue.


In [20]:
count_dict = {}
for element in df['catalogue']:
    if element not in count_dict:
        count_dict[element] = 0  
    else:
        count_dict[element] += 1  

print(count_dict)

{' no catalogue.': 46, ' catalogue.': 78, ' catalogue': 1, ' brochure.': 3}


In [21]:
import pandas as pd

def extract_date_range(date_str):
    # Split the date range string
    dates = date_str.strip('()').split(' to ')
    # Convert each date to datetime format
    return [pd.to_datetime(date, errors='coerce') for date in dates]

# Apply the function to the 'date' column to create a new column with datetime objects
df['date_range'] = df['date'].apply(extract_date_range)

# Function to check if any part of the date range falls within the desired range (2004-2014)
def within_desired_range(date_range):
    start_date = pd.Timestamp('2004-01-01')
    end_date = pd.Timestamp('2014-12-31')
    return any((start_date <= date <= end_date) for date in date_range if pd.notnull(date))

# Filter rows based on whether any part of the date range falls within the desired range
filtered_df = df[df['date_range'].apply(within_desired_range)]

filtered_df.reset_index(drop=True, inplace=True)

filtered_df.drop('date_range', axis=1, inplace=True)

# Display the filtered DataFrame
filtered_df


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered_df.drop('date_range', axis=1, inplace=True)


Unnamed: 0,Exhibition_name,Venue,City,State,date,catalogue
0,Willem de Kooning: Works on Paper and Selected...,Organized by Paul Thiebaud Gallery,San Francisco,California,(1/8/2002 to 6/18/2005),catalogue.
1,Willem de Kooning: A Centennial Exhibition,Gagosian Gallery,New York,New York,(4/24/2004 to 6/19/2004),catalogue.
2,de Kooning: Paintings from the Forties and Fi...,Richard Gray Gallery,New York,New York,(5/1/2004 to 5/29/2004),no catalogue.
3,"Garden in Delft: de Kooning Landscapes, 1928-...",Mitchell-Innes & Nash,New York,New York,(5/3/2004 to 6/26/2004),catalogue.
4,Willem de Kooning,Organized by BA-CA Kunstforum,Vienna,Austria,(1/13/2005 to 7/3/2005),catalogue.
5,"Willem de Kooning: Zeichnungen, Aquarelle, Pas...",Galerie Fred Jahn,Munich,Germany,(9/9/2005 to 10/14/2005),no catalogue.
6,"Willem de Kooning: Paintings, 1975-1978",L&M Arts,New York,New York,(4/20/2006 to 6/3/2006),catalogue.
7,Willem de Kooning: Sketchbook,Matthew Marks Gallery,New York,New York,(5/5/2006 to 6/17/2006),catalogue.
8,Willem de Kooning: Slipping Glimpses 1920s to...,Allan Stone Gallery Art Basel Miami Beach,Miami Beach,Florida,(12/7/2006 to 12/10/2006),catalogue.
9,Willem de Kooning: Women,Craig F. Starr Associates,New York,New York,(4/12/2007 to 6/8/2007),brochure.
