### Complete list of De Kooning's one-man exibitions. 
In our research we considered also the number of exibitions and the venue of the exibitions as parameters in order to check whether the reputation of the artist has changed over the years. However, no complete dataset on artists' exhibitions was found. In order to get an idea on how many exibitions have been covered on catalogues, and, in particular, how many exhibitions are traced by bibliographic records on BnF and Gallica, we needed a "ground truth" to state if those sources of information could be somehow comprehensive. 

The case study is Willem de Kooning, since all data about exibitions are uploaded on the website of the Willem de Kooning Foundation. 
The result of the webscraping are shown here, with a total of 131 exhibitions, 81 possess a catalogue. 

In bibliography_DK.ipynb extraction from SPARQL endpoint of BnF and Google Books API has been done in order to get all bibliographic records on De Kooning - 31 of them are records on exhibitions. So 39% of exibitions with catalogue are present in that dataset, 23% of the total exhibitions are covered. There's also the need to say that the bibliographic records extracted do not concern only one-man shows, so they include further shows that are not present in the dataset reported here below. 


In [1]:
import requests 
from bs4 import BeautifulSoup as bs 

URLs = [
    'https://www.dekooning.org/the-artist/exhibitions/past/one-man/1940',
    'https://www.dekooning.org/the-artist/exhibitions/past/one-man/1950',
    'https://www.dekooning.org/the-artist/exhibitions/past/one-man/1960',
    'https://www.dekooning.org/the-artist/exhibitions/past/one-man/1970',
    'https://www.dekooning.org/the-artist/exhibitions/past/one-man/1980',
    'https://www.dekooning.org/the-artist/exhibitions/past/one-man/1990',
    'https://www.dekooning.org/the-artist/exhibitions/past/one-man/2000',
    'https://www.dekooning.org/the-artist/exhibitions/past/one-man/2010'
] 

titles_list = []

for url in URLs: 
    req = requests.get(url) 
    soup = bs(req.text, 'html.parser') 
    
    titles = soup.find_all('p', class_="unit_title spacing_03") 
    
    for title in titles:
        titles_list.append(title.text.strip().replace("\xa0\n", ";").replace("\xa0", ";").replace('\n',';'))


titles_list

['de Kooning;Charles Egan Gallery, New York, New York, (4/12/1948 to 5/12/1948), no catalogue.',
 'Willem de Kooning;Charles Egan Gallery, New York, New York, (4/1/1951 to 4/30/1951), no catalogue.',
 'Willem de Kooning:  Paintings on the Theme of the Woman;Sidney Janis Gallery, New York, New York, (3/16/1953 to 4/11/1953), no catalogue.',
 'Retrospective (de Kooning, 1935-53);Organized by School of the Museum of Fine Arts, Boston, Massachusetts, (4/21/1953 to 7/3/1953), catalogue.',
 'Recent Oils by Willem de Kooning;Martha Jackson Gallery, New York, New York, (11/9/1955 to 12/3/1955), catalogue.',
 'Willem de Kooning:  Recent Paintings;Sidney Janis Gallery, New York, New York, (4/2/1956 to 4/28/1956), no catalogue.',
 'Willem de Kooning;Sidney Janis Gallery, New York, New York, (5/4/1959 to 6/1/1959), no catalogue.',
 'Willem de Kooning;Paul Kantor Gallery, Beverly Hills, California, (4/3/1961 to 4/29/1961), catalogue.',
 'Recent Paintings by Willem de Kooning;Sidney Janis Gallery, N

In [2]:
replacements = {
    "Inc.": "Inc.",
    "and": "and",
    "Science": "Science",
    "Ontario": "",
    "The": "The",
    "Palazzo": "Palazzo",
    "Droll": "Droll",
    "Fourcade": " Fourcade",
    "University": "University",
    "Ishibashi": "Ishibashi",
    "Smithsonian": "Smithsonian",
    "Millbrook": "Millbrook",
    "Seattle": " Seattle",
    "World": "World",
    "Carnegie": "Carnegie",
    "Akademie ": "Akademie ",
    "Berkeley": "Berkeley",
    "Wellesley": "Wellesley",
    "Mitchell-Innes": "Mitchell-Innes",
    "Art": "Art",
    "Colorado": "Colorado"
}

new_list = []

# Iterate through each string in the original list
for item in titles_list:
    # Replace ';(' with ' ('
    item = item.replace(';(', ' (')
    
    # Find the index of the first occurrence of "catalogue." or "brochure."
    catalogue_index = item.find("catalogue.")
    brochure_index = item.find("brochure.")
    
    # Determine the index of the first occurrence among "catalogue." and "brochure."
    if catalogue_index != -1 and brochure_index != -1:
        first_occurrence_index = min(catalogue_index, brochure_index)
    elif catalogue_index != -1:
        first_occurrence_index = catalogue_index
    elif brochure_index != -1:
        first_occurrence_index = brochure_index
    else:
        first_occurrence_index = len(item)
    
    # Slice the string up to the first occurrence
    item = item[:first_occurrence_index + len("catalogue.")]
    
    parts = item.split(';')

    if len(parts) == 2:
        second_part = parts[1]
        # Iterate through each keyword in the replacements dictionary
        for keyword, replacement in replacements.items():
            if ", " in second_part and keyword in second_part:
                # Get the index of the keyword
                keyword_index = second_part.index(keyword)
                # Get the index of the last ", " before the keyword
                comma_index = second_part.rfind(", ", 0, keyword_index)
                # Replace ", " with " " before the keyword
                if comma_index != -1:  # Ensure ", " was found before the keyword
                    second_part = second_part[:comma_index] + " " + second_part[comma_index + 2:]
                # Replace the keyword with the corresponding replacement
                second_part = second_part.replace(keyword, replacement)
        # Split the second part (after ';') by ','
        second_parts = second_part.split(',')
        # Remove the third element if the length is greater than 5
        if len(second_parts) > 5:
            del second_parts[1]
        # Create a sublist with the first part and the second parts
        sublist = [parts[0]] + second_parts
        # Append the sublist to the new list
        new_list.append(sublist)

print(new_list)


[['de Kooning', 'Charles Egan Gallery', ' New York', ' New York', ' (4/12/1948 to 5/12/1948)', ' no catalogue.'], ['Willem de Kooning', 'Charles Egan Gallery', ' New York', ' New York', ' (4/1/1951 to 4/30/1951)', ' no catalogue.'], ['Willem de Kooning:  Paintings on the Theme of the Woman', 'Sidney Janis Gallery', ' New York', ' New York', ' (3/16/1953 to 4/11/1953)', ' no catalogue.'], ['Retrospective (de Kooning, 1935-53)', 'Organized by School of the Museum of Fine Arts', ' Boston', ' Massachusetts', ' (4/21/1953 to 7/3/1953)', ' catalogue.'], ['Recent Oils by Willem de Kooning', 'Martha Jackson Gallery', ' New York', ' New York', ' (11/9/1955 to 12/3/1955)', ' catalogue.'], ['Willem de Kooning:  Recent Paintings', 'Sidney Janis Gallery', ' New York', ' New York', ' (4/2/1956 to 4/28/1956)', ' no catalogue.'], ['Willem de Kooning', 'Sidney Janis Gallery', ' New York', ' New York', ' (5/4/1959 to 6/1/1959)', ' no catalogue.'], ['Willem de Kooning', 'Paul Kantor Gallery', ' Beverly H

In [3]:
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

# Initialize empty lists for each column
exhibition = []
venue = []
city = []
state = []
date = []
catalogue = []

# Populate the lists from the data in x
for i in new_list:
    if len(i) >= 6:
        exhibition.append(i[0])
        venue.append(i[1])
        city.append(i[2])
        state.append(i[3])
        date.append(i[4])
        catalogue.append(i[5])


df = pd.DataFrame(columns=["Exhibition_name", "Venue", "City", 'State', 'date', 'catalogue'])

df['Exhibition_name'] = exhibition
df['Venue'] = venue
df['City'] = city
df['State'] = state
df['date'] = date
df['catalogue'] = catalogue

for i, item in enumerate(df['catalogue']):
    if "." in item:
        x = item.split('.')
        df.at[i, 'catalogue'] = x[0]
    
df.head()

Unnamed: 0,Exhibition_name,Venue,City,State,date,catalogue
0,de Kooning,Charles Egan Gallery,New York,New York,(4/12/1948 to 5/12/1948),no catalogue
1,Willem de Kooning,Charles Egan Gallery,New York,New York,(4/1/1951 to 4/30/1951),no catalogue
2,Willem de Kooning: Paintings on the Theme of ...,Sidney Janis Gallery,New York,New York,(3/16/1953 to 4/11/1953),no catalogue
3,"Retrospective (de Kooning, 1935-53)",Organized by School of the Museum of Fine Arts,Boston,Massachusetts,(4/21/1953 to 7/3/1953),catalogue
4,Recent Oils by Willem de Kooning,Martha Jackson Gallery,New York,New York,(11/9/1955 to 12/3/1955),catalogue


In [4]:
count_dict = {}
for element in df['catalogue']:
    if element not in count_dict:
        count_dict[element] = 1  
    else:
        count_dict[element] += 1  

print(count_dict)

{' no catalogue': 47, ' catalogue': 81, ' brochure': 4}


In [5]:
import pandas as pd

for i, item in enumerate(df['date']):

    parts = item.strip('()').split(' to ')
    modified_dates = []
    for date_str in parts:
        if "??/??" in date_str:
            month_and_year = date_str.split('/')[-1]  # Extract month and year
            modified_date = "01/01/" + month_and_year  # Replace day with "01"
            modified_dates.append(modified_date)
        elif '??' in date_str:
            month_and_year = date_str.split('/')[0] + '/01/' + date_str.split('/')[-1]
            modified_dates.append(month_and_year)
        else:
            modified_dates.append(date_str)
    df.at[i, 'date'] = ' to '.join(modified_dates)
    

def extract_starting_range(date_str):
    # Split the date range string by ' to ' or '-'
    dates = date_str.strip('( )').split(' to ')
    if len(dates) == 1:  # If ' to ' is not found, try splitting by '-'
        dates = date_str.strip('( )').split('\u2013')
    if len(dates) < 2:
        return None  
    
    starting_date = dates[0]
    
    # Convert the starting date to datetime format and extract the date part
    return pd.to_datetime(starting_date, errors='coerce').date()


def extract_ending_range(date_str):
    dates = date_str.strip('( )').split(' to ')
    if len(dates) == 1:  # If ' to ' is not found, try splitting by '-'
        dates = date_str.strip('( )').split('\u2013')
    if len(dates) < 2:
        return None  
    ending_date = dates[1]
    
    # Convert the ending date to datetime format and extract the date part
    return pd.to_datetime(ending_date, errors='coerce').date()


# Apply the function to the 'date' column to create a new column with datetime objects
df['startingdate'] = df['date'].apply(extract_starting_range)
df['endingdate'] = df['date'].apply(extract_ending_range)



df.reset_index(drop=True, inplace=True)

df.drop('date', axis=1, inplace=True)

# Display the filtered DataFrame
df.head()


Unnamed: 0,Exhibition_name,Venue,City,State,catalogue,startingdate,endingdate
0,de Kooning,Charles Egan Gallery,New York,New York,no catalogue,1948-04-12,1948-05-12
1,Willem de Kooning,Charles Egan Gallery,New York,New York,no catalogue,1951-04-01,1951-04-30
2,Willem de Kooning: Paintings on the Theme of ...,Sidney Janis Gallery,New York,New York,no catalogue,1953-03-16,1953-04-11
3,"Retrospective (de Kooning, 1935-53)",Organized by School of the Museum of Fine Arts,Boston,Massachusetts,catalogue,1953-04-21,1953-07-03
4,Recent Oils by Willem de Kooning,Martha Jackson Gallery,New York,New York,catalogue,1955-11-09,1955-12-03


webscraping books and articles 

In [1]:
import requests 
from bs4 import BeautifulSoup as bs 

URLs = [
'https://www.dekooning.org/the-artist/bibliography'
] 

titles_l = []

for url in URLs: 
    req = requests.get(url) 
    soup = bs(req.text, 'html.parser') 
    
    titles = soup.find_all('div', class_="unit_copy spacing_03") 
    
    for title in titles:
        titles_l.append(title.text.strip().replace("\xa0\n", ";").replace("\xa0", ";").replace('\n',' ').replace("\t", " ").replace(":;;", ": ").replace(";; With", " With"))


titles_l

['Greenberg, Clement.;;de Kooning, 1935-53.;; Exh. cat.;;Boston: School of the Museum of Fine Arts,;;1953.',
 'Hess, Thomas B.;;Willem de Kooning.;; The Great American Artists Series.;;New York: George Braziller,;;1959.',
 'Ashton, Dore, and Willem de Kooning.;;Willem de Kooning.;; Exh. cat.;;Northampton: Smith College Museum of Art,;;1965.',
 'de Kooning Drawings. With a statement by Willem de Kooning.;;New York: Walker & Company for M. Knoedler & Co.,;;1967.',
 'Drudi, Gabriella.;;Willem de Kooning.;;Milan: Fratelli Fabbri Editori,;;1972.',
 'Hess, Thomas B.;;Willem de Kooning Drawings.;;Greenwich: New York Graphic Society,;;1972.',
 'Rosenberg, Harold.;;Willem de Kooning. With statements by and interview with Willem de Kooning.;;New York: Harry N. Abrams, Inc.,;;1973.',
 'Waldman, Diane.;;Willem de Kooning in East Hampton.;; Exh. cat.;;New York: The Solomon R. Guggenheim Foundation,;;1978.',
 'Willem de Kooning: Pittsburgh International Series. With preface by Leon Anthony Arkus, se

In [2]:


new_lis = []

# Iterate through each string in the original list
for item in titles_l:

    
    parts = item.split(';;')
    

    new_lis.append(parts)

print(new_lis)


[['Greenberg, Clement.', 'de Kooning, 1935-53.', ' Exh. cat.', 'Boston: School of the Museum of Fine Arts,', '1953.'], ['Hess, Thomas B.', 'Willem de Kooning.', ' The Great American Artists Series.', 'New York: George Braziller,', '1959.'], ['Ashton, Dore, and Willem de Kooning.', 'Willem de Kooning.', ' Exh. cat.', 'Northampton: Smith College Museum of Art,', '1965.'], ['de Kooning Drawings. With a statement by Willem de Kooning.', 'New York: Walker & Company for M. Knoedler & Co.,', '1967.'], ['Drudi, Gabriella.', 'Willem de Kooning.', 'Milan: Fratelli Fabbri Editori,', '1972.'], ['Hess, Thomas B.', 'Willem de Kooning Drawings.', 'Greenwich: New York Graphic Society,', '1972.'], ['Rosenberg, Harold.', 'Willem de Kooning. With statements by and interview with Willem de Kooning.', 'New York: Harry N. Abrams, Inc.,', '1973.'], ['Waldman, Diane.', 'Willem de Kooning in East Hampton.', ' Exh. cat.', 'New York: The Solomon R. Guggenheim Foundation,', '1978.'], ['Willem de Kooning: Pittsbur

In [14]:
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

# Initialize empty lists for each column
author = []
title = []
type = []
publisher = []
date = []

# Populate the lists from the data in x
for i in new_lis:
    if len(i) >= 5:
        author.append(i[0])
        title.append(i[1])
        if ' Exh. cat.' in  i[2]  or 'PhD ' in i[2] or 'Series' in i[2]:
            type.append(i[2])
        else:
            type.append(None)
        publisher.append(i[3])
        date.append(i[4])
    elif len(i) >=4:
        author.append(i[0])
        title.append(i[1])
        type.append('Book')
        publisher.append(i[2])
        date.append(i[3])
        

# Create DataFrame
df = pd.DataFrame({"author": author, "title": title, "type": type, 'publisher': publisher, 'date': date})

    
df

Unnamed: 0,author,title,type,publisher,date
0,"Greenberg, Clement.","de Kooning, 1935-53.",Exh. cat.,"Boston: School of the Museum of Fine Arts,",1953.
1,"Hess, Thomas B.",Willem de Kooning.,The Great American Artists Series.,"New York: George Braziller,",1959.
2,"Ashton, Dore, and Willem de Kooning.",Willem de Kooning.,Exh. cat.,"Northampton: Smith College Museum of Art,",1965.
3,"Drudi, Gabriella.",Willem de Kooning.,Book,"Milan: Fratelli Fabbri Editori,",1972.
4,"Hess, Thomas B.",Willem de Kooning Drawings.,Book,"Greenwich: New York Graphic Society,",1972.
5,"Rosenberg, Harold.",Willem de Kooning. With statements by and inte...,Book,"New York: Harry N. Abrams, Inc.,",1973.
6,"Waldman, Diane.",Willem de Kooning in East Hampton.,Exh. cat.,"New York: The Solomon R. Guggenheim Foundation,",1978.
7,Willem de Kooning: Pittsburgh International Se...,Exh. cat.,Book,"Pittsburgh: Museum of Art, Carnegie Institute,",1979.
8,"Cummings, Paul, Jörn Merkert, and Claire Stoul...","Willem de Kooning: Drawings, Paintings, Sculpt...",Book,Exh. cat.,"New York: ;Whitney Museum of American Art, 19..."
9,"Gaugh, Harry F.",Willem de Kooning.,"Modern Masters Series, vol. 2.","New York: Abbeville Press,",1983.
