webscraping collections data and general bibliography


### query on data.bnf 

it's not possible to do a query directly using python, so this query was performed here <a href="https://data.bnf.fr/sparql/">data.bnf</a>:
```
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX bnf-onto: <http://data.bnf.fr/ontology/bnf-onto/>
SELECT * 
WHERE {
  ?work dct:title ?title ;
        dct:publisher ?publisher;
        dct:date ?date;

        rdfs:seeAlso ?uri ;
        bnf-onto:isbn ?isbn;
        dct:creator ?creator.
?creator foaf:name ?name.
  FILTER (bif:contains(?title, "De_Kooning"))}

```

has been done on the web endpoint and then the CSV was downloaded, this query is ok but NO AUTHOR is found.


```
PREFIX rdarelationships: <http://rdvocab.info/RDARelationshipsWEMI/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX bnf-onto: <http://data.bnf.fr/ontology/bnf-onto/>
SELECT distinct ?work ?title ?creatorname ?date ?isbn
WHERE {
  ?work dct:title ?title;
        dct:creator ?creator.
		?creator foaf:name ?creatorname.
  ?work dct:date ?date.
  ?work rdarelationships:expressionOfWork ?expression.
?manifestation rdarelationships:expressionManifested ?expression.
?work bnf-onto:isbn ?isbn. 
  FILTER (bif:contains(?title, "De_Kooning"))}
```

questa query ritorna anche l'autore, assieme ad ISBN, il problema è che i risultati tra la prima e la seconda query non corrispondono. Penso perchè si lavora a livello di expression (?)

In [1]:
import pandas as pd
query1 = pd.read_csv("databnf_DK.csv")
# pd.set_option('display.max_rows', None)
query1


Unnamed: 0,work,title,publisher,date,uri,isbn
0,http://data.bnf.fr/ark:/12148/cb45692271m#about,Le rire de De Kooning,"Bordeaux : Olympique , 2019",2019,https://catalogue.bnf.fr/ark:/12148/cb45692271m,978-2-9557550-6-8
1,http://data.bnf.fr/ark:/12148/cb45292173b#about,A way of living : the art of Willem De Kooning,"London : Phaidon Press Limited , 2017",2014,https://catalogue.bnf.fr/ark:/12148/cb45292173b,0714845817
2,http://data.bnf.fr/ark:/12148/cb45292173b#about,A way of living : the art of Willem De Kooning,"London : Phaidon Press Limited , 2017",2014,https://catalogue.bnf.fr/ark:/12148/cb45292173b,0714873160
3,http://data.bnf.fr/ark:/12148/cb45292173b#about,A way of living : the art of Willem De Kooning,"London : Phaidon Press Limited , 2017",2014,https://catalogue.bnf.fr/ark:/12148/cb45292173b,9780714845814
4,http://data.bnf.fr/ark:/12148/cb45292173b#about,A way of living : the art of Willem De Kooning,"London : Phaidon Press Limited , 2017",2014,https://catalogue.bnf.fr/ark:/12148/cb45292173b,9780714873169
...,...,...,...,...,...,...
65,http://data.bnf.fr/ark:/12148/cb47124975c#about,Willem de Kooning,"Lyon : Fage éditions , DL 2022",2022,https://catalogue.bnf.fr/ark:/12148/cb47124975c,978-2-84975-687-4
66,http://data.bnf.fr/ark:/12148/cb399985699#about,"Willem de Kooning, een portret","Leiden : Menken Kasander & Wigman , 2005",2005,https://catalogue.bnf.fr/ark:/12148/cb399985699,90-74622-53-4
67,http://data.bnf.fr/ark:/12148/cb46970628b#about,"Chaïm Soutine, Willem de Kooning, La peinture ...","Paris : ""Beaux-arts"" éditions , DL 2021",2021,https://catalogue.bnf.fr/ark:/12148/cb46970628b,979-10-204-0639-2
68,http://data.bnf.fr/ark:/12148/cb41066822t#about,"Willem De Kooning : works, writings and interv...","Barcelona : Ed. Polígrafa , 2007",2007,https://catalogue.bnf.fr/ark:/12148/cb41066822t,978-84-343-1138-1


In [2]:
for column_name in query1.columns:
    if column_name == "isbn":
        for i, value in enumerate(query1[column_name]):
            if "-" in value:
                # Replacing hyphens with empty string
                query1.at[i, column_name] = value.replace("-", "")
query1.head()

Unnamed: 0,work,title,publisher,date,uri,isbn
0,http://data.bnf.fr/ark:/12148/cb45692271m#about,Le rire de De Kooning,"Bordeaux : Olympique , 2019",2019,https://catalogue.bnf.fr/ark:/12148/cb45692271m,9782955755068
1,http://data.bnf.fr/ark:/12148/cb45292173b#about,A way of living : the art of Willem De Kooning,"London : Phaidon Press Limited , 2017",2014,https://catalogue.bnf.fr/ark:/12148/cb45292173b,714845817
2,http://data.bnf.fr/ark:/12148/cb45292173b#about,A way of living : the art of Willem De Kooning,"London : Phaidon Press Limited , 2017",2014,https://catalogue.bnf.fr/ark:/12148/cb45292173b,714873160
3,http://data.bnf.fr/ark:/12148/cb45292173b#about,A way of living : the art of Willem De Kooning,"London : Phaidon Press Limited , 2017",2014,https://catalogue.bnf.fr/ark:/12148/cb45292173b,9780714845814
4,http://data.bnf.fr/ark:/12148/cb45292173b#about,A way of living : the art of Willem De Kooning,"London : Phaidon Press Limited , 2017",2014,https://catalogue.bnf.fr/ark:/12148/cb45292173b,9780714873169


### query on google books api

In [3]:
import requests
import json

def fetch_books(query, max_results=40):
    base_url = "https://www.googleapis.com/books/v1/volumes"
    start_index = 0
    all_results = []

    while True:
        params = {
            "q": query,
            "startIndex": start_index,
            "maxResults": max_results
        }
        response = requests.get(base_url, params=params)

        if response.status_code == 200:
            data = response.json()
            items = data.get("items", [])
            if not items:
                break
            all_results.extend(items)
            start_index += max_results
        else:
            print("Failed to retrieve data. Status code:", response.status_code)
            break

    return all_results

cezanne_books = fetch_books("De Kooning")

# Saving JSON data to a file
with open("dkbooks.json", "w") as json_file:
    json.dump(cezanne_books, json_file, indent=4)

print("JSON data saved to dkbooks.json")

JSON data saved to dkbooks.json


In [4]:
import json
import pandas as pd

# Load JSON data from file
with open("dkbooks.json", "r") as json_file:
    cezanne_books_data = json.load(json_file)

# Extract relevant fields from each book item
books_list = []
for book in cezanne_books_data:
    book_info = {
        "Title": book["volumeInfo"].get("title", "N/A"),
        "Subtitle": book["volumeInfo"].get("subtitle", "N/A"),
        "Authors": ", ".join(book["volumeInfo"].get("authors", ["N/A"])),
        "Publisher": book["volumeInfo"].get("publisher", "N/A"),
        "PublishedDate": book["volumeInfo"].get("publishedDate", "N/A"),
        "isbn": book["volumeInfo"].get("industryIdentifiers", [{}])[0].get("identifier", "N/A"),  # Retrieving ISBN
    }
    books_list.append(book_info)

# Create DataFrame
books_df = pd.DataFrame(books_list)

# Display DataFrame
books_df.head()


Unnamed: 0,Title,Subtitle,Authors,Publisher,PublishedDate,isbn
0,"De Kooning, dipinti, disegni, sculture",,Willem De Kooning,,1985,UOM:39015015825683
1,De Kooning,,,,1985,OCLC:12250843
2,Willem De Kooning,late paintings,"Willem De Kooning, Museo Carlo Bilotti",Mondadori Electa,2006,UOM:39015066851935
3,De Kooning,A Retrospective,"Willem De Kooning, John Elderfield, Lauren Mah...",The Museum of Modern Art,2011,9780870707971
4,Willem de Kooning,,Carolyn Lanchner,The Museum of Modern Art,2011,9780870707889


In [5]:
liss = []
for column_name in books_df.columns:
    if column_name == "Title":
        for value in books_df[column_name]:
            if "de Kooning" in value or "De Kooning" in value:
                liss.append(value)
print(len(liss))

151


In [6]:
import pandas as pd

# Assuming books_df is your DataFrame
new_dataframe = books_df[books_df["Title"].str.contains("de Kooning", case=False) & ~books_df["Title"].str.contains("Elaine de Kooning", case=False)].copy()

new_dataframe.reset_index(drop=True, inplace=True)

new_dataframe.head()



Unnamed: 0,Title,Subtitle,Authors,Publisher,PublishedDate,isbn
0,"De Kooning, dipinti, disegni, sculture",,Willem De Kooning,,1985,UOM:39015015825683
1,De Kooning,,,,1985,OCLC:12250843
2,Willem De Kooning,late paintings,"Willem De Kooning, Museo Carlo Bilotti",Mondadori Electa,2006,UOM:39015066851935
3,De Kooning,A Retrospective,"Willem De Kooning, John Elderfield, Lauren Mah...",The Museum of Modern Art,2011,9780870707971
4,Willem de Kooning,,Carolyn Lanchner,The Museum of Modern Art,2011,9780870707889


In [7]:
import pandas as pd

# Merge the two DataFrames on the 'ISBN' column
df_combined = pd.merge(query1, new_dataframe, on='isbn', how='inner')

# Display the new DataFrame with rows where ISBN is found in both DataFrames
df_combined


Unnamed: 0,work,title,publisher,date,uri,isbn,Title,Subtitle,Authors,Publisher,PublishedDate
0,http://data.bnf.fr/ark:/12148/cb45292173b#about,A way of living : the art of Willem De Kooning,"London : Phaidon Press Limited , 2017",2014,https://catalogue.bnf.fr/ark:/12148/cb45292173b,0714873160,Willem de Kooning,A Way of Living,Judith Zilczer,Phaidon Press,2017-05-22
1,http://data.bnf.fr/ark:/12148/cb37526494b#about,"Willem de Kooning : drawings, paintings, sculp...",New York : Whitney museum of American art ; Mu...,1983,https://catalogue.bnf.fr/ark:/12148/cb37526494b,0393018407,Willem de Kooning,"Drawings, Paintings, Sculpture, [mostra Itiner...","Paul Cummings, Willem De Kooning",,1983
2,http://data.bnf.fr/ark:/12148/cb37526494b#about,"Willem de Kooning : drawings, paintings, sculp...",New York : Whitney museum of American art ; Mu...,1983,https://catalogue.bnf.fr/ark:/12148/cb37526494b,0393018407,Willem de Kooning,"Drawings, Paintings, Sculpture, [mostra Itiner...","Paul Cummings, Willem De Kooning",,1983
3,http://data.bnf.fr/ark:/12148/cb45288067c#about,"Willem De Kooning, Zao Wou-Ki : [exposition Lé...",New York : Lévy Gorvy,2017,https://catalogue.bnf.fr/ark:/12148/cb45288067c,1944379126,De Kooning - Zao Wou-KI,,,Dominique Levy Gallery,2017-03-28
4,http://data.bnf.fr/ark:/12148/cb42265321b#about,Willem de Kooning : the artist's materials,"Los Angeles : Getty conservation institute , c...",2010,https://catalogue.bnf.fr/ark:/12148/cb42265321b,9781606060216,Willem de Kooning,The Artist's Materials,Susan Lake,Getty Publications,2010
5,http://data.bnf.fr/ark:/12148/cb35710872j#about,"Willem De Kooning : paintings, [exhibition, Na...",Washington : National gallery of art ; New Hav...,1994,https://catalogue.bnf.fr/ark:/12148/cb35710872j,0894682040,Willem de Kooning,Paintings,"Marla Prather, Willem De Kooning, David Sylves...",,1994-01-01
6,http://data.bnf.fr/ark:/12148/cb34982520m#about,"Willem de Kooning, recent paintings, 1983-1986...","London : Anthony d'Offay gallery , 1986",1986,https://catalogue.bnf.fr/ark:/12148/cb34982520m,094756408X,Willem de Kooning,"Recent Paintings, 1983-1986",Willem De Kooning,Anthony D'Offay Gallery,1986-01-01
7,http://data.bnf.fr/ark:/12148/cb347686735#about,"De Kooning : petit journal de l'exposition, 28...","Paris : Centre Georges Pompidou , 1984",1984,https://catalogue.bnf.fr/ark:/12148/cb347686735,285850234X,De Kooning,petit journal de l'exposition : Musee national...,"Musée national d'art moderne (Paris), Whitney ...",,1984
8,http://data.bnf.fr/ark:/12148/cb38841654h#about,"Willem de Kooning : tracing the figure, [exhib...",Los Angeles : Museum of contemporary art ; Pri...,2002,https://catalogue.bnf.fr/ark:/12148/cb38841654h,069109618X,Willem de Kooning,Tracing the Figure,"Willem De Kooning, Cornelia H. Butler, Paul Sc...",Princeton University Press,2002
9,http://data.bnf.fr/ark:/12148/cb356992193#about,Willem De Kooning,"Paris : l'Échoppe , 1994",1994,https://catalogue.bnf.fr/ark:/12148/cb356992193,2840680297,Willem De Kooning,,Edwin Denby,,1994


In [8]:
import pandas as pd

# Assuming df1 and df2 are your two DataFrames

# Rename the 'Title' column in df2 to 'title'
new_dataframe.rename(columns={'Title': 'title'}, inplace=True)
new_dataframe.rename(columns={'PublishedDate': 'date'}, inplace=True)
new_dataframe.rename(columns={'Publisher': 'publisher'}, inplace=True)
# Concatenate the DataFrames vertically
combined_df = pd.concat([query1, new_dataframe], ignore_index=True)

# Drop duplicates based on 'isbn' column
new_df = combined_df.drop_duplicates(subset='isbn')
new_df = combined_df.drop_duplicates(subset=['title', 'publisher', 'date'], keep='first')
# Reset index of the new DataFrame
new_df.reset_index(drop=True, inplace=True)
new_df.drop(columns=['work', 'uri'], inplace=True)
index_column = new_df.columns.get_loc('Subtitle')

# Move the column to position 2
new_column_order = list(new_df.columns)
new_column_order.insert(1, new_column_order.pop(index_column))
new_df = new_df[new_column_order]

# Display the new DataFrame with unique rows based on ISBN and consistent column name 'title'
new_df


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df.drop(columns=['work', 'uri'], inplace=True)


Unnamed: 0,title,Subtitle,publisher,date,isbn,Authors
0,Le rire de De Kooning,,"Bordeaux : Olympique , 2019",2019,9782955755068,
1,A way of living : the art of Willem De Kooning,,"London : Phaidon Press Limited , 2017",2014,0714845817,
2,"Hartung et les peintres lyriques : Schneider, ...",,Landerneau : Fonds Hélène & Édouard Leclerc po...,2016,9791096209002,
3,Willem de Kooning : drawing seeing-seeing draw...,,"New York : Arena , 1998",1998,0965728080,
4,"Les irascibles : Pollock, De Kooning, Rothko e...",,"Paris : le Cherche midi , DL 2023",2023,9782749176703,
...,...,...,...,...,...,...
172,Écrits et propos de Willem De Kooning,Les Fiches de lecture d'Universalis,Encyclopaedia Universalis,2015-11-10,9782852296558,"Encyclopaedia Universalis,"
173,"Six Painters: Mondrian, Guston, Kline, De Koon...","Exhibition. Catalogue. Houston, Texas, Februar...",,1967,OCLC:1414760875,
174,"Piet Mondrian, Hans Hofmann, Willem de Kooning",europäische Künstler in den USA - amerikanisch...,,2003,3793093239,Tobias Lander
175,Willem De Kooning,"die späten Gemälde, die 80er Jahre",,1996,9069181681,


Check how many bibliographic records are actually exhibition catalogues. 

In [9]:
import pandas as pd

# Assuming df is your DataFrame

# Count total rows of the DataFrame
total_rows = len(new_df)

# Count rows where 'title', 'subtitle', or 'description' contain specified keywords
keyword_rows = new_df[new_df['title'].str.contains(r'exhibition|exhib\.|mostra|catalogue|catalogo|exposition|retrospective|Ausstellung', case=False, na=False) |
                  new_df['Subtitle'].str.contains(r'exhibition|exhib\.|mostra|catalogue|catalogo|exposition|retrospective|Ausstellung', case=False, na=False) ]

# Get the count of rows containing the specified keywords
keyword_rows_count = len(keyword_rows)

print("Total rows in DataFrame:", total_rows)
print("Rows containing specified keywords:", keyword_rows_count)


Total rows in DataFrame: 177
Rows containing specified keywords: 32


In [10]:
import pandas as pd

# Assuming df is your original DataFrame

# Create a mask for rows containing specified keywords in 'title', 'subtitle', or 'description' columns
mask = new_df['title'].str.contains(r'exhibition|exhib\.|mostra|catalogue |catalogo|exposition|retrospective|Ausstellung', case=False, na=False) | \
       new_df['Subtitle'].str.contains(r'exhibition|exhib\.|mostra|catalogue|catalogo|exposition|retrospective|Ausstellung', case=False, na=False) 

# Create the exhibitions DataFrame containing rows where keywords are present
exhibitions_dataframe = new_df[mask]

# Remove the rows where keywords are present from the original DataFrame
dfbooks = new_df[~mask]

# Reset index of the original DataFrame
dfbooks.reset_index(drop=True, inplace=True)

# Reset index of the exhibitions DataFrame
exhibitions_dataframe.reset_index(drop=True, inplace=True)


# Display the exhibitions DataFrame containing rows where keywords are present
print("\nExhibitions DataFrame:")
exhibitions_dataframe



Exhibitions DataFrame:


Unnamed: 0,title,Subtitle,publisher,date,isbn,Authors
0,"Hartung et les peintres lyriques : Schneider, ...",,Landerneau : Fonds Hélène & Édouard Leclerc po...,2016,9791096209002,
1,Willem de Kooning : drawing seeing-seeing draw...,,"New York : Arena , 1998",1998,0965728080,
2,École de New York : expressionnisme abstrait a...,,"[Nice] : Nice musées , impr. 2005",2005,2913548695,
3,"Action/abstraction : Pollock, de Kooning, and ...",,"New York : the Jewish museum , cop. 2008",2008,9780300122152,
4,Burri : lo spazio di materia - tra Europa e US...,,Città di Castello : Fondazione Palazzo Albizzi...,2016,8894063984,
5,"Willem De Kooning : the late paintings, the 19...",,San Francisco : San Francisco museum of modern...,1995,0935640479,
6,"American vanguards : Graham, Davis, Gorky, De ...",,Andover (Mass.) : Addison gallery of American ...,2011,0300121679,
7,The impact of Chaim Soutine (1893-1943) : de K...,,"Ostfildern-Ruit : Hatje Cantz , cop. 2002",2002,3775791035,
8,"Willem de Kooning : Retrospektive, Zeichnungen...",,"München : Prestel , cop. 1984",1984,3791306596,
9,"Willem de Kooning : drawings, paintings, sculp...",,New York : Whitney museum of American art ; Mu...,1983,0393018407,
