---
# ETL on French Painters

This notebook demonstrates an ETL (Extract, Transform, Load) process on a dataset of French painters. The goal is to fetch data on French painters and their best artwork from the Wikidata SPARQL endpoint, process the data, and save it to a CSV file.

The notebook is organized as follows:

1. **Fetching French Painters Data**: We use a SPARQL query to fetch data on French painters and their best artwork from the Wikidata SPARQL endpoint.

2. **Transforming the Data**: We process the raw data to extract relevant information such as the painter's name, the title of their best artwork, and the URLs for the painter and artwork.

3. **Saving the Data to a CSV File**: We save the transformed data to a CSV file for further analysis or processing.

Throughout this notebook, you will learn how to fetch data from the Wikidata SPARQL endpoint, process the data to extract relevant information, and save the transformed data to a CSV file.

---


In [None]:
import requests
import json
import pandas as pd

In [None]:
WIKIDATA_SPARQL_ENDPOINT = 'https://query.wikidata.org/sparql'

In [None]:
def fetch_french_painters_data(limit=10):
    query = f'''
    SELECT DISTINCT ?painter ?painterLabel ?artwork ?artworkLabel
    WHERE {{
        ?painter wdt:P106 wd:Q1028181;
                 wdt:P27 wd:Q142.
        ?artwork wdt:P170 ?painter.
        SERVICE wikibase:label {{ bd:serviceParam wikibase:language "en". }}
    }}
    LIMIT {limit}
    '''

    headers = {
        'Accept': 'application/sparql-results+json'
    }

    response = requests.get(WIKIDATA_SPARQL_ENDPOINT, headers=headers, params={'query': query})

    if response.status_code == 200:
        return json.loads(response.text)
    else:
        print(f'Error fetching data: {response.status_code}')
        return None


In [None]:
def transform_french_painters_data(data):
    results = data['results']['bindings']
    transformed_data = []

    for result in results:
        transformed_data.append({
            'painter': result['painterLabel']['value'],
            'artwork': result['artworkLabel']['value'],
            'painter_url': result['painter']['value'],
            'artwork_url': result['artwork']['value'],
        })

    return transformed_data


In [None]:
def save_data_to_csv(data, file_name='french_painters_data.csv'):
    df = pd.DataFrame(data)
    df.to_csv(file_name, index=False)
    print(f'Successfully saved data in {file_name}')


In [None]:
def main():
    raw_data = fetch_french_painters_data()
    if raw_data:
        transformed_data = transform_french_painters_data(raw_data)
        save_data_to_csv(transformed_data)

if __name__ == '__main__':
    main()


Successfully saved data in french_painters_data.csv
