# 1. Find urls to artist's webpages in Google Arts & Culture
This will take a few minutes

In [None]:
# Directory in which the data is to be stored
output_dir = './data'
# Create the data directory
Path("data").mkdir(exist_ok=True) 

In [None]:
from artscraper import get_artist_links

# Get links for all artists, as a list. They are also 
artist_urls = get_artist_links(min_wait_time=1, output_file=f'{output_dir}/artist_links.txt')

# 2. Collect artworks and metadata for all artists

In [2]:
from artscraper import GoogleArtScraper, FindArtworks, random_wait_time, retry

# Maximum number of attempts to perform a task 
max_retries = 3
# Minimum time (in seconds) to wait before retrying
min_wait_time = 10

## 2.1 Example without retries

In [3]:
# Sample artist link, for illustration purposes
artist_urls = ['https://artsandculture.google.com/entity/jan-van-der-heyden/m05g5_1']

In [None]:
# Find_artworks for each artist
for artist_url in artist_urls:
    with FindArtworks(artist_link=artist_url, output_dir=output_dir, 
                      min_wait_time=min_wait_time) as scraper:
            # Save list of artworks, the description, and metadata for an artist
            scraper.save_artist_information()

            # Find artist directory
            artist_dir = output_dir + '/' + scraper.get_artist_name() 

                
    # Scrape artworks
    with GoogleArtScraper(artist_dir + '/' + 'works', min_wait=min_wait_time) as subscraper:
        # Get list of links to this artist's works 
        with open(artist_dir+'/'+'works.txt', 'r') as file:
            artwork_links = [line.rstrip() for line in file]  
            
        # Download all artwork link (slow)
        for url in artwork_links:
            print(f'artwork URL: {url}')
            subscraper.save_artwork_information()

## 2.1 Example retrying (more robust)

In [4]:
# Find_artworks for each artist
for artist_url in artist_urls:
    with FindArtworks(artist_link=artist_url, output_dir=output_dir, 
                      min_wait_time=min_wait_time) as scraper:
            # Save list of artworks, the description, and metadata for an artist
            retry(scraper.save_artist_information, max_retries, min_wait_time)

            # Find artist directory
            artist_dir = output_dir + '/' + scraper.get_artist_name() 

                
    # Scrape artworks
    with GoogleArtScraper(artist_dir + '/' + 'works', min_wait=min_wait_time) as subscraper:
        # Get list of links to this artist's works 
        with open(artist_dir+'/'+'works.txt', 'r') as file:
            artwork_links = [line.rstrip() for line in file]  
            
        # Download all artwork link (slow)
        for url in artwork_links:
            print(f'artwork URL: {url}')
            retry(subscraper.save_artwork_information, max_retries, min_wait_time, url)

# Final structure of results
- data (set up in `output_dir`)
  - artist_links.txt (All artists, with one url per line) 
  - Artist_1
    - description.txt (Description of artist, from wikimedia)
    - metadata.json (Metadata of arist, from wikimedia)
    - works.txt (All artworks, with one url per line)
    - works 
      - work1
        - artwork.png (Artwork image)
        - metadata.json (Metadata of artwork, from Google Art and Culture)
      - work2
        - ...
  - Artist_2
    - ... 

### Example of artist description (description.txt)

Jan van der Heyden (5 March 1637, Gorinchem – 28 March 1712, Amsterdam) was a Dutch Baroque-era painter, glass painter, draughtsman and printmaker. Van der Heyden was one of the first Dutch painters to specialize in townscapes and became one of the leading architectural painters of the Dutch Golden Age.  He painted a number of still lifes in the beginning and at the end of his career.Jan van der Heyden was also an engineer and inventor who made significant contributions to contemporary firefighting technology. Together  with his brother Nicolaes, who was a hydraulic engineer, he invented an improvement of the fire hose in 1672. He modified the manual fire engine, reorganised the volunteer fire brigade (1685) and wrote and illustrated the first firefighting manual (Brandspuiten-boek). A comprehensive street lighting scheme for Amsterdam, designed and implemented by van der Heyden, remained in operation from 1669 until 1840 and was adopted as a model by many other towns and abroad.

### Example of artist metadata (metadata.json)
{"family name": "Van der Heyden", "given name": "Jan", "pseudonym": "", "sex or gender": "male", "date of birth": "1637-03-05", "place of birth": "Gorinchem", "latitude of place of birth": "51.83652", "longitude of place of birth": "4.97243", "date of death": "1712-03-28", "place of death": "Amsterdam", "latitude of place of death": "52.372777777", "longitude of place of death": "4.893611111", "country of citizenship": "Netherlands", "residence": "", "work location": "Amsterdam", "genre": "landscape art", "movement": "", "occupation": ["firefighter", "inventor", "painter", "instrument maker", "printmaker"]}

### Example of artwork's metadata (metadata.json within works)
{"main_text": "The country house in the right middle ground has been identified as one which used to lie on the river Vliet, running between Delft and The Hague. Though this is possible, the house does not seem sufficiently distinctive to permit such a specific identification. This scene, however, depicts a fashionable part of Holland in the seventeenth century: a navigable canal or river with a well-kept towpath and a considerable volume of freight traffic. Lining the water are houses with plots of land extending into the flat, low-lying, fertile, reclaimed land. There is an alternation of elegant farmhouses, like the one with a stepped gable and hayrick, and buitenplaatsen (country houses), like the one nearer to us, with its ionic pilasters and dormer windows with scroll surrounds (as opposed to the more traditional gables). This house has a stone gate and a topiary hedge with claire-vues and an avenue of trees. Audrey Lambert reproduces a 1770 map of Rijswijk, between Delft and The Hague, which still shows exactly this alternation of simple plots and formal gardens extending into the polders on either side of the Vliet and nearby roads. This image by Heyden (1637-1712) is notable for its restrained depiction of evening light, with more white than gold in the spectrum and just a hint of pink in some of the clouds. But it is the vivid naturalism of the scene, with its matter-of-fact viewpoint, recording a public thoroughfare with no deference to the country house, which so remarkably anticipates the landscapes of the Impressionists. It is also possible that Constable had seen this painting when he painted his Scene on a Navigable River in 1816-17 (Tate, London), with its sparkling pointillist touch and scrupulous record of a working inland waterway.", "title": "Country House on the Vliet near Delft", "creator": "Jan van de Heyden", "creator lifespan": "1637 - 1712", "date created": "1665", "type": "Painting", "rights": "Supplied by Royal Collection Trust / (c) HM Queen Elizabeth II 2012", "external link": " http://www.rct.uk/collection/405948", "medium": "Oil on panel", "provenance": "Acquired by George IV when Prince Regent, 1814", "object description": "Beside a canal runs a road on which a huntsman walks his dog, with a country house & an outbuilding on the right; a mother and her children are seated by the road; in the centre a barge is moored to a landing-stage.", "id": "3wEgj7D5Ld8nvg", "link": "https://artsandculture.google.com/asset/country-house-on-the-vliet-near-delft-jan-van-de-heyden/3wEgj7D5Ld8nvg"}