# Network of Painters: building a dataset from paintings datasets, then creating links

The aim of this project is to create a dataset of painters from datasets such as WikiArt and Art500k, combining features, extending missing data of painters with web scraping through Google and Wiki API, and then creating links between painters based on similarity of style, geographical and social interaction.

Note: One long-term goal would be to create a JSON file that contains all combined hierarchically. For example, a level in the structure could be art movement, inside it are artists with some base data like birthplace, year of birth and death and other geographical data, inside it are paintings with all contained data (even better would be including eras of painters in their substructure, and inside them the paintings). Then we could use this to create a network of art movements, artists, and paintings.

NEXT STEPS:<br>
-Add "Places" for Art500k datasets (+change datasets_notebook save.csv loads)<br>
-Add aliases for painters in Art500k datasets<br>
-Combine the datasets on authors<br>

FURTHER STEPS: <br>
-Define connections between painters<br>
-Create a network of painters<br>
-Analyze the network<br>

<details><summary><u> Update 11.06: Maximilian Schich </u></summary>
<p>
I e-mailed an art researcher that Elisa suggested, Maximilian Schich, asking about datasets for our project. He said: 

-we do not have a record of social interactions between artists at the corpus scale. The closest thing is: co-exhibition networks, which you may already know from the work of Fraiberger et al. (incl. Laszlo Barabasi). (http://genetics.bwh.harvard.edu/courses/Biophysics205/Papers/All_papers/Fraiberger_2018.pdf page 2) The issue there is that the network is short, circa1985 to 2020.

-Hyperlink networks (I guess WikiLinks, Pageranks and such), such as those found in Wikipedia are obviously beset with all kinds of issues, even though they do recapitulate the evolution of conventional style periods pretty well (cf. the work of Doron Goldfarb et al.. incl. myself). More locally speaking, it i a core topic in art history to shed light on the social network of artists and their patrons, but this does not lend itself to quantitative analysis. 

-I personally have done a visualization for Max Planck, based on the social network of 5500 individuals related to the Roman Baroque (https://zuccaro.schich.info/), which did reveal another issue, which is that for painters, art historians tend to research family relationships (more cliques), while for architects they focus on business relationships (more hubs). But here you got the inverse problem that there is not much information on the paintings

-There is a question/issue he raised from this: "Should we really assume social interaction influencing the styles of artists? Note that this may substantially underestimate the plasticity of the human brain/mind! It is like assuming that cellists only hang out with cellists, when we all know that grunge bands in Seatlle all did hang out together and missing a bassist. Meanwhile we do have evidence that artists such as Rubens did routinely hang out with different(!) artists, who could serve clients with different genres and if necessary styles. Bramante did build Gothic in Milan and Renaissance style in Rome at the same time. Rubens would call in Elsheimer to do miniatures, etc. And since the mid 19th century, all artists in the Western scene were essentially familiar, not only with the same corpus of classic artists and their works, but also with the contemporary production. Large art exhibitions in Paris literally drew millions of people each year in the mid 19th century (think Burning Man or SXSW today). So it is save to say that most artists of note were familiar with a great number of styles. Styles may bifurcate. for artists the opposite may be true (cf. run DMC meets Aerosmith => https://www.youtube.com/watch?v=4B_UYYPb-Gk). If I were you, I'd turn the question around, pointing into the opposite direction: **If two artists have similar style, can we find traces that they (eventually) knew each other**?" He said influence is B.S. (literally) and there's 100 times more evidence for similarity than influence between two artworks, and suggested answering "does style lead to social interaction?"

-"Here is how this question can be attacked with the available data: The standard "corpus" for artists is their "catalog raisonne", i.e. the catalog of all their works, which does not exist for all artists and is typically a lot of work, sold in expensive books. We are a long way from a comprehensive dataset like this. Yet, for the purpose of a more limited project, you could use general conventional style similarity from the usual suspect databases (Wikiart, Art500k, etc.). As a proxy of social interaction, you could use the hyperlink and/or wikidata links connected to the same artists. Even though these two sources are limited, you could still compare the two graphs as in "Wikipedia connection" vs. "visual similarity".

We have recently published a paper on general similarity using compression ensembles, using a subset of art500k/Wikiart, which is essentially 65k paintings with a reliably year as a data. We have also used the first 100 days of the hic et nunc NFT art platform (which coincidentally you get both social interaction and painting information). See "Availability of data and materials" in https://epjdatascience.springeropen.com/articles/10.1140/epjds/s13688-023-00397-3#Sec21 "

So this could be interesting to think about
</p>
</details>

In [7]:
import pandas as pd
import numpy as np

<details><summary><u>National Gallery of Art  (US) dataset (unused) </u></summary>
<p>
    
```python

df1 = pd.read_csv('datasets/originals/nga_constituents.csv') # From their website
df1.head()

```
    
</p>
</details>

## WikiArt data

Load the cleaned paintings data

In [8]:
wa_paintings = pd.read_csv('datasets/wikiart_paintings_refined.csv')
print("Length:", len(wa_paintings))
wa_paintings.head() #Consider dropping style: "Unknown" 

Length: 175313


Unnamed: 0,artist,style,genre,movement,tags
0,Andrei Rublev,Moscow school of icon painting,religious painting,Byzantine Art,"['Christianity', 'saints-and-apostles', 'angel..."
1,Andrei Rublev,Moscow school of icon painting,religious painting,Byzantine Art,"['Christianity', 'Old-Testament', 'Daniel', 'p..."
2,Andrei Rublev,Moscow school of icon painting,miniature,Byzantine Art,"['Christianity', 'saints-and-apostles', 'Khitr..."
3,Andrei Rublev,Moscow school of icon painting,religious painting,Byzantine Art,"['Christianity', 'saints-and-apostles', 'St.-L..."
4,Andrei Rublev,Moscow school of icon painting,miniature,Byzantine Art,"['Christianity', 'arts-and-crafts', 'saints-an..."


Load the grouped data: artists grouped by style

In [9]:
wa_grouped = pd.read_csv('datasets/wikiart_artists_styles_grouped.csv')
print("Length:", len(wa_grouped), "\n", "Number of groups with only 1 count:", len(wa_grouped[wa_grouped['count']==min(wa_grouped['count'])]))
wa_grouped[wa_grouped['artist'].str.contains("Monet")].sort_values(by=['count'], ascending=False)

Length: 7647 
 Number of groups with only 1 count: 1115


Unnamed: 0,style,artist,movement,count
2963,Impressionism,Claude Monet,Impressionism,1341
5468,Realism,Claude Monet,Impressionism,12
7042,Unknown,Claude Monet,Impressionism,12
462,Academicism,Claude Monet,Impressionism,1
3339,Japonism,Claude Monet,Impressionism,1


### Birthplaces, birth years

In [13]:
artists_A = pd.read_csv('datasets/wikiart_artists.csv')
artists_A

Unnamed: 0,artist,styles,movements,birth_place,birth_year
0,Ad Reinhardt,"Abstract Art, Abstract Expressionism, Color Fi...","Abstract Expressionism, Abstract Expressionism...",Buffalo,1913
1,Akkitham Narayanan,Abstract Art,Abstract Art,Kerala,1939
2,Alberto Magnelli,"Abstract Art, Art Nouveau (Modern), Cubism, Ex...","Abstract Art, Abstract Art, Abstract Art, Abst...",Florence,1888
3,Alekos Kontopoulos,"Abstract Art, Cubism, Expressionism, Post-Impr...","Social Realism, Social Realism, Social Realism...",Lamia,1904
4,Alexander Calder,"Abstract Art, Abstract Expressionism, American...","Kinetic art, Kinetic art, Kinetic art, Kinetic...",Philadelphia,1898
...,...,...,...,...,...
2931,Reem Al Faisal,Unknown,Contemporary,Jeddah,2000
2932,Robert Demachy,Unknown,Pictorialism,Saint-Germain-en-Laye,1859
2933,Sašo Vrabič,Unknown,Contemporary Realism,Slovenj Gradec,1974
2934,Wolfgang Tillmans,Unknown,Contemporary,Remscheid,1968


## Art500K

First dataset (from official website)

In [10]:
art500k = pd.read_csv('datasets/art500k_cleaned.csv')
(art500k[4:10])

  art500k = pd.read_csv('datasets/art500k_cleaned.csv')


Unnamed: 0,author_name,Genre,Style,Nationality,PaintingSchool,ArtMovement,Date,Influencedby,Influencedon,Tag,Pupils,Location,Teachers,FriendsandCoworkers
4,El Greco,,,,,,ca. 1610-1614,,,,,,,
5,El Greco,,,,,,,,,,,,,
6,Diego Rivera,,,,,,,,,,,,,
7,Claude Monet,,,,,,,,,,,,,
8,Francisco Goya,,,,,,,,,,,,,
9,Francisco Goya,,,,,,,,,,,,,


In [137]:
art500k_artists = pd.read_csv('datasets/art500k_artists.csv')
art500k_artists[0:7]

Unnamed: 0,artist,Nationality,PaintingSchool,ArtMovement,Influencedby,Influencedon,Pupils,Teachers,FriendsandCoworkers,FirstYear,LastYear,Places,PlacesYears,StylesYears,StylesCount,PlacesCount
0,Gustave Courbet,French,,"{Realism:272},","Rembrandt,Caravaggio,Diego Velazquez,Peter Pau...","Edouard Manet,Claude Monet,Pierre-Auguste Reno...",,,,1830.0,1877.0,"London, Montpellier, Moscow, CA, UK, Norway, D...","France:1841-1876,,Switzerland:1844-1874,,Lille...","Realism:1835-1877,,Romanticism:1830-1849,","{Realism:257}, {Romanticism:13}","{France:88},{Switzerland:7},{Lille:8},{Paris:4..."
1,Auguste Rodin,French,,"{Modern art:3},{Impressionism:91},","Michelangelo,Donatello,","Georgia O'Keeffe,Man Ray,Aristide Maillol,Olex...","Constantin Brancusi,",,,1865.0,1985.0,"London, CA, UK, Switzerland, Lisbon, US, Germa...","France:1865-1889,,Paris:1865-1898,,CA:1891-189...","Impressionism:1865-1905,",{Impressionism:90},"{France:52},{Paris:15},{Brussels:2},{Belgium:1..."
2,Frida Kahlo,Mexican,,"{Naïve Art (Primitivism),Surrealism:99},","Amedeo Modigliani,Diego Rivera,Jose Clemente O...","Judy Chicago,Georgia O'Keeffe,Feminist Art,",,,,1922.0,1954.0,"CA, LA, New York, US, New Orleans, Washington,...","Mexico:1927-1954,,San Francisco:1931-1933,,Mex...","Naïve Art (Primitivism):1922-1954,,Surrealism:...","{Naïve Art (Primitivism):99}, {Surrealism:15}","{Mexico:50},{San Francisco:6},{New York:4},{Me..."
3,Banksy,,,,,,,,,2011.0,2011.0,"Los Angeles, London, UK, Palestine, California...","London:2011-2011,,UK:2011-2011,",,,"{Palestine:1},{Los Angeles:3},{California:3},{..."
4,El Greco,"Spanish,Greek",Cretan School,"{Spanish Renaissance:1},{Renaissance:2},{Manne...","Byzantine Art,","Expressionism,Cubism,Eugene Delacroix,Edouard ...",,"Titian,","Giulio Clovio,",1568.0,1614.0,"Seville, London, Illescas, Romania, Moscow, Gr...","Spain:1577-1599,,London:1600-1600,,UK:1600-160...","Mannerism (Late Renaissance):1568-1600,","{Renaissance:2}, {XVI CenturySpanish Painting:...","{Spain:75},{Boston:1},{MA:1},{US:27},{Museo de..."
5,Diego Rivera,Mexican,"Mexican Mural Renaissance,La Ruche","{Social Realism,Muralism:146},","Marc Chagall,Robert Delaunay,","Frida Kahlo,Pedro Coronel,Vlady,",,,"Amedeo Modigliani,Saturnino Herran,Roberto Mon...",1904.0,1956.0,"Moscow, CA, Acapulco, New York, Spain, Northam...","Acapulco:1956-1956,,Mexico:1905-1956,,Guerrero...","Cubism:1912-1916,,Muralism:1922-1956,,Art Deco...","{Post-impressionism:1}, {Cubism:19}, {Mexican ...","{France:1},{Paris:1},{Moscow:1},{Acapulco:2},{..."
6,Claude Monet,French,,"{Modern art:3},{Impressionism:1340},","Gustave Courbet,Charles-Francois Daubigny,John...","Childe Hassam,Robert Delaunay,Wassily Kandinsk...",,"Eugene Boudin,Charles Gleyre,","Alfred Sisley,Pierre-Auguste Renoir,Camille Pi...",1858.0,1926.0,"London, Main, Moscow, Rotterdam, Giverny, CA, ...","France:1861-1924,,London:1869-1889,,UK:1869-19...","Impressionist:1879-1904,,Impressionism:1864-19...",{Nineteenth-Century European PaintingImpressio...,"{France:79},{Giverny:1},{London:6},{UK:15},{Bo..."


There needs to be further work done as seen.

Second Art500k dataset: from Rasta <br>

<details><summary><u>Details:</u></summary>

https://github.com/nphilou/rasta/tree/d22b34d5ac1aee9c1f80b4a73ad6792fd465c605/data/art500k

```python

rasta = pd.read_table('datasets/originals/art500k_rasta370k.txt', header=0, engine='python', sep='\t|\s{4,}');
rasta[0:5]

```

Every painting either has East or West origin (or not given), may just filter to one of them
</details>

From these, we can create networks.

<details><summary><u>Something further:</u></summary>
<p>

https://en.wikipedia.org/wiki/Renaissance (at the bottom)
https://en.wikipedia.org/wiki/Periods_in_Western_art_history
    
</p>
</details>

## A) 0) Combine the two datasets

Take the artists from WikiArt and if they are in Art500k, add their attributes from there

#### Version 2023.12.02: Take everything from WikiArt, add from Art500k if possible

In [95]:
artist_A = pd.read_csv('datasets/wikiart_artists.csv')
artists= artist_A[artist_A['artist'].isin(art500k_artists['artist'])].reset_index(drop=True)
print("Artists remaining:", len(artists))

Artists remaining: 2457


In [96]:
#Merge artists_A and art500k_artists

artists = artists.merge(art500k_artists, on='artist', how='left')
artists

Unnamed: 0,artist,styles,movement,styles_extended,pictures_count,birth_place,birth_year,Nationality,PaintingSchool,ArtMovement,...,Influencedon,Pupils,Teachers,FriendsandCoworkers,FirstYear,LastYear,Places,PlacesYears,StylesYears,StylesCount
0,Ad Reinhardt,"Abstract Art, Abstract Expressionism, Color Fi...",Abstract Expressionism,"{Abstract Art:15},{Abstract Expressionism:5},{...",52,Buffalo,1913.0,American,"New York School,American Abstract Artists,Iras...","{Abstract Expressionism,Minimalism:52},",...,"Donald Judd,Barnett Newman,Mark Rothko,Frank S...",,,"Jackson Pollock,",1937.0,1966.0,"US, NY, Canberra, Fort Worth, Buffalo, Austral...","New York City:1938-1966,,NY:1938-1966,,US:1938...","Expressionism:1944-1946,,Abstract Art:1937-194...","{Expressionism:7}, {Abstract Art:15}, {Color F..."
1,Adnan Coker,"Abstract Art, Abstract Expressionism",Abstract Art,"{Abstract Art:25},{Abstract Expressionism:3}",28,,,Turkish,,"{Abstract Art:28},",...,,,,,1968.0,2008.0,,,"Abstract Art:1992-2008,,Abstract Expressionism...","{Abstract Art:25}, {Abstract Expressionism:3}"
2,Akkitham Narayanan,Abstract Art,Abstract Art,{Abstract Art:17},17,Kerala,1939.0,Indian,,"{Abstract Art:17},",...,,,,,1974.0,1974.0,,,"Abstract Art:1974-1974,",{Abstract Art:17}
3,Alberto Magnelli,"Abstract Art, Art Nouveau (Modern), Cubism, Ex...",Abstract Art,"{Abstract Art:19},{Art Nouveau (Modern):2},{Cu...",35,Florence,1888.0,"Italian,French",Abstraction-Création,"{Abstract Art,Cubo-Futurism,Concrete Art (Conc...",...,,,,,1909.0,1971.0,,,"Abstract Art:1916-1971,,Cubism:1914-1935,,Meta...","{Abstract Art:21}, {Cubism:10}, {Metaphysical ..."
4,Alekos Kontopoulos,"Abstract Art, Cubism, Expressionism, Post-Impr...",Social Realism,"{Abstract Art:26},{Cubism:5},{Expressionism:10...",79,Lamia,1904.0,Greek,,"{Abstract Art,Social Realism:79},",...,,,,,1931.0,1974.0,,,"Post-Impressionism:1932-1955,,Expressionism:19...","{Post-Impressionism:8}, {Expressionism:11}, {R..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2452,Marianne von Werefkin,Unknown,Expressionism,{Unknown:61},61,Tula,1860.0,,,"{Der Blaue Reiter:1},",...,,,,,,,,,,
2453,Robert Demachy,Unknown,Pictorialism,{Unknown:24},24,Saint-Germain-en-Laye,1859.0,French,,"{Pictorialism:24},",...,,,,,1900.0,1914.0,France,,,
2454,Wolfgang Tillmans,Unknown,Contemporary,{Unknown:9},9,Remscheid,1968.0,,,,...,,,,,2001.0,2001.0,"London, United Kingdom",,,
2455,Wu Daozi,Unknown,Tang Dynasty (618–907),{Unknown:8},8,Chang'an,680.0,Chinese,Four fathers of Chinese painting,"{Tang Dynasty (618–907):8},",...,,,,,,,,,,


Later extend this list with skipped artists from both datasets

In [97]:
artist_AnotB = artist_A[~artist_A['artist'].isin(art500k_artists['artist'])].reset_index(drop=True).sort_values(by=['pictures_count'], ascending=False)
artist_AnotB.head(10)

Unnamed: 0,artist,styles,movement,styles_extended,pictures_count,birth_place,birth_year
0,Alfred Freddy Krupa,"Abstract Art, Abstract Expressionism, Academic...",New Ink Art,"{Abstract Art:1},{Abstract Expressionism:1},{A...",735,Karlovac,1971.0
720,Zdzislaw Beksinski,Surrealism,Magic Realism,{Surrealism:707},707,Sanok,1929.0
737,Oleksandr Aksinin,Unknown,Soviet Nonconformist Art,{Unknown:480},480,Kiev,1930.0
140,M.C. Escher,"Art Deco, Art Nouveau (Modern), Cubism, Expres...",Surrealism,"{Art Deco:1},{Art Nouveau (Modern):1},{Cubism:...",470,Leeuwarden,1898.0
121,Oleg Holosiy,"Academicism, Cubism, Expressionism, Naïve Art ...",Neo-Expressionism,"{Academicism:1},{Cubism:5},{Expressionism:30},...",372,Dnipro,1965.0
308,Alexander Roitburd,"Cubism, Transavantgarde",Transavantgarde,"{Cubism:1},{Transavantgarde:263}",264,Odesa,1961.0
377,Maria Bozoky,"Expressionism, Impressionism",Expressionism,"{Expressionism:252},{Impressionism:4}",256,Oradea,1909.0
606,Konstantin Gorbatov,Post-Impressionism,Post-Impressionism,{Post-Impressionism:254},254,Tolyatti,1876.0
590,Felix Nadar,Pictorialism,Pictorialism,{Pictorialism:245},245,rue Saint-Honoré,1820.0
436,J.M.W. Turner,"Impressionism, Romanticism, Unknown",Romanticism,"{Impressionism:1},{Romanticism:243},{Unknown:1}",245,London,1775.0


In [98]:
cols = artists.columns.tolist()
cols

['artist',
 'styles',
 'movement',
 'styles_extended',
 'pictures_count',
 'birth_place',
 'birth_year',
 'Nationality',
 'PaintingSchool',
 'ArtMovement',
 'Influencedby',
 'Influencedon',
 'Pupils',
 'Teachers',
 'FriendsandCoworkers',
 'FirstYear',
 'LastYear',
 'Places',
 'PlacesYears',
 'StylesYears',
 'StylesCount']

In [99]:
cols = cols[0:1]+cols[7:8]+cols[5:7]+cols[1:2]+cols[3:4]+cols[19:]+cols[2:3]+cols[9:10]+cols[4:5]+cols[15:19]+cols[8:9]+cols[10:15]
artists = artists[cols]
artists

Unnamed: 0,artist,Nationality,birth_place,birth_year,styles,styles_extended,StylesYears,StylesCount,movement,ArtMovement,...,FirstYear,LastYear,Places,PlacesYears,PaintingSchool,Influencedby,Influencedon,Pupils,Teachers,FriendsandCoworkers
0,Ad Reinhardt,American,Buffalo,1913.0,"Abstract Art, Abstract Expressionism, Color Fi...","{Abstract Art:15},{Abstract Expressionism:5},{...","Expressionism:1944-1946,,Abstract Art:1937-194...","{Expressionism:7}, {Abstract Art:15}, {Color F...",Abstract Expressionism,"{Abstract Expressionism,Minimalism:52},",...,1937.0,1966.0,"US, NY, Canberra, Fort Worth, Buffalo, Austral...","New York City:1938-1966,,NY:1938-1966,,US:1938...","New York School,American Abstract Artists,Iras...","Piet Mondrian,Kazimir Malevich,Josef Albers,","Donald Judd,Barnett Newman,Mark Rothko,Frank S...",,,"Jackson Pollock,"
1,Adnan Coker,Turkish,,,"Abstract Art, Abstract Expressionism","{Abstract Art:25},{Abstract Expressionism:3}","Abstract Art:1992-2008,,Abstract Expressionism...","{Abstract Art:25}, {Abstract Expressionism:3}",Abstract Art,"{Abstract Art:28},",...,1968.0,2008.0,,,,,,,,
2,Akkitham Narayanan,Indian,Kerala,1939.0,Abstract Art,{Abstract Art:17},"Abstract Art:1974-1974,",{Abstract Art:17},Abstract Art,"{Abstract Art:17},",...,1974.0,1974.0,,,,,,,,
3,Alberto Magnelli,"Italian,French",Florence,1888.0,"Abstract Art, Art Nouveau (Modern), Cubism, Ex...","{Abstract Art:19},{Art Nouveau (Modern):2},{Cu...","Abstract Art:1916-1971,,Cubism:1914-1935,,Meta...","{Abstract Art:21}, {Cubism:10}, {Metaphysical ...",Abstract Art,"{Abstract Art,Cubo-Futurism,Concrete Art (Conc...",...,1909.0,1971.0,,,Abstraction-Création,,,,,
4,Alekos Kontopoulos,Greek,Lamia,1904.0,"Abstract Art, Cubism, Expressionism, Post-Impr...","{Abstract Art:26},{Cubism:5},{Expressionism:10...","Post-Impressionism:1932-1955,,Expressionism:19...","{Post-Impressionism:8}, {Expressionism:11}, {R...",Social Realism,"{Abstract Art,Social Realism:79},",...,1931.0,1974.0,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2452,Marianne von Werefkin,,Tula,1860.0,Unknown,{Unknown:61},,,Expressionism,"{Der Blaue Reiter:1},",...,,,,,,,,,,
2453,Robert Demachy,French,Saint-Germain-en-Laye,1859.0,Unknown,{Unknown:24},,,Pictorialism,"{Pictorialism:24},",...,1900.0,1914.0,France,,,,,,,
2454,Wolfgang Tillmans,,Remscheid,1968.0,Unknown,{Unknown:9},,,Contemporary,,...,2001.0,2001.0,"London, United Kingdom",,,,,,,
2455,Wu Daozi,Chinese,Chang'an,680.0,Unknown,{Unknown:8},,,Tang Dynasty (618–907),"{Tang Dynasty (618–907):8},",...,,,,,Four fathers of Chinese painting,,,,,


In [100]:
artists.to_csv('datasets/artists.csv', index=False)

In [125]:
artists = pd.read_csv('datasets/artists.csv')

In [None]:
year_mistake = []
for artist in artists['artist']:
    if (artists[artists['artist'] == artist]['LastYear'].iloc[0]-artists[artists['artist'] == artist]['FirstYear'].iloc[0])>90:
        year_mistake.append(artist)
print((year_mistake))

In [None]:
artists[artists['artist'].isin(year_mistake)][['artist','birth_year','FirstYear','LastYear']]

In [129]:
too_early_years = ["Huang Yongyu","Joe Goode","Theodoros Stamos","Pablo Picasso", "Modest Cuixart","Giovanni Paolo Panini", "Guido Reni", "John Riley", "Marcello Bacciarelli","Rembrandt","Alfredo Volpi", "Henry Ossawa Tanner", "Pierre Soulages","Hieronymus Bosch","Agnes Lawrence Pelton","George Morland", "Jean-Baptiste Carpeaux"]
too_latest_years = ["Rupert Bunny", "Vasily Polenov", "Giovanni Paolo Panini", "Guido Reni","John Riley", "Luca Giordano", "Matthias Stom","Rembrandt", "Giovanni Bellini", "Alfredo Volpi", "Francesco Melzi", "Auguste Rodin", "Edgar Degas", "Henry Ossawa Tanner", "John Frederick Kensett","Giorgio de Chirico", "Maria Sibylla Merian", "Hieronymus Bosch","Jan Provoost","Jean Fouquet","Anton Azbe", "Jean-Baptiste Carpeaux"]
second_batch=['Hieronymus Bosch',
 'Jan Provoost',
 'George Lambert',
 'Charles Turner',
 'Thomas Jones',
 'William Morris']


In [128]:
for artist in too_early_years:
    artists.loc[artists['artist'] == artist, 'FirstYear'] = artists[artists['artist'] == artist]['birth_year']+18
#The latest_years artists are manually corrected.

In [130]:
#Manual edit last years
their_last_year = [1947, 1898, 1765, 1642, 1641, 1705, 1649, 1669, 1516, 1988, 1570, 1917, 1917, 1937, 1872, 1978, 1705, 1705, 1705, 1529, 1460, 1900, 1875]
last_years = [1516, 1460, 1802, 1832, 1803, 1892]
for i in range(len(too_latest_years)):
    artists.loc[artists['artist'] == too_latest_years[i], 'LastYear'] = their_last_year[i]
for i in range(len(second_batch)):
    artists.loc[artists['artist'] == second_batch[i], 'LastYear'] = last_years[i]

In [None]:
artists = artists.merge(subset, on='artist', how='left')

In [145]:

cols = artists.columns.to_list()
cols  = cols[0:15]+cols[-1:]+cols[15:-1]
cols.remove('PlacesCount_x')
artists = artists[cols]
artists.rename(columns={'PlacesCount_y':'PlacesCount'}, inplace=True)
artists.columns

Index(['artist', 'Nationality', 'birth_place', 'birth_year', 'styles',
       'styles_extended', 'StylesYears', 'StylesCount', 'movement',
       'ArtMovement', 'pictures_count', 'FirstYear', 'LastYear', 'Places',
       'PlacesYears', 'PlacesCount', 'PaintingSchool', 'Influencedby',
       'Influencedon', 'Pupils', 'Teachers', 'FriendsandCoworkers'],
      dtype='object')

Last step: in the .csv file, replace float .0 values with integers<br>
*This cannot be precisely done in Pandas, as you cannot have an integer datatype column (Series) with NaNs.*


In [63]:
# Turn the non-NaN years into integers
t1 = artists['FirstYear'].fillna(0).astype(int).replace(0, "remove_hrgldg")
t2 = artists['LastYear'].fillna(0).astype(int).replace(0, "remove_hrgldg")
t3 = artists['birth_year'].fillna(0).astype(int).replace(0, "remove_hrgldg")

artists['FirstYear'] = t1
artists['LastYear'] = t2
artists['birth_year'] = t3

artists.to_csv('datasets/artists.csv', index=False)
#Manually delete the cells with "remove_hrgldg"

NOTE: manually deleted the cells containing "remove_hrgldg" from the csv file.

In [64]:
artists = pd.read_csv('datasets/artists.csv')

## B) Create networks

This is found in the networks folder, mostly in the networks.ipynb notebook.