## 0. Imports, initial procedures
(note: for the style and movement networks, I used only the WikiArt dataset, groupped by artists and styles, not PainterPalette)

In [18]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import networkx as nx

In [20]:
import httpimport

with httpimport.remote_repo('https://raw.githubusercontent.com/me9hanics/ArtProject/main/'):
    import analysis_functions

In [19]:
wikiart_artists_styles = pd.read_csv('https://raw.githubusercontent.com/me9hanics/PainterPalette/main/datasets/wikiart_artists_styles_grouped.csv')
wikiart_artists_styles[::300]

Unnamed: 0,style,artist,movement,count
0,Abstract Art,Ad Reinhardt,Abstract Expressionism,15
300,Abstract Expressionism,Joe Goode,Abstract Expressionism,20
600,Art Brut,Gaston Chaissac,Outsider art (Art brut),22
900,Art Nouveau (Modern),Mikhail Nesterov,Symbolism,9
1200,Color Field Painting,Claude Viallat,Contemporary,21
1500,Constructivism,Kazimir Malevich,Abstract Art,25
1800,Cubism,Mario Comensoli,Social Realism,2
2100,Expressionism,Alexander Bazhbeuk-Melikyan,Expressionism,5
2400,Expressionism,Marevna (Marie Vorobieff),Cubism,9
2700,Feminist Art,Boushra Yahya Almutawakel,Feminist Art,13


Styles of a painter, e.g., Picasso:

In [7]:
wikiart_artists_styles[wikiart_artists_styles['artist'] == 'Pablo Picasso'].sort_values(by='count', ascending=False)

Unnamed: 0,style,artist,movement,count
6599,Surrealism,Pablo Picasso,Post-Impressionism,356
2468,Expressionism,Pablo Picasso,Post-Impressionism,190
1829,Cubism,Pablo Picasso,Post-Impressionism,148
5153,Post-Impressionism,Pablo Picasso,Post-Impressionism,116
4371,Neoclassicism,Pablo Picasso,Post-Impressionism,74
3976,Naïve Art (Primitivism),Pablo Picasso,Post-Impressionism,72
6820,Synthetic Cubism,Pablo Picasso,Post-Impressionism,60
591,Analytical Cubism,Pablo Picasso,Post-Impressionism,48
6783,Symbolism,Pablo Picasso,Post-Impressionism,31
5672,Realism,Pablo Picasso,Post-Impressionism,26


In [8]:
wikiart_styles = wikiart_artists_styles['style'].unique(); wikiart_artists = wikiart_artists_styles['artist'].unique(); wikiart_movements = wikiart_artists_styles['movement'].unique()

G_styles = nx.Graph(); G_styles.add_nodes_from(wikiart_styles)
G_movements = nx.Graph(); G_movements.add_nodes_from(wikiart_movements)
G_artists = nx.Graph(); G_artists.add_nodes_from(wikiart_artists) #Not the final graph as there are more artist graphs, but this is the base

#This algorithm is not totally efficient, but we are not dealing with a large dataset so
for i in range(len(wikiart_artists)): #Through all artists
    #First get all styles of the artist
    artist_styles = (wikiart_artists_styles[wikiart_artists_styles['artist'] == wikiart_artists[i]][['style', 'count']]).reset_index(drop=True)
    #Iterate through all styles of the artist
    for j in range(len(artist_styles)):
        for k in range(j+1, len(artist_styles)):
            #Create an edge between two styles
            if not G_styles.has_edge(artist_styles['style'].iloc[j], artist_styles['style'].iloc[k]):
                    G_styles.add_edge(artist_styles['style'].iloc[j], artist_styles['style'].iloc[k], weight= min(artist_styles['count'].iloc[j], artist_styles['count'].iloc[k]))
            else:
                G_styles[artist_styles['style'].iloc[j]][artist_styles['style'].iloc[k]]['weight'] += min(artist_styles['count'].iloc[j], artist_styles['count'].iloc[k])

#Drop style "Unknown"
G_styles.remove_node('Unknown')

#Threshold graph: remove edges with weight less than 100
threshold = 100
G_styles_threshold_100 = G_styles.copy()
for edge in G_styles_threshold_100.edges():
    if G_styles_threshold_100[edge[0]][edge[1]]['weight'] < threshold:
        G_styles_threshold_100.remove_edge(edge[0], edge[1])

for i in range(len(wikiart_styles)):
    styles_movements = (wikiart_artists_styles[wikiart_artists_styles['style'] == wikiart_styles[i]][['movement', 'count']]).reset_index(drop=True)
    for j in range(len(styles_movements)):
        for k in range(j+1, len(styles_movements)):
            if not G_movements.has_edge(styles_movements['movement'].iloc[j], styles_movements['movement'].iloc[k]):
                if not styles_movements['movement'].iloc[j] == styles_movements['movement'].iloc[k]:
                    G_movements.add_edge(styles_movements['movement'].iloc[j], styles_movements['movement'].iloc[k], weight= min(styles_movements['count'].iloc[j], styles_movements['count'].iloc[k]))
            else:

                G_movements[styles_movements['movement'].iloc[j]][styles_movements['movement'].iloc[k]]['weight'] += min(styles_movements['count'].iloc[j], styles_movements['count'].iloc[k])
G_movements

#Threshold graph: remove edges with weight less than 100
#threshold = 100
G_movements_threshold_100 = G_movements.copy()
for edge in G_movements_threshold_100.edges():
    if G_movements_threshold_100[edge[0]][edge[1]]['weight'] < threshold:
        G_movements_threshold_100.remove_edge(edge[0], edge[1])

(note on code: not very tidy, probably leaving out "threshold" in the variable names is a good idea (as suggested by Marton Posfai))

Since a style network is built on top of painters, and a movement network is built on top of styles, we can just plot them as a multi-layer graph. <br>
<details><summary><u>Code and viz</u></summary>

```python
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.patheffects as path_effects
from mpl_toolkits.mplot3d.art3d import Line3DCollection
%matplotlib inline

pos1 = nx.spring_layout(G_artists, k=0.9, iterations=2)
pos2 = nx.spring_layout(G_styles, k=2.4,iterations=70)
pos3 = nx.spring_layout(G_movements_threshold_100, k=5.4,iterations=7) #G_movements_A is too dense

#Setup for the figure
cols = ['mediumseagreen', 'darksalmon','steelblue' ]#Colors
np.random.seed(42)
graphs = [G_artists,G_styles, G_movements_threshold_100]
w = 8; h = 6
fig, ax = plt.subplots(1, 1, figsize=(w,h), dpi=200, subplot_kw={'projection':'3d'})

for gi, G in enumerate(graphs):
    # node positions
    if gi == 0:
        pos = pos1
    if gi == 1:
        pos = pos2
    if gi == 2:
        pos = pos3
    
    xs = list(list(zip(*list(pos.values())))[0])
    ys = list(list(zip(*list(pos.values())))[1])
    zs = [gi]*len(xs) # set a common z-position of the nodes 
    
    # add within-layer edges (One could add between-layer edges here as well, see GitHub: jkbren/matplotlib_multilayer_network)
    lines3d = [(list(pos[i])+[gi],list(pos[j])+[gi]) for i,j in G.edges()]
    line_collection = Line3DCollection(lines3d, color="grey", zorder=gi, alpha=0.2)
    ax.add_collection3d(line_collection)

    # nodes
    cs = [cols[gi]]*len(xs) #Color
    ax.scatter(xs, ys, zs, c=cs, edgecolors='.2',  alpha=0.5, zorder=gi+1) #Add nodes
    
    # add a plane to designate the layer
    xdiff = max(xs)-min(xs)
    ydiff = max(ys)-min(ys)
    ymin = min(ys)-ydiff*0.1
    ymax = max(ys)+ydiff*0.1
    xmin = min(xs)-xdiff*0.1 * (w/h)
    xmax = max(xs)+xdiff*0.1 * (w/h)
    xx, yy = np.meshgrid([xmin, xmax],[ymin, ymax])
    zz = np.zeros(xx.shape)+gi
    ax.plot_surface(xx, yy, zz, color=cols[gi], alpha=0.1, zorder=gi)

    # add label
    if gi == 0:
        text = "Painters"
    if gi == 1:
        text = "Styles"
    if gi == 2:
        text = "Movements"
    layertext = ax.text(0.0, 0.85, gi*0.95+0.5, text,
                        color='.95', fontsize='large', zorder=1e5, ha='left', va='center',
                        path_effects=[path_effects.Stroke(linewidth=3, foreground=cols[gi]),
                                      path_effects.Normal()])

# set them all at the same x,y,zlims
ax.set_ylim(min(ys)-ydiff*0.1,max(ys)+ydiff*0.1)
ax.set_xlim(min(xs)-xdiff*0.1,max(xs)+xdiff*0.1)
ax.set_zlim(-0.1, len(graphs) - 1 + 0.1)

# select viewing angle
angle = 30
height_angle = 20
ax.view_init(height_angle, angle)

# how much do you want to zoom into the fig
ax.dist = 9.5

ax.set_axis_off()

# plt.savefig('multilayer_network.png',dpi=425,bbox_inches='tight')
plt.show()


```
</details>

To show the difference between thresholding and not, see the centralities for movements: 

No threshold:

In [9]:
degree_centrality_styles = nx.degree_centrality(G_movements)
closeness_centrality_styles = nx.closeness_centrality(G_movements)
betweenness_centrality_styles = nx.betweenness_centrality(G_movements)
eigenvector_centrality_styles = nx.eigenvector_centrality(G_movements, max_iter=1000)

centralities_styles_df = pd.DataFrame({
    'Degree Centrality': degree_centrality_styles,
    'Closeness Centrality': closeness_centrality_styles,
    'Betweenness Centrality': betweenness_centrality_styles,
    'Eigenvector Centrality': eigenvector_centrality_styles
})
centralities_styles_df.head(5)

Unnamed: 0,Degree Centrality,Closeness Centrality,Betweenness Centrality,Eigenvector Centrality
Abstract Expressionism,0.918033,0.920412,0.02968,0.107795
Abstract Art,0.877049,0.884738,0.008157,0.1072
Social Realism,0.852459,0.86463,0.006798,0.106051
Kinetic art,0.844262,0.858129,0.00206,0.105759
Avant-garde,0.811475,0.833074,0.001162,0.103726


In [10]:
print('Top 7 highest betweenness centrality styles in threshold network:')
centralities_styles_df['Degree Centrality'].sort_values(ascending=False).head(7)

Top 7 highest betweenness centrality styles in threshold network:


Abstract Expressionism    0.918033
New Ink Art               0.918033
Expressionism             0.901639
Contemporary              0.885246
Pop Art                   0.885246
Impressionism             0.885246
Symbolism                 0.885246
Name: Degree Centrality, dtype: float64

I showcased the degree centrality which is already quite "surrealistic". It would not be logical to put styles such as new ink art so high up, and in all centrality measures contemporary movements seems more central than an art enthusiast would suggest.

Thresholded:

In [11]:
degree_centrality_styles_2 = nx.degree_centrality(G_movements_threshold_100)
closeness_centrality_styles_2 = nx.closeness_centrality(G_movements_threshold_100)
betweenness_centrality_styles_2 = nx.betweenness_centrality(G_movements_threshold_100)
eigenvector_centrality_styles_2 = nx.eigenvector_centrality(G_movements_threshold_100, max_iter=1000)

centralities_styles_df_2 = pd.DataFrame({
    'Degree Centrality': degree_centrality_styles_2,
    'Closeness Centrality': closeness_centrality_styles_2,
    'Betweenness Centrality': betweenness_centrality_styles_2,
    'Eigenvector Centrality': eigenvector_centrality_styles_2
})
print('The 7 highest betweenness centrality styles in threshold network:')
centralities_styles_df_2['Degree Centrality'].sort_values(ascending=False).head(7)

The 7 highest betweenness centrality styles in threshold network:


Abstract Art     0.729508
Expressionism    0.721311
Contemporary     0.688525
Surrealism       0.680328
Realism          0.631148
Romanticism      0.614754
Impressionism    0.614754
Name: Degree Centrality, dtype: float64

This seems more fair. If we were to account for weights for centralities (only implemented for eigenvalue centrality, but one may use the negative values or reciprocals of the edge weights for distance in betweenness centrality), then the results would be even more well-received. 

In [12]:
G = G_movements_threshold_100.copy()
for edge in G.edges():
    G[edge[0]][edge[1]]['weight'] = -G[edge[0]][edge[1]]['weight']
betweenness_centrality_styles_2_2 = nx.betweenness_centrality(G, weight='weight')
centralities_styles_2_df_3 = pd.DataFrame({
    'Betweenness Centrality': betweenness_centrality_styles_2_2
})
del G #Only needed it here 
centralities_styles_2_df_3['Betweenness Centrality'].sort_values(ascending=False).head(10)

Expressionism         0.717518
Impressionism         0.716434
Post-Impressionism    0.715486
Realism               0.712912
Romanticism           0.697331
Rococo                0.670844
Abstract Art          0.670709
Neoclassicism         0.668270
Baroque               0.663325
Surrealism            0.645238
Name: Betweenness Centrality, dtype: float64

For betweenness centrality, these results seem to make sense. 19th century styles seem to be the "division line" between all (older and newer) styles, plus these styles have strong connections between them, influencing each other's strenght. In general, realism and impressionism are considered to be the most influential styles, which is better represented here (both are at the top with value around 0.71).

The movement network:

![movements](https://github.com/me9hanics/ArtProject/assets/82604073/039688be-16f0-4432-bae2-acba9688914b)

*Note:* There was an analysis of degree distribution of the two networks, but not much interesting things were found (the style network seems to have a power law distribution "ruined" by a small Poisson-distributed part in the middle).