# Paintings by Artists/Genre - Multi Category

Modified from [Kelly's good work here](https://github.com/kellyslpang/unpackAIworkbooks/blob/main/Kelly_Paintings_multicat.ipynb)

[Dataset from kaggle](https://www.kaggle.com/ikarus777/best-artworks-of-all-time)

The dataset is organised in directories where each directory contains the paintings of 1 artist.

The aim is to train the model to identify in whose style the painting is most resembles. 

### Imports and Setup

In [None]:
!pip install -Uqq fastbook
!pip install -q forgebox

In [None]:
# from fastbook import *
from forgebox.imports import *
from IPython.display import Image, display
from fastai.vision.widgets import *
from fastai.vision.all import *

In [None]:
%%html
<style>
    pre {
        white-space: pre-wrap;
    }
  </style>

In [None]:
#tell colab to wrap text in cells

# from IPython.display import HTML, display

# def set_css():
#   display(HTML('''
#   <style>
#     pre {
#         white-space: pre-wrap;
#     }
#   </style>
#   '''))
# get_ipython().events.register('pre_run_cell', set_css)


# MULTI LABEL MODEL

### Preparing the data

In [None]:
!ls /kaggle/input/best-artworks-of-all-time

In [None]:
path = Path('/kaggle/input/best-artworks-of-all-time')

Preparing the CSV to lookup Genre based on artist:

In [None]:
csvDF = pd.read_csv(path/'artists.csv')
csvDF["name"] = csvDF["name"].apply(lambda x:x.replace(" ","_"))
csvDF.head()

In [None]:
artists = (path/"images"/"images").ls()
artists

Perform a grouping by

In [None]:
name_to_genre = dict(csvDF[["name","genre"]].groupby("name").agg(list).reset_index().values)
name_to_genre

Prepping the dataframe from the list of files:

In [None]:
files = get_image_files(path/"images"/"images")
files

Putting together the datablock:

In [None]:
def get_x(f): return str(f)

def get_y(f): 
    artist = f.parent.name
    genres = name_to_genre.get(artist)
    genres = [] if genres is None else genres
    return [artist]+genres

In [None]:
for i in files[1230:1250]:
    print([get_x(i), get_y(i)])

In [None]:
dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
                   splitter=RandomSplitter(valid_pct=0.3, seed=42),
                    get_x = get_x, 
                    get_y = get_y,
                    item_tfms = RandomResizedCrop(128, min_scale=0.35))

In [None]:
dsets = dblock.datasets(files)
dsets.train[1000]

In [None]:
dls = dblock.dataloaders(files)
dls.show_batch(nrows=5, ncols=2)

Train:

In [None]:
THRESHOLD = .2

learn = cnn_learner(dls, resnet50, metrics=[
    partial(accuracy_multi),
    RecallMulti(),
    PrecisionMulti()])
learn.fine_tune(2, base_lr=3e-3, freeze_epochs=3)

In [None]:
final_layer = learn.model[-1][-1].weight.data.cpu().numpy()

In [None]:
from sklearn.manifold import TSNE
# from sklearn.decomposition import PCA
lessdim=TSNE(n_components=2)
# lessdim = PCA(n_components=2)
result = lessdim.fit_transform(final_layer)

In [None]:
X = result[:,0]
Y = result[:,1]

plt.figure(figsize=(32,32))
plt.scatter(X, Y)
for t, x, y in zip(dsets.train.vocab, X, Y):
    genres = name_to_genre.get(t)
    if genres is not None:
        t=f"{t}({genres})"
    plt.text(x,y,t, color=np.random.rand(3)*0.7, fontsize=15)
plt.show()

In [None]:
learn.export('./gdrive/MyDrive/ai/Artists/PaintingGenreRanResizeCrop128.pkl')



Test images(from 1st model with Artists only as labels: