<a href="https://colab.research.google.com/github/giordanovitale/Prado-Museum-CNN/blob/main/Prado_Artists.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



1.   [Data Augmentation Techniques](https://medium.com/ymedialabs-innovation/data-augmentation-techniques-in-cnn-using-tensorflow-371ae43d5be9#8be0)
2.   [Model Architectures](https://medium.com/@navarai/unveiling-the-diversity-a-comprehensive-guide-to-types-of-cnn-architectures-9d70da0b4521)
3. [EfficientNet](https://towardsdatascience.com/complete-architectural-details-of-all-efficientnet-models-5fd5b736142)

# 0 - Load the necessary libraries

Dataset Source: https://www.kaggle.com/datasets/maparla/prado-museum-pictures

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import tensorflow as tf

import os
import requests

from multiprocessing import cpu_count
from multiprocessing.pool import ThreadPool
# import visualkeras as vk

from scipy.optimize import fsolve
from math import exp
import matplotlib.pyplot as plt

from collections import defaultdict

from google.colab import userdata

import keras.backend as K
from keras.layers import Layer
from tensorflow.keras import Sequential, Model
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPooling2D, \
    AveragePooling2D, BatchNormalization, ReLU, PReLU, ZeroPadding2D, \
    GlobalAveragePooling2D, Input, DepthwiseConv2D, Add, Activation, Lambda, RandomFlip
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from tensorflow.keras.callbacks import CSVLogger
from tensorflow.keras.applications.resnet_v2 import ResNet50V2
from tensorflow.keras.applications.resnet_v2 import preprocess_input as resnet_v2_preproccessing
from tensorflow.keras.applications.efficientnet_v2 import preprocess_input as efficientnet_preproccessing
from tensorflow.keras.applications.efficientnet_v2 import EfficientNetV2B3
from tensorflow.keras.applications.vgg19 import VGG19
from tensorflow.keras.applications.vgg19 import preprocess_input as vgg_preproccessing
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input as mobilenet_preprocessing

# 1 - Helper Functions

# 2 - Load the dataset using Kaggle API

My Username and Key have been secreted. Replace `userdata.get('KAGGLE_USERNAME')` and `userdata.get('KAGGLE_KEY')`with your username and key, respectively.

In [42]:
os.environ["KAGGLE_USERNAME"] = "userdata.get('KAGGLE_USERNAME')"
os.environ["KAGGLE_KEY"] = "userdata.get('KAGGLE_KEY')"
!kaggle datasets download maparla/prado-museum-pictures -f prado.csv
!unzip prado.csv.zip

Dataset URL: https://www.kaggle.com/datasets/maparla/prado-museum-pictures
License(s): MIT
Downloading prado.csv.zip to /content
 49% 9.00M/18.3M [00:00<00:00, 40.1MB/s]
100% 18.3M/18.3M [00:00<00:00, 66.1MB/s]
Archive:  prado.csv.zip
  inflating: prado.csv               


Create the dataframe from the unzipepd csv file.

In [3]:
df = pd.read_csv(os.path.join("prado.csv"))

Since no target class has been defined from the project assignment, I have to decide it. After a careful inspection of the columns, I found out that the more suitable ones are `author` and `technical_sheet_tecnica`. The latter seems more intriguing as it has more observations, thus being more suitable to big data algorithms.

In [5]:
df['author'].value_counts()

author
Anónimo                                                                       2698
Goya y Lucientes, Francisco de                                                1080
Bayeu y Subías, Francisco                                                      446
Haes, Carlos de                                                                326
Pizarro y Librado, Cecilio                                                     290
                                                                              ... 
Malombra, Pietro                                                                 1
Taller de Bellini, Giovanni                                                      1
Mattioli, Ludovico -Dibujante- (Autor de la obra original: Cignani, Carlo)       1
Ricci, Marco                                                                     1
García, Sergio                                                                   1
Name: count, Length: 2560, dtype: int64

In [37]:
df['technical_sheet_tecnica'].value_counts().sort_values(ascending=False)[:10]

technical_sheet_tecnica
Óleo                    4156
Acuñación               1118
Esculpido                550
Lápiz compuesto          476
Clarión; Lápiz negro     396
Albúmina                 395
Sanguina                 372
Lápiz                    259
Lápiz negro              237
Pluma; Tinta parda       214
Name: count, dtype: int64

In order to obtain the JPGs images, we need to start from the given URL column `work_image_url`.

In [47]:
df["work_id"] = df["work_image_url"].apply(lambda x: x.split("/")[-1])

In [48]:
df['work_id']

0        404387d6-a52c-4477-b598-de2a2d5a3d55.jpg
1        589ee4a3-28fa-4977-a84d-7326f5c9aeb3.jpg
2        4a8bab74-ca91-450a-b5b7-39dd61e2d7f3.jpg
3        9af5b176-b4d3-4930-854b-5b5f252829f1.jpg
4        4c494f0a-d5ae-45ca-826b-59f4b5fd4398.jpg
                           ...                   
13482    c62f7f3e-3ad3-4d9e-9586-b0b389b2d032.jpg
13483    6c28accf-e0c0-4bc0-b4c6-3fbb282bcbd8.jpg
13484    b4126fb6-c5ac-40e3-89a1-d1578914c09b.jpg
13485    e7bf2481-522c-4071-9d97-7908aca45831.jpg
13486    e8d785a8-1407-4203-b837-2d01e82a36cb.jpg
Name: work_id, Length: 13487, dtype: object