# Recommendation System development for paintings through Image Classification - *Part 1*

The purpose of this assignment was to develop a recommendation system that classifies paintings based on specific target labels and suggests similar artworks to enhance user engagement. Below are the key objectives and steps taken during the project:
- Deep-learning classification model: The primary goal was to create a multi-output classifier to predict five target labels—artist, subject, style, materials, and period of the painting.
- Recommendation system: The second goal was to develop a system that recommends similar paintings based on the predicted labels and extracted image features, to maximize user engagement.
- Model architectures: Two deep learning approaches were used—Convolutional Neural Networks (EfficientNetB3, VGG16, VGG19) and Vision Transformer (Google's ViT model).
- Performance comparison: After evaluating the models, the custom Vision Transformer (ViT) classifier outperformed the CNN-based models in terms of accuracy.
- Embedding-based recommendation: The selected ViT model was used to extract both visual and label features as embeddings, which were then utilized to calculate the similarity between paintings for the recommendation system.

<br/>

The initial dataset utilized has been collected from "Best Artworks of All Time", the "data.csv" file and can be accessed at [kaggle](https://www.kaggle.com/datasets/ikarus777/best-artworks-of-all-time?fbclid=IwY2xjawFiYOBleHRuA2FlbQIxMAABHSNw_G7kJ3z-yZSqa6CRmeXTRPDIOA1eF8HZV92KjPmy-NzeE0QYRyTtdw_aem_oKhEYzZut1CgX1-bduN3NA) .

<br/>

In this part (part 1) each of the 400 selected image titles is mapped to its corresponding file path in the images folder for use in the project.

<br/>

To conduct file management and data manipulation, several libraries have been employed, including:

- `numpy`
- `shutil`
- `os`

---
> MSc Business Analytics   <br/>
> Athens University of Economics and Business  <br/>
> Machine Learning and Content Analytics <br/>
>  <br/>
> Papadimitriou Anna, Registration number: f2822311  <br/>
> Ralli Eleni, Registration number: f2822312  <br/>
> Lakkas-Pyknis Evangelos, Registration number: f2822306 <br/>
> Mesolora Stamatoula Gerasimoula, Registration number: f2822308  <br/>

In [2]:
import os
import shutil
import pandas as pd

### Paintings Dataset

We load the Excel file containing paintings data into a pandas DataFrame.

<br/>

The dataset consists of 400 rows and 8 columns:
- Image ID: The ID of each painting
- Title: The title of each painting
- Artist: The artist of each painting  
- Subject: The subject of each painting
- Style: The style of each painting
- Materials: The materials used for each painting  
- Start: The first year of the period during which the painting was created (counting years by 10)
- End: The last year of the period during which the painting was created (counting years by 10)


In [3]:
excel_path = r"C:\Users\Μπαμπης\Documents\MSc Business Analytics\ML and content analytics\pinakes teliko.xlsx"

df = pd.read_excel(excel_path)
print(df.shape)
df.head()

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\Μπαμπης\\Documents\\MSc Business Analytics\\ML and content analytics\\pinakes teliko.xlsx'

### Directory for Filtered Paintings
The code below creates a new directory to store filtered paintings, checking that it does not already exist.

In [None]:
filtered_paintings_path = r"C:\Users\Μπαμπης\Documents\MSc Business Analytics\ML and content analytics\filtered_paintings"

if not os.path.exists(filtered_paintings_path):
    os.makedirs(filtered_paintings_path)

### Filtering and Copying Valid Images
In this section, we:

1. Specify the source directory containing the resized images
2. Extract a list of valid image titles from the Excel file
3. Iterate through the images in the source directory and:
    - Check if the image title (without extension) matches any of the valid image IDs
    - If a match is found, the image is copied to the destination folder (`filtered_paintings_path`) using `shutil.copy()`

In [None]:
source_dir = r"C:\Users\Μπαμπης\Documents\MSc Business Analytics\ML and content analytics\resized1"

valid_image_ids = df['Image ID'].tolist()

for image_file in os.listdir(source_dir):
    image_title = os.path.splitext(image_file)[0]

    if image_title in valid_image_ids:
        source_file = os.path.join(source_dir, image_file)
        destination_file = os.path.join(filtered_paintings_path, image_file)

        shutil.copy(source_file, destination_file)

As we can see, the images have been mapped successfully and the 'filtered_paintings' directory contains 400 images.

In [None]:
file_list = os.listdir(filtered_paintings_path)
len(file_list)

399