# Analyzing Artistic Styles with Multimodal Embeddings

Visual data is information-rich, but the unstructured nature of that data make it difficult to analyze.

In this example, we will explore multimodal embeddings and computed attributes to analyze artistic styles in images.

We will use the [`wikiart`](https://huggingface.co/datasets/huggan/wikiart) dataset and the [FiftyOne](https://docs.voxel51.com/index.html) for data analysis and visualization.

## Setups

In [None]:
!pip install -qU transformers huggingface_hub fiftyone umap-learn hf-transfer

In [None]:
# HF Transfer makes downloads fast
import os
os.environ['HF_HUB_ENABLE_HF_TRANSFER'] = '1

In [None]:
import fiftyone as fo # base library and app
import fiftyone.zoo as foz # zoo datasets and models
import fiftyone.brain as fob # ML routines
from fiftyone import ViewField as F # for defining custom views
import fiftyone.utils.huggingface as fouh # for loading datasets from HuggingFace

## Dataset

We start by loading the WikiArt dataset from HuggingFace Hub into FiftyOne. This dataset can also be loaded through HuggingFace's `datasets` library, but here we use FiftyOne's HF Hub integration to get the data directly from the Datasets server.

In [None]:
dataset = fouh.load_from_hub(
    'huggan/wikiart', # repo_id
    format='parquet',
    classification_fields=['artist', 'style', 'genre'], # columns to store as classification fields
    max_samples=1000, # number of samples to load
    name='wikiart', # name of the dataset in FiftyOne
)

In [None]:
dataset

We can visualize the dataset in the FiftyOne App:

In [None]:
session = fo.launch_app(dataset)

We can list the names of the artists whose styles we will be analyzing:

In [None]:
artists = dataset.distinct('artist.label')
artists

## Find similar artwork

By using multimodal embeddings, we will unlock the ability to find paintings that closely resemble a given text query, which could be a description of painting or even a poem.

To generate multimodal embeddings for the images, we will use a pretrained CLIP Vision Transformer (ViT) model from HuggingFace Transformers. We will run `compute_similarity` from the [FiftyOne Brain](https://docs.voxel51.com/brain.html) to compute these embeddings and use them to generate a similarity index on the dataset.

In [None]:
# fiftyone brain
fob.compute_similarity(
    dataset,
    model='zero-shot-classification-transformer-torch' # model to load from model zoo
    name_or_path='openai/clip-vit-base-patch32', # repo_id of checkpoint
    embeddings='clip_embeddings', # name of the field to store embedings
    brain_key='clip_sim', # key to store similarity index info
    batch_size=32, # batch_size for inference
)

If we want to load the model directly from HuggingFace Transformers library,

In [None]:
from transformers import CLIPModel

model = CLIPModel.from_pretrained('openai/clip-vit-base-patch32')

fob.compute_similarity(
    dataset,
    model=model,
    embeddings='clip_embeddings',
    brain_key='clip_sim'
)

Once the embeddings are generated, we can refresh the FiftyOne App, select the checkbox for an image in the sample grid, and click the photo icon to see the most similar images in the dataset.

Clicking this button triggers a query to the similarity index to find the most similar images to the selected image, based on the pre-computed embeddings, and displays them in the App.

This is useful for finding similar art pieces (to recommend to users or add to a collection) or getting inspiration for a new piece.

Because CLIP is multimodal, we can also use it to perform semantic searches, which means that we can search for images based on text quueries. For example, we can search for "pastel trees" and see all the images in the dataset that are similar to that query.

Behind the scenes, the text is tokenized, embedded with CLIP’s text encoder, and then used to query the similarity index to find the most similar images in the dataset.

## Uncover artistic motifs with clustering and visualization

By performing similarity and semantic searches, we can begin to interact with the data more effectively.

We can also add some unsupervised learning here to help us identify artistic patterns in the WikiArt dataset.
1. **Dimensionality reduction**: We will use UMAP to reduce the dimensionality of the embeddings to 2D and visualize the data in a scatter plot. This will allow us to see how the images cluster based on their style, genre, and artist.
2. **Clustering**: We will use K-Means clustering to cluster the images based on their embeddings and see what groups emerge.

In [None]:
fob.computer_visualization(
    dataset,
    embeddings='clip_embeddings',
    method='umap',
    brain_key='clip_vis'
)

Here We pass in the previously computed embeddings `'clip_embeddings'` and specify `method='umap'` to use UMAP for dimensionality reduction.

After that we can open a panel in the FiftyOne App, where we will see one 2D point for each image in the dataset.

We can also run clustering on the embeddings to group similar images together. To cluster our data, we will need to download the FiftyOne Clustering Plugin:

In [None]:
!fiftyone plugins download https://github.com/jacobmarks/clustering-plugin

Once installed, we need to refresh the app again, and then we can access the clustering functionality via an operator in the app.

## Identify the most unique works of art

Our image embeddings allow us to quantitatively assign each sample a uniqueness score based on how similar it is to other samples in the dataset. Explicitly, the FiftyOne Brain's `compute_uniqueness()` function looks at the distance between each sample's embedding and its nearest neighbors, and computes a score between 0 and 1 based on this distance. A score of 0 means the sample is nondescript or very similar to others.

In [None]:
fob.compute_uniqueness(
    dataset,
    embeddings='clip_embeddings'
)

We can then color by this in the embeddings panel, filter by uniqueness score, or even sort it to see the most unique images in the dataset.

In [None]:
most_unique_view = dataset.sort_by('uniqueness', reverse=True)
seesion.view = most_unique_view.view()

In [None]:
least_unique_view = dataset.sort_by("uniqueness", reverse=False)
session.view = least_unique_view.view()  # Least unique images

We can also answer the question of which artist tends to produce the most unique works. We can compute the average uniqueness score for each artist across all of their works of art:

In [None]:
artist_unique_scores = {
    artist: dataset.match(F('artist.label') == artist).mean('uniqueness')
    for artist in artists
}

sorted_artists = sorted(
    artist_unique_scores,
    key=artist_unique_scores.get,
    reverse=True
)

for artist in sorted_artists:
    print(f"{artist}: {artist_unique_scores[artist]}")

In [None]:
kustodiev_view = dataset.match(F("artist.label") == "boris-kustodiev")
session.view = kustodiev_view.view()

## Characterize art with visual qualities

We will compute standard metrics like brightness, contrast, and saturation for each image and see how these metrics correlate with the artistic style and genre of the art pieces.

We will need to download the FiftyOne Image Quality Plugin:

In [None]:
!fiftyone plugins download https://github.com/jacobmarks/image-quality-issues/

Refresh the app and open the operators list again. This time type compute and select one of the image quality operators.