# Example code for `clip_gaze`

A notebook that demonstrates the `gaze` command.

First we import the `clip_gaze` library. This in turn imports the machine learning libraries that will run CLIP for us.

In [1]:
import clip_gaze

  from .autonotebook import tqdm as notebook_tqdm


## Preparing the image

We use `pillow` (exposed to Python as PIL) in order to load and process images. `pillow` is not a listed dependency of the module, so you may need to install it with `pip`.

`pprint`, short for "pretty print", just formats Python objects neatly.

In [2]:
import PIL
import PIL.Image
import pprint

Having saved the 2,175 × 1,713 pixel version of ["Brücke über die Marne bei Creteil" by Cézanne](https://commons.wikimedia.org/wiki/File:Cezanne_bruecke-ueber-die-marne-bei-creteil.jpg) as "cezanne.jpg" we open and save it as `image`.

In [3]:

with open("cezanne.jpg", "rb") as file:
    image = PIL.Image.open(file)
    image.load()

## Running gaze

In [4]:
pprint.pprint(
    {
        "artist": clip_gaze.gaze(image, clip_gaze.ARTISTS_BY_TRAINING_PREVALENCE[:200]),
        "surface": clip_gaze.gaze(image, clip_gaze.SURFACES),
        "movement": clip_gaze.gaze(image, clip_gaze.MOVEMENTS)
    }
)

{'artist': ['by paul cézanne (82%)',
            'by clyfford still (07%)',
            'by arnold böcklin (04%)',
            'by franz kline (01%)',
            'by giorgio de chirico (01%)'],
 'movement': ['tonalism movement (16%)',
              'impressionism movement (09%)',
              'american scene painting movement (09%)',
              'modern european ink painting movement (09%)',
              'post-impressionism movement (09%)'],
 'surface': ['on canvas (86%)',
             'on paperboard (11%)',
             'on vellum (01%)',
             'on wood (01%)',
             'on card stock (00%)']}


As you can see CLIP suggests that, of the options provided, the terms "by paul cézanne", "tonalism movement", and "on canvas" are the most likely to describe the input image.

## Adding more categories

The categories are just handy lists of many options (see the code itself for the lists and their sources).

In [5]:
all_categories = {
    "artist_by_name": clip_gaze.ARTISTS_BY_NAME,
    "artist_by_prevalence": clip_gaze.ARTISTS_BY_TRAINING_PREVALENCE,
    "movement": clip_gaze.MOVEMENTS,
    "painting_materials": clip_gaze.PAINTING_MATERIALS,
    "quality": clip_gaze.QUALITIES,
    "sculpture_materials": clip_gaze.SCULTPURE_MATERIALS,
    "site": clip_gaze.SITES,
    "surface": clip_gaze.SURFACES,
    "tool": clip_gaze.TOOLS
}

In [6]:
pprint.pprint({
    category: clip_gaze.gaze(image, all_categories[category], only_show_best=1) for category in all_categories
})  # see below


{'artist_by_name': ['andré derain (09%)'],
 'artist_by_prevalence': ['by paul cézanne (19%)'],
 'movement': ['tonalism movement (16%)'],
 'painting_materials': ['gouache medium (25%)'],
 'quality': ['good quality (46%)'],
 'sculpture_materials': ['sculpted from polychrome (20%)'],
 'site': ['opengameart (74%)'],
 'surface': ['on canvas (86%)'],
 'tool': ['using brush (43%)']}


Many of these prompts aren't appropriate for our input image; CLIP is simply trying to find the best options from the list provided, even if none of the options are actually very apt.

## Probability analysis

Why has the probability of it being by Cézanne gone down? Because this tool shows *relative* probabilities.

Of the first 200 artists (`clip_gaze.ARTISTS_BY_TRAINING_PREVALENCE[0:200]`, ordered by prevalence in the dataset) the tool ascribed Cézanne 82% and the other 199 artists *shared* the final 18%.

Of the first 500 or so artists (`clip_gaze.ARTISTS_BY_TRAINING_PREVALENCE`, ordered by prevalence in the dataset) the tool ascribed Cézanne 19% and the other 500 or so artists shared the final 81%.

When given 6000 artists (`clip_gaze.ARTISTS_BY_NAME`) it ascribed Cézanne only 4% and even gives its first choice (Derain) 9%. Although it had both `cézanne` and `paul cézanne` as options we should treat those as individual options (i.e. "by *a* Cézanne, but we don't know which" and "by Paul Cézanne in particular").

The probability is not the confidence that the art was by that artist. 

The probability is **the confidence that the prompt is the correct one out of the prompts provided**.