# Colorful Vectors Analysis

This notebook explores color representation in 3D vector space, demonstrating how colors can be represented and compared using their RGB values.

In [9]:
import pandas as pd
import numpy as np
import plotly.express as px
from functools import lru_cache
import ipywidgets as widgets
from IPython.display import display, clear_output

# Constants
SIZES = [1, 20, 30]
CACHE_SIZE = 32
DEFAULT_COLOR = 'rgb(200,200,200)'  # Added fallback color

## Data Loading and Preprocessing

First, let's load and prepare our color data.

In [2]:
def load_data():
    """Load and preprocess color data."""
    path = "https://frenzy86.s3.eu-west-2.amazonaws.com/python/data/colors.csv"
    df = pd.read_csv(path, names=['simple_name', 'name', 'hex', 'r', 'g', 'b'])
    df['rgb'] = 'rgb(' + df['r'].astype(str) + ',' + df['g'].astype(str) + ',' + df['b'].astype(str) + ')'
    df['category'] = df['simple_name'].str.split('_').str[-1]
    df['size'] = SIZES[0]
    return df

df = load_data()
df.head()

Unnamed: 0,simple_name,name,hex,r,g,b,rgb,category,size
0,air_force_blue_raf,Air Force Blue (Raf),#5d8aa8,93,138,168,"rgb(93,138,168)",raf,1
1,air_force_blue_usaf,Air Force Blue (Usaf),#00308f,0,48,143,"rgb(0,48,143)",usaf,1
2,air_superiority_blue,Air Superiority Blue,#72a0c1,114,160,193,"rgb(114,160,193)",blue,1
3,alabama_crimson,Alabama Crimson,#a32638,163,38,56,"rgb(163,38,56)",crimson,1
4,alice_blue,Alice Blue,#f0f8ff,240,248,255,"rgb(240,248,255)",blue,1


## Helper Functions

Let's define our utility functions for analysis.

In [3]:
def get_top_colors(df):
    """Get most common color categories."""
    return [c for c in df['category'].value_counts()[:15].index.tolist()
            if c in df.simple_name.values]

@lru_cache(maxsize=CACHE_SIZE)
def calculate_distances(vector_tuple):
    """Calculate distances between colors."""
    vector = np.array(vector_tuple)
    coords = df[['r', 'g', 'b']].values
    return np.linalg.norm(coords - vector, axis=1)

def build_chart(df_plot):
    """Create 3D scatter plot with optimized layout."""
    if len(df_plot) == 0:
        df_plot = pd.DataFrame({
            'r': [0], 'g': [0], 'b': [0],
            'simple_name': ['No matches'],
            'name': ['No matches'],
            'rgb': [DEFAULT_COLOR],
            'size': [1]
        })

    fig = px.scatter_3d(
        df_plot,
        x='r', y='g', z='b',
        template='plotly_white',
        color='simple_name',
        color_discrete_sequence=df_plot['rgb'].tolist(),
        size='size',
        hover_data=['name']
    )

    fig.update_layout(
        showlegend=False,
        margin=dict(l=5, r=5, t=20, b=5),
        scene=dict(
            xaxis_title="Red",
            yaxis_title="Green",
            zaxis_title="Blue",
            xaxis=dict(range=[0, 255]),
            yaxis=dict(range=[0, 255]),
            zaxis=dict(range=[0, 255])
        )
    )
    return fig

## Initial Visualization

Let's visualize all colors in 3D space.

In [4]:
fig = build_chart(df)
fig.show()

## Color Search Comparison

Let's compare text-based search vs. vector-based search for colors.

In [5]:
# Get top colors for selection
top_colors = get_top_colors(df)
print("Available colors for analysis:", top_colors)

Available colors for analysis: ['blue', 'pink', 'red', 'yellow', 'rose', 'gray', 'magenta', 'violet']


In [6]:
def compare_search_methods(query, thresh_sel):
    match = df[df.simple_name == query].iloc[0]

    # Text-based search
    text_matches = df[df.simple_name.str.contains(query, case=False)].copy()
    if len(text_matches) > 0:
        text_matches['size'] = SIZES[1]
        text_matches.loc[text_matches.simple_name == query, 'size'] = SIZES[2]

    # Vector-based search
    distances = calculate_distances(tuple(match[['r', 'g', 'b']].values))
    vector_matches = df[distances < thresh_sel].copy()
    if len(vector_matches) > 0:
        vector_matches['size'] = SIZES[1]
        vector_matches.loc[vector_matches.simple_name == query, 'size'] = SIZES[2]

    # Display results
    fig1 = build_chart(text_matches)
    fig2 = build_chart(vector_matches)

    clear_output(wait=True)
    print(f"\nResults for color: {query} ({match.r}, {match.g}, {match.b})")
    print(f"Text-based matches: {len(text_matches)}")
    print(f"Vector-based matches: {len(vector_matches)}")
    display(fig1)
    display(fig2)

    return text_matches, vector_matches

In [12]:
# Load data
df = load_data()
DEFAULT_THRESHOLD = 80

# Create interactive widgets
threshold_slider = widgets.IntSlider(
                                    value=DEFAULT_THRESHOLD,
                                    min=20,
                                    max=100,
                                    step=1,
                                    description='Threshold:',
                                    style={'description_width': 'initial'}
                                    )
threshold_slider

IntSlider(value=80, description='Threshold:', min=20, style=SliderStyle(description_width='initial'))

In [13]:
# Example analysis with 'red'
def on_threshold_change(change):
    compare_search_methods('red', change.new)

threshold_slider.observe(on_threshold_change, names='value')
display(threshold_slider)

# Initial comparison
compare_search_methods('red', DEFAULT_THRESHOLD)


Results for color: red (255, 0, 0)
Text-based matches: 51
Vector-based matches: 37


(                simple_name                    name      hex    r    g    b  \
 42                 barn_red                Barn Red  #7c0a02  124   10    2   
 77    boston_university_red   Boston University Red     #c00  204    0    0   
 82                brick_red               Brick Red  #cb4154  203   65   84   
 113             cadmium_red             Cadmium Red  #e30022  227    0   34   
 123         candy_apple_red         Candy Apple Red  #ff0800  255    8    0   
 132             carmine_red             Carmine Red  #ff0038  255    0   56   
 150                  cg_red                  Cg Red  #e03c31  224   60   49   
 162             chinese_red             Chinese Red  #aa381e  170   56   30   
 181              copper_red              Copper Red  #cb6d51  203  109   81   
 186               coral_red               Coral Red  #ff4040  255   64   64   
 189             cornell_red             Cornell Red  #b31b1b  179   27   27   
 204    dark_candy_apple_red    Dark Can

## Analysis and Conclusions

The vector-based search demonstrates several advantages over text-based search:

1. It finds similar colors regardless of their names
2. Results are based on objective color properties rather than naming conventions
3. The similarity threshold can be adjusted to find more or fewer matches

This simple example illustrates why vector representations are powerful: they enable objective similarity comparisons based on inherent properties rather than arbitrary labels.