# Image Search and Color Clustering Based on the Unsplash API

This project utilizes the Unsplash API along with the K-Means clustering algorithm to create an interactive web application that enables users to filter pattern or texture images by color. Given that the Unsplash API permits a maximum of 50 requests per hour, the project integrates both frontend interactions and backend image processing to allow users to intuitively browse images that meet their color criteria.

## Project Execution

### Activate the Environment: ：
```bash
conda activate aim

#### Run the Application: 
```bash
streamlit run week2-APIs/App_SplashPatterns.py



## 1. Dependencies and Project Structure

The project relies on the following key libraries:   

- **Streamlit**: For rapid development of interactive web applications.   
- **requests**：To handle HTTP requests for interacting with the Unsplash API. 
- **numpy、cv2、sklearn.cluster（KMeans）、skimage**：For image processing and clustering analysis.   
- **PIL**：For reading and basic image processing.   

In [2]:
import streamlit as st
import requests
import numpy as np
import cv2
from sklearn.cluster import KMeans
from skimage import color
from io import BytesIO
from PIL import Image


Libraries imported successfully.


## 2. Custom Frontend Styling and Main Title 

To enhance the page’s visual appeal and improve data presentation clarity, custom CSS was incorporated into Streamlit. Key optimizations include:   

- **Sidebar**: Added background color and shadow effects for better structural clarity.    

- **Main Title**: Center-aligned, bold, enlarged font, and increased letter spacing for emphasis.    

- **Buttons and Dropdowns**: Adjusted spacing and styling for smoother interactions.   

In [7]:
st.markdown("""
    <style>
    /* Add a background color and subtle shadow to the left sidebar to enhance layering */
    [data-testid="stSidebar"] {
        background-color: #f8f9fa;
        box-shadow: 2px 0 5px rgba(0, 0, 0, 0.1);
    }

    /* Main title style: larger font size, increased letter spacing, and reduced bottom margin */
    .main-title {
        font-size: 3rem;        
        letter-spacing: 2px;    
        font-weight: 700;
        margin-bottom: 0.5rem;  
        text-align: center;     
    }

    /* Sidebar title style */
    [data-testid="stSidebar"] h2 {
        font-size: 1.25rem;
        font-weight: 700;
        margin-bottom: 0.5rem;
        letter-spacing: 1px;
    }

    /* Default spacing between dropdown and button */
    .stSelectbox, .stButton {
        margin-bottom: 1rem;
    }

    /* Optimize button appearance and size */
    .stButton > button {
        font-size: 16px;
        padding: 0.5rem 1.5rem; 
        border-radius: 4px;
        background-color: #cccccc; 
        color: #333;               
        border: none;
        transition: background-color 0.3s ease;
    }
    .stButton > button:hover {
        background-color: #bbbbbb;
    }
    </style>
""", unsafe_allow_html=True)

st.markdown("<h1 class='main-title'>SplashPatterns</h1>", unsafe_allow_html=True)



DeltaGenerator()

**【output】**：

![示例图片](image/custom_css.png)

When you visit **localhost:8501** in your browser, the page clearly demonstrates the enhancements: the main title is centered and bold, the sidebar uses a light gray background with added shadow for depth, and the buttons change color on hover for improved interactivity.

## 3. Unsplash API Integration and Data Retrieval

A function is created to search for images using the Unsplash API. This function fetches the search results for a single page and returns the URLs of the images in their 'regular' size. It also supports a **color** parameter, enabling users to filter images based on color using Unsplash’s built-in color filtering feature.

In [4]:
UNSPLASH_ACCESS_KEY = "cV2Slomwnm9YY0dp_lRF40J2QGatJfmmPxwJyzyZIlA" 
UNSPLASH_SEARCH_URL = "https://api.unsplash.com/search/photos"

def search_unsplash_single_page(query, per_page=24, page=1, color=None):
    """
    Searches for images on Unsplash for a single page and returns a list of 'regular' image URLs.
    If a color parameter is provided, Unsplash's built-in color filter is applied.
    """
    headers = {"Authorization": f"Client-ID {UNSPLASH_ACCESS_KEY}"}
    params = {
        "query": query,
        "page": page,
        "per_page": per_page
    }
    if color:
        params["color"] = color

    try:
        response = requests.get(UNSPLASH_SEARCH_URL, headers=headers, params=params)
        if response.status_code != 200:
            st.warning(f"Unsplash API returned an error: {response.status_code}")
            return []
        data = response.json()
    except Exception as e:
        st.error(f"Error requesting the Unsplash API: {e}")
        return []

    image_urls = []
    for item in data.get("results", []):
        url = item.get("urls", {}).get("regular")
        if url:
            image_urls.append(url)
    return image_urls

print("search_unsplash_single_page function defined.")


search_unsplash_single_page function defined.


## 4. Sidebar Input and Session State

A color selection dropdown is added to the sidebar, allowing users to filter images by color based on Unsplash’s built-in options. Simultaneously, Streamlit's session state is initialized to store image data and the current page number.

![示例图片](image/search_white.png)

> Note: In actual testing, I found that even when setting **color=white**, the returned images still include blue, multicolored, and even black content. This may be because Unsplash’s image tagging relies on human or algorithmic recognition, which can introduce certain inaccuracies.

In [8]:
# st.sidebar.header("Tone Selector")
# Users can only choose a color and whether to apply K-Means for secondary filtering.
colors = [
    "None", "black_and_white", "black", "white", "yellow", "orange",
    "red", "purple", "magenta", "green", "teal", "blue"
]
selected_color = st.sidebar.selectbox("Select a color (Unsplash filter)", colors)

# Initialize Session State
if "images" not in st.session_state:
    st.session_state["images"] = []
if "page" not in st.session_state:
    st.session_state["page"] = 1

print("Sidebar inputs and session state initialized.")




Sidebar inputs and session state initialized.


## 5. Image Loading Function and “Search” Button Logic

A **load_images** function is defined to fetch image URLs from the Unsplash API based on the current page stored in the session state. The default search keyword is set to "pattern", and the function supports pagination. When the "Show Patterns" button in the sidebar is clicked, the session state is reset, and the function loads images from the first three pages (a total of 72 images).

In [9]:
def load_images(color=None):
    """
    Uses a fixed search keyword "pattern" and retrieves images based on st.session_state["page"],
    appending the results to st.session_state["images"].
    """
    query = "pattern"
    new_urls = search_unsplash_single_page(
        query,
        per_page=24,
        page=st.session_state["page"],
        color=color
    )
    st.session_state["images"].extend(new_urls)

if st.sidebar.button("Show Patterns"):
    # Reset session state for a new search
    st.session_state["page"] = 1
    st.session_state["images"] = []

    c = None if selected_color == "None" else selected_color
    # Load data from the first 3 pages
    for page_idx in range(1, 4):
        st.session_state["page"] = page_idx
        load_images(color=c)




## 6. K-Means Clustering for Secondary Color Filtering

After the images have been loaded, if the user checks the "Refine with K-Means" box and selects a color other than the default "None", the system uses the K-Means algorithm to further filter and sort images based on their dominant color tones. The code defines a color mapping, functions for converting between RGB and Lab color spaces, and a function for calculating Euclidean distance. These methods help in identifying images whose dominant colors best match the user’s selected color.

> In the provided example, the initial number of clusters is set to **k=3**. In subsequent experiments, values such as **k=1** and **k=8** were tested to observe the effect of different parameters on the filtering results. Although adjusting **k** theoretically allows the algorithm to capture more color information, focusing solely on the largest cluster's center may not fully leverage this information.

In [None]:
refine_color = st.sidebar.checkbox("Refine with K-Means")

if st.session_state["images"]:
    if refine_color and selected_color != "None":
        # Color mapping (RGB values are in the range 0-1)
        color_map = {
            "black": (0, 0, 0),
            "white": (1, 1, 1),
            "yellow": (1, 1, 0),
            "orange": (1, 0.65, 0),
            "red": (1, 0, 0),
            "purple": (0.5, 0, 0.5),
            "magenta": (1, 0, 1),
            "green": (0, 1, 0),
            "teal": (0, 0.5, 0.5),
            "blue": (0, 0, 1),
            "black_and_white": (0.5, 0.5, 0.5)
        }
        target_rgb = color_map.get(selected_color, (0.5, 0.5, 0.5))
        
        def rgb_to_lab(rgb):
            """Convert (r, g, b) [0,1] to Lab color space."""
            arr = np.array(rgb).reshape(1, 1, 3)
            lab = color.rgb2lab(arr)
            return lab[0, 0, :]

        target_lab = rgb_to_lab(target_rgb)

        def get_dominant_color(img, k=3):
            """
            Uses K-Means to extract the dominant color from an image (choosing the cluster center with the most pixels).
            Returns (r, g, b) with values in the range 0-1.
            """
            img = np.array(img.resize((100, 100)))
            img = img.reshape(-1, 3).astype(np.float32) / 255.0
            km = KMeans(n_clusters=k, random_state=42).fit(img)
            labels, counts = np.unique(km.labels_, return_counts=True)
            major_cluster = labels[np.argmax(counts)]
            dominant_rgb = km.cluster_centers_[major_cluster]
            return dominant_rgb

        def color_distance(c1, c2):
            """Calculates the Euclidean distance between two Lab colors."""
            return np.sqrt(np.sum((c1 - c2) ** 2))

        refined_data = []
        for url in st.session_state["images"]:
            try:
                resp = requests.get(url, timeout=5)
                if resp.status_code == 200:
                    img_pil = Image.open(BytesIO(resp.content)).convert("RGB")
                    dom_rgb = get_dominant_color(img_pil, k=3)
                    dist = color_distance(rgb_to_lab(dom_rgb), target_lab)
                    refined_data.append((url, dist))
            except Exception as e:
                # Skip this image if downloading or processing fails
                pass

        # Sort images by increasing color distance and keep only the top N images
        refined_data.sort(key=lambda x: x[1])
        top_n = 200
        refined_data = refined_data[:top_n]
        st.session_state["images"] = [x[0] for x in refined_data]


### Example Comparison

![Example Comparison](image/comparison_k.png)

Experimental results indicate that while K-Means can analyze the dominant colors in an image, the Unsplash API’s built-in color filtering is already quite accurate. Consequently, additional clustering-based refinements do not significantly enhance the overall image display. Moreover, although increasing the number of clusters (e.g., k=8) can theoretically capture more color information, simply taking the center of the largest cluster may not fully utilize that additional data.

## 7. Displaying Images and the "Load More" Button

Using Streamlit’s **st.columns()** layout, images are displayed in a grid with four columns. When the "Load More" button is clicked, the page number increments by 1 and additional images are loaded.

In [None]:
num_cols = 4
cols = st.columns(num_cols)
for idx, url in enumerate(st.session_state["images"]):
    with cols[idx % num_cols]:
        st.image(url, use_container_width=True)

if st.button("Load More"):
    st.session_state["page"] += 1
    c = None if selected_color == "None" else selected_color
    load_images(color=c)
elif st.session_state["page"] == 1:
    st.write("No images to display. Please click 'Show Patterns' on the sidebar.")


## 8. Understanding the Data and Statistics

This project has provided a clearer understanding of the limitations inherent in using color clustering as an unsupervised learning method. Despite multiple parameter adjustments, the accuracy of extracting a dominant color from complex images remains limited. This observation aligns with the viewpoints expressed by Gugelmann Galaxy （http://www.mathiasbernhard.ch/gugelmann/）and FeatureInsight （https://www.microsoft.com/en-us/research/wp-content/uploads/2016/09/FeatureInsight.VAST2015.pdf）: even advanced machine recognition techniques can struggle to accurately classify the multifaceted painting techniques and mixed color palettes present in artworks.

Additionally, I discovered a new insight during this experiment. Sometimes, websites do not insist on achieving absolute color accuracy by rigidly enforcing exact color matching. Instead, they incorporate a certain degree of tolerance—meaning that even if a color isn’t perfectly matched, images with hues that are close enough will still be displayed to users. This approach helps users explore a broader range of potential images.

## 9. Ethical Considerations and LLM Disclaimer

Throughout the development of this project, strict adherence to Unsplash’s terms of use was maintained, ensuring respect for copyright and compliance with API rate limits. Data processing and storage practices were implemented with a focus on privacy and compliance, avoiding excessive data scraping or misuse.

It is also noted that the project’s code was generated with the assistance of a language model, with subsequent modifications made through specific instructions. 