# Natural Language Photo Editing: Transforming Words into Visual Enhancements

## Problem Statement

Traditional photo editing interfaces rely on technical sliders, complex terminology, and specialized knowledge that can be intimidating for casual users. Even with simplified consumer apps, users must:

1. Understand what each control does (exposure, contrast, saturation, etc.)
2. Know which adjustments to make for specific visual goals
3. Make multiple trial-and-error attempts to achieve desired results
4. Remember which combinations of settings create specific looks

This technical barrier prevents many people from effectively enhancing their photos, leading to either unedited images or reliance on one-size-fits-all filters that don't address the specific needs of each photo.

## How Generative AI Solves This Problem

Natural Language Processing (NLP) combined with computer vision creates a revolutionary approach to photo editing. Instead of manipulating technical controls, users can simply describe what they want in plain English:

- "Make the sunset colors more vibrant"
- "Add more contrast and warmth to the portrait"
- "Give this landscape a dramatic cinematic look"
- "Fix the lighting in this dark indoor photo"

The AI system:
1. **Interprets natural language** to understand the user's intent
2. **Maps descriptions to technical operations** using function calling
3. **Applies appropriate adjustments** to achieve the described effect
4. **Provides transparency** by showing which operations were performed

This notebook demonstrates how our Natural Language Photo Editor bridges the gap between human creative intent and technical execution, making photo editing accessible to everyone.


## Setup and Imports

Let's start by importing the necessary libraries and setting up our environment.


In [None]:
import os
import sys
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import cv2

# Import our photo editing package
from photoedit_mvp import (
    load_image, 
    save_image
)

# Import the natural language processor
from photoedit_mvp.nl_processor import NLProcessor

# Set up matplotlib for displaying images
%matplotlib inline
plt.rcParams['figure.figsize'] = (12, 8)


## Helper Functions

Let's define some helper functions to display images and results.


In [None]:
def display_image(image, title=None):
    """Display an image with an optional title."""
    plt.figure(figsize=(10, 8))
    if isinstance(image, str):
        # If image is a file path, load it
        image = load_image(image)
    
    plt.imshow(image)
    plt.axis('off')
    if title:
        plt.title(title, fontsize=14)
    plt.show()

def display_before_after(before, after, titles=None):
    """Display before and after images side by side."""
    if titles is None:
        titles = ['Before', 'After']
    
    plt.figure(figsize=(20, 10))
    
    plt.subplot(1, 2, 1)
    if isinstance(before, str):
        before = load_image(before)
    plt.imshow(before)
    plt.axis('off')
    plt.title(titles[0], fontsize=14)
    
    plt.subplot(1, 2, 2)
    if isinstance(after, str):
        after = load_image(after)
    plt.imshow(after)
    plt.axis('off')
    plt.title(titles[1], fontsize=14)
    
    plt.tight_layout()
    plt.show()

def display_multiple(images, titles=None, cols=3):
    """Display multiple images in a grid."""
    n = len(images)
    rows = (n + cols - 1) // cols
    
    plt.figure(figsize=(5*cols, 5*rows))
    
    for i, image in enumerate(images):
        plt.subplot(rows, cols, i+1)
        if isinstance(image, str):
            image = load_image(image)
        plt.imshow(image)
        plt.axis('off')
        if titles and i < len(titles):
            plt.title(titles[i], fontsize=12)
    
    plt.tight_layout()
    plt.show()


## 1. Understanding Natural Language Processing for Photo Editing

Natural language processing for photo editing involves translating human descriptions into specific technical operations. This requires:

1. **Understanding intent**: Parsing what the user wants to achieve
2. **Parameter extraction**: Determining the magnitude and direction of adjustments
3. **Function mapping**: Selecting the appropriate editing operations
4. **Execution**: Applying the operations in the right sequence

Let's implement the image processing functions that our natural language processor will use:


In [None]:
def adjust_exposure(image, amount):
    """Adjust image exposure/brightness.
    
    Args:
        image: Input image
        amount: Adjustment amount (-1.0 to 1.0)
        
    Returns:
        Adjusted image
    """
    # Simple implementation for demonstration
    result = image.copy().astype(float)
    result = result * (1 + amount)
    return np.clip(result, 0, 255).astype(np.uint8)

def adjust_contrast(image, multiplier):
    """Adjust image contrast.
    
    Args:
        image: Input image
        multiplier: Contrast multiplier (0.5 to 2.0)
        
    Returns:
        Adjusted image
    """
    # Simple implementation for demonstration
    mean = np.mean(image, axis=(0, 1))
    result = image.copy().astype(float)
    for i in range(3):
        result[:,:,i] = (result[:,:,i] - mean[i]) * multiplier + mean[i]
    return np.clip(result, 0, 255).astype(np.uint8)

def adjust_saturation(image, adjustment):
    """Adjust image saturation/vibrance.
    
    Args:
        image: Input image
        adjustment: Saturation adjustment (-1.0 to 1.0)
        
    Returns:
        Adjusted image
    """
    # Convert to HSV, adjust S channel, convert back to RGB
    hsv = cv2.cvtColor(image, cv2.COLOR_RGB2HSV).astype(float)
    hsv[:,:,1] = hsv[:,:,1] * (1 + adjustment)
    hsv[:,:,1] = np.clip(hsv[:,:,1], 0, 255)
    return cv2.cvtColor(hsv.astype(np.uint8), cv2.COLOR_HSV2RGB)

def adjust_temperature(image, adjustment):
    """Adjust image color temperature (warmth/coolness).
    
    Args:
        image: Input image
        adjustment: Temperature adjustment (-0.5 to 0.5)
        
    Returns:
        Adjusted image
    """
    # Simple implementation - increase red for warmth, blue for coolness
    result = image.copy().astype(float)
    if adjustment > 0:  # Warm
        result[:,:,0] = np.clip(result[:,:,0] * (1 + adjustment), 0, 255)  # Red
        result[:,:,2] = np.clip(result[:,:,2] * (1 - adjustment/2), 0, 255)  # Blue
    else:  # Cool
        result[:,:,2] = np.clip(result[:,:,2] * (1 - adjustment), 0, 255)  # Blue
        result[:,:,0] = np.clip(result[:,:,0] * (1 + adjustment/2), 0, 255)  # Red
    return result.astype(np.uint8)

def adjust_sharpness(image, strength):
    """Adjust image sharpness.
    
    Args:
        image: Input image
        strength: Sharpness strength (0.0 to 1.0)
        
    Returns:
        Adjusted image
    """
    # Simple implementation using unsharp masking
    blur = cv2.GaussianBlur(image, (0, 0), 3)
    result = image.copy().astype(float)
    result = result + strength * (image.astype(float) - blur)
    return np.clip(result, 0, 255).astype(np.uint8)

def reduce_noise(image, strength):
    """Reduce noise in the image.
    
    Args:
        image: Input image
        strength: Noise reduction strength (0.0 to 1.0)
        
    Returns:
        Adjusted image
    """
    # Simple implementation using bilateral filter
    # Adjust parameters based on strength
    d = int(5 + strength * 10)  # Diameter of each pixel neighborhood
    sigma_color = 50 + strength * 100  # Filter sigma in the color space
    sigma_space = 50 + strength * 100  # Filter sigma in the coordinate space
    
    return cv2.bilateralFilter(image, d, sigma_color, sigma_space)


## 2. Setting Up the Natural Language Processor

Now let's set up our natural language processor and register the image processing functions we defined above. This will allow the processor to map natural language instructions to specific operations.


In [None]:
# Initialize the natural language processor
nl_processor = NLProcessor()

# Register our image processing functions
nl_processor.register_function(
    "adjust_exposure",
    adjust_exposure,
    "Adjust the brightness/exposure of the image",
    {"amount": {"type": "number", "description": "Amount to adjust exposure (-1.0 to 1.0)"}}
)

nl_processor.register_function(
    "adjust_contrast",
    adjust_contrast,
    "Adjust the contrast of the image",
    {"multiplier": {"type": "number", "description": "Contrast multiplier (0.5 to 2.0)"}}
)

nl_processor.register_function(
    "adjust_saturation",
    adjust_saturation,
    "Adjust the color saturation/vibrance of the image",
    {"adjustment": {"type": "number", "description": "Saturation adjustment (-1.0 to 1.0)"}}
)

nl_processor.register_function(
    "adjust_temperature",
    adjust_temperature,
    "Adjust the color temperature (warmth/coolness) of the image",
    {"adjustment": {"type": "number", "description": "Temperature adjustment (-0.5 to 0.5)"}}
)

nl_processor.register_function(
    "adjust_sharpness",
    adjust_sharpness,
    "Adjust the sharpness/clarity of the image",
    {"strength": {"type": "number", "description": "Sharpness strength (0.0 to 1.0)"}}
)

nl_processor.register_function(
    "reduce_noise",
    reduce_noise,
    "Reduce noise/grain in the image",
    {"strength": {"type": "number", "description": "Noise reduction strength (0.0 to 1.0)"}}
)


## 3. Processing Natural Language Instructions

Let's load a test image and try processing some natural language instructions.


In [None]:
# Load a test image
test_image_path = '../test_images/test_image.jpg'
image = load_image(test_image_path)

# Display the original image
display_image(image, "Original Image")


Now let's try some natural language instructions and see how the system interprets and applies them.


In [None]:
# Define a function to process and display results
def process_instruction(image, instruction):
    """Process a natural language instruction and display results."""
    print(f"Instruction: '{instruction}'")
    
    # Process the instruction
    processed_image, metadata = nl_processor.process(image, instruction)
    
    # Display the functions that were called
    print("\nFunctions called:")
    for func_call in metadata['functions_called']:
        print(f"- {func_call['name']}({', '.join([f'{k}={v}' for k, v in func_call['args'].items()])})")
    
    # Display before and after
    display_before_after(image, processed_image, ["Original Image", f"After: '{instruction}'"])
    
    return processed_image

# Try a simple instruction
result1 = process_instruction(image, "Make the image warmer and increase the contrast slightly")


Let's try some more complex instructions to see how the system handles them.


In [None]:
# Try more complex instructions
instructions = [
    "Make the colors more vibrant and add some warmth",
    "Increase contrast dramatically and make it cooler",
    "Brighten the dark areas and add clarity",
    "Give it a soft, dreamy look with reduced contrast",
    "Sharpen the details and make colors pop"
]

results = []

for instruction in instructions:
    print(f"\n{'='*50}\n")
    result = process_instruction(image, instruction)
    results.append(result)


## 4. Understanding How Instructions Are Parsed

Let's take a closer look at how the natural language processor interprets different types of instructions. This will help us understand the relationship between language and editing operations.


In [None]:
# Define some instruction categories
instruction_categories = {
    "Brightness/Exposure": [
        "Brighten the image",
        "Make the image darker",
        "Increase exposure slightly",
        "Fix the underexposed areas"
    ],
    "Color Temperature": [
        "Make it warmer",
        "Add a cool blue tone",
        "Give it a warmer feel",
        "Cool down the highlights"
    ],
    "Contrast": [
        "Increase contrast",
        "Make it more dramatic with higher contrast",
        "Reduce contrast for a softer look",
        "Add just a touch more contrast"
    ],
    "Saturation/Vibrance": [
        "Make colors more vibrant",
        "Increase saturation",
        "Tone down the colors",
        "Make it slightly less saturated"
    ],
    "Clarity/Sharpness": [
        "Sharpen the details",
        "Add more clarity",
        "Make it slightly softer",
        "Enhance the fine details"
    ],
    "Noise Reduction": [
        "Reduce the noise",
        "Remove grain",
        "Smooth out the noisy areas",
        "Clean up the image"
    ],
    "Combined Effects": [
        "Make it warmer and increase contrast",
        "Brighten and add clarity",
        "Cool it down and make colors pop",
        "Give it a vintage look with warm tones and reduced contrast"
    ]
}

# Let's analyze how each category of instructions is interpreted
for category, instructions_list in instruction_categories.items():
    print(f"\n{'='*80}\n{category} Instructions\n{'='*80}")
    
    for instruction in instructions_list:
        print(f"\nInstruction: '{instruction}'")
        
        # Process the instruction but don't apply it (just analyze)
        _, metadata = nl_processor.process(image, instruction)
        
        # Display the functions that would be called
        print("Functions that would be called:")
        for func_call in metadata['functions_called']:
            print(f"- {func_call['name']}({', '.join([f'{k}={v}' for k, v in func_call['args'].items()])})")
        
        print("-" * 50)


## 5. Innovative Use Case: Guided Photo Editing Through Conversation

One of the most powerful applications of natural language photo editing is the ability to guide users through an iterative editing process, similar to working with a professional photo editor. Let's demonstrate this conversational editing workflow:


In [None]:
def conversational_editing_workflow(image):
    """Demonstrate a conversational editing workflow."""
    print("=== Conversational Photo Editing Workflow ===\n")
    print("Starting with the original image:")
    display_image(image, "Original Image")
    
    # Step 1: Initial assessment and basic enhancement
    print("\nStep 1: Initial assessment and basic enhancement")
    print("User: \"Enhance this photo to make it look better overall\"")
    
    current_image, metadata = nl_processor.process(image, "Enhance this photo to make it look better overall")
    
    print("\nAI: \"I've made some basic enhancements. I've slightly increased the exposure, added a bit of contrast, and made the colors more vibrant. Here's the result:\"")
    print("\nOperations performed:")
    for func_call in metadata['functions_called']:
        print(f"- {func_call['name']}({', '.join([f'{k}={v}' for k, v in func_call['args'].items()])})")
    
    display_image(current_image, "After Basic Enhancement")
    
    # Step 2: Specific adjustment based on user feedback
    print("\nStep 2: Specific adjustment based on user feedback")
    print("User: \"It looks better, but I'd like it to be a bit warmer and more dramatic\"")
    
    previous_image = current_image.copy()
    current_image, metadata = nl_processor.process(current_image, "Make it warmer and more dramatic")
    
    print("\nAI: \"I've added warmth by adjusting the color temperature and increased the contrast for a more dramatic look. Here's the updated image:\"")
    print("\nOperations performed:")
    for func_call in metadata['functions_called']:
        print(f"- {func_call['name']}({', '.join([f'{k}={v}' for k, v in func_call['args'].items()])})")
    
    display_before_after(previous_image, current_image, ["After Basic Enhancement", "Warmer and More Dramatic"])
    
    # Step 3: Fine-tuning
    print("\nStep 3: Fine-tuning")
    print("User: \"That's closer to what I want, but now the colors are a bit too intense. Can you tone down the saturation slightly but keep the contrast?\"")
    
    previous_image = current_image.copy()
    current_image, metadata = nl_processor.process(current_image, "Reduce saturation slightly but maintain contrast")
    
    print("\nAI: \"I've reduced the color saturation while maintaining the contrast levels. Here's the result:\"")
    print("\nOperations performed:")
    for func_call in metadata['functions_called']:
        print(f"- {func_call['name']}({', '.join([f'{k}={v}' for k, v in func_call['args'].items()])})")
    
    display_before_after(previous_image, current_image, ["Warmer and More Dramatic", "Fine-tuned"])
    
    # Step 4: Final touches
    print("\nStep 4: Final touches")
    print("User: \"That's looking good! As a final touch, can you sharpen it a bit to bring out the details?\"")
    
    previous_image = current_image.copy()
    current_image, metadata = nl_processor.process(current_image, "Sharpen to bring out details")
    
    print("\nAI: \"I've applied sharpening to enhance the details. Here's your final image:\"")
    print("\nOperations performed:")
    for func_call in metadata['functions_called']:
        print(f"- {func_call['name']}({', '.join([f'{k}={v}' for k, v in func_call['args'].items()])})")
    
    display_before_after(previous_image, current_image, ["Fine-tuned", "Final Image"])
    
    # Show the complete transformation
    print("\nComplete Transformation:")
    display_before_after(image, current_image, ["Original Image", "Final Edited Image"])
    
    return current_image

# Run the conversational editing workflow
final_image = conversational_editing_workflow(image)


## 6. Comparing Natural Language Editing to Traditional Methods

Let's compare the natural language approach to traditional editing methods to highlight the advantages of using AI.


In [None]:
def compare_approaches():
    """Compare traditional editing vs. natural language editing."""
    print("=== Traditional Editing vs. Natural Language Editing ===\n")
    
    # Define a set of traditional editing steps
    traditional_steps = [
        {"operation": "Adjust exposure", "value": 0.2, "function": adjust_exposure, "args": {"amount": 0.2}},
        {"operation": "Increase contrast", "value": 1.2, "function": adjust_contrast, "args": {"multiplier": 1.2}},
        {"operation": "Add warmth", "value": 0.15, "function": adjust_temperature, "args": {"adjustment": 0.15}},
        {"operation": "Increase saturation", "value": 0.1, "function": adjust_saturation, "args": {"adjustment": 0.1}},
        {"operation": "Sharpen", "value": 0.3, "function": adjust_sharpness, "args": {"strength": 0.3}}
    ]
    
    # Define equivalent natural language instructions
    nl_instructions = [
        "Brighten the image slightly",
        "Add a bit more contrast",
        "Make it slightly warmer",
        "Make the colors a bit more vibrant",
        "Sharpen the details"
    ]
    
    # Apply traditional editing steps
    traditional_result = image.copy()
    print("Traditional Editing Approach:")
    print("1. User needs to know which technical adjustments to make")
    print("2. User must understand what each control does")
    print("3. User must determine appropriate values for each adjustment")
    print("\nTraditional Editing Steps:")
    
    for step in traditional_steps:
        print(f"- {step['operation']}: {step['value']}")
        traditional_result = step["function"](traditional_result, **step["args"])
    
    # Apply natural language editing
    nl_result = image.copy()
    print("\nNatural Language Editing Approach:")
    print("1. User simply describes the desired changes in plain English")
    print("2. AI interprets instructions and applies appropriate adjustments")
    print("3. User can refine results with additional natural language feedback")
    print("\nNatural Language Instructions:")
    
    for instruction in nl_instructions:
        print(f"- \"{instruction}\"")
        nl_result, _ = nl_processor.process(nl_result, instruction)
    
    # Display the results side by side
    print("\nComparison of Results:")
    display_multiple([image, traditional_result, nl_result], 
                    ["Original Image", "Traditional Editing Result", "Natural Language Editing Result"])
    
    print("\nKey Advantages of Natural Language Editing:")
    print("1. Accessibility: No technical knowledge required")
    print("2. Efficiency: Faster editing with fewer steps")
    print("3. Intuitiveness: Edit using familiar language instead of technical controls")
    print("4. Flexibility: Can handle complex combined operations with a single instruction")
    print("5. Learnability: Easier for beginners to get started with photo editing")

# Run the comparison
compare_approaches()


## Conclusion

Natural language photo editing represents a significant advancement in making creative tools more accessible and intuitive. By bridging the gap between human intent and technical execution, this approach:

1. **Democratizes photo editing** by removing technical barriers
2. **Accelerates the editing workflow** by reducing the number of steps required
3. **Makes editing more intuitive** by allowing users to express their creative vision directly
4. **Provides educational value** by showing which technical operations correspond to natural language descriptions
5. **Enables iterative refinement** through conversational interaction

The future of creative software lies in these natural language interfaces that focus on what users want to achieve rather than how to achieve it. This shift from technical controls to intent-based editing will continue to expand as AI models become more sophisticated in understanding both language and visual aesthetics.

As demonstrated in this notebook, the combination of natural language processing and computer vision creates a powerful new paradigm for photo editing that makes professional-quality results accessible to everyone, regardless of their technical expertise.