<center>
  <h1>Skalu - Horizontal Line Detection</h1>
</center>

This notebook detects horizontal lines in images using computer vision techniques.

## Instructions

1. Upload your image files using the file browser on the left sidebar
2. Run the cell below to process all images
3. View and download the results

**Note**: Images will be processed automatically and results will be displayed below.

In [None]:
#@title ## Setup Environment
import os
import sys
import subprocess

# Install or update required packages
!pip install -q opencv-python numpy tqdm ipywidgets

# Create directory for output
!mkdir -p output

print("✅ Environment setup complete!")

In [None]:
# Skalu - Horizontal Line Detection Tool
import cv2
import os
import json
import glob
import numpy as np
from tqdm.notebook import tqdm
from IPython.display import display, HTML, Image
from google.colab import files

def detect_horizontal_lines(image, min_line_width_ratio=0.2, max_line_height=10):
    """
    Detects horizontal lines in an image.
    
    Args:
        image: The input image (BGR format)
        min_line_width_ratio: Minimum width ratio compared to image width (default: 0.2)
        max_line_height: Maximum height of a line in pixels (default: 10)
        
    Returns:
        A list of dictionaries containing line information
    """
    height, width = image.shape[:2]
    
    # Convert to grayscale if needed
    if len(image.shape) == 3:
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    else:
        gray = image.copy()
    
    # Apply adaptive thresholding for better results in varied lighting
    thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, 
                                  cv2.THRESH_BINARY_INV, 11, -2)
    
    # Define horizontal kernel size based on image dimensions
    kernel_width = max(50, int(width * 0.05))
    horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernel_width, 1))
    
    # Detect horizontal lines
    detect_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
    
    # Find contours
    contours, _ = cv2.findContours(detect_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    
    horizontal_lines = []
    min_width = int(min_line_width_ratio * width)
    
    for c in contours:
        x, y, w, h = cv2.boundingRect(c)
        if w > min_width and h <= max_line_height:
            horizontal_lines.append({
                "x": int(x),
                "y": int(y),
                "width": int(w),
                "height": int(h)
            })
    
    # Sort lines by vertical position
    horizontal_lines.sort(key=lambda line: line["y"])
    
    return horizontal_lines

def draw_detections(image, horizontal_lines):
    """
    Draw detected horizontal lines on a copy of the image.
    
    Args:
        image: The original image
        horizontal_lines: List of detected horizontal line data
        
    Returns:
        Image with visualized detections
    """
    debug_image = image.copy()
    
    # Draw horizontal lines in green
    for i, line in enumerate(horizontal_lines):
        x, y, w, h = line["x"], line["y"], line["width"], line["height"]
        cv2.rectangle(debug_image, (x, y), (x + w, y + h), (0, 255, 0), 2)
        
        # Add line number label
        cv2.putText(debug_image, f"#{i+1}", (x, y-5), cv2.FONT_HERSHEY_SIMPLEX, 
                    0.5, (0, 0, 255), 1, cv2.LINE_AA)
    
    return debug_image

def create_download_button(file_path, button_text=None):
    """Create a download button for file"""
    import base64
    
    if button_text is None:
        button_text = f"Download {os.path.basename(file_path)}"
    
    with open(file_path, 'rb') as f:
        data = f.read()
    b64 = base64.b64encode(data).decode()
    
    button_html = f'''
    <a href="data:application/octet-stream;base64,{b64}" download="{os.path.basename(file_path)}">
        <button style="font-size: 14px; padding: 5px 15px; background-color: #4CAF50; color: white; 
                 border: none; border-radius: 4px; cursor: pointer;">
            {button_text}
        </button>
    </a>
    '''
    
    return HTML(button_html)

def process_image(image_path, min_line_width_ratio=0.2, max_line_height=10):
    """Process a single image and return results"""
    # Load image
    image = cv2.imread(image_path)
    if image is None:
        print(f"Warning: Unable to load image at {image_path}")
        return None, None, None
    
    # Detect horizontal lines
    horizontal_lines = detect_horizontal_lines(
        image, 
        min_line_width_ratio=min_line_width_ratio,
        max_line_height=max_line_height
    )
    
    # Prepare result data
    result = {
        "image_info": {
            "path": image_path,
            "width": image.shape[1],
            "height": image.shape[0],
        },
        "detection_params": {
            "min_line_width_ratio": min_line_width_ratio,
            "max_line_height": max_line_height
        },
        "horizontal_lines": horizontal_lines,
        "line_count": len(horizontal_lines)
    }
    
    # Generate visualization
    debug_image = draw_detections(image, horizontal_lines)
    
    return result, image, debug_image

In [None]:
#@title ## Process Images
#@markdown Click the button to start processing all images in the current directory

#@markdown ### Line Detection Parameters
min_line_width_ratio = 0.2 #@param {type:"slider", min:0.1, max:0.5, step:0.05}
#@markdown Minimum width of line as a ratio of image width

max_line_height = 10 #@param {type:"slider", min:1, max:20, step:1}
#@markdown Maximum height of a line in pixels

# Find all image files in current directory
supported_extensions = [".jpg", ".jpeg", ".png", ".bmp", ".tiff", ".webp"]
image_files = []

for ext in supported_extensions:
    image_files.extend(glob.glob(f"*{ext}"))
    image_files.extend(glob.glob(f"*{ext.upper()}"))

# If no files found, prompt user to upload
if not image_files:
    print("No image files found. Please upload some images:")
    uploaded = files.upload()
    image_files = list(uploaded.keys())

if not image_files:
    print("❌ No images to process. Please upload images first.")
else:
    print(f"🔍 Found {len(image_files)} images to process")
    
    # Create output directory if it doesn't exist
    os.makedirs('output', exist_ok=True)
    
    # Process each image with progress bar
    all_results = {}
    
    for filename in tqdm(sorted(image_files), desc="Processing images"):
        # Process the image
        result, original, debug_image = process_image(
            filename,
            min_line_width_ratio=min_line_width_ratio,
            max_line_height=max_line_height
        )
        
        if result is None:
            continue
        
        # Save the result
        base_name = os.path.splitext(filename)[0]
        
        # Save detection visualization
        debug_image_path = f"output/{base_name}_detected.jpg"
        cv2.imwrite(debug_image_path, debug_image)
        
        # Add to results dictionary
        all_results[filename] = result
    
    # Add summary information
    all_results["_summary"] = {
        "total_images": len(all_results) - 1,  # Subtract 1 for the _summary key
        "detection_params": {
            "min_line_width_ratio": min_line_width_ratio,
            "max_line_height": max_line_height
        }
    }
    
    # Save combined JSON
    output_json_path = "output/structures.json"
    with open(output_json_path, "w", encoding="utf-8") as f:
        json.dump(all_results, f, indent=4, ensure_ascii=False)
    
    print(f"\n✅ Processing complete! Results saved to {output_json_path}")
    print(f"📊 Total images processed: {len(all_results) - 1}")
    print(f"📋 Total lines detected: {sum(result['line_count'] for result in all_results.values() if isinstance(result, dict) and 'line_count' in result)}")
    
    # Display download buttons
    display(HTML("<h3>Download Results</h3>"))
    display(create_download_button(output_json_path, "Download JSON Results"))
    
    # Display image previews
    display(HTML("<h3>Detection Previews</h3>"))
    
    # Show up to 5 images with results
    preview_count = min(5, len(image_files))
    for i, filename in enumerate(sorted(image_files)[:preview_count]):
        base_name = os.path.splitext(filename)[0]
        debug_image_path = f"output/{base_name}_detected.jpg"
        
        if os.path.exists(debug_image_path):
            display(HTML(f"<h4>Image {i+1}: {filename}</h4>"))
            display(Image(debug_image_path, width=600))
            display(create_download_button(debug_image_path, f"Download Detection for {filename}"))
    
    # If there are more images than previewed
    if len(image_files) > preview_count:
        display(HTML(f"<p>...and {len(image_files) - preview_count} more images processed. Check the output folder for all results.</p>"))
    
    # Create a zip file of all results for easy download
    !zip -q -r output/skalu_results.zip output/
    display(create_download_button("output/skalu_results.zip", "Download All Results (ZIP)"))

## How to Use the Results

The `structures.json` file contains detailed information about each detected horizontal line:

```json
{
  "image_filename.jpg": {
    "image_info": {
      "path": "image_filename.jpg",
      "width": 1240,
      "height": 1754
    },
    "detection_params": {
      "min_line_width_ratio": 0.2,
      "max_line_height": 10
    },
    "horizontal_lines": [
      {
        "x": 120,
        "y": 350,
        "width": 1000,
        "height": 2
      },
      ...
    ],
    "line_count": 5
  },
  "_summary": {
    "total_images": 3,
    "detection_params": {
      "min_line_width_ratio": 0.2,
      "max_line_height": 10
    }
  }
}
```

You can use this data for:
- Document structure analysis
- Form field detection
- Table structure extraction
- OCR pre-processing