# Tutorial 1: Multi-threaded Image Downloading

This notebook demonstrates how to use `landlens_db`'s multi-threaded downloading capabilities to efficiently download Mapillary images.

## Prerequisites

You'll need the same prerequisites as Tutorial 1, including:
- A Mapillary API token
- The .env file with MLY_TOKEN and DOWNLOAD_DIR defined
- landlens_db installed

## Setup

First, let's import the necessary modules and load environment variables:

In [11]:
from landlens_db.handlers.cloud import Mapillary
from landlens_db.geoclasses.geoimageframe import GeoImageFrame
from dotenv import load_dotenv
import os
import time

load_dotenv()

MLY_TOKEN = os.environ.get("MLY_TOKEN")
DOWNLOAD_DIR = os.environ.get("DOWNLOAD_DIR")

# Create download directory if it doesn't exist
os.makedirs(DOWNLOAD_DIR, exist_ok=True)

## Fetching Images with Multi-threading

The `download_images_to_local` method supports multi-threaded downloads through its `max_workers` parameter. Let's fetch a small sample of images first:

In [None]:
# Initialize Mapillary connection
importer = Mapillary(MLY_TOKEN)

# Fetch a small sample of images from Shibuya area
bbox = [139.699, 35.658, 139.7, 35.659]  # Small area in Shibuya
fields = ['id', 'captured_at', 'compass_angle', 'thumb_1024_url', 'geometry']

# Fetch images and create GeoImageFrame
gdf = importer.fetch_within_bbox(
    bbox,
    fields=fields,
    max_images=50  # Limit to 50 images for this tutorial
)

# Convert to GeoImageFrame
images = GeoImageFrame(gdf)

print(f"Found {len(images)} images")
print("\nSample of names that will be used for filenames:")
print(images['name'].head())

### Single-threaded vs Multi-threaded Download

Let's compare download speeds with different numbers of worker threads. Based on testing, multi-threading provides significant speedups:

- Single thread: ~1 image/second (baseline)
- 5 workers: ~42 images/second (38x faster)
- 10 workers: ~74 images/second (67x faster)

In [None]:
# Test with different numbers of workers
for workers in [1, 5, 10]:
    start_time = time.time()

    # Download images using name column for filenames
    local_images = images.download_images_to_local(
        DOWNLOAD_DIR,
        filename_column='name',  # This uses the mly|{id} format
        max_workers=workers
    )

    duration = time.time() - start_time
    print(f"\nResults with {workers} worker{'s' if workers > 1 else ''}:")
    print(f"Time taken: {duration:.2f} seconds")
    print(f"Speed: {len(images)/duration:.2f} images/second")

    # Show sample of downloaded files
    print("\nSample of downloaded files:")
    for file in sorted(os.listdir(DOWNLOAD_DIR))[:3]:
        print(f"- {file}")

### Important Notes About Multi-threaded Downloads

1. **Number of Workers**: 
   - Default is 10 workers (optimal for most cases)
   - Testing shows near-linear scaling up to 10 workers
   - 67x speedup with 10 workers vs single thread

2. **Error Handling**:
   - Built-in retry mechanism (3 attempts per image)
   - 1-second delay between retries
   - Failed downloads are logged but don't stop the process

3. **Progress Tracking**:
   - Progress bar shows overall download progress
   - Shows current download speed
   - Indicates any failed downloads

4. **Filename Handling**:
   - Uses 'name' column for filenames (format: mly|{id})
   - Results in clean filenames like 'mly_123456789.jpg'
   - Prevents issues with long URLs as filenames

### Example Usage

In [None]:
# Default usage (10 workers for optimal performance)
local_images = images.download_images_to_local(
    DOWNLOAD_DIR,
    filename_column='name'  # Use name column for proper filenames
)

# With custom settings
local_images = images.download_images_to_local(
    DOWNLOAD_DIR,
    filename_column='name',
    max_workers=15  # Adjust workers based on your system
)

### Best Practices

1. **Choose Worker Count Wisely**:
   - Start with 10 workers (provides optimal performance)
   - Near-linear scaling up to 10 workers observed
   - Consider your network bandwidth and system resources

2. **Manage Download Directory**:
   - Use a dedicated directory for downloads
   - Directory is created automatically if it doesn't exist
   - Files are named using Mapillary ID format (mly_123456789.jpg)

3. **Monitor Performance**:
   - Watch the progress bar for download speed
   - Check for failed downloads in the output
   - Adjust workers if needed based on performance

## Example: Processing Larger Areas

When working with larger areas, make sure to use the max_images parameter to control the number of downloads:

In [None]:
# Larger area in Shibuya
large_bbox = [139.69, 35.65, 139.71, 35.67]

# Limit to 100 images
gdf = importer.fetch_within_bbox(
    large_bbox,
    fields=fields,
    max_images=100  # Limit number of images
)

# Convert to GeoImageFrame
images = GeoImageFrame(gdf)

print(f"Found {len(images)} images")

# Download with optimal worker count
local_images = images.download_images_to_local(
    DOWNLOAD_DIR,
    filename_column='name',  # Use proper filename format
    max_workers=10  # Use optimal worker count from testing
)