# Temporal Analysis

This notebook analyzes the temporal patterns of image downloads:
- Download timeline and patterns
- Rate analysis and efficiency metrics
- Time-based distributions
- Performance optimization insights

In [1]:
# Import required modules
import sys
import os
sys.path.append('..')

from visualizations.data_loader import (
    load_all_metadata,
    create_image_details_dataframe,
    get_temporal_stats
)

from visualizations.plotters import (
    plot_temporal_analysis
)

import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from datetime import datetime, timedelta

print("✅ Modules imported successfully")

✅ Modules imported successfully


In [2]:
# Load data
metadata_list = load_all_metadata()
image_df = create_image_details_dataframe(metadata_list)

# Filter to images with valid timestamps
temporal_df = image_df[
    (image_df['has_download_data'] == True) & 
    (image_df['download_timestamp'].notna())
].copy()

print(f"📊 Total images: {len(image_df)}")
print(f"⏱️ Images with timestamps: {len(temporal_df)}")

if len(temporal_df) == 0:
    print("❌ No temporal data available for analysis")
else:
    print("✅ Ready for temporal analysis")

Found 100 metadata files
✅ Successfully loaded 100 metadata files
📊 Total images: 49212
⏱️ Images with timestamps: 49008
✅ Ready for temporal analysis


In [3]:
# Get temporal statistics
temporal_stats = get_temporal_stats(image_df)

if temporal_stats['has_temporal_data']:
    print("⏱️ TEMPORAL ANALYSIS RESULTS:")
    print("=" * 40)
    print(f"Images with timestamps: {temporal_stats['total_images']:,}")
    print(f"Earliest download: {temporal_stats['earliest_download']}")
    print(f"Latest download: {temporal_stats['latest_download']}")
    print(f"Total duration: {temporal_stats['duration_hours']:.1f} hours")
    print(f"Average interval: {temporal_stats['avg_interval_seconds']:.1f} seconds")
    print(f"Median interval: {temporal_stats['median_interval_seconds']:.1f} seconds")
else:
    print("❌ No temporal data available")

⏱️ TEMPORAL ANALYSIS RESULTS:
Images with timestamps: 49,008
Earliest download: 2025-05-24 09:57:21.152048
Latest download: 2025-05-30 08:52:48.010403
Total duration: 142.9 hours
Average interval: 10.5 seconds
Median interval: 1.0 seconds


In [4]:
# Basic temporal visualization
if len(temporal_df) > 0:
    plot_temporal_analysis(image_df, use_plotly=True)

## Temporal Analysis Summary

This analysis provides insights into the timing and efficiency of your image download process:

### Key Temporal Metrics
- **Download Duration**: Total time from first to last download
- **Download Rate**: Images downloaded per hour/minute/second
- **Interval Analysis**: Time between consecutive downloads
- **Pattern Recognition**: Peak hours, days, and seasonal trends

### Performance Insights

#### Download Efficiency
- **Rate Consistency**: How stable the download rate is over time
- **Peak Performance**: Identifying optimal download windows
- **Bottleneck Detection**: Finding slow periods or issues

#### Timing Patterns
- **Hour of Day**: When downloads are most/least active
- **Day of Week**: Weekly patterns in download activity
- **Category Timing**: How different categories are distributed over time

### Optimization Opportunities

#### Speed Improvements
1. **Parallel Downloads**: Implementing concurrent downloads
2. **Reduced Delays**: Optimizing sleep times between requests
3. **Batch Processing**: Grouping similar requests

#### Timing Optimization
1. **Load Distribution**: Spreading downloads across optimal hours
2. **Rate Limiting**: Avoiding server overload while maximizing speed
3. **Error Recovery**: Handling timeouts and retries efficiently

### Next Steps
1. **Review Bottlenecks**: Address slow periods identified in the analysis
2. **Implement Optimizations**: Apply recommended timing improvements
3. **Monitor Progress**: Track performance metrics over time
4. **Adjust Parameters**: Fine-tune based on temporal patterns