In [1]:
# FAST SCRAPING NOTEBOOK - Single Cell Solution
# Import and use the optimized fast scraping function

import sys
import os
import time

# Add current directory to path to import our module
sys.path.append(os.getcwd())

# Import the fast scraping function
from scraping.fast_scraping import fast_scrape_multiple_dogs

def run_fast_scraping(start_id=637322, end_id=637332, output_file="dogs3.csv"):
    """
    Run fast scraping for a range of dog IDs
    
    Args:
        start_id: Starting dog ID
        end_id: Ending dog ID (exclusive)
        output_file: Output CSV file name
    """
    
    # Generate dog ID list
    dog_ids = [str(i) for i in range(start_id, end_id)]
    
    print(f"🚀 Starting fast scraping for {len(dog_ids)} dogs (IDs {start_id}-{end_id-1})")
    print(f"📂 Output file: {output_file}")
    print("=" * 50)
    
    start_time = time.time()
    
    try:
        # Run the fast scraping
        total_records = fast_scrape_multiple_dogs(
            dog_ids=dog_ids,
            output_file=output_file,
            batch_size=25  # Save progress every 25 races
        )
        
        elapsed_time = time.time() - start_time
        
        print("=" * 50)
        print("✅ SCRAPING COMPLETED!")
        print(f"📊 Total records scraped: {total_records}")
        print(f"⏱️ Total time: {elapsed_time:.1f} seconds")
        print(f"🐕 Average per dog: {elapsed_time/len(dog_ids):.1f} seconds")
        print(f"📈 Speed: {len(dog_ids)/elapsed_time*60:.1f} dogs per minute")
        print(f"💾 Data saved to: {output_file}")
        
        # Show some statistics
        if total_records > 0:
            avg_races_per_dog = total_records / len(dog_ids)
            print(f"🏁 Average races per dog: {avg_races_per_dog:.1f}")
        
        return total_records
        
    except Exception as e:
        print(f"❌ Error during scraping: {str(e)}")
        return 0

# CONFIGURATION - Edit these values as needed
START_DOG_ID = 637322  # Starting dog ID
END_DOG_ID = 637332    # Ending dog ID (will scrape 10 dogs)
OUTPUT_FILE = "dogs3.csv"  # Output file name

# RUN THE SCRAPING
print("🔧 Fast Scraping Configuration:")
print(f"   Start ID: {START_DOG_ID}")
print(f"   End ID: {END_DOG_ID}")
print(f"   Total dogs: {END_DOG_ID - START_DOG_ID}")
print(f"   Output: {OUTPUT_FILE}")
print()

# Execute the scraping
total_scraped = run_fast_scraping(
    start_id=START_DOG_ID,
    end_id=END_DOG_ID,
    output_file=OUTPUT_FILE
)

# Final summary
print(f"\n🎯 FINAL RESULT: {total_scraped} total records scraped")
if os.path.exists(OUTPUT_FILE):
    file_size = os.path.getsize(OUTPUT_FILE) / 1024  # KB
    print(f"📁 File size: {file_size:.1f} KB")

🔧 Fast Scraping Configuration:
   Start ID: 637322
   End ID: 637332
   Total dogs: 10
   Output: dogs3.csv

🚀 Starting fast scraping for 10 dogs (IDs 637322-637331)
📂 Output file: dogs3.csv
Loading existing data from dogs3.csv...
Loaded 10038 existing records, 1786 unique races
Collecting race URLs from all dogs...
Processing dog 1/10: 637322
Collecting race URLs from all dogs...
Processing dog 1/10: 637322
Processing dog 2/10: 637323
Processing dog 2/10: 637323
Processing dog 3/10: 637324
Processing dog 3/10: 637324
Processing dog 4/10: 637325
Processing dog 4/10: 637325
Processing dog 5/10: 637326
Processing dog 5/10: 637326
Processing dog 6/10: 637327
Processing dog 6/10: 637327
Processing dog 7/10: 637328
Processing dog 7/10: 637328
Processing dog 8/10: 637329
Processing dog 8/10: 637329
Processing dog 9/10: 637330
Processing dog 9/10: 637330
Processing dog 10/10: 637331
Processing dog 10/10: 637331
Found 100 unique races to process
Need to scrape 75 new races
Found 100 unique rac

# Greyhound Racing Dataset - Column Explanations

## Overview
This CSV file contains comprehensive greyhound racing data with 42 columns capturing race information, dog characteristics, performance metrics, and derived features for machine learning analysis.

## Column Descriptions

### Basic Race Information
- **Column 1-2**: `meeting_id`, `race_id` - Unique identifiers for the racing meeting and specific race
- **Column 3**: `date` - Race date (e.g., "Thursday 6th February 2025")
- **Column 4**: `track` - Racing venue (Newcastle, Towcester, Doncaster, etc.)
- **Column 5**: `time` - Race start time
- **Column 6**: `grade` - Race grade/class (A3, A4, B4, etc.)
- **Column 7**: `distance` - Race distance in meters (450m, 480m, 500m, etc.)
- **Column 8**: `prize_info` - Prize money breakdown for winners and placed dogs

### Race Performance
- **Column 9**: `finishing_position_text` - Finishing position as text (1st, 2nd, 3rd, etc.)
- **Column 10**: `trap_number` - Starting trap number (1-6)
- **Column 11**: `dog_id` - Unique identifier for the dog
- **Column 12**: `dog_name` - Name of the greyhound
- **Column 13**: `trainer` - Trainer's name
- **Column 14**: `comment` - Race commentary describing the dog's performance
- **Column 15**: `odds` - Betting odds (e.g., "3/1", "11/8F" where F = favorite)
- **Column 16**: `sectional_time` - Split time at specific distance point
- **Column 17**: `finish_time` - Final race time with margin behind winner
- **Column 18**: `date_of_birth` - Dog's birth date
- **Column 19**: `weight` - Dog's racing weight in kg
- **Column 20**: `color_sex` - Color and sex code (e.g., "b - bk" = bitch - black)
- **Column 21**: `sire` - Father's name
- **Column 22**: `dam` - Mother's name
- **Column 23**: `breeding_info` - Combined breeding information
- **Column 24**: `url` - Link to race details

### Processed Features
- **Column 25**: `distance_numeric` - Race distance as numeric value
- **Column 26**: `finishing_position` - Finishing position as number (1.0, 2.0, etc.)
- **Column 27**: `weight_numeric` - Weight as numeric value
- **Column 28**: `trap_number_numeric` - Trap number as numeric
- **Column 29**: `sectional_time_numeric` - Sectional time as number
- **Column 30**: `won_race` - Binary indicator (1 if won, 0 if not)

### Performance Analysis Features
- **Column 31**: `margin_lengths` - Margin behind winner in lengths
- **Column 32**: `odds_numeric` - Converted odds as decimal number
- **Column 33**: `color_code` - Simplified color code (bk, bd, be, f, etc.)
- **Column 34**: `is_favorite` - Binary indicator if dog was favorite
- **Column 35**: `early_pace` - Indicator of early speed/position
- **Column 36**: `led_at_some_point` - Whether dog led during race
- **Column 37**: `bumped_or_crowded` - Indicator of racing interference
- **Column 38**: `clear_run` - Whether dog had unimpeded run
- **Column 39**: `ran_on` - Whether dog finished strongly
- **Column 40**: `checked_or_blocked` - Racing trouble indicators
- **Column 41**: `wide_run` - Whether dog raced wide

### Statistical Features
- **Column 42**: `performance_score` - Calculated performance metric
- **Column 43**: `is_short_distance` - Binary indicator for sprint races
- **Column 44**: `is_long_distance` - Binary indicator for distance races  
- **Column 45**: `is_middle_distance` - Binary indicator for middle distance
- **Column 46**: `track_type` - Numeric track type classification

## Key Insights from the Data

### Race Grades
- **A grades**: Higher class races (A2, A3, A4, etc.)
- **B grades**: Mid-level competition  
- **D grades**: Lower class/maiden races
- **HP/OR**: Handicap/Open races

### Performance Indicators
The comment field contains valuable racing information:
- **"ALed"** = Always led
- **"QAw"** = Quick away from traps
- **"Crd"** = Crowded during race
- **"Bmp"** = Bumped by other dogs
- **"RnOn"** = Ran on strongly at finish (positive - dog accelerated/finished with strong pace in final stretch)
- **"SAw"** = Slow away from traps
- **"Wide"** = Raced wide around bends

### Distance Categories
- **Short**: 245m-285m (sprint races)
- **Middle**: 400m-450m (standard distances)  
- **Long**: 480m-500m+ (staying races)

## Racing Commentary Explanation

### "Ran On" (RnOn) - Detailed Meaning:
In greyhound racing, **"Ran On"** is a **positive performance indicator** that means:

1. **Strong Finish**: The dog accelerated or maintained strong pace in the final portion of the race
2. **Late Speed**: Shows the dog has stamina and finishing kick
3. **Closing Ground**: Often indicates the dog was gaining on leaders or maintaining position strongly
4. **Good Fitness**: Suggests the dog is in good racing condition
5. **Distance Suitability**: May indicate the dog suits longer distances where stamina matters

**This is GOOD performance** - it shows the dog finished the race strongly rather than tiring. Dogs that "run on" are often considered to have good racing fitness and potential for improvement at longer distances.

**Contrast with negative terms**:
- "Tired" = Dog slowed significantly in final stretch
- "Faded" = Dog lost position/pace late in race
- "Weakened" = Dog showed lack of stamina

This dataset appears designed for predictive modeling of greyhound race outcomes, with features capturing both historical performance and race-day factors that influence results.