# 🌊🦈 BlueCloud Mini Pipeline - Interactive Analysis

This notebook provides an interactive version of the BlueCloud Mini Pipeline, allowing you to explore skate tracking data, plankton distributions, and elasmobranch capture data with comprehensive visualizations and machine learning models.

## 📋 Table of Contents
1. [Setup and Configuration](#setup)
2. [Data Processing](#data-processing)
   - [Skate Data Processing](#skate-data)
   - [Plankton Data Processing](#plankton-data)
   - [Elasmobranch Data Processing](#elasmobranch-data)
3. [Machine Learning Model](#ml-model)
4. [Visualizations](#visualizations)
5. [Results and Analysis](#results)

---


## 🔧 Setup and Configuration {#setup}

First, let's set up the environment and configure the pipeline parameters.


In [1]:
# Import required libraries
import sys
import os
import pandas as pd
import numpy as np
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Add the pipeline modules to path
current_dir = os.getcwd()
if 'deliverable4' not in current_dir:
    os.chdir('/home/samwork/Documents/coding/bluecloud-hackathon-2025/deliverable4')

from pipeline.main_pipeline import MainPipeline
from modules import (
    SkateProcessor,
    PlanktonProcessor,
    ElasmobranchProcessor,
    TinyModel,
    Visualizer
)

print("🌊🦈 BlueCloud Mini Pipeline - Interactive Analysis")
print("=" * 50)
print(f"📅 Started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")


🌊🦈 BlueCloud Mini Pipeline - Interactive Analysis
📅 Started at: 2025-10-01 10:13:46


### Configuration Parameters

Configure the pipeline with your data paths and parameters:


In [2]:
# Configuration
base_dir = os.getcwd()
project_root = os.path.dirname(base_dir)

config = {
    'skate_csv_path': os.path.join(base_dir, 'data/Skates_Track.csv'),
    'plankton_netcdf_dir': os.path.join(base_dir, 'data/planktonoutputs'),
    'use_sample_plankton': True,  # Set to False when you have real NetCDF files
    'output_dir': os.path.join(base_dir, 'data'),
    'model_type': 'random_forest',  # Options: 'random_forest', 'linear_regression', 'gradient_boosting'
    'random_state': 42,
    'test_size': 0.2,
    # Elasmobranch data paths
    'capture_csv_path': os.path.join(project_root, 'Capture_2025.1.0', 'Capture_Quantity.csv'),
    'species_metadata_path': os.path.join(project_root, 'Capture_2025.1.0', 'CL_FI_SPECIES_GROUPS.csv'),
    'water_area_path': os.path.join(project_root, 'Capture_2025.1.0', 'CL_FI_WATERAREA_GROUPS.csv')
}

print("📋 Configuration:")
for key, value in config.items():
    if isinstance(value, str) and os.path.exists(value):
        print(f"  ✅ {key}: {value}")
    elif isinstance(value, str):
        print(f"  ⚠️  {key}: {value} (file not found)")
    else:
        print(f"  📊 {key}: {value}")


📋 Configuration:
  ✅ skate_csv_path: /home/samwork/Documents/coding/bluecloud-hackathon-2025/deliverable4/data/Skates_Track.csv
  ✅ plankton_netcdf_dir: /home/samwork/Documents/coding/bluecloud-hackathon-2025/deliverable4/data/planktonoutputs
  📊 use_sample_plankton: True
  ✅ output_dir: /home/samwork/Documents/coding/bluecloud-hackathon-2025/deliverable4/data
  ⚠️  model_type: random_forest (file not found)
  📊 random_state: 42
  📊 test_size: 0.2
  ✅ capture_csv_path: /home/samwork/Documents/coding/bluecloud-hackathon-2025/Capture_2025.1.0/Capture_Quantity.csv
  ✅ species_metadata_path: /home/samwork/Documents/coding/bluecloud-hackathon-2025/Capture_2025.1.0/CL_FI_SPECIES_GROUPS.csv
  ✅ water_area_path: /home/samwork/Documents/coding/bluecloud-hackathon-2025/Capture_2025.1.0/CL_FI_WATERAREA_GROUPS.csv


### Data Validation

Let's check if all required data files are available:


In [3]:
# Validate input files
print("🔍 Validating input files...")
print("-" * 30)

# Check skate data
if os.path.exists(config['skate_csv_path']):
    print(f"✅ Skate data found: {config['skate_csv_path']}")
    skate_df = pd.read_csv(config['skate_csv_path'])
    print(f"   📊 Records: {len(skate_df):,}")
    print(f"   📅 Date range: {skate_df['Date'].min()} to {skate_df['Date'].max()}")
else:
    print(f"❌ Skate data not found: {config['skate_csv_path']}")

# Check plankton data
if os.path.exists(config['plankton_netcdf_dir']):
    netcdf_files = [f for f in os.listdir(config['plankton_netcdf_dir']) if f.endswith('.nc')]
    if netcdf_files:
        print(f"✅ Plankton NetCDF files found: {len(netcdf_files)} files")
        config['use_sample_plankton'] = False
    else:
        print(f"⚠️  Plankton directory exists but no .nc files found")
        print(f"   Will use sample plankton data")
else:
    print(f"⚠️  Plankton directory not found: {config['plankton_netcdf_dir']}")
    print(f"   Will use sample plankton data")

# Check elasmobranch data
capture_files = [
    config['capture_csv_path'],
    config['species_metadata_path'],
    config['water_area_path']
]

missing_files = [f for f in capture_files if not os.path.exists(f)]
if missing_files:
    print(f"⚠️  Missing elasmobranch files: {len(missing_files)} files")
    for f in missing_files:
        print(f"   - {os.path.basename(f)}")
else:
    print(f"✅ All elasmobranch data files found")

print(f"\n📁 Output directory: {config['output_dir']}")
os.makedirs(config['output_dir'], exist_ok=True)


🔍 Validating input files...
------------------------------
✅ Skate data found: /home/samwork/Documents/coding/bluecloud-hackathon-2025/deliverable4/data/Skates_Track.csv
   📊 Records: 4,925
   📅 Date range: 2021-08-05 to 2024-08-20
⚠️  Plankton directory exists but no .nc files found
   Will use sample plankton data
✅ All elasmobranch data files found

📁 Output directory: /home/samwork/Documents/coding/bluecloud-hackathon-2025/deliverable4/data


---

## 📊 Data Processing {#data-processing}

Now let's process each dataset step by step.


### 🦈 Skate Data Processing {#skate-data}

Process the skate tracking data to extract movement patterns and statistics.


In [4]:
print("🦈 Processing Skate Data")
print("=" * 30)

# Initialize skate processor
skate_processor = SkateProcessor(config['skate_csv_path'])

# Process the data
skate_data = skate_processor.process()

# Display summary statistics
skate_stats = skate_processor.get_summary()
print(f"\n📊 Skate Data Summary:")
print(f"  Total Records: {skate_stats['stats']['total_records']:,}")
print(f"  Individual Skates: {skate_stats['stats']['unique_skates']}")
print(f"  Date Range: {skate_stats['stats']['date_range'][0].strftime('%Y-%m-%d')} to {skate_stats['stats']['date_range'][1].strftime('%Y-%m-%d')}")
print(f"  Total Distance: {skate_stats['stats']['total_distance']:.2f} degrees")
print(f"  Average Speed: {skate_stats['stats']['avg_speed']:.4f} degrees/day")

# Export processed data
skate_output = os.path.join(config['output_dir'], 'skate_processed.csv')
skate_processor.export_processed_data(skate_output)
print(f"\n💾 Exported processed skate data to: {skate_output}")

# Display first few rows
print(f"\n📋 First 5 rows of processed skate data:")
display(skate_data.head())


🦈 Processing Skate Data
🦈🌊 Starting Skate Data Processing
🦈 Loading skate tracking data...
✅ Loaded 4,925 skate tracking records
📊 Data spans from 2021-08-05 to 2024-08-20
🦈 Tracking 52 individual skates
📅 Adding temporal features...
✅ Temporal features added
🏃‍♂️ Calculating movement metrics...
✅ Movement metrics calculated
🗺️ Adding spatial features...
✅ Spatial features added
🧹 Filtering and cleaning data...
✅ Cleaned data: removed 52 records (1.1%)
📊 Final dataset: 4,873 records
📈 Calculating summary statistics...
✅ Summary statistics calculated

🎯 Skate Processing Summary:
  📊 Total records: 4,873
  🦈 Individual skates: 52
  📅 Date range: 2021-08-05 to 2024-08-20
  🏃‍♂️ Total distance: 388.20 degrees
  ⚡ Average speed: 0.0797 degrees/day

📊 Skate Data Summary:
  Total Records: 4,925
  Individual Skates: 52
  Date Range: 2021-08-05 to 2024-08-20
  Total Distance: 388.20 degrees
  Average Speed: 0.0797 degrees/day
💾 Exported processed data to: /home/samwork/Documents/coding/blueclou

Unnamed: 0,id,Date,Common_name,Latitude,Longitude,year,month,day_of_year,week,day_of_week,...,distance,time_diff,speed,direction,cumulative_distance,center_lat,center_lon,dist_from_center,lat_bin,lon_bin
1,A16746,2021-08-06,Spotted Skate,52.040958,3.280237,2021,8,218,31,4,...,0.061733,1.0,0.061733,-144.374741,0.061733,52.384421,1.546,1.767921,52.0,3.3
2,A16746,2021-08-07,Spotted Skate,52.071792,3.182861,2021,8,219,31,5,...,0.102141,1.0,0.102141,-72.42985,0.163875,52.384421,1.546,1.666448,52.1,3.2
3,A16746,2021-08-08,Spotted Skate,52.119983,3.070554,2021,8,220,31,6,...,0.12221,1.0,0.12221,-66.775917,0.286084,52.384421,1.546,1.547318,52.1,3.1
4,A16746,2021-08-09,Spotted Skate,52.146748,3.001152,2021,8,221,32,0,...,0.074385,1.0,0.074385,-68.910433,0.360469,52.384421,1.546,1.474434,52.1,3.0
5,A16746,2021-08-10,Spotted Skate,52.166211,2.945024,2021,8,222,32,1,...,0.059407,1.0,0.059407,-70.875037,0.419875,52.384421,1.546,1.415939,52.2,2.9


### 🦐 Plankton Data Processing {#plankton-data}

Process plankton distribution data from NetCDF files or use sample data.


In [5]:
print("🦐 Processing Plankton Data")
print("=" * 30)

# Initialize plankton processor
plankton_processor = PlanktonProcessor(
    netcdf_dir=config['plankton_netcdf_dir'],
    netcdf_files=config.get('plankton_netcdf_files', None)
)

# Process the data
plankton_data = plankton_processor.process(
    use_sample_data=config['use_sample_plankton']
)

# Display summary statistics
if plankton_data is not None:
    plankton_stats = plankton_processor.get_summary()
    print(f"\n📊 Plankton Data Summary:")
    print(f"  Total Records: {plankton_stats['stats']['total_records']:,}")
    print(f"  Source Files: {plankton_stats['stats'].get('unique_files', 1)}")
    
    if 'plankton_summary' in plankton_stats['stats']:
        print(f"  Plankton Variables: {len(plankton_stats['stats']['plankton_summary'])}")
    if 'environmental_summary' in plankton_stats['stats']:
        print(f"  Environmental Variables: {len(plankton_stats['stats']['environmental_summary'])}")
    
    # Export processed data
    plankton_output = os.path.join(config['output_dir'], 'plankton_processed.csv')
    plankton_processor.export_processed_data(plankton_output)
    print(f"\n💾 Exported processed plankton data to: {plankton_output}")
    
    # Display first few rows
    print(f"\n📋 First 5 rows of processed plankton data:")
    display(plankton_data.head())
else:
    print("⚠️  No plankton data available")
    plankton_stats = None


🦐 Processing Plankton Data
🦐🌊 Starting Plankton Data Processing
🧪 Creating sample plankton data for testing...
✅ Created sample data: 24,400 records
📅 Adding temporal features...
✅ Added temporal features using column: time
🗺️ Adding spatial features...
✅ Added spatial features using columns: latitude, longitude
📈 Calculating summary statistics...
✅ Calculated statistics for 3 plankton variables
✅ Calculated statistics for 3 environmental variables

🎯 Plankton Processing Summary:
  📊 Total records: 24,400
  📁 Source files: 1
  🗺️ Spatial bounds: Lat 50.0-55.0°N, Lon 1.0-5.0°E
  🦐 Plankton variables: 3
  🌡️ Environmental variables: 3

📊 Plankton Data Summary:
  Total Records: 24,400
  Source Files: 1
  Plankton Variables: 3
  Environmental Variables: 3


💾 Exported processed data to: /home/samwork/Documents/coding/bluecloud-hackathon-2025/deliverable4/data/plankton_processed.csv

💾 Exported processed plankton data to: /home/samwork/Documents/coding/bluecloud-hackathon-2025/deliverable4/data/plankton_processed.csv

📋 First 5 rows of processed plankton data:


Unnamed: 0,latitude,longitude,time,acartia_abundance,calanus_abundance,metridia_abundance,temperature,salinity,nitrate,file_source,year,month,day_of_year,season,lat_bin,lon_bin
0,50.0,1.0,2021-08-01,-18.018651,-301.713785,116.261543,11.381911,35.159451,0.847981,sample_data.nc,2021,8,213,Summer,50.0,1.0
1,50.0,1.0,2021-08-02,-2.365586,-207.552506,81.149997,11.83881,34.737415,1.013057,sample_data.nc,2021,8,214,Summer,50.0,1.0
2,50.0,1.0,2021-08-03,-14.748217,-78.952944,49.933873,13.162795,34.238062,0.751173,sample_data.nc,2021,8,215,Summer,50.0,1.0
3,50.0,1.0,2021-08-04,-14.431437,-49.74858,53.765649,11.644323,35.29886,3.610146,sample_data.nc,2021,8,216,Summer,50.0,1.0
4,50.0,1.0,2021-08-05,-38.473357,-5.322433,82.584861,11.6018,35.47372,0.513887,sample_data.nc,2021,8,217,Summer,50.0,1.0


### 🦈 Elasmobranch Data Processing {#elasmobranch-data}

Process elasmobranch capture data from fisheries databases.


In [6]:
print("🦈 Processing Elasmobranch Data")
print("=" * 35)

# Check if capture data files exist
capture_files = [
    config['capture_csv_path'],
    config['species_metadata_path'],
    config['water_area_path']
]

missing_files = [f for f in capture_files if not os.path.exists(f)]
if missing_files:
    print("⚠️ Missing capture data files:")
    for f in missing_files:
        print(f"  - {f}")
    print("   Skipping elasmobranch processing")
    elasmobranch_data = None
    elasmobranch_stats = None
else:
    # Initialize elasmobranch processor
    elasmobranch_processor = ElasmobranchProcessor(
        capture_csv_path=config['capture_csv_path'],
        species_metadata_path=config['species_metadata_path'],
        water_area_path=config['water_area_path']
    )
    
    # Process elasmobranch data (focus on North Sea - area 27)
    results = elasmobranch_processor.process(
        target_area='27', resolution=1.0
    )
    
    elasmobranch_data = results['elasmobranch_data']
    
    # Display summary statistics
    elasmobranch_stats = elasmobranch_processor.get_summary()
    print(f"\n📊 Elasmobranch Data Summary:")
    print(f"  Total Records: {len(elasmobranch_data):,}")
    print(f"  Species: {elasmobranch_stats['stats'].get('unique_species', 'N/A')}")
    print(f"  Water Areas: {elasmobranch_stats['stats'].get('unique_areas', 'N/A')}")
    
    # Export processed data
    elasmobranch_processor.export_raster_data(config['output_dir'])
    print(f"\n💾 Exported elasmobranch raster data to: {config['output_dir']}")
    
    # Display first few rows
    print(f"\n📋 First 5 rows of processed elasmobranch data:")
    display(elasmobranch_data.head())


🦈 Processing Elasmobranch Data
🦈🌊 Starting Elasmobranch Processing
📊 Loading FAO datasets...
  Loading capture quantity data...
    Loaded 1,055,015 capture records
  Loading species metadata...
    Loaded 13,596 species records
  Loading water area metadata...
    Loaded 29 water area records
✅ Data loading complete!
🦈 Identifying elasmobranch species...
  Found 1144 elasmobranch species
  Species codes: ['SBL', 'NTC', 'BSK', 'CCT', 'THR', 'ALV', 'PTH', 'MAK', 'SMA', 'LMD']...
🔗 Joining capture data with metadata...
  Joined data: 102,553 elasmobranch capture records
  Time range: 1950 - 2023
  Total catch: 74,720,072 tonnes
🌊 Filtering for North Sea region...
  North Sea elasmobranch records: 17,165
  Time range: 1950 - 2023
  Total catch: 6,237,310 tonnes
  Top elasmobranch species in North Sea:
    Rays and skates NEI: 1,852,820 tonnes
    Picked dogfish: 1,773,612 tonnes
    Sharks, rays, skates, etc. NEI: 362,406 tonnes
    Blue shark: 324,222 tonnes
    Various sharks NEI: 304,7

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

---

## 🤖 Machine Learning Model {#ml-model}

Train a machine learning model to predict skate behavior based on environmental conditions.


In [None]:
print("🤖 Training Tiny Model")
print("=" * 25)

if skate_data is None:
    print("❌ Cannot train model: Skate data not available")
    model = None
else:
    # Initialize model
    model = TinyModel(
        model_type=config['model_type'],
        random_state=config['random_state']
    )
    
    # Train the model
    print(f"Training {config['model_type']} model...")
    model.fit(skate_data, plankton_data)
    
    # Display model performance
    model_summary = model.get_summary()
    if 'metrics' in model_summary:
        metrics = model_summary['metrics']
        print(f"\n📊 Model Performance:")
        print(f"  Model Type: {type(model_summary['model']).__name__}")
        print(f"  R² Score: {metrics['r2']:.4f}")
        print(f"  RMSE: {metrics['rmse']:.4f}")
        print(f"  MAE: {metrics['mae']:.4f}")
        print(f"  Features Used: {len(model_summary['feature_names'])}")
    
    # Create performance plots
    if hasattr(model, 'test_data') and model.test_data is not None:
        model.create_performance_plots(
            model.test_data['y'],
            model.predictions,
            output_dir=config['output_dir']
        )
        print(f"\n📈 Model performance plots saved to: {config['output_dir']}")
    
    print(f"\n✅ Model training completed!")


---

## 📊 Visualizations {#visualizations}

Create comprehensive visualizations of the data and analysis results.


In [None]:
print("📊 Creating Visualizations")
print("=" * 30)

# Initialize visualizer
visualizer = Visualizer(output_dir=config['output_dir'])

# Create all visualizations
viz_results = {}

print("Creating skate movement map...")
skate_map = visualizer.create_skate_movement_map(skate_data)
viz_results['skate_map'] = skate_map

if plankton_data is not None:
    print("Creating plankton distribution map...")
    plankton_map = visualizer.create_plankton_distribution_map(plankton_data)
    viz_results['plankton_map'] = plankton_map

print("Creating time series plots...")
time_series = visualizer.create_time_series_plots(skate_data, plankton_data)
viz_results['time_series'] = time_series

print("Creating correlation heatmap...")
correlation = visualizer.create_correlation_heatmap(skate_data, plankton_data)
viz_results['correlation'] = correlation

print("Creating interactive dashboard...")
dashboard = visualizer.create_dashboard(skate_data, plankton_data, model)
viz_results['dashboard'] = dashboard

print("Creating summary report...")
summary_report = visualizer.create_summary_report(skate_data, plankton_data, model)
viz_results['summary_report'] = summary_report

print("Creating enhanced skate map with PNG overlays...")
enhanced_map = visualizer.create_enhanced_skate_map_with_png(skate_data)
viz_results['enhanced_map'] = enhanced_map

print("Creating PNG dashboard...")
png_dashboard = visualizer.create_png_dashboard(skate_data, plankton_data)
viz_results['png_dashboard'] = png_dashboard

print("Creating final integrated map...")
final_map = visualizer.create_integrated_final_map(skate_data, plankton_data)
viz_results['final_integrated_map'] = final_map

print(f"\n✅ All visualizations created and saved to: {config['output_dir']}")
print("\n📋 Generated visualizations:")
for viz_name, viz_path in viz_results.items():
    if viz_path:
        print(f"  📊 {viz_name}: {os.path.basename(viz_path)}")


---

## 📋 Results and Analysis {#results}

Generate a comprehensive analysis report and summary of all results.


In [None]:
print("📋 Generating Analysis Report")
print("=" * 35)

# Generate comprehensive report
report_path = os.path.join(config['output_dir'], 'analysis_report.txt')

with open(report_path, 'w') as f:
    f.write("SKATE-PLANKTON ECOSYSTEM ANALYSIS REPORT\n")
    f.write("=" * 50 + "\n\n")
    f.write(f"Generated on: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n\n")
    
    # Skate data summary
    if skate_data is not None:
        f.write("SKATE DATA SUMMARY:\n")
        f.write(f"  Total Records: {len(skate_data):,}\n")
        f.write(f"  Individual Skates: {skate_data['individual'].nunique()}\n")
        f.write(f"  Date Range: {skate_data['date'].min()} to {skate_data['date'].max()}\n")
        f.write(f"  Geographic Range: Lat {skate_data['latitude'].min():.2f} to {skate_data['latitude'].max():.2f}, ")
        f.write(f"Lon {skate_data['longitude'].min():.2f} to {skate_data['longitude'].max():.2f}\n\n")
    
    # Plankton data summary
    if plankton_data is not None:
        f.write("PLANKTON DATA SUMMARY:\n")
        f.write(f"  Total Records: {len(plankton_data):,}\n")
        f.write(f"  Variables: {len(plankton_data.columns)}\n")
        f.write(f"  Geographic Coverage: Lat {plankton_data['latitude'].min():.2f} to {plankton_data['latitude'].max():.2f}, ")
        f.write(f"Lon {plankton_data['longitude'].min():.2f} to {plankton_data['longitude'].max():.2f}\n\n")
    
    # Elasmobranch data summary
    if elasmobranch_data is not None:
        f.write("ELASMOBRANCH DATA SUMMARY:\n")
        f.write(f"  Total Records: {len(elasmobranch_data):,}\n")
        f.write(f"  Species: {elasmobranch_data['species'].nunique() if 'species' in elasmobranch_data.columns else 'N/A'}\n\n")
    
    # Model performance
    if model is not None:
        model_summary = model.get_summary()
        if 'metrics' in model_summary:
            metrics = model_summary['metrics']
            f.write("MODEL PERFORMANCE:\n")
            f.write(f"  Model Type: {type(model_summary['model']).__name__}\n")
            f.write(f"  R² Score: {metrics['r2']:.4f}\n")
            f.write(f"  RMSE: {metrics['rmse']:.4f}\n")
            f.write(f"  MAE: {metrics['mae']:.4f}\n")
            f.write(f"  Features Used: {len(model_summary['feature_names'])}\n\n")
    
    # Output files
    f.write("OUTPUT FILES:\n")
    f.write("  Data Files:\n")
    f.write("    - skate_processed.csv\n")
    if plankton_data is not None:
        f.write("    - plankton_processed.csv\n")
    f.write("  Visualizations:\n")
    for viz_name, viz_path in viz_results.items():
        if viz_path:
            f.write(f"    - {os.path.basename(viz_path)}\n")

print(f"✅ Analysis report saved to: {report_path}")

# Display the report content
print("\n📄 Analysis Report Content:")
print("-" * 40)
with open(report_path, 'r') as f:
    print(f.read())


### 🎯 Final Results

Display the final results and next steps:


In [None]:
print("🎯 PIPELINE COMPLETED SUCCESSFULLY!")
print("=" * 40)
print(f"📁 Output directory: {config['output_dir']}")
print(f"📊 Skate records processed: {len(skate_data):,}" if skate_data is not None else "📊 Skate records: N/A")
plankton_count = len(plankton_data) if plankton_data is not None else 'N/A'
elasmobranch_count = len(elasmobranch_data) if elasmobranch_data is not None else 'N/A'
print(f"🦐 Plankton records processed: {plankton_count:,}" if isinstance(plankton_count, int) else f"🦐 Plankton records processed: {plankton_count}")
print(f"🦈 Elasmobranch records processed: {elasmobranch_count:,}" if isinstance(elasmobranch_count, int) else f"🦈 Elasmobranch records processed: {elasmobranch_count}")

if model is not None and hasattr(model, 'metrics'):
    print(f"🤖 Model R² score: {model.metrics['r2']:.4f}")

print("\n📋 Generated Files:")
print("  🗺️ Interactive maps and dashboards")
print("  📈 Time series and correlation plots")
print("  📊 Model performance visualizations")
print("  📄 Comprehensive analysis report")

print("\n🔍 Next Steps:")
print("  1. Review the generated visualizations in the output directory")
print("  2. Open the HTML files in a web browser for interactive exploration")
print("  3. Analyze the model performance and consider parameter tuning")
print("  4. Use the processed data for further analysis or modeling")

print(f"\n⏱️ Analysis completed at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")


---

## 📚 Additional Information

### 🔗 Useful Links
- [BlueCloud VLab](https://vlab.blue-cloud.org/)
- [DIVAnd.jl Documentation](https://github.com/gher-ulg/DIVAnd.jl)
- [EcoTaxa Platform](https://ecotaxa.obs-vlfr.fr/)

### 📖 Data Sources
- **Skate Data**: Individual tracking data from tagged skates
- **Plankton Data**: NetCDF files from BlueCloud VLab or sample data
- **Elasmobranch Data**: Fisheries capture data from FAO

### 🛠️ Technical Details
- **Model Types**: Random Forest, Linear Regression, Gradient Boosting
- **Visualization**: Interactive maps using Folium and Plotly
- **Data Processing**: Pandas for data manipulation, NumPy for numerical operations

---

*This notebook was generated as part of the BlueCloud Hackathon 2025 project.*
