# Test: WHISP Concurrent & Non-Concurrent Processing

Testing new concurrent and non-concurrent stats processing functions with proper logging, progress tracking, and endpoint validation.

**Test Coverage:**
- Concurrent processing (high-volume endpoint)
- Non-concurrent processing (standard endpoint)
- Logging and progress display
- Endpoint validation
- Client-side metadata extraction
- Error handling and retries

## Part 1: Setup

Initialize Earth Engine and configure logging

In [1]:
import ee

# Reset Earth Engine completely
ee.Reset()
print("✅ Earth Engine reset")

✅ Earth Engine reset


## Part 2: CONCURRENT PROCESSING (High-Volume Endpoint)

Test concurrent processing with the high-volume endpoint

In [2]:
# Earth Engine initialization with HIGH-VOLUME endpoint
import ee
from pathlib import Path

try:
    ee.Initialize(opt_url='https://earthengine-highvolume.googleapis.com')
    print("✅ Initialized with high-volume endpoint")
except Exception:
    ee.Authenticate()
    ee.Initialize(opt_url='https://earthengine-highvolume.googleapis.com')
    print("✅ Authenticated and initialized with high-volume endpoint")

✅ Initialized with high-volume endpoint


In [None]:
# Verify endpoint is high-volume
api_url = str(ee.data._cloud_api_base_url)
if 'highvolume' in api_url:
    print("✅ Using HIGH-VOLUME endpoint")
else:
    print("❌ WARNING: Not using high-volume endpoint!")

EE Cloud API Base URL: https://earthengine-highvolume.googleapis.com
EE API Base URL: https://earthengine-highvolume.googleapis.com/api
✅ Using HIGH-VOLUME endpoint


In [4]:
import openforis_whisp as whisp
import logging
from openforis_whisp.concurrent_stats import (
    setup_concurrent_logger,
    validate_ee_endpoint,
    whisp_concurrent_stats_geojson_to_df,
    check_ee_endpoint,
)

print("✅ Imported concurrent stats module")

✅ Imported concurrent stats module


In [None]:
# Setup logging for concurrent processing
logger = setup_concurrent_logger(level=logging.INFO)
logger.info("Logging configured")

INFO: Logging configured for concurrent processing (DEBUG level)


In [7]:
num_polygons=10  # Smaller dataset for testing
min_area_ha=10 
max_area_ha=10 
min_number_vert=10     
max_number_vert=10   

In [8]:
# Generate test data (or use your own GeoJSON)
import geopandas as gpd
import json
import tempfile
import os
import io
from contextlib import redirect_stdout

# Generate test polygons
geom = (ee.FeatureCollection("projects/sat-io/open-datasets/FAO/GAUL/GAUL_2024_L1")
    .filter(ee.Filter.eq('gaul0_name', 'Austria')).geometry().bounds()
)

# Suppress GeoJSON generation messages
with redirect_stdout(io.StringIO()):
    random_geojson = whisp.generate_test_polygons(
        bounds=geom, 
        num_polygons=num_polygons,  # Smaller dataset for testing
        min_area_ha=min_area_ha, 
        max_area_ha=max_area_ha, 
        min_number_vert=min_number_vert,     
        max_number_vert=max_number_vert     
    )

# Save to temporary file
temp_fd, concurrent_geojson_path = tempfile.mkstemp(suffix='.geojson', text=True)
os.close(temp_fd)
with open(concurrent_geojson_path, 'w') as f:
    json.dump(random_geojson, f)

print(f"✅ Generated test GeoJSON with {len(random_geojson['features'])} features")
print(f"   Saved to: {concurrent_geojson_path}")


[utils.py | generate_test_polygons() | l.378] INFO: Extracting bounds from Earth Engine Geometry...
[utils.py | generate_test_polygons() | l.391] INFO: Bounds: [9.53, 46.37, 17.16, 49.02]
[utils.py | generate_test_polygons() | l.419] INFO: Generating 10 test polygons with 10-10 vertices...
[utils.py | generate_test_polygons() | l.467] INFO: Generated 10 polygons!
[utils.py | generate_test_polygons() | l.473] INFO: Vertex count - Requested: 10-10, Actual: 10-10
[utils.py | generate_test_polygons() | l.481] INFO: Area (ha) - Requested: 10.0-10.0, Actual: 8.6-10.3
[utils.py | generate_test_polygons() | l.391] INFO: Bounds: [9.53, 46.37, 17.16, 49.02]
[utils.py | generate_test_polygons() | l.419] INFO: Generating 10 test polygons with 10-10 vertices...
[utils.py | generate_test_polygons() | l.467] INFO: Generated 10 polygons!
[utils.py | generate_test_polygons() | l.473] INFO: Vertex count - Requested: 10-10, Actual: 10-10
[utils.py | generate_test_polygons() | l.481] INFO: Area (ha) - Req

In [None]:
# Create Whisp image with national codes
iso2_codes = ['br','co','ci']

whisp_image = whisp.combine_datasets(national_codes=iso2_codes)
band_names = whisp_image.bandNames().getInfo()
print(f"✅ Created Whisp image with {len(band_names)} bands")
print(f"   Bands: {', '.join(band_names[:5])}..." if len(band_names) > 5 else f"   Bands: {', '.join(band_names)}")

plot_id_column = 'plotId' (type: <class 'str'>)


In [None]:
# Test concurrent: GeoJSON → DataFrame with automatic formatting
print("\n" + "="*70)
print("TEST 1: Concurrent GeoJSON → DataFrame (Formatted)")
print("="*70 + "\n")

try:
    df_concurrent = whisp.whisp_concurrent_formatted_stats_geojson_to_df(
        input_geojson_filepath=concurrent_geojson_path,
        national_codes=iso2_codes,
        batch_size=10,
        max_concurrent=20,
        validate_geometries=True,
        add_metadata_server=False,
        logger=logger,
    )
    
    print(f"\n✅ SUCCESS: Concurrent processing complete!")
    print(f"   Processed: {df_concurrent.shape[0]} features")
    print(f"   Output columns: {df_concurrent.shape[1]}")
    print(f"\n   First row sample:")
    print(df_concurrent.iloc[0, :8])
    
except Exception as e:
    print(f"❌ ERROR: {str(e)}")
    import traceback
    traceback.print_exc()


TEST 1: Concurrent GeoJSON → DataFrame

DEBUG: Using decimal_places=3 from config
DEBUG: Step 1/2: Extracting statistics (concurrent)...
DEBUG: Using decimal_places=3 from config
INFO: Loading GeoJSON: C:\Users\Arnell\AppData\Local\Temp\tmp3ufbylpb.geojson
INFO: Loaded 10 features
DEBUG: Validation complete: 10 geometries ready
DEBUG: Creating Whisp image...
DEBUG: Step 1/2: Extracting statistics (concurrent)...
DEBUG: Using decimal_places=3 from config
INFO: Loading GeoJSON: C:\Users\Arnell\AppData\Local\Temp\tmp3ufbylpb.geojson
INFO: Loaded 10 features
DEBUG: Validation complete: 10 geometries ready
DEBUG: Creating Whisp image...
INFO: Processing 10 features in 1 batches
INFO: Processing 10 features in 1 batches


2025-10-30 17:44:14,673 - INFO - Created 10 records


Reading GeoJSON file from: C:\Users\Arnell\AppData\Local\Temp\tmpcwf_kxzv.geojson
DEBUG: Batch 0: EE returned columns: ['geo', 'Area_median', 'Area_sum', 'Cocoa_2023_FDaP_median', 'Cocoa_2023_FDaP_sum', 'Cocoa_ETH_median', 'Cocoa_ETH_sum', 'Cocoa_FDaP_median', 'Cocoa_FDaP_sum', 'Coffee_FDaP_2023_median', 'Coffee_FDaP_2023_sum', 'Coffee_FDaP_median', 'Coffee_FDaP_sum', 'ESA_TC_2020_median', 'ESA_TC_2020_sum', 'ESA_fire_2001_median', 'ESA_fire_2001_sum', 'ESA_fire_2002_median', 'ESA_fire_2002_sum', 'ESA_fire_2003_median', 'ESA_fire_2003_sum', 'ESA_fire_2004_median', 'ESA_fire_2004_sum', 'ESA_fire_2005_median', 'ESA_fire_2005_sum', 'ESA_fire_2006_median', 'ESA_fire_2006_sum', 'ESA_fire_2007_median', 'ESA_fire_2007_sum', 'ESA_fire_2008_median', 'ESA_fire_2008_sum', 'ESA_fire_2009_median', 'ESA_fire_2009_sum', 'ESA_fire_2010_median', 'ESA_fire_2010_sum', 'ESA_fire_2011_median', 'ESA_fire_2011_sum', 'ESA_fire_2012_median', 'ESA_fire_2012_sum', 'ESA_fire_2013_median', 'ESA_fire_2013_sum', 'ES

In [15]:
df_concurrent

Unnamed: 0,plotId,external_id,Area,Geometry_type,Country,ProducerCountry,Admin_Level_1,Centroid_lon,Centroid_lat,Unit,...,nBR_MapBiomas_col9_palmoil_2020,nBR_MapBiomas_col9_pc_2020,nBR_INPE_TCamz_cer_annual_2020,nBR_MapBiomas_col9_soy_2020,nBR_MapBiomas_col9_annual_crops_2020,nBR_INPE_TCamz_pasture_2020,nBR_INPE_TCcer_pasture_2020,nBR_MapBiomas_col9_pasture_2020,nCI_Cocoa_bnetd,geo
0,1,,12.637,Polygon,AUT,AT,Steiermark,15.364266,46.905772,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[15.36196..."
1,2,,13.23,Polygon,DEU,DE,Baden-Württemberg,9.998846,48.557983,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[9.996565..."
2,3,,14.215,Polygon,DEU,DE,Bayern,10.580957,47.699715,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[10.57850..."
3,4,,13.104,Polygon,ITA,IT,Lombardia,10.272653,46.403059,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[10.27017..."
4,5,,15.069,Polygon,AUT,AT,Kärnten,13.22301,46.86169,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[13.22038..."
5,6,,13.861,Polygon,HUN,HU,Zala,17.078263,46.47277,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[17.07589..."
6,7,,15.034,Polygon,DEU,DE,Bayern,12.302862,48.348745,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[12.30039..."
7,8,,15.106,Polygon,AUT,AT,Steiermark,15.671636,47.124828,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[15.66909..."
8,9,,14.283,Polygon,ITA,IT,Trentino-Alto Adige,11.131155,46.693512,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[11.12882..."
9,10,,14.512,Polygon,CHE,CH,Graubünden,9.704624,46.948802,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[9.702355..."


In [None]:
# Summary of concurrent processing results
print("\n" + "="*70)
print("RESULTS SUMMARY")
print("="*70)
print(f"\nDataFrame shape: {df_concurrent.shape}")
print(f"  • Rows (features): {df_concurrent.shape[0]}")
print(f"  • Columns: {df_concurrent.shape[1]}")
print(f"\nKey columns: {', '.join(df_concurrent.columns.tolist()[:10])}")
print(f"\nplotId sequence: {df_concurrent['plotId'].tolist()}")
print(f"\n✅ Concurrent processing pipeline working correctly!")

CONCURRENT PROCESSING RESULTS

DataFrame shape: (10, 207)
Rows: 10 | Columns: 207

First few columns: ['plotId', 'external_id', 'Area', 'Geometry_type', 'Country', 'ProducerCountry', 'Admin_Level_1', 'Centroid_lon', 'Centroid_lat', 'Unit']

plotId values: ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']

✅ SUCCESS: Concurrent processing completed without errors!
