# Potential Interest Zone (PIZ) Identification and Scoring

This notebook implements an initial system for identifying and scoring Potential Interest Zones (PIZs) based on the `SITE_PREDICTION_VERIFICATION_STRATEGY.md` and simulated outputs from the Phase 3 EDA notebooks.

**Key Steps:**
1. Load (placeholder/simulated) EDA outputs for LiDAR, Satellite, and Textual data.
2. Define initial PIZs based on spatial proximity/overlap of anomalies from different sources.
3. Implement a heuristic scoring system to rank PIZs.
4. Demonstrate conceptual OpenAI prompt formulation for plausibility assessment of top PIZs.
5. Visualize PIZs on a map.

In [None]:
import configparser
from pathlib import Path
import geopandas
import pandas as pd
from shapely.geometry import Point, Polygon, box
from shapely.ops import unary_union
import matplotlib.pyplot as plt
import numpy as np
import json # For OpenAI prompt structuring

# Helper for pretty printing JSON
def print_json(data):
    print(json.dumps(data, indent=2))

## 1. Configuration and Setup

In [None]:
CONFIG_FILE_PATH = "../scripts/satellite_pipeline/config.ini" # Adjust if your config is elsewhere
SCRIPT_DIR = Path(".").resolve().parent # Assuming notebook is in 'notebooks' dir, so parent is project root
EDA_OUTPUT_DIR_PIZ = SCRIPT_DIR / "eda_outputs" / "piz"
EDA_OUTPUT_DIR_PIZ.mkdir(parents=True, exist_ok=True)

def load_config(config_path):
    config = configparser.ConfigParser(interpolation=None, allow_no_value=True)
    if not Path(config_path).exists():
        raise FileNotFoundError(f"Configuration file '{config_path}' not found.")
    config.read(config_path)
    return config

config = load_config(CONFIG_FILE_PATH)

# Get AOI (used for context and potential gridding)
aoi_bbox_str = config['DEFAULT'].get('aoi_bbox')
aoi_geojson_path_str = config['DEFAULT'].get('aoi_geojson_path')
aoi_boundary_gdf = None # This will be a GeoDataFrame representing the AOI
TARGET_PROJECTED_CRS = config['LIDAR'].get('target_projected_crs', 'EPSG:32620') # Default to a UTM zone if not set

if aoi_geojson_path_str:
    # Construct path relative to project root, assuming config paths are relative to `scripts/satellite_pipeline`
    # Config path: scripts/satellite_pipeline/config.ini -> ../../ -> project root
    # Notebook path: notebooks/piz_identification_scoring.ipynb -> ../ -> project root
    # So, if aoi_geojson_path is like 'path/to/aoi.geojson' it means 'project_root/path/to/aoi.geojson'
    # This logic needs to be robust based on where the config file expects paths to be relative to.
    # For simplicity, assume aoi_geojson_path is relative to project root if not absolute.
    potential_aoi_path = Path(aoi_geojson_path_str)
    if not potential_aoi_path.is_absolute():
        potential_aoi_path = SCRIPT_DIR / potential_aoi_path
    
    if potential_aoi_path.exists():
        aoi_boundary_gdf = geopandas.read_file(potential_aoi_path)
        print(f"Loaded AOI from GeoJSON: {potential_aoi_path}")
    else:
        print(f"AOI GeoJSON path in config not found: {potential_aoi_path}")

if aoi_boundary_gdf is None and aoi_bbox_str: # Fallback to BBOX if GeoJSON not loaded
    coords = [float(c.strip()) for c in aoi_bbox_str.split(',')]
    minx, miny, maxx, maxy = coords
    aoi_boundary_gdf = geopandas.GeoDataFrame([{'geometry': box(minx, miny, maxx, maxy)}], crs="EPSG:4326")
    print(f"Using AOI from BBOX (EPSG:4326): {coords}")
elif aoi_boundary_gdf is None:
    print("CRITICAL: No AOI geometry defined in config. Using a placeholder AOI.")
    # Placeholder AOI (e.g., a 10km x 10km square in a projected CRS for simplicity if no config AOI)
    # This is just for the notebook to run; real analysis needs a proper AOI.
    aoi_boundary_gdf = geopandas.GeoDataFrame([{'geometry': box(0, 0, 10000, 10000)}], crs=TARGET_PROJECTED_CRS)

# Ensure AOI is in the target projected CRS for spatial operations
if aoi_boundary_gdf.crs.to_string().lower() != TARGET_PROJECTED_CRS.lower():
    print(f"Reprojecting AOI from {aoi_boundary_gdf.crs} to {TARGET_PROJECTED_CRS}")
    aoi_boundary_gdf = aoi_boundary_gdf.to_crs(TARGET_PROJECTED_CRS)

print(f"AOI CRS: {aoi_boundary_gdf.crs}")
aoi_total_bounds = aoi_boundary_gdf.total_bounds # (minx, miny, maxx, maxy)
print(f"AOI total bounds (in {aoi_boundary_gdf.crs}): {aoi_total_bounds}")

## 2. Load (Placeholder) EDA Outputs

In a real workflow, these would be outputs from the Phase 3 EDA notebooks (`lidar_eda.ipynb`, `satellite_eda.ipynb`, `textual_eda_openai.ipynb`). For this implementation, we'll use manually curated placeholder data. Each anomaly will be represented as a point or polygon with some basic attributes.

In [None]:
# Placeholder EDA Outputs - Assume these are GeoDataFrames in the TARGET_PROJECTED_CRS

# LiDAR Anomalies (e.g., mounds, linear features)
# Attributes: 'lidar_clarity' (1-5), 'feature_type' (e.g., 'mound', 'linear_depression')
lidar_anomalies_data = {
    'geometry': [
        Point(aoi_total_bounds[0] + 1000, aoi_total_bounds[1] + 1000), # Anomaly 1
        box(aoi_total_bounds[0] + 2000, aoi_total_bounds[1] + 2000, 
            aoi_total_bounds[0] + 2100, aoi_total_bounds[1] + 2300)  # Anomaly 2 (linear/rectangular)
    ],
    'lidar_clarity': [4, 5], # Score 1-5
    'lidar_feature_type': ['point_mound', 'linear_earthwork']
}
lidar_anomalies_gdf = geopandas.GeoDataFrame(lidar_anomalies_data, crs=TARGET_PROJECTED_CRS)
print(f"Loaded {len(lidar_anomalies_gdf)} LiDAR anomalies.")

# Satellite Anomalies (e.g., unusual NDVI, geometric vegetation pattern)
# Attributes: 'satellite_significance' (1-5), 'anomaly_type' (e.g., 'ndvi_low', 'veg_pattern_geometric')
satellite_anomalies_data = {
    'geometry': [
        Point(aoi_total_bounds[0] + 1050, aoi_total_bounds[1] + 1050), # Near LiDAR Anomaly 1
        Point(aoi_total_bounds[0] + 3000, aoi_total_bounds[1] + 3000)  # Standalone satellite anomaly
    ],
    'satellite_significance': [3, 4],
    'satellite_anomaly_type': ['ndvi_low_circular', 'veg_pattern_geometric']
}
satellite_anomalies_gdf = geopandas.GeoDataFrame(satellite_anomalies_data, crs=TARGET_PROJECTED_CRS)
print(f"Loaded {len(satellite_anomalies_gdf)} Satellite anomalies.")

# Textual Mentions (Geocoded points of interest from text)
# Attributes: 'textual_reliability' (1-5), 'mention_type' (e.g., 'settlement_described', 'resource_area')
textual_mentions_data = {
    'geometry': [
        Point(aoi_total_bounds[0] + 950, aoi_total_bounds[1] + 950),   # Near LiDAR Anomaly 1 & Sat Anomaly 1
        Point(aoi_total_bounds[0] + 4000, aoi_total_bounds[1] + 4000)  # Standalone textual mention
    ],
    'textual_reliability': [4, 2],
    'textual_mention_type': ['settlement_possible_ruins', 'general_region_activity_X']
}
textual_mentions_gdf = geopandas.GeoDataFrame(textual_mentions_data, crs=TARGET_PROJECTED_CRS)
print(f"Loaded {len(textual_mentions_gdf)} Textual mentions.")

# --- (Optional) Load other relevant features like water sources ---
# Example: rivers_gdf = geopandas.read_file("path/to/rivers.shp").to_crs(TARGET_PROJECTED_CRS)
# For now, we'll simulate this if needed in scoring.
water_sources_data = {
    'geometry': [Point(aoi_total_bounds[0] + 1500, aoi_total_bounds[1] + 1500)],
    'water_type': ['river_segment']
}
water_sources_gdf = geopandas.GeoDataFrame(water_sources_data, crs=TARGET_PROJECTED_CRS)
print(f"Loaded {len(water_sources_gdf)} water sources (placeholder).")

## 3. Define Potential Interest Zones (PIZs)

We'll use a simple approach: create buffers around each anomaly and then find areas where these buffers overlap (intersections). An alternative is gridding the AOI, which is also discussed.

### 3.1. PIZ Definition by Proximity/Overlap of Anomalies

In [None]:
BUFFER_DISTANCE_METERS = 200 # Define a buffer distance (e.g., 200 meters)

# Create buffers
lidar_buffers = lidar_anomalies_gdf.copy()
lidar_buffers['geometry'] = lidar_anomalies_gdf.geometry.buffer(BUFFER_DISTANCE_METERS)
lidar_buffers['source'] = 'lidar'

satellite_buffers = satellite_anomalies_gdf.copy()
satellite_buffers['geometry'] = satellite_anomalies_gdf.geometry.buffer(BUFFER_DISTANCE_METERS)
satellite_buffers['source'] = 'satellite'

textual_buffers = textual_mentions_gdf.copy()
textual_buffers['geometry'] = textual_mentions_gdf.geometry.buffer(BUFFER_DISTANCE_METERS * 1.5) # Larger buffer for less precise textual data
textual_buffers['source'] = 'textual'

# Combine all buffered anomalies into one GeoDataFrame
all_buffers = pd.concat([lidar_buffers, satellite_buffers, textual_buffers], ignore_index=True)

# --- Create PIZs by finding intersections of buffers from DIFFERENT sources ---
# This is a simplified approach. A more robust way would be to iterate and intersect pairs, 
# or use spatial clustering on original points, then evaluate sources within clusters.
# For now, let's find areas covered by at least two different types of buffers by dissolving overlapping buffers.

# Dissolve all overlapping buffers to get initial candidate zones
dissolved_zones = all_buffers.dissolve(by=None) # 'by=None' dissolves all into one if they overlap
dissolved_zones['piz_id'] = range(len(dissolved_zones))

piz_gdf = geopandas.GeoDataFrame(columns=['piz_id', 'geometry', 'contributing_sources', 'lidar_features', 'satellite_features', 'textual_features'], crs=TARGET_PROJECTED_CRS)

temp_piz_list = []

for idx, dissolved_zone in dissolved_zones.iterrows():
    intersecting_original_anomalies = []
    sources_in_zone = set()
    
    # Check original anomalies (not buffers) within this dissolved zone
    # LiDAR
    lidar_intersect = lidar_anomalies_gdf[lidar_anomalies_gdf.geometry.intersects(dissolved_zone.geometry)]
    if not lidar_intersect.empty:
        sources_in_zone.add('lidar')
        intersecting_original_anomalies.extend(lidar_intersect.apply(lambda row: f"L: {row.lidar_feature_type} (Clarity: {row.lidar_clarity})", axis=1).tolist())
        
    # Satellite
    satellite_intersect = satellite_anomalies_gdf[satellite_anomalies_gdf.geometry.intersects(dissolved_zone.geometry)]
    if not satellite_intersect.empty:
        sources_in_zone.add('satellite')
        intersecting_original_anomalies.extend(satellite_intersect.apply(lambda row: f"S: {row.satellite_anomaly_type} (Sig: {row.satellite_significance})", axis=1).tolist())

    # Textual
    textual_intersect = textual_mentions_gdf[textual_mentions_gdf.geometry.intersects(dissolved_zone.geometry)]
    if not textual_intersect.empty:
        sources_in_zone.add('textual')
        intersecting_original_anomalies.extend(textual_intersect.apply(lambda row: f"T: {row.textual_mention_type} (Rel: {row.textual_reliability})", axis=1).tolist())

    # Only consider zones with evidence from at least two sources, or very strong single source (for this demo, use 2)
    if len(sources_in_zone) >= 1: # For demo, let's use 1 to see all zones, scoring will differentiate
        temp_piz_list.append({
            'piz_id': dissolved_zone.piz_id,
            'geometry': dissolved_zone.geometry,
            'num_sources': len(sources_in_zone),
            'contributing_sources': ", ".join(sorted(list(sources_in_zone))),
            'lidar_features': [f['lidar_feature_type'] for i,f in lidar_intersect.iterrows()] if not lidar_intersect.empty else [],
            'lidar_clarity_max': lidar_intersect['lidar_clarity'].max() if not lidar_intersect.empty else 0,
            'satellite_features': [f['satellite_anomaly_type'] for i,f in satellite_intersect.iterrows()] if not satellite_intersect.empty else [],
            'satellite_significance_max': satellite_intersect['satellite_significance'].max() if not satellite_intersect.empty else 0,
            'textual_features': [f['textual_mention_type'] for i,f in textual_intersect.iterrows()] if not textual_intersect.empty else [],
            'textual_reliability_max': textual_intersect['textual_reliability'].max() if not textual_intersect.empty else 0,
            'all_intersecting_features_desc': "; ".join(intersecting_original_anomalies)
        })

piz_gdf = geopandas.GeoDataFrame(temp_piz_list, crs=TARGET_PROJECTED_CRS)
piz_gdf = piz_gdf[piz_gdf.num_sources >=1] # Filter for PIZs with at least 1 source for this demo
piz_gdf['piz_id'] = range(len(piz_gdf)) # Re-ID after filtering

print(f"Identified {len(piz_gdf)} initial PIZs with >=1 source types.")
if not piz_gdf.empty:
    display(piz_gdf[['piz_id', 'num_sources', 'contributing_sources', 'all_intersecting_features_desc']].head())

### 3.2. PIZ Definition by Gridding (Alternative/Conceptual)

1.  Create a grid over the AOI (e.g., 100m x 100m cells).
2.  For each cell, count LiDAR anomalies, satellite anomalies, and textual mentions falling within it.
3.  Cells with high counts or a mix of anomaly types become PIZs.

```python
# # Conceptual Gridding Example (not fully implemented for brevity)
# GRID_CELL_SIZE = 100 # meters
# minx, miny, maxx, maxy = aoi_total_bounds
# grid_cells = []
# for x in np.arange(minx, maxx, GRID_CELL_SIZE):
#     for y in np.arange(miny, maxy, GRID_CELL_SIZE):
#         cell = box(x, y, x + GRID_CELL_SIZE, y + GRID_CELL_SIZE)
#         grid_cells.append({'geometry': cell})
# grid_gdf = geopandas.GeoDataFrame(grid_cells, crs=TARGET_PROJECTED_CRS)
# grid_gdf['cell_id'] = range(len(grid_gdf))
# 
# # Perform spatial joins to count anomalies per cell
# # lidar_in_grid = geopandas.sjoin(grid_gdf, lidar_anomalies_gdf, how='left', op='intersects')
# # ... and so on for other sources ...
# # Then group by cell_id and count/score.
```
The buffer/overlap method is generally more direct for feature-driven PIZ creation.

## 4. Implement Heuristic Scoring System

In [None]:
# Define weights for scoring factors (can be tuned)
weights = {
    'num_sources': 3.0,        # e.g., 1 source=1, 2 sources=2, 3 sources=3
    'lidar_clarity': 2.0,      # Max score from LiDAR anomalies in PIZ (1-5 scale)
    'satellite_significance': 2.0, # Max score from Satellite anomalies (1-5 scale)
    'textual_reliability': 1.5, # Max score from Textual mentions (1-5 scale)
    'proximity_to_water': 0.5, # Bonus if PIZ is near water (binary 0 or 1, or scaled by distance)
    'uniqueness_factor': 1.0 # Placeholder for future feature, e.g. how rare the pattern is
}

def calculate_piz_score(row):
    score = 0
    
    # Score from number of unique sources
    score += row['num_sources'] * weights['num_sources']
    
    # Score from LiDAR (using max clarity of features within PIZ)
    if 'lidar' in row['contributing_sources']:
        score += row['lidar_clarity_max'] * weights['lidar_clarity']
        
    # Score from Satellite (using max significance)
    if 'satellite' in row['contributing_sources']:
        score += row['satellite_significance_max'] * weights['satellite_significance']
        
    # Score from Textual (using max reliability)
    if 'textual' in row['contributing_sources']:
        score += row['textual_reliability_max'] * weights['textual_reliability']
        
    # (Optional) Proximity to water (example)
    # This requires water_sources_gdf to be defined and populated
    if not water_sources_gdf.empty:
        # Check if the PIZ (its geometry: row.geometry) intersects with any buffered water source
        # A more sophisticated approach would be distance-based scaling.
        WATER_BUFFER_METERS = 100 # Considered 'near water' if within this distance
        if any(row.geometry.intersects(water_buffer) for water_buffer in water_sources_gdf.geometry.buffer(WATER_BUFFER_METERS)):
            score += 1.0 * weights['proximity_to_water'] # Binary bonus
            
    # Add uniqueness factor (placeholder)
    # score += (row.get('uniqueness_score', 0) * weights['uniqueness_factor'])
            
    return score

if not piz_gdf.empty:
    piz_gdf['score'] = piz_gdf.apply(calculate_piz_score, axis=1)
    piz_gdf_sorted = piz_gdf.sort_values(by='score', ascending=False)
    
    print("\n--- Top Scored PIZs ---")
    display(piz_gdf_sorted[['piz_id', 'score', 'num_sources', 'contributing_sources', 'all_intersecting_features_desc']].head())
else:
    print("No PIZs identified to score.")
    piz_gdf_sorted = piz_gdf # Keep it as an empty GeoDataFrame

## 5. OpenAI for Plausibility Assessment (Conceptual Integration)

For the top-scoring PIZs, we can formulate a prompt to send to an OpenAI model to get a qualitative assessment of archaeological plausibility, as outlined in `SITE_PREDICTION_VERIFICATION_STRATEGY.md`.

In [None]:
def formulate_openai_plausibility_prompt(piz_row):
    evidence_summary = []
    if 'lidar' in piz_row['contributing_sources']:
        evidence_summary.append(f"- LiDAR evidence: Max clarity score {piz_row['lidar_clarity_max']}. Features include: {', '.join(piz_row['lidar_features'])}.")
    if 'satellite' in piz_row['contributing_sources']:
        evidence_summary.append(f"- Satellite imagery evidence: Max significance score {piz_row['satellite_significance_max']}. Features include: {', '.join(piz_row['satellite_features'])}.")
    if 'textual' in piz_row['contributing_sources']:
        evidence_summary.append(f"- Textual evidence: Max reliability score {piz_row['textual_reliability_max']}. Mentions include: {', '.join(piz_row['textual_features'])}.")
    
    # Get centroid for representative coordinates (in original projected CRS for now)
    # For actual reporting, might want to convert to WGS84 (Lat/Lon)
    centroid = piz_row.geometry.centroid
    coords_str = f"{centroid.x:.2f}, {centroid.y:.2f} (CRS: {piz_gdf.crs.to_string()})"

    prompt = f"""Assess the archaeological plausibility of the following Potential Interest Zone (PIZ):

PIZ ID: {piz_row['piz_id']}
Approximate Coordinates: {coords_str}
Total Score (heuristic): {piz_row['score']:.2f}
Number of contributing data source types: {piz_row['num_sources']}

Summary of Evidence:
"""
    prompt += "\n".join(evidence_summary)
    prompt += """

Based on this combined evidence:
1. What type of archaeological site or features might this represent in an Amazonian context?
2. What are common characteristics of such sites in the Amazon?
3. Are there any alternative (non-archaeological) explanations for these combined features?
4. What specific aspects of the evidence make this PIZ more or less plausible?
5. What further investigation steps (using available data types or suggesting new ones) would you recommend to clarify the nature of this PIZ?

Provide a concise assessment.
"""
    return prompt

if not piz_gdf_sorted.empty:
    print("\n--- Example OpenAI Plausibility Prompts for Top PIZs ---")
    for i, (idx, row) in enumerate(piz_gdf_sorted.head(2).iterrows()): # Show for top 2
        print(f"\n--- Prompt for PIZ ID: {row['piz_id']} ---")
        example_prompt = formulate_openai_plausibility_prompt(row)
        print(example_prompt)
        # In a real scenario, this prompt would be sent to an OpenAI model:
        # client = OpenAI() # Assuming client is initialized
        # response = client.chat.completions.create(
        #     model="gpt-4-turbo-preview", # Or other suitable model
        #     messages=[
        #         {"role": "system", "content": "You are an AI assistant with expertise in Amazonian archaeology and multi-source data interpretation."},
        #         {"role": "user", "content": example_prompt}
        #     ]
        # )
        # plausibility_assessment = response.choices[0].message.content
        # print(f"\nOpenAI Plausibility Assessment for PIZ {row['piz_id']}:\n{plausibility_assessment}")
else:
    print("No PIZs available to generate OpenAI prompts.")

## 6. Visualization of PIZs

In [None]:
if not piz_gdf_sorted.empty:
    fig, ax = plt.subplots(1, 1, figsize=(15, 15))
    
    # Plot AOI boundary
    aoi_boundary_gdf.plot(ax=ax, facecolor='none', edgecolor='black', linewidth=2, label='AOI Boundary', zorder=1)
    
    # Plot original anomalies for context (optional, can make plot busy)
    lidar_anomalies_gdf.plot(ax=ax, marker='o', color='blue', markersize=50, label='LiDAR Anomalies', alpha=0.7, zorder=2)
    satellite_anomalies_gdf.plot(ax=ax, marker='^', color='green', markersize=50, label='Satellite Anomalies', alpha=0.7, zorder=2)
    textual_mentions_gdf.plot(ax=ax, marker='s', color='purple', markersize=50, label='Textual Mentions', alpha=0.7, zorder=2)
    
    # Plot PIZs, color-coded by score (or number of sources)
    # Ensure 'score' column exists and has numeric data
    if 'score' in piz_gdf_sorted.columns and pd.api.types.is_numeric_dtype(piz_gdf_sorted['score']):
        piz_gdf_sorted.plot(ax=ax, column='score', cmap='OrRd', alpha=0.6, 
                            legend=True, legend_kwds={'label': "PIZ Score"}, zorder=3)
    else:
        piz_gdf_sorted.plot(ax=ax, facecolor='red', alpha=0.5, label='PIZs (Unscored)', zorder=3) # Fallback if no score
        
    # Add labels for PIZ IDs
    for idx, row in piz_gdf_sorted.iterrows():
        if row.geometry.centroid.is_valid:
            plt.text(row.geometry.centroid.x, row.geometry.centroid.y, 
                     s=f"ID:{row.piz_id}\nS:{row.score:.1f}", 
                     fontsize=8, ha='center', va='center', color='black')
        
    plt.title(f'Potential Interest Zones (PIZs) within AOI (CRS: {piz_gdf.crs.to_string()})')
    plt.xlabel("Easting")
    plt.ylabel("Northing")
    plt.legend()
    plt.grid(True, linestyle='--', alpha=0.7)
    fig.tight_layout()
    plt.savefig(EDA_OUTPUT_DIR_PIZ / "piz_visualization_with_scores.png")
    plt.show()
else:
    print("No PIZs to visualize.")

# Save PIZ GeoDataFrame to a file (e.g., GeoJSON)
if not piz_gdf_sorted.empty:
    piz_output_path = EDA_OUTPUT_DIR_PIZ / "piz_ranked_list.geojson"
    try:
        piz_gdf_sorted.to_file(piz_output_path, driver="GeoJSON")
        print(f"Saved ranked PIZs to: {piz_output_path}")
    except Exception as e:
        print(f"Error saving PIZs to GeoJSON: {e}")
        # Fallback to CSV if GeoJSON fails for some reason (e.g. complex geometries not handled by driver)
        try:
            piz_df_for_csv = pd.DataFrame(piz_gdf_sorted.drop(columns='geometry'))
            piz_df_for_csv['wkt_geometry'] = piz_gdf_sorted.geometry.to_wkt()
            csv_output_path = EDA_OUTPUT_DIR_PIZ / "piz_ranked_list.csv"
            piz_df_for_csv.to_csv(csv_output_path, index=False)
            print(f"Saved ranked PIZs (geometry as WKT) to CSV: {csv_output_path}")
        except Exception as e_csv:
            print(f"Error saving PIZs to CSV: {e_csv}")

## 7. Summary and Next Steps

This notebook outlined an initial system for PIZ identification and scoring:
1.  **PIZ Definition:** Used a buffer-and-overlap approach on placeholder EDA outputs (LiDAR, Satellite, Textual anomalies) to define PIZs where evidence from multiple sources converges. A gridding approach was also conceptually mentioned.
2.  **Heuristic Scoring:** Implemented a scoring system based on weights and factors like the number of confirming data sources, clarity/significance of anomalies from each source, and (optionally) proximity to features like water. This produced a ranked list of PIZs.
3.  **OpenAI Integration (Conceptual):** Showed how prompts could be formulated for top-ranked PIZs to leverage LLMs for plausibility assessment and hypothesis refinement. Actual API calls were not made in this notebook but the structure is provided.
4.  **Visualization:** PIZs were visualized on a map, color-coded by score, along with the original anomalies and AOI boundary.

**Next Steps & Refinements:**
*   **Integrate Real EDA Outputs:** Replace placeholder anomaly data with actual outputs from the Phase 3 EDA notebooks. This will involve standardizing the format of those outputs (e.g., GeoJSON files for detected features with relevant attributes).
*   **Refine PIZ Definition Logic:** Explore more sophisticated spatial clustering algorithms (e.g., DBSCAN on anomaly points) or the gridding approach in more detail for PIZ definition.
*   **Tune Scoring System:** The weights and scoring parameters are initial estimates. They should be iteratively tuned based on domain expertise and feedback from verification efforts. Consider adding more nuanced parameters (e.g., size/shape of anomalies, specific feature types from text like 'earthwork' vs 'general settlement').
*   **Automate OpenAI Interaction:** For a larger number of PIZs, the OpenAI prompt generation and API calls could be automated, with results stored alongside PIZ data.
*   **Incorporate More Data Layers:** Integrate other relevant spatial data if available (e.g., geological maps, soil type maps, historical maps, known archaeological site distributions for context if permitted).
*   **Verification Feedback Loop:** As top PIZs are verified (Phase 4 verification strategies), use the results to validate and improve the scoring model and PIZ identification criteria.