# Base Graph Creation from SpatiaLite

This notebook demonstrates the end-to-end process of creating a basic maritime navigation graph using S-57 data stored in a local SpatiaLite database file.

The workflow covers:
1.  Defining an area of interest (AOI) between two ports.
2.  Filtering Electronic Navigational Charts (ENCs) that cover the AOI.
3.  Generating a navigable sea grid from the ENC data.
4.  Constructing a `networkx` graph from the grid.
5.  Performing a basic pathfinding operation on the resulting graph.

---

## Required Data

This notebook requires:
1. **ENC Data**: S-57 charts converted to SpatiaLite format
2. **Data File**:
   - File location: `output/us_enc_all.sqlite` (or your custom path)
   - Required layers: `seaare`, `lndare`, `fairwy`, `drgare`, `tsslpt`, `prcare`
3. **Port Data**: Standard or custom port definitions (included with package)

**Setup Instructions:**
See `docs/SETUP.md` for converting S-57 charts to SpatiaLite backend.

**Troubleshooting:**
If you encounter issues, see `docs/TROUBLESHOOTING.md` for common problems and solutions.

In [None]:
import sys
import os
import time
from pathlib import Path
from dotenv import load_dotenv
import plotly.io as pio
import pandas as pd
import plotly.express as px

# --- Setup Python Environment ---
# Add the src directory to the Python path to enable module imports
project_root = Path.cwd().parent.parent
if str(project_root) not in sys.path:
    sys.path.append(str(project_root))

# Load environment variables from .env file at the project root
# This loads configuration settings (optional for SpatiaLite backend)
load_dotenv(project_root / ".env")
pio.renderers.default = "notebook_connected"

# Import maritime module components
from src.nautical_graph_toolkit.core.s57_data import ENCDataFactory
from src.nautical_graph_toolkit.utils.port_utils import Boundaries, PortData
from src.nautical_graph_toolkit.utils.plot_utils import PlotlyChart

# --- Define Output Directory ---
# Create output directory for saving results
output_dir = Path.cwd() / 'output'
output_dir.mkdir(exist_ok=True)

# --- Define Data Source (SpatiaLite File) ---
# For file-based backends, we use a Path object instead of connection dict
# Note: Can be .sqlite or .gpkg - ENCDataFactory auto-detects the format
data_file = Path.cwd() / "output" / "us_enc_all.sqlite"

print(f"Output directory: {output_dir}")
print(f"Data source: {data_file.name}")

# --- Performance Tracking ---
# Dictionary to store timing metrics for each pipeline step
performance_metrics = {}

## 1. Define Area of Interest (AOI)

This first step defines the geographic scope for our graph. We select two ports and create an expanded bounding box around them to ensure all relevant navigational data is included.

In [None]:
# --- Define Area of Interest by Selecting Two Ports ---
# Get port data and create a bounding box between Los Angeles and San Francisco
# The expansion parameter adds a buffer around the ports to include surrounding navigable areas
start_time = time.perf_counter()

port  = PortData()
bbox = Boundaries()
port1 = port.get_port_by_name('Los Angeles')
port2 = port.get_port_by_name('San Francisco')

# --- Validate Port Selection ---
# Ensure both ports were found in the database before proceeding
if port1.empty or port2.empty:
    raise ValueError("Could not find one or both ports. Please check the names.")
else:
    print(port.format_port_string(port1))
    print(port.format_port_string(port2))
    # Create expanded boundary (24 nautical miles) around the two ports
    # date_line=True handles cases where routes cross the International Date Line
    port_bbox = bbox.create_geo_boundary(geometries = [port1.geometry, port2.geometry],
                                      expansion=24,
                                      date_line=True)

end_time = time.perf_counter()
performance_metrics['Port Selection & Boundary'] = end_time - start_time
print(f"\nPort selection and boundary creation took: {end_time - start_time:.2f}s")
port_bbox

## 2. Visualize the Area of Interest

Here, we plot the selected ports and the calculated boundary on a map to visually confirm our area of interest.

In [None]:
# --- Visualize Ports on Interactive Map ---
# Create a Plotly map and add both ports as markers
# This helps verify port locations before proceeding with graph creation
ply = PlotlyChart()
ply_fig = ply.create_base_map(mapbox_token=os.getenv('MAPBOX_TOKEN'))
ply.plotly_base_config(ply_fig)
port1_df = port.get_port_details_df(port1)
port2_df = port.get_port_details_df(port2)
# Add departure port (Los Angeles) in blue
ply.add_single_port_trace(ply_fig, port1, name=port1['PORT_NAME'], color='blue')
# Add arrival port (San Francisco) in red
ply.add_single_port_trace(ply_fig, port2, name=port2['PORT_NAME'], color='red')
ply_fig.show()

In [None]:
# --- Add Boundary to Map Visualization ---
# Display the expanded boundary box on the map to show our area of interest
ply.add_boundary_trace(ply_fig, port_bbox)
ply_fig.show()

## 3. ENC Data Preparation

With the AOI defined, we now query the SpatiaLite file to find all Electronic Navigational Charts (ENCs) that intersect with our boundary. This ensures we only process relevant chart data, which is critical for performance.


In [None]:
# --- Initialize ENC Data Factory for SpatiaLite Backend ---
# The factory provides a unified interface for accessing ENC data
# regardless of backend (PostGIS/GeoPackage/SpatiaLite)
start_time = time.perf_counter()

sqlite_factory = ENCDataFactory(source=data_file)

# --- Filter ENCs by Boundary ---
# Step 1: Get the list of ENC names that intersect with our area of interest
# This ensures we only process relevant chart data, critical for performance
enc_names_in_boundary = sqlite_factory.get_encs_by_boundary(port_bbox.geometry.iloc[0])

# Step 2: Get the bounding box GeoDataFrame for only those filtered ENCs
# This provides geographic extents for visualization
enc_bbox_gdf = sqlite_factory.get_enc_bounding_boxes(enc_names_in_boundary)

# --- Visualize ENC Coverage on Map ---
# Step 3: Add the ENC boundaries to our map to verify coverage
# Different usage bands (1-6) represent different chart scales/detail levels
ply.add_boundary_trace(ply_fig, port_bbox)
ply.add_enc_bbox_trace(figure=ply_fig, bbox_df=enc_bbox_gdf, usage_bands=[1,2,3,4,5,6])

end_time = time.perf_counter()
performance_metrics['ENC Filtering'] = end_time - start_time
print(f"ENC filtering took: {end_time - start_time:.2f}s")
ply_fig.show()

## 4. Graph Generation and Pathfinding

#### 4.1 Create Navigable Grid

In [None]:
# --- Initialize BaseGraph for SpatiaLite Backend ---
# BaseGraph handles graph creation operations for any backend
# It queries the S-57 'seaare' (Sea Area) layer and other navigable layers
# (like fairways) to create a single polygon representing all navigable water
from src.nautical_graph_toolkit.core.graph import BaseGraph

start_time = time.perf_counter()

sqlite_bg = BaseGraph(data_factory=ENCDataFactory(source=data_file),
                  graph_schema_name="graph")

# --- Create Navigable Grid ---
# This step extracts navigable areas from the ENC data and creates a combined
# polygon. The reduce_distance_nm parameter can shrink the navigable area by
# a specified distance to avoid getting too close to hazards (3 NM in this case).
grid = sqlite_bg.create_base_grid(port_boundary=port_bbox,
                              departure_port=port1,
                              arrival_port=port2,
                              layer_table="seaare",
                              reduce_distance_nm=3)

end_time = time.perf_counter()
performance_metrics['Grid Creation'] = end_time - start_time
print(f"Grid creation took: {end_time - start_time:.2f}s")
print(len(grid))
print(type(grid))
print(grid.keys())

#### 4.2 Visualize Grid Components

In [None]:
# --- Visualize Grid Components ---
# We plot the different components of the generated grid to understand coverage:
# - main_grid (red): Primary sea area polygons from 'seaare' layer
# - extra_grids (green): Additional navigable areas (fairways, channels, etc.)
# - combined_grid (blue): Final merged navigable polygon used for graph creation
ply_grid = ply.create_base_map(mapbox_token=os.getenv('MAPBOX_TOKEN'))
ply.plotly_base_config(ply_grid)
ply.add_grid_trace(ply_grid, grid_geojson=grid["main_grid"], color="red")
ply.add_grid_trace(ply_grid, grid_geojson=grid["extra_grids"], color="green")
ply.add_grid_trace(ply_grid, grid_geojson=grid["combined_grid"], color="blue")
ply_grid.show()

#### 4.3 Construct Graph from Grid

In [None]:
# --- Construct Graph from Grid ---
# This is the core graph creation step. It populates the navigable grid polygon 
# with a dense network of nodes (0.3 NM spacing) and edges connecting adjacent nodes.
# keep_largest_component=True ensures the graph is a single connected network,
# which prevents pathfinding errors by removing isolated node clusters.
start_time = time.perf_counter()

G = sqlite_bg.create_base_graph(grid["combined_grid"], 0.3,
                              keep_largest_component=True)

end_time = time.perf_counter()
performance_metrics['Graph Creation'] = end_time - start_time
print(f"Graph creation took: {end_time - start_time:.2f}s")

#### 4.4 Save Graph to File

In [None]:
# --- Save Graph to GeoPackage File ---
# For portability and use in other tools (like QGIS), we save the 
# in-memory graph to a GeoPackage file with separate layers for nodes and edges.
# This allows for visualization, analysis, and sharing outside of Python.
# Note: Even though our source data is SpatiaLite, we save to GPKG for better compatibility.
start_time = time.perf_counter()

output_file = output_dir / "base_graph_SQLITE.gpkg"
sqlite_bg.save_graph_to_gpkg(G, output_file)

end_time = time.perf_counter()
performance_metrics['Save to GPKG'] = end_time - start_time
print(f"Saving to GeoPackage took: {end_time - start_time:.2f}s")

#### 4.5 Perform Base Routing

In [None]:
# --- Perform Base Routing (Shortest Path Calculation) ---
# With the graph created, we can now run a simple shortest-path calculation
# between our two ports using A* algorithm. The Route class handles:
# - Mapping port coordinates to nearest graph nodes
# - Computing optimal path considering only distance (no weights yet)
# - Calculating total route distance in nautical miles
from src.nautical_graph_toolkit.core.pathfinding_lite import Route

start_time = time.perf_counter()

route = Route(graph=G, data_manager=sqlite_factory.manager)
route_geometry, distance = route.base_route(
    departure_point=port1.geometry,
    arrival_point=port2.geometry
)

end_time = time.perf_counter()
performance_metrics['Pathfinding'] = end_time - start_time
print(f"Pathfinding took: {end_time - start_time:.2f}s")

#### 4.6 Visualize and Save Route

In [None]:
# --- Visualize Computed Route on Map ---
# Display the calculated route as a line on the map along with both ports.
# This provides a visual verification that the routing algorithm produced
# a sensible maritime path between the two locations.
ply_route = ply.create_base_map(mapbox_token=os.getenv('MAPBOX_TOKEN'))
ply.plotly_base_config(ply_route)
# Add the route line
ply.add_route_trace(figure=ply_route,
                    line=route_geometry,
                    name="Base Route")
# Add departure port marker
ply.add_single_port_trace(ply_route, port1, name=port1['PORT_NAME'], color='blue')
# Add arrival port marker
ply.add_single_port_trace(ply_route, port2, name=port2['PORT_NAME'], color='red')
ply_route.show()

In [None]:
# --- Save Route to GeoPackage File ---
# Store the computed route geometry in a GeoPackage file for future reference.
# Routes are saved with metadata (route name, distance, etc.) in a dedicated file.
# This allows for route comparison, analysis, and sharing with other applications.
sqlite_factory.save_route(route_geom=route_geometry,
                      route_name= "base_route_name_sql",
                      table_name= "base_route_table_sql",
                      overwrite= True)

## 5. Performance Summary

In [None]:
# --- Visualize Pipeline Performance Metrics ---
# Create an interactive bar chart showing time taken for each pipeline step.
# This helps identify bottlenecks and optimize the workflow for larger areas.
if performance_metrics:
    # Convert the dictionary to a pandas DataFrame for easy plotting
    perf_df = pd.DataFrame(list(performance_metrics.items()), columns=['Step', 'Time (seconds)'])
    perf_df = perf_df.sort_values(by='Time (seconds)', ascending=False)

    # Create an interactive bar chart with time values displayed
    fig = px.bar(
         perf_df,
         x='Step',
         y='Time (seconds)',
         title='Base Graph Creation Pipeline Performance (SpatiaLite)',
         text_auto='.2f',
         labels={'Step': 'Pipeline Step', 'Time (seconds)': 'Time Taken (seconds)'}
    )
    fig.update_traces(textposition='outside')
    fig.show()
else:
    print("No performance metrics were recorded. Run the notebook cells to generate the summary.")