# Notebook - Fractopo â€“ KB7 Trace Data Validation

In [None]:
import warnings

warnings.filterwarnings("ignore")

In [None]:
import geopandas as gpd

In [None]:
# This cell's contents only for development purposes.
from importlib.util import find_spec

if find_spec("fractopo") is None:
    import sys

    sys.path.append("../../")

In [None]:
from fractopo import Validation
import matplotlib.pyplot as plt

plt.close()

## Data (KB7)

In [None]:
# Trace and target area data available on GitHub
trace_data_url = "https://raw.githubusercontent.com/nialov/fractopo/master/tests/sample_data/KB7/KB7_traces.geojson"
area_data_url = "https://raw.githubusercontent.com/nialov/fractopo/master/tests/sample_data/KB7/KB7_area.geojson"

# Use geopandas to load data from urls
traces = gpd.read_file(trace_data_url)
area = gpd.read_file(area_data_url)

# Name the dataset
name = "KB7"

## Validation (KB7)

In [None]:
# Create validation object with fixing (i.e. modification of data) allowed.
# AREA_EDGE_SNAP_MULTIPLIER is overridden to keep catching this error even with future default
# value changes
kb7_validation = Validation(
    traces, area, name=name, allow_fix=True, AREA_EDGE_SNAP_MULTIPLIER=2.5
)

In [None]:
# Run actual validation and capture the outputted validated trace GeoDataFrame
kb7_validated = kb7_validation.run_validation()

## Validation results (KB7)

In [None]:
# Normal DataFrame methods are available for data inspection
kb7_validated.columns

In [None]:
# Convert column data to string to allow hashing and return all unique
# validation errors.
kb7_validated["VALIDATION_ERRORS"].astype(str).unique()

In [None]:
# Better description function is found in fractopo.cli
from fractopo.cli import describe_results

describe_results(kb7_validated, kb7_validation.ERROR_COLUMN)

The KB7 dataset contains the above errors of which `MULTI JUNCTION` and `TRACE UNDERLAPS TARGET AREA` are disruptive in further analysis.

See documentation: https://fractopo.readthedocs.io/en/latest/validation/errors.html

## Visualization of errors in notebook

Though visualization here is possible, GIS-software (e.g. QGIS, ArcGIS) are much more interactive and are recommended for actual fixing and further error inspection.

### MULTI JUNCTION

In [None]:
# Find MULTI JUNCTION erroneous traces in GeoDataFrame
kb7_multijunctions = kb7_validated.loc[
    ["MULTI JUNCTION" in err for err in kb7_validated[kb7_validation.ERROR_COLUMN]]
]
kb7_multijunctions

In [None]:
kb7_multijunctions.plot(colors=["red", "black", "blue", "orange", "green"])

The plot shows that the green and blue traces abut at their endpoints
which is not a valid topology for traces.
The fix is done by merging the green and blue traces.

Additionally the orange trace has a dangling end instead of being accurately snapped to the black trace. 

In [None]:
# Example fix for blue and green traces
from shapely.ops import linemerge

gpd.GeoSeries(
    [
        linemerge(
            [kb7_multijunctions.geometry.iloc[4], kb7_multijunctions.geometry.iloc[2]]
        ),
        kb7_multijunctions.geometry.iloc[0],
    ]
).plot(colors=["green", "red"])

### TRACE UNDERLAPS TARGET AREA

In [None]:
# Find TRACE UNDERLAPS TARGET AREA erroneous traces in GeoDataFrame
kb7_underlaps = kb7_validated.loc[
    [
        "TRACE UNDERLAPS TARGET AREA" in err
        for err in kb7_validated[kb7_validation.ERROR_COLUMN]
    ]
]
kb7_underlaps

In [None]:
# Create figure, ax base
fig, ax = plt.subplots()

# Plot the underlapping trace along with the trace area boundary
kb7_underlaps.plot(ax=ax, color="red")
area.boundary.plot(ax=ax, color="black")

# Get trace bounds
minx, miny, maxx, maxy = kb7_underlaps.total_bounds

ax.set_xlim(minx - 0.5, maxx + 0.5)
ax.set_ylim(miny - 0.5, maxy + 0.5)

The plot shows that the trace underlaps the target area at least on the northern end and maybe on the southern end. The fix is implemented by extending the trace to meet the target area boundary.