## OATpy Examples
Do some basic analysis using OATpy. This is very heavily based on Greg's provided notebooks, though is done on our own datasets and constructed networks to make sure I know how to work with it.

We use 4 Examples:
1. A 2D point cloud
2. A 3D point cloud
3. A network represented by a sparse matrix

For each example we:
1. Compute homology
2. Solve for representative cycles

### Preliminaries

In [151]:
# load some packages
from plotly.subplots import make_subplots
import plotly.graph_objects as go # oat uses plotly not matplotlib, theres a bit of a learning curve here but it should work fine
from scipy import sparse
import networkx as nx
import pandas as pd
import oatpy as oat
import numpy as np

# configuration
DATA_PATH = 'datasets/CCMathTopologyScavengerHunt/'


### Example 1: 2D Point Cloud

We use `points2.csv` from the CCMath Topology Scavenger Hunt.

In [2]:
# Pull the cloud
FILE = 'points2.csv'

dta = np.array(pd.read_csv(DATA_PATH + FILE, header=None)) # oat uses np arrays as coordinate inputs

# plotly plotting
trace = go.Scatter(x=dta[:, 0], y=dta[:, 1], mode='markers')
fig = go.Figure(trace)
fig.update_layout(
        width=500, 
        height=500,
        margin=dict(l=20, r=20, t=20, b=20)
    )
fig.show()

#### Solve Homology
OATpy requires us to go through a couple steps before we can actually compute persistence to set up the problem. Finally, the homology call at the end calculated homology, including representative cycles and bounding chains (idk what these are)

In [3]:
# setup problem
enclosing = oat.dissimilarity.enclosing_from_cloud(dta) # max fintration radius
dissimilairty_matrix = oat.dissimilarity.matrix_from_cloud( # distance matrix
        cloud=dta,
        dissimilarity_max = enclosing + 1e-10 # i belive any elements past this are removed (returns a sparse matrix)
    )
# add 1e-10 to elimite some numerical error (greg says to do it)
factored = oat.rust.FactoredBoundaryMatrixVr( # two functions that do this, idk what the other one is
        dissimilarity_matrix=dissimilairty_matrix,
        homology_dimension_max=1
    )

# solve homology
homology = factored.homology( # solve homology
        return_cycle_representatives=True, # These need to be true to be able to make a barcode
        return_bounding_chains=True
    )

#### Visualize Homology
See a persistence diagram and barcode for the homology.

Based on the data input (`points2.csv`), we expect there to be only one 1D cycle, 1 large 2D cycle which lasts the longest, and 4 smaller 2D cycles that last a smaller amount of time not including noise.

In [4]:
# Persistance diagram
fig = oat.plot.pd(homology)
fig.update_layout(
        width=600, 
        height=500,
        margin=dict(l=20, r=20, t=20, b=20)
    )
fig.show()

In [5]:
# Barcode diagram
fig = oat.plot.barcode(homology)
fig.update_layout(
        width=1000, 
        height=500,
        margin=dict(l=20, r=20, t=20, b=20)
    )
fig.show()

#### Find Representative Cycles
To find representative cycles, we use the factored matrix and the `optimize_cycle` method. This returns a lot of information, including the initial unoptimized cycle and the process used to remove simplifies to optimize the cycle, but I don't show these here since I think they're irrelevant for our project. If interested, check out `vietoris_rips_bounding.ipynb` in Greg's example notebooks.

In [6]:
# cycle to optimize
i = np.argmax(homology['cycle nnz']) # pick the largest cycle the optimize (could pick any what forms an actual cycle, this one's just easy)

# optimization problem
optimal = factored.optimize_cycle(
        birth_simplex                   =   homology["birth simplex"][i], 
        problem_type                    =   "preserve PH basis"
    )
optimal_edges = optimal["chain"]["optimal cycle"]["simplex"].tolist() # bounding box of optimal cycle

# plotting
trace_dta = go.Scatter( # data plot
        x=dta[:, 0],
        y=dta[:, 1],
        mode='markers',
        showlegend=True,
        name='Data',
        opacity=0.5
    )
traces_optimal = [oat.plot.edge__trace2d(edge, dta) for edge in optimal_edges] # optimal cycle plot
for n, trace in enumerate(traces_optimal): # plot optimal cycle
    trace.update(
            showlegend=(n==0),
            legendgroup="opti",
            opacity=0.5,
            name="Optimal cycle",
            line=dict(color="black")
        )
fig = go.Figure(data=[trace_dta]+traces_optimal)
fig.update_layout(
        width=600, 
        height=500,
        margin=dict(l=20, r=20, t=20, b=20)
    )
fig.show()


Finished construcing L1 optimization program.
Constraint matrix has 9546 nonzero entries.
Passing program to solver.

Done solving.
MINILP solution: Solution { direction: Minimize, num_vars: 4029, num_constraints: 4384, objective: 30.282127520653727 }


### Example 2: 3D Point Cloud

We use `points13.csv` from the CCMath Topology Scavenger Hunt.

In [7]:
# Pull the cloud
FILE = 'points13.csv'

dta = np.array(pd.read_csv(DATA_PATH + FILE, header=None)) # oat uses np arrays as coordinate inputs

# plotly plotting
trace = go.Scatter3d(x=dta[:, 0], y=dta[:, 1], z=dta[:, 2], mode='markers')
fig = go.Figure(trace)
fig.update_layout(
        width=300, 
        height=500,
        margin=dict(l=20, r=20, t=20, b=20)
    )
fig.show()

#### Solve Homology
We use `homology_dimension_max=2` here to find the trapped voids. It does take much longer tho.

In [8]:
# setup problem
enclosing = oat.dissimilarity.enclosing_from_cloud(dta) # max fintration radius
dissimilairty_matrix = oat.dissimilarity.matrix_from_cloud( # distance matrix
        cloud=dta,
        dissimilarity_max = enclosing + 1e-10 # i belive any elements past this are removed (returns a sparse matrix)
    )
# add 1e-10 to elimite some numerical error (greg says to do it)
factored = oat.rust.FactoredBoundaryMatrixVr( # two functions that do this, idk what the other one is
        dissimilarity_matrix=dissimilairty_matrix,
        homology_dimension_max=2
    )

# solve homology
homology = factored.homology( # solve homology
        return_cycle_representatives=True, # These need to be true to be able to make a barcode, makes the problem take ~30% longer (1:30ish)
        return_bounding_chains=True
    )

#### Visualize Homology
See a persistence diagram and barcode for the homology.

Based on the data input (`points13.csv`), we expect there to be only one 1D cycle, 1 large 2D cycle, and 2 large 3d cycles.

#### Visualize Homology
See a persistence diagram and barcode for the homology.

Based on the data input (`points13.csv`), we expect there to be only one 1D cycle, 1 large 2D cycle, and 2 large 3d cycles.

In [9]:
# Persistance diagram
fig = oat.plot.pd(homology)
fig.update_layout(
        width=600, 
        height=500,
        margin=dict(l=20, r=20, t=20, b=20)
    )
fig.show()

In [10]:
# Barcode diagram
fig = oat.plot.barcode(homology)
fig.update_layout(
        width=1000, 
        height=500,
        margin=dict(l=20, r=20, t=20, b=20)
    )
fig.show()

#### Find Representative Cycles
We find representative cycles for the largest 2D hole and 3D void

In [11]:
## Representative 2D Hole
# cycle to optimize
i = homology[homology['dimension'] == 1]['cycle nnz'].idxmax() # largest 1d cycle (2d hole)

# optimization problem
optimal = factored.optimize_cycle(
        birth_simplex=homology["birth simplex"][i], 
        problem_type="preserve PH basis"
    )
optimal_edges = optimal["chain"]["optimal cycle"]["simplex"].tolist() # bounding box of optimal cycle

# plotting
trace_dta = go.Scatter3d( # data plot
        x=dta[:, 0],
        y=dta[:, 1],
        z=dta[:, 2],
        mode='markers',
        showlegend=True,
        name='Data',
        opacity=0.5
    )
traces_optimal = [oat.plot.edge__trace3d(edge, dta) for edge in optimal_edges] # optimal cycle plot
for n, trace in enumerate(traces_optimal): # plot optimal cycle
    trace.update(
            showlegend=(n==0),
            legendgroup="opti",
            opacity=0.5,
            name="Optimal cycle",
            line=dict(color="black")
        )
fig = go.Figure(data=[trace_dta]+traces_optimal)
fig.update_layout(
        width=400, 
        height=500,
        margin=dict(l=20, r=20, t=20, b=20)
    )
fig.show()


Finished construcing L1 optimization program.
Constraint matrix has 7245 nonzero entries.
Passing program to solver.

Done solving.
MINILP solution: Solution { direction: Minimize, num_vars: 4040, num_constraints: 4754, objective: 58.45581204266295 }


I can't get a representative 3D optimal cycle to work (it kinda just breaks the notebook) so I did the next best thing (a non-optimal cycle).

In [12]:
## Representative 3D Hole
# cycle to optimize
i = homology[homology['dimension'] == 2]['cycle nnz'].idxmax() # largest 1d cycle (2d hole)

# # optimization problem (This has been having issues)
# optimal = factored.optimize_cycle(
#         birth_simplex=homology["birth simplex"][i], 
#         problem_type="preserve PH basis"
#     )
# optimal_faces = optimal["chain"]["optimal cycle"]["simplex"].tolist() # bounding box of optimal cycle

# This is def non-optimal
optimal_faces = homology['cycle representative'][i]['simplex'].tolist()

# plotting
trace_dta = go.Scatter3d( # data plot
        x=dta[:, 0],
        y=dta[:, 1],
        z=dta[:, 2],
        mode='markers',
        showlegend=True,
        name='Data',
        opacity=0.5
    )
traces_optimal = [oat.plot.triangle__trace3d(triangle=tri, coo=dta) for tri in optimal_faces] # optimal cycle plot
for n, trace in enumerate(traces_optimal): # plot optimal cycle
    trace.update(
            showlegend=(n==0),
            legendgroup="diff",
            opacity=0.5,
            name="Optimal cycle"
        )
fig = go.Figure(data=[trace_dta]+traces_optimal)
fig.update_layout(
        width=400, 
        height=500,
        margin=dict(l=20, r=20, t=20, b=20)
    )
fig.show()

### Example 2: 3D Point Cloud

We use the network adjacency matrix from `ripserer_exs.ipynb`.

In [160]:
# graph adjacency matrix
adj = np.array([[0, 1, 0, 0, 0, 6],
                [1, 0, 5, 8, 7, 0],
                [0, 5, 0, 4, 0, 0],
                [0, 8, 4, 0, 2, 0],
                [0, 7, 0, 2, 0, 3],
                [6, 0, 0, 0, 3, 0]])

def viz_graph(G):
    pos = nx.circular_layout(G) # setup graph layout

    # edge locations
    e_x = [] # edge x
    e_y = [] # edge y
    for e in G.edges:
        u, v = e # edge goes from u to v
        u_x, u_y = pos[u] # u position
        v_x, v_y = pos[v] # v position
        e_x += [u_x, v_x, None]
        e_y += [u_y, v_y, None]

    edge_trace = go.Scatter(
            x=e_x, y=e_y,
            hoverinfo='none',
            mode='lines',
            line=dict(width=5, color='#888')
        )

    # node locations
    n_x = [] # node x
    n_y = [] # node y
    n_t = [] # node test
    for i, n in enumerate(G.nodes):
        x, y = pos[n]
        n_x.append(x)
        n_y.append(y)
        n_t.append(i)
        
    node_trace = go.Scatter(
            x=n_x, y=n_y,
            hoverinfo='none',
            mode='markers+text',
            text=n_t,
            marker=dict(
                    size=25,
                    line_width=2
                )
        )
    
    return edge_trace, node_trace

# Initialize figure with subplots
fig = make_subplots(
        rows=3,
        cols=3,
        subplot_titles=(
                'Threshold=0.0', 'Threshold=1.0', 'Threshold=2.0',
                'Threshold=3.0','Threshold=4.0', 'Threshold=5.0',
                'Threshold=6.0', 'Threshold=7.0', 'Threshold=8.0'
            ),
        shared_xaxes=True,
        shared_yaxes=True,
        vertical_spacing=0.05
    )

# Add traces
for thresh in range(9):
    # find info for plots
    r = 1 + (thresh) // 3 # row (for some horrid reason plotly 1 indexes)
    c = 1 + (thresh) % 3 # column
    G = nx.from_numpy_array(adj * (adj <= thresh)) # graph below threshhold
    edge_trace, node_trace = viz_graph(G) # viz objects

    # edit fig
    fig.add_trace(edge_trace, row=r, col=c)
    fig.add_trace(node_trace, row=r, col=c)
    fig.update_xaxes(showgrid=False, zeroline=False, showticklabels=False, row=r, col=c)
    fig.update_yaxes(showgrid=False, zeroline=False, showticklabels=False, row=r, col=c)

# Update title and height
fig.update_layout(
        showlegend=False,
        width=900, 
        height=900,
        margin=dict(l=10, r=20, t=30, b=20)
    )
fig.show()

In [218]:
# graph adjacency matrix (animation)
adj = np.array([[0, 1, 0, 0, 0, 6],
                [1, 0, 5, 8, 7, 0],
                [0, 5, 0, 4, 0, 0],
                [0, 8, 4, 0, 2, 0],
                [0, 7, 0, 2, 0, 3],
                [6, 0, 0, 0, 3, 0]])

def viz_graph(G):
    pos = nx.circular_layout(G) # setup graph layout

    # edge locations
    e_x = [] # edge x
    e_y = [] # edge y
    for e in G.edges:
        u, v = e # edge goes from u to v
        u_x, u_y = pos[u] # u position
        v_x, v_y = pos[v] # v position
        e_x += [u_x, v_x, None]
        e_y += [u_y, v_y, None]

    edge_trace = go.Scatter(
            x=e_x, y=e_y,
            hoverinfo='none',
            mode='lines',
            line=dict(width=5, color='#888')
        )

    # node locations
    n_x = [] # node x
    n_y = [] # node y
    n_t = [] # node test
    for i, n in enumerate(G.nodes):
        x, y = pos[n]
        n_x.append(x)
        n_y.append(y)
        n_t.append(i)
        
    node_trace = go.Scatter(
            x=n_x, y=n_y,
            hoverinfo='none',
            mode='markers+text',
            text=n_t,
            marker=dict(
                    size=25,
                    line_width=2
                )
        )
    
    return edge_trace, node_trace

# Add traces
frames = []
for thresh in range(9):
    # find info for plots
    r = 1 + (thresh) // 3 # row (for some horrid reason plotly 1 indexes)
    c = 1 + (thresh) % 3 # column
    G = nx.from_numpy_array(adj * (adj <= thresh)) # graph below threshhold
    edge_trace, node_trace = viz_graph(G) # viz objects

    # add as frame
    frames.append(go.Frame(
            data=[edge_trace, node_trace],
            name=thresh
        ))
    if thresh == 0:
        edge_0_trace, node_0_trace = edge_trace, node_trace

# create figure
fig = go.Figure(data=[edge_0_trace, node_0_trace], frames=frames)

## the rest is coped from the plotly documentation example on mri volume slices
def frame_args(duration):
    return {
            "frame": {"duration": duration},
            "mode": "immediate",
            "fromcurrent": True,
            "transition": {"duration": 0, "easing": "linear"},
        }
fig.update_layout(
        showlegend=False,
        width=500, 
        height=500,
        margin=dict(l=20, r=20, t=20, b=20),
        xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
        yaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
        updatemenus = [dict(
                buttons=[
                        dict(
                                args=[None, frame_args(200)],
                                label='&#9654;', # play symbol
                                method='animate'
                            ),
                        dict(
                                args=[None, frame_args(0)],
                                label='&#9724;', # play symbol
                                method='animate'
                            )
                    ],
                direction='left',
                pad=dict(l=0, r=0, t=10, b=10),
                type='buttons',
                x=0.1,
                y=0
            )],
        sliders=[
                dict(
                        pad=dict(l=15, r=0, t=10, b=10),
                        len=0.9,
                        x=0.1,
                        y=0,
                        steps=[dict(
                                args=[[f.name], frame_args(0)],
                                label=str(k),
                                method='animate'
                            ) for k, f in enumerate(fig.frames)],
                    )
            ]
)
fig.show()

#### Solve for Homology
We turn the adjacency into a sparse matrix so 0s (where there is no edge) are viewed as inf. Then we solve the problem and find homology.

In [13]:
## setup problems
adj = sparse.csr_array(adj) # adj => dissimilairty_matrix
adj.setdiag(0) # start wiht all nodes

# setup solver
factored = oat.rust.FactoredBoundaryMatrixVr( # two functions that do this, idk what the other one is
        dissimilarity_matrix=adj,
        homology_dimension_max=1
    )

# solve homology
homology = factored.homology( # solve homology
        return_cycle_representatives=True, # These need to be true to be able to make a barcode
        return_bounding_chains=True
    )

#### Visualize Homology
See a persistence diagram and barcode for the homology.

Based on the data input (our network), we expect there to be only one 1D cycle, 1 late 2d cycle that dies and 1 1d cycle that lasts to the end

In [14]:
# Persistance diagram
fig = oat.plot.pd(homology)
fig.update_layout(
        width=600, 
        height=500,
        margin=dict(l=20, r=20, t=20, b=20)
    )
fig.show()

In [156]:
# Barcode diagram
fig = oat.plot.barcode(homology)
fig.update_layout(
        width=1000, 
        height=500,
        hovermode='closest',
        margin=dict(l=20, r=20, t=20, b=20)
    )
fig.show()

In [16]:
## Representative 2D Hole
# cycle to optimize
i = homology['cycle nnz'].idxmax() # largest 1d cycle (2d hole)

# optimization problem
optimal = factored.optimize_cycle(
        birth_simplex=homology["birth simplex"][i], 
        problem_type="preserve PH basis"
    )
optimal_edges = optimal["chain"]["optimal cycle"]["simplex"].tolist() # bounding box of optimal cycle

optimal_edges


Finished construcing L1 optimization program.
Constraint matrix has 0 nonzero entries.
Passing program to solver.

Done solving.
MINILP solution: Solution { direction: Minimize, num_vars: 6, num_constraints: 12, objective: 21.0 }


[[0, 5], [1, 2], [2, 3], [4, 5], [3, 4], [0, 1]]