## My First Cycle Rep
Getting an early cycle rep!!!

The parameters are mostly arbitrary, and this isn't final in any way, but it's cool.

#### Preliminaries
The network threshold is 10x what Jingyi suggests in her paper.

In [1]:
# load some packages
import Gavin.utils.make_network as mn
import plotly.graph_objects as go
from time import time
import networkx as nx
import pandas as pd
import oatpy as oat
import numpy as np

# config
DATA_PATH = 'datasets/concept_network/'
CONCEPT_FILE = 'articles_category_for_2l_abstracts_concepts_processed_v1_EX_102.csv.gz' # Applied Mathematics
article_concept_df = mn.filter_article_concept_file(
        DATA_PATH+CONCEPT_FILE,
        relevance_cutoff=0.7,
        min_article_freq=0.0006, # 0.06%
        max_article_freq=0.005, # 0.5%
        normalize_year=True,
        year_min=1920
    ) # use a filtered data file to make the samples

In [3]:
article_concept_df['concept'].nunique()

284

#### Problem Setup
Take the concept-article dataframe and turn it into a network, then a distance matrix.

In [49]:
G = mn.gen_concept_network(article_concept_df) # make the graph
adj = nx.adjacency_matrix(G, weight='norm_year') # adjacency matrix
# node_births = np.array(list(nx.get_node_attributes(G, 'norm_year').values())) # node orgin years, these break the cycle reps (idk why)
# adj.setdiag(node_births)
adj.setdiag(0)

#### Homology Calculation
We setup and calculate homology, then do some basic visualizations.

In [50]:
start = time()

# setup the problem
factored = oat.rust.FactoredBoundaryMatrixVr( # two functions that do this, idk what the other one is
        dissimilarity_matrix=adj,
        homology_dimension_max=2
    )

# solve homology
homology = factored.homology( # solve homology
        return_cycle_representatives=True, # These need to be true to be able to make a barcode, makes the problem take ~30% longer (1:30ish)
        return_bounding_chains=True
    )

f'Homology calculation took {time() - start} secs'

'Homology calculation took 5.411086082458496 secs'

In [51]:
# persistance diagram
fig = oat.plot.pd(homology)
fig.update_layout(
        width=600, 
        height=500,
        margin=dict(l=20, r=20, t=20, b=20)
    )
fig.show()

In [52]:
# Barcode diagram
fig = oat.plot.barcode(homology)
fig.update_layout(
        width=1000, 
        height=500,
        margin=dict(l=20, r=20, t=20, b=20)
    )
fig.show()

#### Cycle Rep
Find a cycle rep!

In [101]:
## Representative 2D Hole
# index of cycle to optimize in homology dataframe
i = 1117 # i think this is a cool one

# optimization problem
start = time()
optimal = factored.optimize_cycle(
        birth_simplex=homology['birth simplex'][i], 
        problem_type='preserve PH basis'
    )
print(f'Optimizaiton took {time() - start}')
optimal_edge_indexes = optimal['chain']['optimal cycle']['simplex'].tolist() # bounding box of optimal cycle
optimal_node_indexes = pd.to_numeric(optimal['chain']['optimal cycle']['simplex'].explode().drop_duplicates()).tolist() # nodes in optimal cycle
optimal_edges = list(optimal['chain']['optimal cycle']['simplex'].apply(lambda simplex: list(np.array(G.nodes)[simplex])))
optimal_nodes = list(np.array(G.nodes)[optimal_node_indexes])
optimal_edges, optimal_nodes


Finished construcing L1 optimization program.
Constraint matrix has 176 nonzero entries.
Passing program to solver.
Optimizaiton took 0.018126964569091797

Done solving.
MINILP solution: Solution { direction: Minimize, num_vars: 129, num_constraints: 182, objective: 5.316831683168317 }


([['algebraic equation', 'first order partial differential equation'],
  ['discrete time dynamical system', 'state equation'],
  ['discrete time dynamical system', 'stochastic approximation algorithm'],
  ['finite difference approximation', 'state equation'],
  ['gradient method', 'stochastic approximation algorithm'],
  ['algebraic equation', 'matrix riccati equation'],
  ['finite difference approximation', 'order partial differential equation'],
  ['distribute parameter systems', 'matrix riccati equation'],
  ['distribute parameter systems', 'gradient method'],
  ['first order partial differential equation',
   'order partial differential equation']],
 ['algebraic equation',
  'first order partial differential equation',
  'discrete time dynamical system',
  'state equation',
  'stochastic approximation algorithm',
  'finite difference approximation',
  'gradient method',
  'matrix riccati equation',
  'order partial differential equation',
  'distribute parameter systems'])