## Reading custom datasets and building a Simplicial Complex

In this tutorial, we provide examples to read data and build a simplicial complex using different formats of data.

### CSV and TNTP format

The code reads a CSV file containing edges of a graph, processes the data to identify source and target nodes, extracts a specific feature (distance), and converts this data into a simplicial complex. The summary provides a quick overview of the structure and properties of the generated simplicial complex, including the number of nodes, edges, and higher-dimensional simplices.

In [2]:
from pytspl import read_csv

PAPER_DATA_FOLDER = "pytspl/data/paper_data"

filename = f"{PAPER_DATA_FOLDER}/edges.csv"
delimiter = " "
src_col = "Source"
dest_col = "Target"
feature_cols = ["Distance"]

# reading a csv file
sc = read_csv(
        filename=filename, 
        delimiter=delimiter, 
        src_col=src_col, 
        dest_col=dest_col, 
        feature_cols=feature_cols
    ).to_simplicial_complex(condition="all")

sc.print_summary()

(CVXPY) Jul 30 01:43:37 AM: Encountered unexpected exception importing solver ECOS:
ImportError("dlopen(/Users/irtaza.hashmi@futurice.com/Desktop/Thesis/pytspl/thesis_venv/lib/python3.11/site-packages/_ecos.cpython-311-darwin.so, 0x0002): tried: '/Users/irtaza.hashmi@futurice.com/Desktop/Thesis/pytspl/thesis_venv/lib/python3.11/site-packages/_ecos.cpython-311-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/Users/irtaza.hashmi@futurice.com/Desktop/Thesis/pytspl/thesis_venv/lib/python3.11/site-packages/_ecos.cpython-311-darwin.so' (no such file), '/Users/irtaza.hashmi@futurice.com/Desktop/Thesis/pytspl/thesis_venv/lib/python3.11/site-packages/_ecos.cpython-311-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))")
(CVXPY) Jul 30 01:43:37 AM: Encountered unexpected exception importing solver SCS:
ImportError("dlopen(/Users/irtaza.hashmi@futurice.com/Desktop/Thesis/pyts

Similarly the data can be read from a TNTP format using the $\texttt{read\_tntp}$ function.

### Incidence matrices

The data can be directly read from incidence matrices $\textbf{B}_1$ and $\textbf{B}_2$. The triangles (2-simplices) are extracted from the $\textbf{B}_2$ matrix.

In [3]:
from pytspl import read_B1_B2

B1_filename = "pytspl/data/paper_data/B1.csv"
B2_filename = "pytspl/data/paper_data/B2t.csv"

# extract the triangles
scbuilder, triangles = read_B1_B2(
    B1_filename=B1_filename, 
    B2_filename=B2_filename
)

Now we can build the SC using the extracted triangles.

In [4]:
# build the SC using the triangles
sc = scbuilder.to_simplicial_complex(triangles=triangles)
sc.print_summary()

Num. of nodes: 7
Num. of edges: 10
Num. of triangles: 3
Shape: (7, 10, 3)
Max Dimension: 2


### Building a SC

There are several ways to build a SC using $\texttt{PyTSPL}$. The first way is to find all the triangles in the graph and consider them as 2-simplices. This method is triangle-based. The second way is find all the triangles and only keep the ones where the distance between the nodes is less than a threshold $\epsilon$. This method is distance-based. By default, when we load a dataset using the $\texttt{load\_dataset}$ function, the SC is built using the triangle-based method.

In this first example, we build the the simplicial complex by finding all the triangles.

In [5]:
sc = read_csv(
        filename=filename, 
        delimiter=delimiter, 
        src_col=src_col, 
        dest_col=dest_col, 
        feature_cols=feature_cols
    ).to_simplicial_complex(condition="all")

sc.print_summary()

Num. of nodes: 7
Num. of edges: 10
Num. of triangles: 3
Shape: (7, 10, 3)
Max Dimension: 2


In this second example, we build a simplicial complex using the distance-method and define distance $\epsilon$. When building a SC with the distance-based method, we get one less triangle (2-simplex).

In [6]:
sc = read_csv(
        filename=filename, 
        delimiter=delimiter, 
        src_col=src_col, 
        dest_col=dest_col, 
        feature_cols=feature_cols,
    ).to_simplicial_complex(
        condition="distance",
        dist_col_name= "Distance",
        dist_threshold = 1.5,
    )
    
sc.print_summary()

Num. of nodes: 7
Num. of edges: 10
Num. of triangles: 2
Shape: (7, 10, 2)
Max Dimension: 2


### Reading coordinates and edge flow from data

We can also read coordinates and flow from custom datasets.

In [7]:
from pytspl.io.network_reader import read_coordinates, read_flow

# load coordinates
coordinates_path = "pytspl/data/paper_data/coordinates.csv"

coordinates = read_coordinates(
    filename=coordinates_path,
    node_id_col="Id",
    x_col="X",
    y_col="Y",
    delimiter=" "
)

print(coordinates)

{0: (0, 0.0), 1: (1, -0.5), 2: (0, -1.0), 3: (-1, -0.5), 4: (-1, -2.5), 5: (0, -2.0), 6: (1, -2.5)}


In [9]:
# load flow
flow_path = "pytspl/data/paper_data/flow.csv"
flow = read_flow(filename=flow_path, header=None)

print(flow.to_dict())

{0: {0: 2.25, 1: 0.13, 2: 1.72, 3: -2.12, 4: 1.59, 5: 1.08, 6: -0.3, 7: -0.21, 8: 1.25, 9: 1.45}}
