**GraphReader:**

- `read_and_clean_graph`: Reads a graph from a GraphML file, assigns node IDs, cleans up attribute names, and prints summary information about the graph.

**LayoutUtility:**

- `fr_layout_nx`: Performs a Fruchterman-Reingold layout on the graph using NetworkX. It allows for customization of layout parameters and prints information about the layout process.

**ZCoordinateAdder:**

- `add_z_coordinate_to_nodes`: Adds a z-coordinate to each node in the graph based on its centrality value. It calculates the bounds of the x and y coordinates, normalizes centrality values, scales the z-coordinates, and adds them to the nodes.

**GraphBundler2d:**

- `prune_edges_by_percentile_weight`: Removes edges with weights below a specified percentile threshold.
  bundle_edges: Performs edge bundling using the Hammer Bundle algorithm. It first prunes edges and then performs bundling with user-defined parameters. It also groups the bundled edges by edge ID and includes source and target positions.

**EdgeZInterpolator:**

- `interpolate_z_to_edges`: Interpolates z-coordinates for each edge in a DataFrame using cubic spline interpolation. It considers the source and target node z-coordinates and the edge path to assign z-coordinates to all points along the bundled edge.

**Apply3DEdgeBundling:**

- `apply_3d_bundling`: Applies 3D edge bundling to a graph's edges. It utilizes neighbor information, forces, and smoothing to create a bundled representation of edges in 3D space. It allows for customization of various parameters like the number of iterations, step size, smoothing iterations, and neighbor radius.

**GraphSaver:**

- `save_igraph_nodes_to_json`: Saves node data from an igraph graph to a JSON file. It offers options to return the JSON string as well as specify which node attributes to include.


# Imports


In [3]:
# Standard library imports
import sys
import json
import random
import time
from collections import defaultdict
from multiprocessing import Pool, cpu_count
from typing import Dict, List, Optional, Tuple, Union
import logging

# Data manipulation and analysis
import numpy as np
import pandas as pd

# Disable SettingWithCopyWarning
pd.options.mode.chained_assignment = None

# Graphs and networks
import networkx as nx
import igraph as ig

# Visualization
import matplotlib.pyplot as plt
import colorcet as cc
from matplotlib.colors import to_hex, to_rgb

# Data visualization and processing
import datashader as ds
import datashader.transfer_functions as tf
from datashader.bundling import hammer_bundle

# Scientific computing
from scipy.spatial import cKDTree
from scipy.interpolate import CubicSpline, interp1d


# Progress bars
from tqdm.notebook import tqdm

# Performance optimization
from numba import jit, prange

# Custom modules
from fa2_modified import ForceAtlas2

# Warnings
import warnings

In [4]:
# 2. Constants and configuration
INPUT_GRAPH_PATH = "../data/07-clustered-graphs/alpha0.3_k10_res0.002.graphml"
CLUSTER_INFO_LABEL_TREE = "../output/cluster-qualifications/ClusterInfoLabelTree.xlsx"
CLUSTER_LABEL_DICT_PATH = "../data/99-testdata/cluster_label_dict.json"
CLUSTER_TREE_PATH = "../output/cluster-qualifications/ClusterHierachy_noComments.json"
OUTPUT_DIR = "../data/99-testdata/"
THREEJS_OUTPUT_DIR = (
    "/Users/jlq293/Projects/Random Projects/LW-ThreeJS/2d_ssrinetworkviz/src/data/"
)
CLUSTER_HIERACHY_FOR_LEGEND_PATH = (
    "../output/cluster-qualifications/ClusterHierachy_noComments.json"
)

# utility functions


## Graph Reader


In [5]:
class GraphReader:
    @staticmethod
    def read_and_clean_graph(path: str) -> ig.Graph:
        g = ig.Graph.Read_GraphML(path)
        g.vs["node_id"] = [int(i) for i in range(g.vcount())]

        if "id" in g.vs.attribute_names():
            g.vs["node_name"] = g.vs["id"]
            del g.vs["id"]

        if "cluster" in g.vs.attribute_names():
            g.vs["cluster"] = [int(cluster) for cluster in g.vs["cluster"]]

        if "year" in g.vs.attribute_names():
            g.vs["year"] = [int(year) for year in g.vs["year"]]

        if "eid" in g.vs.attribute_names():
            del g.vs["eid"]

        if "centrality_alpha0.3_k10_res0.006" in g.vs.attribute_names():
            del g.vs["centrality_alpha0.3_k10_res0.006"]

        if "centrality_alpha0.3_k10_res0.002" in g.vs.attribute_names():
            g.vs["centrality"] = g.vs["centrality_alpha0.3_k10_res0.002"]
            del g.vs["centrality_alpha0.3_k10_res0.002"]

        g.es["edge_id"] = list(range(g.ecount()))
        print("Node Attributes:", g.vs.attribute_names())
        print("Edge Attributes:", g.es.attribute_names())
        # print number of nodes and edges
        print(f"Number of nodes: {g.vcount()}")
        print(f"Number of edges: {g.ecount()}")
        return g

    @staticmethod
    def subgraph_of_clusters(G, clusters):
        if isinstance(G, nx.Graph):
            nodes = [
                node for node in G.nodes if G.nodes[node].get("cluster") in clusters
            ]
            return G.subgraph(nodes)
        elif isinstance(G, ig.Graph):
            nodes = [v.index for v in G.vs if v["cluster"] in clusters]
            return G.subgraph(nodes)
        else:
            raise TypeError("Input must be a NetworkX Graph or an igraph Graph")

    @staticmethod
    def add_cluster_labels(
        G: Union[nx.Graph, ig.Graph],
        labels_file_path: str = "../output/cluster-qualifications/raw_cluster_labels.json",
    ) -> Tuple[Union[nx.Graph, ig.Graph], Dict[float, str]]:
        """
        Add cluster labels to the graph nodes.

        Args:
            G (Union[nx.Graph, ig.Graph]): The input graph (NetworkX or igraph).
            labels_file_path (str): Path to the JSON file containing cluster labels.

        Returns:
            Tuple[Union[nx.Graph, ig.Graph], Dict[float, str]]:
                The graph with added cluster labels and the cluster label dictionary.
        """
        with open(labels_file_path) as file:
            cluster_label_dict = json.load(file)
        cluster_label_dict = {float(k): v[0] for k, v in cluster_label_dict.items()}

        if isinstance(G, nx.Graph):
            for node in G.nodes:
                cluster = G.nodes[node]["cluster"]
                G.nodes[node]["cluster_label"] = cluster_label_dict.get(
                    cluster, "Unknown"
                )
        elif isinstance(G, ig.Graph):
            G.vs["cluster_label"] = [
                cluster_label_dict.get(v["cluster"], "Unknown") for v in G.vs
            ]
        else:
            raise TypeError("Input must be a NetworkX Graph or an igraph Graph")

        return G, cluster_label_dict

## LayoutUtility


In [6]:
import time
import networkx as nx
import igraph as ig
from typing import Union, Dict, Tuple, Optional


class LayoutUtility:
    """
    Layout utility class for igraph layout operations. made for fruchterman-reingold layout.

    Args:
        g (Union[nx.Graph, ig.Graph]): The input graph (NetworkX or igraph).
        layout_params (Optional[Dict]): The layout parameters.

    Returns:
        Tuple[nx.Graph, Dict]: The graph with assigned coordinates and the layout dictionary.
    """

    @staticmethod
    def fr_layout_nx(
        g: Union[nx.Graph, ig.Graph], layout_params: Optional[Dict] = None
    ) -> Tuple[nx.Graph, Dict]:
        print("Starting Fruchterman-Reingold layout process...")
        start_time = time.time()

        if layout_params is None:
            layout_params = {
                "iterations": 100,
                "threshold": 0.00001,
                "weight": "weight",
                "scale": 1,
                "center": (0, 0),
                "dim": 2,
                "seed": 1887,
            }
        print(f"Layout parameters: {layout_params}")

        if not isinstance(g, nx.Graph):
            print("Converting to NetworkX Graph...")
            G = g.to_networkx()
            print("Conversion complete.")
        else:
            G = g

        print(f"Graph has {G.number_of_nodes()} nodes and {G.number_of_edges()} edges.")

        print("Calculating layout...")
        layout_start_time = time.time()
        pos = nx.spring_layout(G, **layout_params)
        layout_end_time = time.time()
        print(
            f"Layout calculation completed in {layout_end_time - layout_start_time:.2f} seconds."
        )

        print("Processing layout results...")
        node_xy_dict = {node: pos[node] for node in G.nodes}

        x_values, y_values = zip(*node_xy_dict.values())
        min_x, max_x = min(x_values), max(x_values)
        min_y, max_y = min(y_values), max(y_values)

        print(f"Layout boundaries:")
        print(f"X-axis: Min = {min_x:.2f}, Max = {max_x:.2f}")
        print(f"Y-axis: Min = {min_y:.2f}, Max = {max_y:.2f}")

        print("Assigning coordinates to nodes...")
        for node in G.nodes:
            G.nodes[node]["x"] = node_xy_dict[node][0]
            G.nodes[node]["y"] = node_xy_dict[node][1]

        end_time = time.time()
        total_time = end_time - start_time
        print(f"Layout process completed in {total_time:.2f} seconds.")

        return G, pos


# Usage example:
# g = ... # your graph object
# G, pos = LayoutUtility.fr_layout_nx(g)

## Z Coordinate Adder

adds a z coodrinate bases on the centrality of the node


In [7]:
class ZCoordinateAdder:
    def __init__(self, g, scale_factor=0.15):
        self.g = g
        self.scale_factor = scale_factor

    def add_z_coordinate_to_nodes(self):
        """
        Add a z-coordinate to the nodes of the graph based on their centrality values.

        Args:
            g (nx.Graph): The input graph.
            scale_factor (float): The scaling factor for the z-coordinates. (they should not be as spread out as x and y)

        Returns:
            nx.Graph: The graph with the z-coordinate added to the nodes.
        """
        # Calculate the bounds of x and y coordinates
        # Assuming self.g is a NetworkX graph
        xvalues = [attributes["x"] for _, attributes in self.g.nodes(data=True)]
        yvalues = [attributes["y"] for _, attributes in self.g.nodes(data=True)]
        min_x, max_x = min(xvalues), max(xvalues)
        min_y, max_y = min(yvalues), max(yvalues)

        print("Bounds of the layout:")
        print(f"Min x: {min_x}, Max x: {max_x}")
        print(f"Min y: {min_y}, Max y: {max_y}")

        # Extract centrality values from nodes
        centralities = np.array(
            [self.g.nodes[node]["centrality"] for node in self.g.nodes]
        )

        # Normalize centrality values to range [0, 1]
        centrality_min = centralities.min()
        centrality_max = centralities.max()
        centralities_normalized = (centralities - centrality_min) / (
            centrality_max - centrality_min
        )

        # Adjust normalized centrality values to range [-1, 1]
        centralities_adjusted = centralities_normalized * 2 - 1

        # Scale down the z-values to make them less pronounced
        z_coordinates = centralities_adjusted * self.scale_factor

        # Add z-coordinate to nodes
        for i, node in enumerate(self.g.nodes):
            self.g.nodes[node]["z"] = z_coordinates[i]

        # Describe the distribution of z values
        print("Description of the Z coordinate values:")
        print(pd.Series(z_coordinates).describe())

        print("Z coordinate added to nodes")
        return self.g

## Pruning and 2D Bundling


In [45]:
class GraphBundler2d:
    """
    A class for bundling edges in graphs using the Hammer Bundle algorithm.

    This class supports both igraph and NetworkX graph objects as input.
    """

    def __init__(
        self,
        graph: Union[ig.Graph, nx.Graph],
        pruning_weight_percentile: float = 50,
        bundle_kwargs: Optional[Dict] = None,
    ):
        """
        Initialize the GraphBundler.

        Args:
            graph (Union[ig.Graph, nx.Graph]): The input graph.
            pruning_weight_percentile (float): The percentile to use for pruning edges (default is 50).
            bundle_kwargs (Optional[Dict]): Optional parameters for the bundling algorithm.
        """
        self.graph = self._ensure_igraph(graph)
        self.pruning_weight_percentile = pruning_weight_percentile
        self.bundle_kwargs = bundle_kwargs or {
            "decay": 0.90,
            "initial_bandwidth": 0.10,
            "iterations": 15,
            "include_edge_id": True,
        }
        self.bundled_edges = None

    def _ensure_igraph(self, graph: Union[ig.Graph, nx.Graph]) -> ig.Graph:
        """
        Ensure the input graph is an igraph object.

        Args:
            graph (Union[ig.Graph, nx.Graph]): The input graph.

        Returns:
            ig.Graph: The graph as an igraph object.

        Raises:
            ValueError: If the input is neither an igraph nor a NetworkX graph object.
        """
        if isinstance(graph, ig.Graph):
            return graph
        if isinstance(graph, nx.Graph):
            return ig.Graph.from_networkx(graph)
        raise ValueError(
            "Input graph must be either an igraph or NetworkX graph object."
        )

    def prune_edges_by_percentile_weight(
        self, g: ig.Graph, percentile: float
    ) -> ig.Graph:
        """
        Remove edges from the graph that have weight less than or equal to the specified percentile weight.

        Args:
            g (ig.Graph): The input graph. Must have a 'weight' attribute for edges.
            percentile (float): The percentile to use as the threshold for pruning edges.

        Returns:
            ig.Graph: A new graph with edges removed based on the specified percentile.

        Raises:
            ValueError: If the input graph has no 'weight' attribute for edges.
        """
        # Check if 'weight' attribute exists
        if "weight" not in g.es.attributes():
            raise ValueError("Input graph must have a 'weight' attribute for edges.")

        # Get initial number of edges and isolates
        initial_edge_count = g.ecount()
        initial_isolates = len(g.vs.select(_degree=0))

        # Get all weights and calculate the specified percentile
        weights = g.es["weight"]
        weight_threshold = np.percentile(weights, percentile)

        # Identify edges to keep
        edges_to_keep = [
            edge.index for edge in g.es if edge["weight"] > weight_threshold
        ]
        threshold_edges = [
            edge.index for edge in g.es if edge["weight"] == weight_threshold
        ]

        # Randomly select from threshold edges to reach target number of edges
        target_edge_count = int(initial_edge_count * (1 - percentile / 100))
        edges_to_add = target_edge_count - len(edges_to_keep)
        if edges_to_add > 0:
            random.shuffle(threshold_edges)
            edges_to_keep.extend(threshold_edges[:edges_to_add])

        # Create a new graph with only the selected edges
        g_pruned = g.subgraph_edges(edges_to_keep, delete_vertices=False)

        # Get final number of edges and isolates
        final_edge_count = g_pruned.ecount()
        final_isolates = len(g_pruned.vs.select(_degree=0))

        # Print results
        print(f"Pruning edges by weight percentile: {percentile}%")
        print("-" * 20)
        print(f"Number of edges before: {initial_edge_count}")
        print(f"Number of edges after: {final_edge_count}")
        print(f"Number of isolates before: {initial_isolates}")
        print(f"Number of isolates after: {final_isolates}")

        return g_pruned

    def bundle_edges(self) -> Optional[pd.DataFrame]:
        """
        Perform edge bundling on the graph.

        Returns:
            Optional[pd.DataFrame]: A DataFrame containing the bundled edges,
            or None if an error occurs.
        """
        g_pruned = self.prune_edges_by_percentile_weight(
            self.graph, self.pruning_weight_percentile
        )
        self.graph = g_pruned

        print("Starting edge bundling process...")

        try:
            df_nodes = pd.DataFrame(
                {
                    "x": self.graph.vs["x"],
                    "y": self.graph.vs["y"],
                    "z": self.graph.vs["z"],
                    "cluster": self.graph.vs["cluster"],
                }
            )
            edges_df = pd.DataFrame(
                {
                    "source": [e.source for e in self.graph.es],
                    "target": [e.target for e in self.graph.es],
                    "edge_id": self.graph.es["edge_id"],
                    "weight": self.graph.es["weight"],
                }
            )
            bundled_edges = hammer_bundle(df_nodes, edges_df, **self.bundle_kwargs)
            bundled_edges = pd.DataFrame(
                bundled_edges, columns=["x", "y", "edge_id", "weight"]
            )
            self.bundled_edges = self._group_bundled_edges(bundled_edges)
            return self.bundled_edges, g_pruned
        except Exception as e:
            print(f"An error occurred during edge bundling: {e}")
            return None

    def _group_bundled_edges(self, bundled_edges: pd.DataFrame) -> pd.DataFrame:
        """
        Group the bundled edges by edge_id and include source and target positions.

        Args:
            bundled_edges (pd.DataFrame): DataFrame containing the bundled edges.

        Returns:
            pd.DataFrame: A DataFrame with grouped bundled edges including source and target positions.
        """

        def _get_node_positions(node_id):
            return {
                "x": self.graph.vs[node_id]["x"],
                "y": self.graph.vs[node_id]["y"],
                "z": self.graph.vs[node_id]["z"],
            }

        grouped = bundled_edges.groupby("edge_id")
        result = pd.DataFrame(
            {
                "source": [
                    self.graph.es.find(edge_id=eid).source for eid in grouped.groups
                ],
                "target": [
                    self.graph.es.find(edge_id=eid).target for eid in grouped.groups
                ],
                "x": [group["x"].values for _, group in grouped],
                "y": [group["y"].values for _, group in grouped],
                "weight": grouped["weight"].first(),
            },
            index=grouped.groups.keys(),
        )

        # Convert the DataFrame's index into a column named 'edge_id'
        result.reset_index(inplace=True)
        result.rename(columns={"index": "edge_id"}, inplace=True)

        # Add source and target positions
        result["source_position"] = result["source"].apply(_get_node_positions)
        result["target_position"] = result["target"].apply(_get_node_positions)

        # change source and target values to respective the node_id
        result["source"] = result["source"].apply(lambda x: self.graph.vs[x]["node_id"])
        result["target"] = result["target"].apply(lambda x: self.graph.vs[x]["node_id"])

        return result

## Interpolation of Z coordinates

add z coordinates to the positions of bundled edges using cubic spline interpolation.


In [9]:
import numpy as np
import pandas as pd


class SimpleEdgeZInterpolator:
    def __init__(self, bundled_edges):
        self.bundled_edges = bundled_edges

    def assign_same_z_coordinate_to_all_edge_points(self):
        # Calculate minimum z value more efficiently
        min_z = min(
            self.bundled_edges["source_position"].apply(lambda x: x["z"]).min(),
            self.bundled_edges["target_position"].apply(lambda x: x["z"]).min(),
        )
        print("Minimum z value:", min_z)

        # Vectorized operation to assign z values
        self.bundled_edges["z"] = self.bundled_edges.apply(
            lambda row: self._assign_z(row, min_z), axis=1
        )

        # Reorder columns
        column_order = [
            "edge_id",
            "source",
            "target",
            "x",
            "y",
            "z",
            "weight",
            "source_position",
            "target_position",
        ]
        self.bundled_edges = self.bundled_edges.reindex(columns=column_order)

        return self.bundled_edges

    @staticmethod
    def _assign_z(row, min_z):
        z_values = np.full(len(row["x"]), min_z)
        z_values[0] = row["source_position"]["z"]
        z_values[-1] = row["target_position"]["z"]
        return z_values.tolist()


# example usage
# bundled_edges = ... # your bundled edges DataFrame
# interpolator = SimpleEdgeZInterpolator(bundled_edges)
# bundled_edges = interpolator.assign_same_z_coordinate_to_all_edge_points()

In [10]:
class EdgeZInterpolator:
    def __init__(self, pruned_bundled_edges_2d, graph, centrality_scale_factor=0.2):
        """
        Initialize the EdgeZInterpolator.

        Args:
            pruned_bundled_edges_2d (pd.DataFrame): The DataFrame containing bundled edges.
            graph (ig.Graph): The original graph object.
        """
        self.pruned_bundled_edges_2d = pruned_bundled_edges_2d
        self.graph = graph
        self.adjusted_edges_3d = None

    def interpolate_z_to_edges(self):
        """
        Interpolate z-coordinates for each edge in the pruned_bundled_edges_2d DataFrame.
        """
        self.adjusted_edges_3d = self.pruned_bundled_edges_2d.apply(
            self._interpolate_z, axis=1
        )
        self.adjusted_edges_3d = pd.concat(
            [self.pruned_bundled_edges_2d, self.adjusted_edges_3d], axis=1
        )
        print("Initial Z Coordinates added to edges")
        return self.adjusted_edges_3d

    def _interpolate_z(self, row):
        """
        Interpolate z-coordinates for a single edge.

        Args:
            row (pd.Series): A pandas Series representing a single row in the pruned_bundled_edges_2d DataFrame.

        Returns:
            pd.Series: A pandas Series containing the x, y, and interpolated z-coordinates.
        """
        x = np.array(row["x"])
        y = np.array(row["y"])

        source_z = row["source_position"]["z"]
        target_z = row["target_position"]["z"]

        distances = np.sqrt(np.diff(x) ** 2 + np.diff(y) ** 2)
        cumulative_distances = np.cumsum(distances)

        if cumulative_distances.size == 0 or cumulative_distances[-1] == 0:
            num_points = len(x)
            z = np.linspace(source_z, target_z, num_points)
        else:
            t = np.insert(cumulative_distances, 0, 0) / cumulative_distances[-1]
            cs = CubicSpline([0, 1], [source_z, target_z])
            z = cs(t)

        return pd.Series(
            {"interpolated_x": x, "interpolated_y": y, "interpolated_z": z}
        )

## 3D Bundling


In [11]:
class Apply3DEdgeBundling:
    """Applies 3D edge bundling to a graph's edges.
    bundled_edges: DataFrame containing edge data with interpolated 3D coordinates.
    bundling_iterations: Number of iterations for the bundling algorithm.More iterations result in more bundling.
    step_size: Controls the magnitude of force application in each iteration. Smaller steps provide more stable bundling but require more iterations for the same effect.
    compatibility_threshold: Threshold for determining edge compatibility (not used in current implementation). High: More strict compatibility, fewer edges are bundled together.
    smoothing_iterations: Number of smoothing passes applied to each edge. High: More smoothing iterations create smoother curves but may lose some detail. Low: Fewer smoothing iterations preserve more original path details but may result in jagged edges.
    neighbor_radius: Radius for finding neighboring points, 'auto' for automatic inference. High: Larger radius considers more distant points, potentially leading to more global bundling. Low: Smaller radius only considers nearby points, resulting in more local bundling.
    radius_multiplier: Used to adjust the automatically inferred radius. High: Increases the automatically inferred neighbor radius, considering more distant points. Low: Decreases the automatically inferred neighbor radius, focusing on more local interactions.
    n_jobs: Number of CPU cores to use for parallel processing.
    points: Array of 3D coordinates for all edge points.
    neighbor_indices: Array of indices of neighboring points for each point.
    neighbor_counts: Array of neighbor counts for each point.
    point_to_edge: Array mapping each point to its corresponding edge index.
    """

    def __init__(
        self,
        bundled_edges,
        bundling_iterations=20,
        step_size=0.3,
        compatibility_threshold=0.3,
        smoothing_iterations=5,
        neighbor_radius="auto",
        radius_multiplier=0.2,
    ):
        self.bundled_edges = bundled_edges
        self.bundling_iterations = bundling_iterations
        self.step_size = step_size
        self.compatibility_threshold = compatibility_threshold
        self.smoothing_iterations = smoothing_iterations
        self.neighbor_radius = neighbor_radius
        self.radius_multiplier = radius_multiplier
        self.n_jobs = min(4, max(1, cpu_count() - 2))  # Limit to 4 processes
        self.adjusted_edges = None

        # Set up logging
        logging.basicConfig(
            level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
        )
        self.logger = logging.getLogger(__name__)

    def _infer_neighbor_radius(self, points):
        try:
            min_coords = np.min(points, axis=0)
            max_coords = np.max(points, axis=0)
            diagonal = np.linalg.norm(max_coords - min_coords)

            base_percentage = 0.05
            point_count_factor = np.log10(len(points)) / 10
            adjusted_percentage = base_percentage / (1 + point_count_factor)

            inferred_radius = diagonal * adjusted_percentage * self.radius_multiplier

            self.logger.info(f"Diagonal of bounding box: {diagonal}")
            self.logger.info(f"Adjusted percentage: {adjusted_percentage:.6f}")
            self.logger.info(f"Inferred radius: {inferred_radius:.6f}")

            return inferred_radius
        except Exception as e:
            self.logger.error(f"Error in _infer_neighbor_radius: {str(e)}")
            raise

    @staticmethod
    @jit(nopython=True, parallel=True)
    def _apply_forces(
        points, neighbor_indices, neighbor_counts, point_to_edge, step_size
    ):
        new_points = points.copy()
        for i in prange(1, len(points) - 1):
            edge_index = point_to_edge[i]
            force = np.zeros(3)
            count = 0
            for j in range(neighbor_counts[i]):
                n = neighbor_indices[i, j]
                if n != -1 and point_to_edge[n] != edge_index:
                    direction = points[n] - points[i]
                    distance = np.linalg.norm(direction)
                    if distance > 0:
                        force += direction / distance
                    count += 1
            if count > 0:
                force_magnitude = np.linalg.norm(force)
                if force_magnitude > 0:
                    new_points[i] += step_size * (force / force_magnitude)
        return new_points

    @staticmethod
    @jit(nopython=True)
    def _smooth_edge(edge_points, smoothing_iterations):
        for _ in range(smoothing_iterations):
            new_points = edge_points.copy()
            new_points[1:-1] = 0.5 * edge_points[1:-1] + 0.25 * (
                edge_points[:-2] + edge_points[2:]
            )
            edge_points = new_points
        return edge_points

    def apply_3d_bundling(self):
        try:
            self.adjusted_edges = self.bundled_edges.copy()

            # Convert interpolated x, y, z to points
            self.adjusted_edges["points"] = self.adjusted_edges.apply(
                lambda row: np.column_stack(
                    (
                        row["interpolated_x"],
                        row["interpolated_y"],
                        row["interpolated_z"],
                    )
                ),
                axis=1,
            )

            all_points = np.vstack(self.adjusted_edges["points"].values)

            point_to_edge = np.repeat(
                np.arange(len(self.adjusted_edges)),
                self.adjusted_edges["points"].apply(len),
            )

            if self.neighbor_radius == "auto":
                self.neighbor_radius = self._infer_neighbor_radius(all_points)
                self.logger.info(
                    f"Inferred neighbor radius: {self.neighbor_radius:.4f}"
                )

            tree = cKDTree(all_points)
            neighbors_list = tree.query_ball_point(all_points, r=self.neighbor_radius)

            max_neighbors = max(len(n) for n in neighbors_list)
            neighbor_indices = np.full(
                (len(all_points), max_neighbors), -1, dtype=np.int64
            )
            neighbor_counts = np.zeros(len(all_points), dtype=np.int64)

            for i, neighbors in enumerate(neighbors_list):
                neighbor_counts[i] = len(neighbors)
                neighbor_indices[i, : len(neighbors)] = neighbors

            self.logger.info(
                f"Starting 3D edge bundling with {self.bundling_iterations} iterations"
            )
            self.logger.info(f"Using {self.n_jobs} CPU cores for parallel processing")

            with tqdm(
                total=self.bundling_iterations, desc="3D Bundling Progress"
            ) as pbar:
                for iteration in range(self.bundling_iterations):
                    iteration_start_time = time.time()

                    all_points = self._apply_forces(
                        all_points,
                        neighbor_indices,
                        neighbor_counts,
                        point_to_edge,
                        self.step_size,
                    )

                    edge_points = np.split(
                        all_points,
                        np.cumsum(self.adjusted_edges["points"].apply(len))[:-1],
                    )
                    with Pool(self.n_jobs) as pool:
                        smoothed_edges = pool.starmap(
                            self._smooth_edge,
                            [(edge, self.smoothing_iterations) for edge in edge_points],
                        )
                    all_points = np.concatenate(smoothed_edges)

                    iteration_time = time.time() - iteration_start_time
                    self.logger.info(
                        f"Iteration {iteration + 1}/{self.bundling_iterations} completed in {iteration_time:.2f}s"
                    )
                    pbar.update(1)

            # Create new columns for bundled coordinates
            self.adjusted_edges["bundled_x"] = None
            self.adjusted_edges["bundled_y"] = None
            self.adjusted_edges["bundled_z"] = None

            # Update the adjusted_edges with new bundled coordinates
            start = 0
            for i, length in enumerate(self.adjusted_edges["points"].apply(len)):
                self.adjusted_edges.at[i, "bundled_x"] = all_points[
                    start : start + length, 0
                ].tolist()
                self.adjusted_edges.at[i, "bundled_y"] = all_points[
                    start : start + length, 1
                ].tolist()
                self.adjusted_edges.at[i, "bundled_z"] = all_points[
                    start : start + length, 2
                ].tolist()
                start += length

            self.logger.info("3D edge bundling applied successfully")
        except Exception as e:
            self.logger.error(f"Error in apply_3d_bundling: {str(e)}")
            raise

    def get_bundled_edges(self):
        return self.adjusted_edges

## EdgesSaver


In [81]:
class EdgesSaver:
    """
    A utility class for saving graph data to JSON format, particularly for use in JavaScript applications.
    """

    @staticmethod
    def add_color_bool_to_edges(
        bundled_edges_3d: pd.DataFrame, g: ig.Graph
    ) -> pd.DataFrame:
        """
        Add color boolean if source and target node are of the same cluster.
        """
        color_bool = []

        for _, row in bundled_edges_3d.iterrows():
            source_cluster = g.vs.find(node_id=row["source"])["cluster"]
            target_cluster = g.vs.find(node_id=row["target"])["cluster"]
            if source_cluster == target_cluster:
                color_bool.append(True)
            else:
                color_bool.append(False)

        bundled_edges_3d["color"] = color_bool

        print(
            f"{sum(color_bool)} out of {len(bundled_edges_3d)} edges have the same source and target cluster."
        )
        return bundled_edges_3d

    @staticmethod
    def inspect_nan_edges(bundled_edges_3d: pd.DataFrame, x_col, y_col, z_col):
        """
        Inspect and print information about edges containing NaN values.
        Args:
            bundled_edges_3d (pd.DataFrame): DataFrame containing adjusted edge data.
        """

        # Function to check if any element in a list is NaN
        def has_nan(lst):
            return any(pd.isna(x) for x in lst)

        # Filter rows where any of bundled_x, bundled_y, or bundled_z contains a NaN
        nan_edges = bundled_edges_3d[
            bundled_edges_3d[x_col].apply(has_nan)
            | bundled_edges_3d[y_col].apply(has_nan)
            | bundled_edges_3d[z_col].apply(has_nan)
            | pd.isna(bundled_edges_3d["weight"])
        ]

        if nan_edges.empty:
            print("No edges with NaN values found.")
        else:
            print(f"Found {len(nan_edges)} edges with NaN values:")
            for idx, edge in nan_edges.iterrows():
                print(f"Edge ID: {edge['edge_id']}")
                print(f"  Source: {edge['source']}, Target: {edge['target']}")
                print(f"  Weight: {edge['weight']}")
                print("  NaN positions:")
                for i, (x, y, z) in enumerate(
                    zip(edge["x"], edge["y"], edge["bundled_z"])
                ):
                    if pd.isna(x) or pd.isna(y) or pd.isna(z):
                        print(f"    Point {i}: x={x}, y={y}, z={z}")
                print()

    @staticmethod
    def prepare_edges_for_js(
        bundled_edges_3d: pd.DataFrame, x_col, y_col, z_col
    ) -> List[Dict]:
        """
        Prepare adjusted edges data for efficient use in JavaScript.
        Args:
            bundled_edges_3d (pd.DataFrame): DataFrame containing adjusted edge data.
        Returns:
            List[Dict]: List of edge objects ready for JSON serialization.
        Raises:
            ValueError: If bundled_edges_3d is None.
        """
        if bundled_edges_3d is None:
            raise ValueError("Adjusted edges data is not available.")

        # First, inspect edges with NaN values
        EdgesSaver.inspect_nan_edges(bundled_edges_3d, x_col, y_col, z_col)

        # Then proceed with the rest of the method
        edges_for_js = []
        for _, edge in bundled_edges_3d.iterrows():
            edge_object = {
                "id": int(edge["edge_id"]),
                "source": int(edge["source"]),
                "target": int(edge["target"]),
                "weight": (
                    float(edge["weight"]) if not pd.isna(edge["weight"]) else None
                ),
                "colored": bool(edge["color"]) if "color" in edge else False,
                "points": [
                    {"x": float(x), "y": float(y), "z": float(z)}
                    for x, y, z in zip(edge[x_col], edge[y_col], edge[z_col])
                    if not (pd.isna(x) or pd.isna(y) or pd.isna(z))
                ],
            }
            edges_for_js.append(edge_object)
        return edges_for_js

    @staticmethod
    def save_edges_for_js(
        bundled_edges_3d: pd.DataFrame,
        output_files: Union[str, List[str]],
        add_color_bool: bool = False,
        g: ig.Graph = None,
        return_json: bool = False,
        x_col: str = "x",
        y_col: str = "y",
        z_col: str = "z",
    ) -> Optional[List[Dict]]:
        """
        Save adjusted edges to one or more JSON files optimized for JavaScript use.
        Args:
            bundled_edges_3d (pd.DataFrame): DataFrame containing adjusted edge data.
            output_files (Union[str, List[str]]): Path or list of paths to the output JSON file(s).
            add_color_bool (bool): If True, add color boolean based on cluster information.
            g (ig.Graph): Graph object required if add_color_bool is True.
            return_json (bool): If True, return the JSON data as well as saving it.
        Returns:
            Optional[List[Dict]]: List of edge objects if return_json is True, else None.
        """
        if add_color_bool:
            if not g:
                raise ValueError("Graph object is required to add color boolean.")
            bundled_edges_3d = EdgesSaver.add_color_bool_to_edges(bundled_edges_3d, g)

        edges_data = EdgesSaver.prepare_edges_for_js(
            bundled_edges_3d, x_col, y_col, z_col
        )

        # Convert single path to list for consistent processing
        if isinstance(output_files, str):
            output_files = [output_files]

        # Save to all specified paths
        for output_file in output_files:
            with open(output_file, "w") as f:
                json.dump(edges_data, f)
            print(f"Edges data saved to {output_file}")

        return edges_data if return_json else None

## Nodes Saver


In [82]:
import json
import igraph as ig
from typing import List, Dict, Optional, Union


class NodesSaver:
    """
    A utility class for saving graph data to JSON format, particularly for use in JavaScript applications.
    """

    @staticmethod
    def save_igraph_nodes_to_json(
        g: ig.Graph,
        paths: Union[str, List[str]],
        return_json: bool = False,
        attributes: List[str] = None,
    ) -> Optional[List[Dict]]:
        """
        Save the igraph nodes to one or more JSON files.
        Args:
            g (ig.Graph): The input graph.
            paths (Union[str, List[str]]): Path or list of paths to save the JSON file(s).
            return_json (bool): If True, return the JSON data as well as saving it.
            attributes (List[str]): List of node attributes to include in the JSON.
        Returns:
            Optional[List[Dict]]: List of node dictionaries if return_json is True, else None.
        Raises:
            ValueError: If a specified attribute is missing from a node.
        """
        if attributes is None:
            attributes = [
                "node_id",
                "node_name",
                "doi",
                "year",
                "title",
                "cluster",
                "centrality",
                "x",
                "y",
                "z",
            ]

        # Fix encoding of titles
        g.vs["title"] = [NodesSaver.fix_encoding(title) for title in g.vs["title"]]

        nodes_json = []
        for node in g.vs:
            if not all(attr in node.attributes() for attr in attributes):
                raise ValueError(f"Missing attribute in node: {node.attributes()}")
            node_dict = {attr: node[attr] for attr in attributes}
            nodes_json.append(node_dict)

        # Convert single path to list for consistent processing
        if isinstance(paths, str):
            paths = [paths]

        # Save to all specified paths
        for path in paths:
            with open(path, "w") as f:
                json.dump(nodes_json, f)
            print(f"Graph nodes saved to {path}")

        return nodes_json if return_json else None

    @staticmethod
    def fix_encoding(title: str) -> str:
        """
        Fix the encoding of a string.
        Args:
            title (str): The input string to fix.
        Returns:
            str: The fixed string.
        """
        try:
            decoded_title = title.encode("utf-8").decode("unicode_escape")
            return decoded_title.encode("latin1").decode("utf-8")
        except UnicodeEncodeError:
            # If the above method fails, return the original title
            return title

## VisualizationUtility


In [14]:
class VisualizationUtility:
    @staticmethod
    def plot_graph_with_bundled_edges(g, bundled_edges, **kwargs):
        """
        Plot the graph with bundled edges.
        Args:
        g (igraph.Graph or networkx.Graph): The graph object containing node positions and cluster information.
        bundled_edges (pd.DataFrame): DataFrame containing the bundled edge coordinates.
        **kwargs: Additional keyword arguments for customizing the plot.
            figsize (tuple): Figure size in inches. Default is (10, 10).
            node_size (int): Size of the nodes in the scatter plot. Default is 10.
            edge_alpha (float): Alpha (transparency) of the edges. Default is 0.2.
            edge_width (float): Width of the edge lines. Default is 0.2.
            node_alpha (float): Alpha (transparency) of the nodes. Default is 0.7.
            edge_color (str): Color of the edges. Default is "black".
            cmap (str): Colormap for the nodes. Default is "tab20".
        Returns:
        None: Displays the plot.
        """
        # Default values
        defaults = {
            "figsize": (10, 10),
            "node_size": 10,
            "edge_alpha": 0.2,
            "edge_width": 0.2,
            "node_alpha": 0.7,
            "edge_color": "black",
            "cmap": "tab20",
        }
        # transform the graph if not igraph
        if not isinstance(g, ig.Graph):
            g = ig.Graph.from_networkx(g)
            print("Converted to igraph Graph")

        # Update defaults with any provided kwargs
        defaults.update(kwargs)

        plt.figure(figsize=defaults["figsize"])

        # Plot edges
        plt.plot(
            bundled_edges["x"],
            bundled_edges["y"],
            color=defaults["edge_color"],
            alpha=defaults["edge_alpha"],
            linewidth=defaults["edge_width"],
        )

        # Get unique clusters and map them to consecutive integers
        unique_clusters = sorted(set(g.vs["cluster"]))
        cluster_map = {c: i for i, c in enumerate(unique_clusters)}

        # Map cluster values to consecutive integers
        cluster_colors = [cluster_map[c] for c in g.vs["cluster"]]

        # Create a custom colormap
        cmap = plt.get_cmap(defaults["cmap"])
        n_colors = len(unique_clusters)
        custom_cmap = cmap(np.linspace(0, 1, n_colors))

        # Plot nodes
        scatter = plt.scatter(
            g.vs["x"],
            g.vs["y"],
            s=defaults["node_size"],
            c=cluster_colors,
            cmap=cmap,
            alpha=defaults["node_alpha"],
        )

        # Add a colorbar
        # plt.colorbar(scatter, label="Cluster", ticks=range(len(unique_clusters)))
        # plt.clim(-0.5, len(unique_clusters) - 0.5)

        plt.axis("off")
        plt.tight_layout()
        plt.show()
        # Draw the network

# FULL RUN

1. read graph
2. layout
3. prune edges
4. bundle edges
5. visualize
6. adjust for 3d plotting
   1. add z coordinate to nodes
   2. add z coordinate to bundled edges
7. use extra 3d bundling step
8. save nodes and edges for 3d plotting


## 2D Steps


In [11]:
g = GraphReader.read_and_clean_graph(INPUT_GRAPH_PATH)

cluster_list = list(range(40, 51))

# subset to only cluster 0 to 100
g = GraphReader.subgraph_of_clusters(g, cluster_list)

total_nodes = len(g.vs)
################################################################################################
layout_params = {
    # "k": 0.5, # distance between nodes; best to leave it to algo
    "iterations": 20,  # (default=50) use 100
    "threshold": 0.0001,  # default 0.0001
    "weight": "weight",
    "scale": 1,
    "center": (0, 0),
    "dim": 2,
    "seed": 1887,
}

g_fr, pos = LayoutUtility.fr_layout_nx(g, layout_params)

print("#" * 100)
print("Layout done")
print("#" * 100)

################################################################################################

g_fr_z = ZCoordinateAdder(g_fr, scale_factor=0.15).add_z_coordinate_to_nodes()
print("#" * 100)
print("Z coordinate added to nodes")
print("#" * 100)
################################################################################################
bundle_kwargs = {
    "decay": 0.90,
    "initial_bandwidth": 0.10,
    "iterations": 15,
    "include_edge_id": True,
}

bundler = GraphBundler2d(
    g_fr_z, pruning_weight_percentile=75, bundle_kwargs=bundle_kwargs
)
pruned_bundled_edges_2d, g_fr_z_bundled_pruned = bundler.bundle_edges()

print("#" * 100)
print("Edge bundling done")
print("#" * 100)

################################################################################################
# if total_nodes < 5000:
#    VisualizationUtility.plot_graph_with_bundled_edges(
#        g_fr_z_bundled_pruned, pruned_bundled_edges_2d
#    )

################################################################################################


# interpolator = EdgeZInterpolator(pruned_bundled_edges_2d, g_fr_z_bundled_pruned)
# adjusted_edges_3d = interpolator.interpolate_z_to_edges()

print("#" * 100)
print("Z interpolation done")
print("#" * 100)
################################################################################################

  g = ig.Graph.Read_GraphML(path)


Node Attributes: ['doi', 'year', 'title', 'cluster', 'node_id', 'node_name', 'centrality']
Edge Attributes: ['weight', 'edge_id']
Number of nodes: 40643
Number of edges: 602779
Starting Fruchterman-Reingold layout process...
Layout parameters: {'iterations': 20, 'threshold': 0.0001, 'weight': 'weight', 'scale': 1, 'center': (0, 0), 'dim': 2, 'seed': 1887}
Converting to NetworkX Graph...
Conversion complete.
Graph has 3744 nodes and 35279 edges.
Calculating layout...
Layout calculation completed in 11.20 seconds.
Processing layout results...
Layout boundaries:
X-axis: Min = -1.00, Max = 0.97
Y-axis: Min = -0.89, Max = 0.87
Assigning coordinates to nodes...
Layout process completed in 11.43 seconds.
####################################################################################################
Layout done
####################################################################################################
Bounds of the layout:
Min x: -1.0, Max x: 0.9690223336219788
Min y: -0.89410936

NameError: name 'EdgeZInterpolator' is not defined

In [13]:
adjusted_edges_3d.head(2)

Unnamed: 0,edge_id,source,target,x,y,weight,source_position,target_position,interpolated_x,interpolated_y,interpolated_z
0,50.0,0,1,"[0.3307619094848633, 0.34947332739830017]","[0.19845418632030487, 0.30729520320892334]",0.663864,"{'x': 0.3307619094848633, 'y': 0.1984541863203...","{'x': 0.34947332739830017, 'y': 0.307295203208...","[0.3307619094848633, 0.34947332739830017]","[0.19845418632030487, 0.30729520320892334]","[-0.13823016870468868, -0.1340834649785878]"
1,102.0,2,35,"[-0.4124625027179718, -0.39503922612022846, -0...","[-0.4567071497440338, -0.40076967835872734, -0...",0.80623,"{'x': -0.4124625027179718, 'y': -0.45670714974...","{'x': -0.36125415563583374, 'y': -0.4384048879...","[-0.4124625027179718, -0.39503922612022846, -0...","[-0.4567071497440338, -0.40076967835872734, -0...","[-0.1310213731838211, -0.13087355803861248, -0..."


## 3D edge bundling


In [14]:
# Usage
params = {
    "bundling_iterations": 10,
    "step_size": 0.3,
    "compatibility_threshold": 0.6,
    "smoothing_iterations": 5,
    "neighbor_radius": "auto",
    "radius_multiplier": 0.2,
}

try:
    bundler_3d = Apply3DEdgeBundling(adjusted_edges_3d, **params)
    bundler_3d.apply_3d_bundling()
    bundled_edges_3d = bundler_3d.get_bundled_edges()
except Exception as e:
    logging.error(f"An error occurred during 3D bundling: {str(e)}")

2024-07-24 14:49:36,102 - INFO - Diagonal of bounding box: 2.6338850009501926
2024-07-24 14:49:36,102 - INFO - Adjusted percentage: 0.031992
2024-07-24 14:49:36,102 - INFO - Inferred radius: 0.016853
2024-07-24 14:49:36,103 - INFO - Inferred neighbor radius: 0.0169
2024-07-24 14:50:20,344 - INFO - Starting 3D edge bundling with 10 iterations
2024-07-24 14:50:20,349 - INFO - Using 4 CPU cores for parallel processing
3D Bundling Progress:   0%|          | 0/10 [00:00<?, ?it/s]2024-07-24 14:51:08,626 - INFO - Iteration 1/10 completed in 48.19s
3D Bundling Progress:  10%|█         | 1/10 [00:48<07:13, 48.19s/it]2024-07-24 14:51:40,528 - INFO - Iteration 2/10 completed in 31.90s
3D Bundling Progress:  20%|██        | 2/10 [01:20<05:08, 38.61s/it]2024-07-24 14:52:17,253 - INFO - Iteration 3/10 completed in 36.72s
3D Bundling Progress:  30%|███       | 3/10 [01:56<04:24, 37.75s/it]2024-07-24 14:52:52,526 - INFO - Iteration 4/10 completed in 35.27s
3D Bundling Progress:  40%|████      | 4/10 [

In [178]:
# Save to multiple paths
nodes_json = NodesSaver.save_igraph_nodes_to_json(
    g_fr_z_bundled_pruned,
    [
        OUTPUT_DIR + f"nodes_3d_clusters{min(cluster_list)}to{max(cluster_list)}.json",
        THREEJS_OUTPUT_DIR
        + f"nodes_3d_clusters{min(cluster_list)}to{max(cluster_list)}.json",
    ],
    return_json=True,
)

# print first 2 nodes
nodes_json[:2]

Graph nodes saved to ../data/99-testdata/nodes_3d_clusters45to50.json
Graph nodes saved to /Users/jlq293/Projects/Random Projects/LW-ThreeJS/2d_ssrinetworkviz/src/data/nodes_3d_clusters45to50.json


[{'node_id': 6,
  'node_name': 'Horita_1982',
  'doi': '',
  'year': 1982,
  'title': 'Centrally administered thyrotropin-releasing hormone (TRH) stimulates colonic transit and diarrhea production by a vagally mediated serotonergic mechanism in the rabbit',
  'cluster': 48,
  'centrality': 0.136328538692653,
  'x': 0.6405702233314514,
  'y': -0.3191435635089874,
  'z': -0.10927152627688637},
 {'node_id': 8,
  'node_name': 'Mcelroy_1982',
  'doi': '10.1007/BF00432770',
  'year': 1982,
  'title': 'The effects of fenfluramine and fluoxetine on the acquisition of a conditioned avoidance response in rats',
  'cluster': 50,
  'centrality': 0.0151485334617365,
  'x': 0.5508721470832825,
  'y': -0.4072713851928711,
  'z': -0.14564939253508402}]

In [16]:
edges_json = EdgesSaver.save_edges_for_js(
    bundled_edges_3d,
    [
        OUTPUT_DIR
        + f"bundled_edges_3d_clusters{min(cluster_list)}to{max(cluster_list)}.json",
        THREEJS_OUTPUT_DIR
        + f"bundled_edges_3d_clusters{min(cluster_list)}to{max(cluster_list)}.json",
    ],
    add_color_bool=True,
    g=g_fr_z_bundled_pruned,
    return_json=True,
)

edges_json[99]

31719 out of 33095 edges have the same source and target cluster.
No edges with NaN values found.
Edges data saved to ../data/99-testdata/bundled_edges_3d_clusters20to50.json
Edges data saved to /Users/jlq293/Projects/Random Projects/LW-ThreeJS/2d_ssrinetworkviz/src/data/bundled_edges_3d_clusters20to50.json


{'id': 9282,
 'source': 55,
 'target': 628,
 'weight': 0.809315085411072,
 'colored': True,
 'points': [{'x': 0.1797104785509199,
   'y': 0.08640227269980194,
   'z': -0.11921884171978157},
  {'x': 0.16762777694242395,
   'y': 0.07244703881857949,
   'z': -0.13564365842741866},
  {'x': 0.15903036949242866,
   'y': 0.06342433086241353,
   'z': -0.15484917390021652},
  {'x': 0.15385792925439534,
   'y': 0.061205861293863566,
   'z': -0.1751324490140186},
  {'x': 0.15009907378869408,
   'y': 0.0664796134829248,
   'z': -0.19236211035300899},
  {'x': 0.14717861149056716,
   'y': 0.08213241384961667,
   'z': -0.20430724700450514},
  {'x': 0.14578842711877277,
   'y': 0.11159344556639125,
   'z': -0.21088572610528544},
  {'x': 0.14567282979197266,
   'y': 0.1536696058033668,
   'z': -0.21200172298069808},
  {'x': 0.14628638132851618,
   'y': 0.20140796591181168,
   'z': -0.20667695328035182},
  {'x': 0.1497297300315264,
   'y': 0.24635118507521314,
   'z': -0.19244827127885808},
  {'x': 0.15

# FAILED QUALITY CHECK


In [90]:
correct_int = 0
false_int = 0
correct_bund = 0
false_bund = 0
correct_both = 0
false_both = 0
for i, row in bundled_edges_3d.iterrows():
    s_x = round(row["source_position"]["x"], 7)
    x_int = round(row["interpolated_x"][0], 7)
    x_bund = round(row["bundled_x"][0], 7)
    s_y = round(row["source_position"]["y"], 7)
    y_int = round(row["interpolated_y"][0], 7)
    y_bund = round(row["bundled_y"][0], 7)
    s_z = round(row["source_position"]["z"], 7)
    z_int = round(row["interpolated_z"][0], 7)
    z_bund = round(row["bundled_z"][0], 7)

    if s_x == x_int and s_y == y_int and s_z == z_int:
        correct_int += 1
    else:
        false_int += 1

    if s_x == x_bund and s_y == y_bund and s_z == z_bund:
        correct_bund += 1
    else:
        false_bund += 1

    if (
        s_x == x_int
        and s_y == y_int
        and s_z == z_int
        and s_x == x_bund
        and s_y == y_bund
        and s_z == z_bund
    ):
        correct_both += 1
    else:
        false_both += 1


print("correct_int: ", correct_int)
print("false_int: ", false_int)

print("correct_bund: ", correct_bund)
print("false_bund: ", false_bund)

print("correct_both: ", correct_both)
print("false_both: ", false_both)

correct_int:  33095
false_int:  0
correct_bund:  5810
false_bund:  27285
correct_both:  5810
false_both:  27285


In [7]:
g = GraphReader.read_and_clean_graph(INPUT_GRAPH_PATH)

cluster_list = list(range(45, 51))

# subset to only cluster 0 to 100
g = GraphReader.subgraph_of_clusters(g, cluster_list)

total_nodes = len(g.vs)
################################################################################################
layout_params = {
    # "k": 0.5, # distance between nodes; best to leave it to algo
    "iterations": 20,  # (default=50) use 100
    "threshold": 0.0001,  # default 0.0001
    "weight": "weight",
    "scale": 1,
    "center": (0, 0),
    "dim": 2,
    "seed": 1887,
}

g_fr, pos = LayoutUtility.fr_layout_nx(g, layout_params)

print("#" * 100)
print("Layout done")
print("#" * 100)

################################################################################################

g_fr_z = ZCoordinateAdder(g_fr, scale_factor=0.15).add_z_coordinate_to_nodes()
print("#" * 100)
print("Z coordinate added to nodes")
print("#" * 100)
################################################################################################

bundle_kwargs = {
    "decay": 0.95,
    "initial_bandwidth": 0.05,
    "iterations": 50,
    "include_edge_id": True,
}
bundler = GraphBundler2d(
    g_fr_z, pruning_weight_percentile=25, bundle_kwargs=bundle_kwargs
)
pruned_bundled_edges_2d, g_fr_z_bundled_pruned = bundler.bundle_edges()

print("#" * 100)
print("Edge bundling done")
print("#" * 100)

################################################################################################
# if total_nodes < 5000:
#    VisualizationUtility.plot_graph_with_bundled_edges(
#        g_fr_z_bundled_pruned, pruned_bundled_edges_2d
#    )

################################################################################################
#
#
# interpolator = EdgeZInterpolator(pruned_bundled_edges_2d, g_fr_z_bundled_pruned)
# adjusted_edges_3d = interpolator.interpolate_z_to_edges()
#
# print("#" * 100)
# print("Z interpolation done")
# print("#" * 100)
#################################################################################################

  g = ig.Graph.Read_GraphML(path)


Node Attributes: ['doi', 'year', 'title', 'cluster', 'node_id', 'node_name', 'centrality']
Edge Attributes: ['weight', 'edge_id']
Number of nodes: 40643
Number of edges: 602779
Starting Fruchterman-Reingold layout process...
Layout parameters: {'iterations': 20, 'threshold': 0.0001, 'weight': 'weight', 'scale': 1, 'center': (0, 0), 'dim': 2, 'seed': 1887}
Converting to NetworkX Graph...
Conversion complete.
Graph has 1976 nodes and 17666 edges.
Calculating layout...
Layout calculation completed in 6.90 seconds.
Processing layout results...
Layout boundaries:
X-axis: Min = -0.82, Max = 1.00
Y-axis: Min = -0.88, Max = 0.80
Assigning coordinates to nodes...
Layout process completed in 6.96 seconds.
####################################################################################################
Layout done
####################################################################################################
Bounds of the layout:
Min x: -0.819740355014801, Max x: 1.0
Min y: -0.88231164216

NameError: name 'GraphBundler2d' is not defined

In [None]:
pruned_bundled_edges_2d.head(1)  # , g_fr_z_bundled_pruned

Unnamed: 0,edge_id,source,target,x,y,weight,source_position,target_position
0,149.0,1,280,"[0.5508721470832825, 0.55393686033313, 0.55205...","[-0.4072713851928711, -0.3896731439439783, -0....",0.936576,"{'x': 0.5508721470832825, 'y': -0.407271385192...","{'x': 0.445087194442749, 'y': -0.2985744476318..."


In [284]:
pruned_bundled_edges_2d.source_position.apply(lambda x: x["z"])

0      -0.145649
1      -0.144170
2      -0.120307
3      -0.120307
4      -0.120307
          ...   
4412   -0.130054
4413   -0.034908
4414   -0.034908
4415   -0.000343
4416   -0.090827
Name: source_position, Length: 4417, dtype: float64

In [285]:
import numpy as np
import pandas as pd


class BundleQualityChecker:
    @staticmethod
    def connection_quality_check(
        df,
        source_col="source_position",
        target_col="target_position",
        x_col="x",
        y_col="y",
        z_col="z",
        tolerance=1e-6,
    ):
        """
        Perform quality checks on the bundled edges, accounting for positive and negative coordinates.

        Parameters:
        df (pd.DataFrame): The dataframe containing edge data
        source_col (str): Name of the column containing source node positions
        target_col (str): Name of the column containing target node positions
        x_col (str): Name of the column containing bundled x coordinates
        y_col (str): Name of the column containing bundled y coordinates
        z_col (str): Name of the column containing bundled z coordinates
        tolerance (float): Tolerance for floating point comparisons

        Returns:
        dict: A dictionary containing the results of various checks
        """
        results = {
            "total_edges": len(df),
            "start_point_mismatches": 0,
            "end_point_mismatches": 0,
            "invalid_z_interpolations": 0,
            "high_z_connection_count": df["is_high_z_connection"].sum(),
        }

        for _, row in df.iterrows():
            # Check start point
            if (
                abs(row[x_col][0] - row[source_col]["x"]) > tolerance
                or abs(row[y_col][0] - row[source_col]["y"]) > tolerance
                or abs(row[z_col][0] - row[source_col]["z"]) > tolerance
            ):
                results["start_point_mismatches"] += 1

            # Check end point
            if (
                abs(row[x_col][-1] - row[target_col]["x"]) > tolerance
                or abs(row[y_col][-1] - row[target_col]["y"]) > tolerance
                or abs(row[z_col][-1] - row[target_col]["z"]) > tolerance
            ):
                results["end_point_mismatches"] += 1

            # Check z interpolation
            z_min, z_max = min(row[source_col]["z"], row[target_col]["z"]), max(
                row[source_col]["z"], row[target_col]["z"]
            )
            if any(z < z_min - tolerance or z > z_max + tolerance for z in row[z_col]):
                results["invalid_z_interpolations"] += 1

        # Calculate percentages
        total = results["total_edges"]
        results["start_point_mismatch_percentage"] = (
            results["start_point_mismatches"] / total
        ) * 100
        results["end_point_mismatch_percentage"] = (
            results["end_point_mismatches"] / total
        ) * 100
        results["invalid_z_interpolation_percentage"] = (
            results["invalid_z_interpolations"] / total
        ) * 100
        results["high_z_connection_percentage"] = (
            results["high_z_connection_count"] / total
        ) * 100

        return results

    @staticmethod
    def perform_quality_check(bundled_edges_3d):
        if bundled_edges_3d is None:
            print("No bundled edges available.")
            return None

        results = BundleQualityChecker.connection_quality_check(bundled_edges_3d)

        print("Quality Check Results:")
        print(f"Total edges: {results['total_edges']}")
        print(
            f"Start point mismatches: {results['start_point_mismatches']} ({results['start_point_mismatch_percentage']:.2f}%)"
        )
        print(
            f"End point mismatches: {results['end_point_mismatches']} ({results['end_point_mismatch_percentage']:.2f}%)"
        )
        print(
            f"Invalid z interpolations: {results['invalid_z_interpolations']} ({results['invalid_z_interpolation_percentage']:.2f}%)"
        )
        print(
            f"High z connections: {results['high_z_connection_count']} ({results['high_z_connection_percentage']:.2f}%)"
        )

        return results

    @staticmethod
    def analyze_edge_points(bundled_edges_3d):
        if bundled_edges_3d is None:
            print("No bundled edges available. Run adjust_bundling_for_3d first.")
            return None

        point_counts = bundled_edges_3d["x"].apply(len)

        analysis = {
            "min_points": point_counts.min(),
            "max_points": point_counts.max(),
            "mean_points": point_counts.mean(),
            "median_points": point_counts.median(),
        }

        print("Edge Point Analysis:")
        print(f"Minimum points per edge: {analysis['min_points']}")
        print(f"Maximum points per edge: {analysis['max_points']}")
        print(f"Mean points per edge: {analysis['mean_points']:.2f}")
        print(f"Median points per edge: {analysis['median_points']}")

        return analysis

In [286]:
class EdgeBundler3d:
    def __init__(
        self,
        bundled_edges_2d,
        z_threshold_percentile=75,
        vertical_influence=0.8,
    ):
        self.bundled_edges_2d = bundled_edges_2d
        self.z_threshold_percentile = z_threshold_percentile
        self.vertical_influence = vertical_influence
        self.bundled_edges_3d = None

    def define_z_threshold(self, z_values):
        z_threshold = np.percentile(z_values, self.z_threshold_percentile)
        print(f"Z threshold set to {z_threshold:.4f}")
        return z_threshold

    def adjust_bundling_for_3d(self):
        z_values = pd.concat(
            [
                self.bundled_edges_2d.source_position.apply(lambda x: x["z"]),
                self.bundled_edges_2d.target_position.apply(lambda x: x["z"]),
            ]
        )
        z_threshold = self.define_z_threshold(z_values)

        adjusted_edges = []
        for _, row in tqdm(
            self.bundled_edges_2d.iterrows(), total=len(self.bundled_edges_2d)
        ):
            edge_id = row["edge_id"]
            source = row["source_position"]
            target = row["target_position"]
            source_z = source["z"]
            target_z = target["z"]
            x = np.array(row["x"])
            y = np.array(row["y"])

            num_points = len(x)
            is_high_z_connection = (source_z > z_threshold and source_z > target_z) or (
                target_z > z_threshold and target_z > source_z
            )

            # Determine high and low nodes
            if source_z > target_z:
                high_node, low_node = source, target
            else:
                high_node, low_node = target, source

            # Create control points for cubic spline
            if is_high_z_connection:
                # For high-z connections, create a smooth vertical path
                mid_x = (source["x"] + target["x"]) / 2
                mid_y = (source["y"] + target["y"]) / 2
                control_points_t = [0, 0.25, 0.75, 1]
                control_points_x = [source["x"], mid_x, mid_x, target["x"]]
                control_points_y = [source["y"], mid_y, mid_y, target["y"]]
            else:
                # For non-high-z connections, use more of the original path
                control_points_t = [0, 1 / 3, 2 / 3, 1]
                control_points_x = [
                    source["x"],
                    x[num_points // 3],
                    x[2 * num_points // 3],
                    target["x"],
                ]
                control_points_y = [
                    source["y"],
                    y[num_points // 3],
                    y[2 * num_points // 3],
                    target["y"],
                ]

            # Apply cubic spline interpolation with natural boundary conditions
            cs_x = CubicSpline(control_points_t, control_points_x, bc_type="natural")
            cs_y = CubicSpline(control_points_t, control_points_y, bc_type="natural")

            # Use the original number of points, but ensure it's at least 20 for smoother curves
            t = np.linspace(0, 1, num_points)
            x_adj = cs_x(t)
            y_adj = cs_y(t)

            # Create a smooth z-coordinate transition

            z_min, z_max = min(source_z, target_z), max(source_z, target_z)

            if is_high_z_connection:
                # For high-z connections, create a more pronounced vertical effect
                z_mid = (z_min + z_max) / 2
                z_control = [z_min, z_max, z_max, z_max]
                cs_z = CubicSpline(control_points_t, z_control, bc_type="clamped")
                z = np.clip(cs_z(t), z_min, z_max)
            else:
                # For non-high-z connections, use linear interpolation
                z = np.interp(t, [0, 1], [z_min, z_max])

            adjusted_edges.append(
                {
                    "edge_id": edge_id,
                    "bundled_x": x_adj.tolist(),
                    "bundled_y": y_adj.tolist(),
                    "bundled_z": z.tolist(),
                    "is_high_z_connection": is_high_z_connection,
                }
            )

        # Create a new dataframe with the adjusted edges
        adjusted_df = pd.DataFrame(adjusted_edges)

        # Merge the new dataframe with the original one
        merged_df = pd.merge(
            self.bundled_edges_2d, adjusted_df, on="edge_id", how="left"
        )

        # Rename columns to avoid confusion
        merged_df = merged_df.rename(
            columns={
                "x": "original_x",
                "y": "original_y",
                "bundled_x": "x",
                "bundled_y": "y",
                "bundled_z": "z",
            }
        )

        self.bundled_edges_3d = merged_df
        return self.bundled_edges_3d

    # Add this method to your GraphBundler3d class

In [287]:
bundler_3d = EdgeBundler3d(pruned_bundled_edges_2d)
adjusted_edges_3d = bundler_3d.adjust_bundling_for_3d()

Z threshold set to -0.0193


100%|██████████| 4417/4417 [00:01<00:00, 2466.46it/s]


In [288]:
adjusted_edges_3d

Unnamed: 0,edge_id,source,target,original_x,original_y,weight,source_position,target_position,x,y,z,is_high_z_connection
0,149.0,1,280,"[0.5508721470832825, 0.55393686033313, 0.55205...","[-0.4072713851928711, -0.3896731439439783, -0....",0.936576,"{'x': 0.5508721470832825, 'y': -0.407271385192...","{'x': 0.445087194442749, 'y': -0.2985744476318...","[0.5508721470832825, 0.5482098857147208, 0.544...","[-0.4072713851928711, -0.38927841208412234, -0...","[-0.1460439847414049, -0.14600811272264846, -0...",False
1,271.0,2,7,"[0.49992355704307556, 0.5130205154418945]","[-0.5801289677619934, -0.6044682264328003]",0.816939,"{'x': 0.49992355704307556, 'y': -0.58012896776...","{'x': 0.5130205154418945, 'y': -0.604468226432...","[0.49992355704307556, 0.5130205154418945]","[-0.5801289677619934, -0.6044682264328003]","[-0.1448336408037463, -0.1441700028383177]",False
2,971.0,3,16,"[0.5524348020553589, 0.5497690907852348, 0.544...","[-0.34138861298561096, -0.339813561933337, -0....",0.795334,"{'x': 0.5524348020553589, 'y': -0.341388612985...","{'x': 0.45631155371665955, 'y': -0.40058809518...","[0.5524348020553589, 0.5496931334538281, 0.545...","[-0.34138861298561096, -0.3414993881606116, -0...","[-0.13980900556810782, -0.13785882390783016, -...",False
3,974.0,3,44,"[0.5524348020553589, 0.5585797046720011, 0.566...","[-0.34138861298561096, -0.3394194623550637, -0...",0.806165,"{'x': 0.5524348020553589, 'y': -0.341388612985...","{'x': 0.662406861782074, 'y': -0.4320113360881...","[0.5524348020553589, 0.5597469018915837, 0.567...","[-0.34138861298561096, -0.3394893537229342, -0...","[-0.1436823801075662, -0.14108513664731787, -0...",False
4,981.0,3,549,"[0.5524348020553589, 0.5483290342278686, 0.541...","[-0.34138861298561096, -0.3477689670685117, -0...",0.792766,"{'x': 0.5524348020553589, 'y': -0.341388612985...","{'x': 0.49700018763542175, 'y': -0.46284896135...","[0.5524348020553589, 0.5482820949181224, 0.541...","[-0.34138861298561096, -0.34846635889273353, -...","[-0.12947644557359098, -0.12794823613888104, -...",False
...,...,...,...,...,...,...,...,...,...,...,...,...
4412,601809.0,1931,1970,"[-0.34231740236282343, -0.354872689493168, -0....","[-0.21710354089736938, -0.21300441973668005, -...",0.800884,"{'x': -0.3423174023628235, 'y': -0.21710354089...","{'x': -0.43638426065444946, 'y': -0.2356306910...","[-0.3423174023628235, -0.35194551355679315, -0...","[-0.21710354089736938, -0.2147058240132259, -0...","[-0.13220869745589156, -0.13190087264187106, -...",False
4413,602031.0,1936,1954,"[0.5319066643714905, 0.5140461921691895]","[0.2123633772134781, 0.2381635308265686]",0.663164,"{'x': 0.5319066643714905, 'y': 0.2123633772134...","{'x': 0.5140461921691895, 'y': 0.2381635308265...","[0.5319066643714905, 0.5140461921691895]","[0.2123633772134781, 0.23816353082656858]","[-0.10930934920714531, -0.034908049376908085]",False
4414,602032.0,1936,1975,"[0.5319066643714905, 0.5406255788234311, 0.548...","[0.2123633772134781, 0.2014645253000802, 0.191...",0.666620,"{'x': 0.5319066643714905, 'y': 0.2123633772134...","{'x': 0.5727753043174744, 'y': 0.3141528069972...","[0.5319066643714905, 0.5406255788234311, 0.548...","[0.2123633772134781, 0.2014645253000802, 0.191...","[-0.10072348897801676, -0.07878500911098053, -...",False
4415,602314.0,1949,1959,"[0.48814910650253296, 0.46257609128952026]","[-0.20887938141822815, -0.11553271114826202]",0.672519,"{'x': 0.48814910650253296, 'y': -0.20887938141...","{'x': 0.46257609128952026, 'y': -0.11553271114...","[0.48814910650253296, 0.46257609128952026]","[-0.20887938141822815, -0.11553271114826202]","[-0.08850872864036753, -0.0003430996737725345]",True


In [289]:
checker = BundleQualityChecker()
checker.perform_quality_check(adjusted_edges_3d)

checker.analyze_edge_points(adjusted_edges_3d)

Quality Check Results:
Total edges: 4417
Start point mismatches: 2500 (56.60%)
End point mismatches: 2500 (56.60%)
Invalid z interpolations: 0 (0.00%)
High z connections: 1744 (39.48%)
Edge Point Analysis:
Minimum points per edge: 2
Maximum points per edge: 75
Mean points per edge: 8.30
Median points per edge: 7.0


{'min_points': 2,
 'max_points': 75,
 'mean_points': 8.300656554222323,
 'median_points': 7.0}

# new attempt 1000


In [115]:
g = GraphReader.read_and_clean_graph(INPUT_GRAPH_PATH)

cluster_list = list(range(50, 70))

# subset to only cluster 0 to 100
g = GraphReader.subgraph_of_clusters(g, cluster_list)

total_nodes = len(g.vs)
################################################################################################
layout_params = {
    # "k": 0.5, # distance between nodes; best to leave it to algo
    "iterations": 20,  # (default=50) use 100
    "threshold": 0.0001,  # default 0.0001
    "weight": "weight",
    "scale": 1,
    "center": (0, 0),
    "dim": 2,
    "seed": 1887,
}

g_fr, pos = LayoutUtility.fr_layout_nx(g, layout_params)

print("#" * 100)
print("Layout done")
print("#" * 100)

################################################################################################

g_fr_z = ZCoordinateAdder(g_fr, scale_factor=0.15).add_z_coordinate_to_nodes()
print("#" * 100)
print("Z coordinate added to nodes")
print("#" * 100)
################################################################################################

bundle_kwargs = {
    "decay": 0.95,
    "initial_bandwidth": 0.05,
    "iterations": 10,
    "include_edge_id": True,
}
bundler = GraphBundler2d(
    g_fr_z, pruning_weight_percentile=80, bundle_kwargs=bundle_kwargs
)
pruned_bundled_edges_2d, g_fr_z_bundled_pruned = bundler.bundle_edges()

print("#" * 100)
print("Edge bundling done")
print("#" * 100)

################################################################################################
# if total_nodes < 5000:
#    VisualizationUtility.plot_graph_with_bundled_edges(
#        g_fr_z_bundled_pruned, pruned_bundled_edges_2d
#    )

################################################################################################
#
#
# interpolator = EdgeZInterpolator(pruned_bundled_edges_2d, g_fr_z_bundled_pruned)
# adjusted_edges_3d = interpolator.interpolate_z_to_edges()
#
# print("#" * 100)
# print("Z interpolation done")
# print("#" * 100)
#################################################################################################

  g = ig.Graph.Read_GraphML(path)


Node Attributes: ['doi', 'year', 'title', 'cluster', 'node_id', 'node_name', 'centrality']
Edge Attributes: ['weight', 'edge_id']
Number of nodes: 40643
Number of edges: 602779
Starting Fruchterman-Reingold layout process...
Layout parameters: {'iterations': 20, 'threshold': 0.0001, 'weight': 'weight', 'scale': 1, 'center': (0, 0), 'dim': 2, 'seed': 1887}
Converting to NetworkX Graph...
Conversion complete.
Graph has 5524 nodes and 46997 edges.
Calculating layout...
Layout calculation completed in 22.74 seconds.
Processing layout results...
Layout boundaries:
X-axis: Min = -0.85, Max = 0.95
Y-axis: Min = -0.77, Max = 1.00
Assigning coordinates to nodes...
Layout process completed in 22.91 seconds.
####################################################################################################
Layout done
####################################################################################################
Bounds of the layout:
Min x: -0.8468527793884277, Max x: 0.9507213234901428
Min

In [116]:
import numpy as np
import pandas as pd


class EdgeZInterpolator:
    def __init__(self, bundled_edges):
        self.bundled_edges = bundled_edges

    def assign_same_z_coordinate_to_all_edge_points(self):
        # Calculate minimum z value more efficiently
        min_z = min(
            self.bundled_edges["source_position"].apply(lambda x: x["z"]).min(),
            self.bundled_edges["target_position"].apply(lambda x: x["z"]).min(),
        )
        print("Minimum z value:", min_z)

        # Vectorized operation to assign z values
        self.bundled_edges["z"] = self.bundled_edges.apply(
            lambda row: self._assign_z(row, min_z), axis=1
        )

        # Reorder columns
        column_order = [
            "edge_id",
            "source",
            "target",
            "x",
            "y",
            "z",
            "weight",
            "source_position",
            "target_position",
        ]
        self.bundled_edges = self.bundled_edges.reindex(columns=column_order)

        return self.bundled_edges

    @staticmethod
    def _assign_z(row, min_z):
        z_values = np.full(len(row["x"]), min_z)
        z_values[0] = row["source_position"]["z"]
        z_values[-1] = row["target_position"]["z"]
        return z_values.tolist()

In [13]:
def find_common_coordinates(pruned_bundled_edges_2d, tolerance=1e-7, precision=7):
    # Extract x and y coordinates, excluding first and last points
    xs = np.concatenate([x[1:-1] for x in pruned_bundled_edges_2d["x"].values])
    ys = np.concatenate([y[1:-1] for y in pruned_bundled_edges_2d["y"].values])
    # Combine all points into a single array
    all_points = np.column_stack((xs, ys))
    # Create arrays for edge indices and point indices
    edge_lengths = [len(x) - 2 for x in pruned_bundled_edges_2d["x"].values]
    edge_indices = np.repeat(np.arange(len(pruned_bundled_edges_2d)), edge_lengths)
    point_indices = np.concatenate(
        [np.arange(1, length + 1) for length in edge_lengths]
    )
    # Build a KD-tree for efficient nearest neighbor search
    tree = cKDTree(all_points)
    # Find pairs of points within the tolerance
    pairs = tree.query_pairs(r=tolerance)
    # Create a dictionary to store coordinates and their corresponding edges and indices
    coord_to_edges = defaultdict(lambda: defaultdict(set))
    # Process pairs and populate the dictionary
    for p1, p2 in tqdm(pairs, desc="Processing point pairs", unit="pair"):
        i, j = edge_indices[p1], edge_indices[p2]
        if i != j:  # Ensure points are from different edges
            coord = tuple(np.round(all_points[p1], precision))
            coord_to_edges[coord][i].add(point_indices[p1])
            coord_to_edges[coord][j].add(point_indices[p2])
    # Create a dictionary of DataFrames
    bundle_points_dict = {}
    for coord, edges in tqdm(
        coord_to_edges.items(), desc="Creating DataFrames", unit="coord"
    ):
        df_data = []
        for edge_index, indices in edges.items():
            edge_data = pruned_bundled_edges_2d.iloc[edge_index]
            for idx in indices:
                df_data.append(
                    {
                        "edge_index": edge_index,
                        "edge_id": edge_data["edge_id"],
                        "coordinate_idx": idx,
                        "x": edge_data["x"][idx],
                        "y": edge_data["y"][idx],
                        "source": edge_data["source"],
                        "target": edge_data["target"],
                    }
                )
        # Create DataFrame and remove duplicates
        df = pd.DataFrame(df_data).drop_duplicates(
            subset=["edge_index", "coordinate_idx"], keep="first"
        )
        bundle_points_dict[coord] = df
    # Print statistics
    print(f"Nr of Edges: {len(pruned_bundled_edges_2d)}")
    print(f"Tolerance {tolerance}")
    print(f"Nr of Bundled Points: {len(bundle_points_dict)}")
    print(f"Total Nr of points: {len(all_points)}")
    return bundle_points_dict

In [117]:
bundled_edges_df_basic = EdgeZInterpolator(
    pruned_bundled_edges_2d
).assign_same_z_coordinate_to_all_edge_points()
bundled_edges_df.head(2)

Minimum z value: -0.14998970187326197


Unnamed: 0,edge_id,source,target,x,y,z,weight,source_position,target_position
0,2934.0,3,222,"[0.32710835337638855, 0.34829094948036254, 0.3...","[0.09232964366674423, 0.07370856381931867, 0.0...","[-0.14538083599954943, -0.15, -0.15, -0.15, -0...",0.799535,"{'x': 0.32710835337638855, 'y': 0.092329643666...","{'x': 0.2604182958602905, 'y': 0.0255446005612..."
1,4359.0,4,5,"[-0.005170490127056837, -0.030738407745957375]","[-0.5241622924804688, -0.5212947130203247]","[-0.14182552012417127, -0.1433996689668787]",0.818326,"{'x': -0.005170490127056837, 'y': -0.524162292...","{'x': -0.030738407745957375, 'y': -0.521294713..."


In [15]:
bundle_points_dict_s = find_common_coordinates(
    bundled_edges_df, tolerance=1e-7, precision=7
)

# bundle_points_dict_m = find_common_coordinates(
#    bundled_edges_df, tolerance=1e-4, precision=7
# )
#
# bundle_points_dict_l = find_common_coordinates(
#    bundled_edges_df, tolerance=1e-3, precision=7
# )

Processing point pairs:   0%|          | 0/35291 [00:00<?, ?pair/s]

Creating DataFrames:   0%|          | 0/14021 [00:00<?, ?coord/s]

Nr of Edges: 42237
Tolerance 1e-07
Nr of Bundled Points: 14021
Total Nr of points: 359822


In [16]:
class BundlePointProcessor:
    def __init__(self, bundled_edges_df, log_to_file=False, log_level=logging.INFO):
        self.bundled_edges_df = bundled_edges_df
        self.bundled_edges_df_old = bundled_edges_df.copy()
        self.logger = self.setup_logger(log_to_file, log_level)

    def setup_logger(self, log_to_file, log_level):
        logger = logging.getLogger(__name__)
        logger.setLevel(log_level)

        # Remove any existing handlers
        for handler in logger.handlers[:]:
            logger.removeHandler(handler)

        formatter = logging.Formatter("%(asctime)s - %(levelname)s - %(message)s")

        if log_to_file:
            file_handler = logging.FileHandler("bundle_point_processor.log")
            file_handler.setFormatter(formatter)
            logger.addHandler(file_handler)
        else:
            # Use NullHandler to prevent logging to console
            logger.addHandler(logging.NullHandler())
        # else:
        # console_handler = logging.StreamHandler(sys.stdout)
        # console_handler.setFormatter(formatter)
        # logger.addHandler(console_handler)

        return logger

    def determine_z_value(self, bundle_point_df):
        # Get the most frequent node: source or target
        combined_counts = pd.concat(
            [bundle_point_df["source"], bundle_point_df["target"]]
        ).value_counts()

        # Find the maximum count and its corresponding node
        max_count = combined_counts.max()
        max_count_node = combined_counts.idxmax()

        # Check if the max_count_node is in source or target
        source_match = self.bundled_edges_df[
            self.bundled_edges_df["source"] == max_count_node
        ]
        target_match = self.bundled_edges_df[
            self.bundled_edges_df["target"] == max_count_node
        ]

        if not source_match.empty:
            z_max_node = source_match.iloc[0]["source_position"]["z"]
        elif not target_match.empty:
            z_max_node = target_match.iloc[0]["target_position"]["z"]
        else:
            self.logger.warning(f"Node {max_count_node} not found in bundled_edges_df")
            # return error
            return AttributeError

        z_for_bundlepoint = z_max_node * 0.9

        # Log the z values
        self.logger.info(
            f"Node with max count: {max_count_node} with count: {max_count} and z value: {z_max_node:.4f}"
        )
        return z_for_bundlepoint

    def process_bundle_points(self, bundle_points_dict):
        # Create a progress bar with tqdm
        pbar = tqdm(
            bundle_points_dict.items(), desc="Processing Bundle Points", unit="bundle"
        )

        for bundle_coords, bundle_point_df in pbar:
            try:
                z_for_point = self.determine_z_value(bundle_point_df)
            except Exception as e:
                self.logger.error(f"Error determining z-value for {bundle_coords}: {e}")
                continue

            for _, row in bundle_point_df.iterrows():
                edge_id = row["edge_id"]
                coordinate_idx_to_update = int(row["coordinate_idx"])

                # Find the single row in bundled_edges_df that matches the current edge_id
                matching_row = self.bundled_edges_df[
                    self.bundled_edges_df["edge_id"] == edge_id
                ]

                if matching_row.empty:
                    self.logger.warning(
                        f"No matching row found in bundled_edges for edge_id {edge_id}"
                    )
                    continue

                # Get the index of the matching row
                index = matching_row.index[0]

                # Update z value for the matching row
                z_values = self.bundled_edges_df.at[index, "z"]

                if coordinate_idx_to_update < len(z_values):
                    # Log which values are being changed
                    log_msg = f"Updating z value at index {index}, coordinate {coordinate_idx_to_update}/{len(z_values)}: from {z_values[coordinate_idx_to_update]} to {z_for_point:.4f}"
                    # self.logger.info(log_msg)
                    pbar.set_postfix_str(log_msg)  # Display log in progress bar

                    z_values[coordinate_idx_to_update] = z_for_point
                    self.bundled_edges_df.at[index, "z"] = z_values
                else:
                    log_msg = f"Coordinate index {coordinate_idx_to_update} out of bounds for z_values length {len(z_values)}."
                    # self.logger.warning(log_msg)
                    pbar.set_postfix_str(log_msg)  # Display log in progress bar
                # Log completion
                check_indexing = True
                if check_indexing:
                    x_values = self.bundled_edges_df.at[index, "x"]
                    x_at_idx = round(x_values[coordinate_idx_to_update], 5)
                    x_coord = round(bundle_coords[0], 5)
                    if x_at_idx != x_coord:
                        log_msg = f"FAILED!!!Indexing check: x value at index {coordinate_idx_to_update} is {x_at_idx} and should be {x_coord}"
                    self.logger.info(
                        f"Indexing check: x value at index {coordinate_idx_to_update} is {x_at_idx} and should be {x_coord}"
                    )
        self.logger.info("Processing complete.")

In [17]:
# Example usage with logging to a file
processor = BundlePointProcessor(
    bundled_edges_df, log_to_file=True, log_level=logging.INFO
)
processor.process_bundle_points(bundle_points_dict_s)

Processing Bundle Points:   0%|          | 0/14021 [00:00<?, ?bundle/s]

In [18]:
bundled_edges_df_interpo = processor.bundled_edges_df

In [21]:
old_b = processor.bundled_edges_df_old

In [41]:
import numpy as np
import pandas as pd


def adjust_edge_heights(df, low_z=-0.10):
    def modify_edge(row):
        x, y, z = map(np.array, (row["x"], row["y"], row["z"]))
        source_z = row["source_position"]["z"]
        target_z = row["target_position"]["z"]

        # Add point for source if it's above low_z
        if source_z > low_z:
            x = np.insert(x, 1, row["source_position"]["x"])
            y = np.insert(y, 1, row["source_position"]["y"])
            z = np.insert(z, 1, low_z)

        # Add point for target if it's above low_z
        if target_z > low_z:
            x = np.insert(x, -1, row["target_position"]["x"])
            y = np.insert(y, -1, row["target_position"]["y"])
            z = np.insert(z, -1, low_z)

        return pd.Series({"x": x, "y": y, "z": z})

    return df.apply(modify_edge, axis=1)


# Usage
old_b[["x", "y", "z"]] = adjust_edge_heights(bundled_edges_df_basic)

In [71]:
bundled_edges_df_basic["z_diff"] = bundled_edges_df_basic.apply(
    lambda x: abs(x["z"][-1] - x["z"][0]), axis=1
)

bundled_edges_df_basic[
    (bundled_edges_df_basic["z_diff"] > 0.27) & (len(bundled_edges_df_basic["x"]) > 3)
]

Unnamed: 0,edge_id,source,target,x,y,z,weight,source_position,target_position,color,z_diff
504,115608.0,210,2621,"[0.05712436884641647, 0.076321294463916, 0.092...","[-0.5912988781929016, -0.5899160105348072, -0....","[-0.12089909153190367, -0.15, -0.15, -0.15, 0.15]",0.668332,"{'x': 0.05712436884641647, 'y': -0.59129887819...","{'x': 0.09977206587791443, 'y': -0.59439557790...",True,0.270899
3713,491542.0,1655,2621,"[-0.045594144612550735, -0.0401460388633037, -...","[-0.47651049494743347, -0.4803644721423699, -0...","[-0.12030554688365759, -0.15, -0.15, -0.15, -0...",0.667766,"{'x': -0.045594144612550735, 'y': -0.476510494...","{'x': 0.09977206587791443, 'y': -0.59439557790...",True,0.270306


In [48]:
bundled_edges_df_basic.head(2).drop("color", axis=1).to_csv(
    "bundled_edges_df_basic.csv", index=False
)

In [112]:
import numpy as np


def adjust_edge_z_coordinates_exponential(z, source_z, target_z, steepness=10):
    num_points = len(z)

    # Calculate min_z within the function
    min_z = np.min(z)
    edge_min_z = min(source_z, target_z, min_z)

    # Create exponential curve
    t = np.linspace(0, 1, num_points)
    curve = np.exp(steepness * t) - 1
    curve = curve / curve.max()  # Normalize to [0, 1]

    # Determine which end needs to rise
    if source_z > target_z:
        curve = 1 - curve  # Flip the curve if source is higher
        rise_target = source_z
    else:
        rise_target = target_z

    # Scale and shift the curve
    z_diff = rise_target - edge_min_z
    new_z = edge_min_z + curve * z_diff

    # Ensure exact source and target z-values
    new_z[0] = source_z
    new_z[-1] = target_z

    return new_z


# Example usage:
z = np.array([-0.12089909153190367, -0.15, -0.15, -0.15, 0.15])
source_z = z[0]
target_z = z[-1]

# Try different steepness values
for steepness in [5, 10, 20]:
    new_z = adjust_edge_z_coordinates_exponential(z, source_z, target_z, steepness=0.5)
    print(f"\nSteepness: {steepness}")
    print("Original z:", z)
    print("New z:", new_z)


Steepness: 5
Original z: [-0.12089909 -0.15       -0.15       -0.15        0.15      ]
New z: [-0.12089909 -0.08842573 -0.01865295  0.06040997  0.15      ]

Steepness: 10
Original z: [-0.12089909 -0.15       -0.15       -0.15        0.15      ]
New z: [-0.12089909 -0.08842573 -0.01865295  0.06040997  0.15      ]

Steepness: 20
Original z: [-0.12089909 -0.15       -0.15       -0.15        0.15      ]
New z: [-0.12089909 -0.08842573 -0.01865295  0.06040997  0.15      ]


In [123]:
bundled_edges_df_basic
all_nodes = pd.Series(
    bundled_edges_df_basic["source"].tolist()
    + bundled_edges_df_basic["target"].tolist()
)

In [124]:
all_nodes.value_counts()

4023    45
2212    37
3992    37
3279    35
3924    34
        ..
1920     1
1925     1
1928     1
1665     1
5445     1
Name: count, Length: 4113, dtype: int64

In [204]:
edges_df = pruned_bundled_edges_2d[["source", "target"]]
edge_list = edges_df.values.tolist()

In [192]:
# Assuming g_fr_z_bundled_pruned is your igraph Graph object
node_positions_x = [node["x"] for node in g_fr_z_bundled_pruned.vs]
node_positions_y = [node["y"] for node in g_fr_z_bundled_pruned.vs]
node_ids = g_fr_z_bundled_pruned.vs.indices

node_positions_tuples = list(zip(node_positions_x, node_positions_y))


# Create a dictionary with node IDs as keys and positions (x, y) as values
node_positions_dict = dict(zip(node_ids, node_positions_tuples))

print(node_positions_dict)

{0: (0.2746517062187195, 0.45784616470336914), 1: (0.01469799317419529, 0.014342580921947956), 2: (-0.09740978479385376, 0.4621775448322296), 3: (0.2300095111131668, 0.18453529477119446), 4: (0.2564321458339691, -0.14262549579143524), 5: (-0.29685482382774353, 0.20775803923606873), 6: (0.5454504489898682, -0.027031797915697098), 7: (-0.21311798691749573, 0.1837420016527176), 8: (0.19109883904457092, 0.43846002221107483), 9: (0.18282705545425415, 0.48243024945259094), 10: (-0.2575904130935669, 0.17567496001720428), 11: (0.11515644192695618, 0.10593082755804062), 12: (-0.18967297673225403, 0.2205783873796463), 13: (-0.23726747930049896, 0.2639611065387726), 14: (0.2720964550971985, 0.2829279899597168), 15: (-0.5140379667282104, 0.28309643268585205), 16: (0.20922598242759705, 0.49176785349845886), 17: (-0.36739182472229004, -0.16183343529701233), 18: (0.04989243671298027, -0.42265456914901733), 19: (-0.026112766936421394, -0.13746881484985352), 20: (-0.21293410658836365, 0.224489301443099

In [238]:
import numpy as np
import netgraph

edges_df = pruned_bundled_edges_2d[["source", "target"]]
edge_list = edges_df.values.tolist()

# Assuming g_fr_z_bundled_pruned is your igraph Graph object
node_positions_x = [node["x"] for node in g_fr_z_bundled_pruned.vs]
node_positions_y = [node["y"] for node in g_fr_z_bundled_pruned.vs]
node_ids = g_fr_z_bundled_pruned.vs.indices

# Create a dictionary with node IDs as keys and positions as numpy arrays
node_positions_dict = {
    node_id: np.array([x, y])
    for node_id, x, y in zip(node_ids, node_positions_x, node_positions_y)
}

print(node_positions_dict)

bundled_edges_netgraph = netgraph.get_bundled_edge_paths(
    edge_list,
    node_positions_dict,
    k=800.0,
    compatibility_threshold=0.2,
    total_cycles=2,
    total_iterations=2,
    step_size=0.2,
    straighten_by=0.3,
)

{0: array([0.27465171, 0.45784616]), 1: array([0.01469799, 0.01434258]), 2: array([-0.09740978,  0.46217754]), 3: array([0.23000951, 0.18453529]), 4: array([ 0.25643215, -0.1426255 ]), 5: array([-0.29685482,  0.20775804]), 6: array([ 0.54545045, -0.0270318 ]), 7: array([-0.21311799,  0.183742  ]), 8: array([0.19109884, 0.43846002]), 9: array([0.18282706, 0.48243025]), 10: array([-0.25759041,  0.17567496]), 11: array([0.11515644, 0.10593083]), 12: array([-0.18967298,  0.22057839]), 13: array([-0.23726748,  0.26396111]), 14: array([0.27209646, 0.28292799]), 15: array([-0.51403797,  0.28309643]), 16: array([0.20922598, 0.49176785]), 17: array([-0.36739182, -0.16183344]), 18: array([ 0.04989244, -0.42265457]), 19: array([-0.02611277, -0.13746881]), 20: array([-0.21293411,  0.2244893 ]), 21: array([-0.145447  ,  0.30416423]), 22: array([-0.5631972 , -0.14197923]), 23: array([-0.39991391, -0.13871677]), 24: array([-0.27311137, -0.24170022]), 25: array([ 0.24019188, -0.1168144 ]), 26: array([

  displacement = compatibility * delta / distance_squared[..., None]


k=1000.0:

Spring constant for the force-directed algorithm.
Higher values make the bundling more rigid, lower values make it more flexible.
Adjust this to control how aggressively edges are bundled.

compatibility_threshold=0.05:

Threshold for determining if two edges are compatible for bundling.
Range is typically between 0 and 1.
Lower values result in more aggressive bundling, higher values in less bundling.

total_cycles=5:

Number of cycles in the bundling process.
Each cycle applies the force-directed algorithm to refine the bundling.
More cycles can lead to better bundling but increase computation time.

total_iterations=50:

Number of iterations per cycle.
Higher values can lead to more refined results but increase computation time.

step_size=0.04:

Size of each step in the force-directed algorithm.
Smaller values lead to more precise but slower convergence.
Larger values are faster but might overshoot optimal positions.

straighten_by=0.0:

Factor to straighten bundled edges after bundling.
Range is typically between 0 and 1.
0 means no straightening, 1 means fully straight edges.
Can help in making the final result look cleaner.


In [239]:
source, target, x, y = [], [], [], []
for nodes, coords in bundled_edges_netgraph.items():
    source.append(nodes[0])
    target.append(nodes[1])
    # Extract x and y coordinates using list comprehensions
    xs = [coord[0] for coord in node_positions_tuples]
    ys = [coord[1] for coord in node_positions_tuples]
    x.append(xs)
    y.append(ys)

df = pd.DataFrame({"source": source, "target": target, "x": x, "y": y})
df["z"] = df.apply(lambda x: np.zeros(len(x["x"])), axis=1)

In [240]:
# merge and target and source with bundled_edges_df_basic
new_bundled_edges_df_basic = bundled_edges_df_basic.merge(
    df[["source", "target", "x", "y", "z"]],
    on=["source", "target"],
    how="left",
    suffixes=("", "_new"),
)

# drop the old x, y, z columns
new_bundled_edges_df_basic.drop(["x", "y", "z"], axis=1, inplace=True)
# rename the new columns
new_bundled_edges_df_basic.rename(
    columns={"x_new": "x", "y_new": "y", "z_new": "z"}, inplace=True
)

In [241]:
new_bundled_edges_df_basic = new_bundled_edges_df_basic.drop("z", axis=1)
eip = EdgeZInterpolator(new_bundled_edges_df_basic)
new_bundled_edges_3d = eip.assign_same_z_coordinate_to_all_edge_points()

Minimum z value: -0.14998970187326197


In [242]:
# get everage number of points
points = new_bundled_edges_3d["x"].apply(len)
points.mean()

5524.0

In [236]:
# Check the reduction in number of points
original_points = sum(
    len(edge_data[1]) for edge_data in bundled_edges_netgraph.values()
)
simplified_points = sum(len(path) for path in simplified_bundled_edges.values())

print(f"Original total points: {original_points}")
print(f"Simplified total points: {simplified_points}")
print(
    f"Reduction: {(original_points - simplified_points) / original_points * 100:.2f}%"
)

# Check the structure of the simplified edges
print("\nStructure of simplified_bundled_edges:")
for key, value in list(simplified_bundled_edges.items())[:5]:  # Print first 5 entries
    print(f"Key: {key}")
    print(f"Shape of simplified path: {value.shape}")
    print(f"First few coordinates: {value[:3]}")
    print()

NameError: name 'simplified_bundled_edges' is not defined

In [237]:
from shapely.geometry import LineString
import numpy as np


def simplify_path(edge_data, tolerance=0.01):
    print(f"Edge data type: {type(edge_data)}")
    print(f"Edge data: {edge_data}")

    # Check if edge_data is already a numpy array
    if isinstance(edge_data, np.ndarray):
        path = edge_data
    elif isinstance(edge_data, tuple) and len(edge_data) > 1:
        path = edge_data[1]
    else:
        print(f"Unexpected edge_data format: {edge_data}")
        return None

    print(f"Path type: {type(path)}")
    print(
        f"Path shape: {path.shape if isinstance(path, np.ndarray) else 'Not an array'}"
    )

    # If path is 1D, reshape it to 2D
    if path.ndim == 1:
        path = path.reshape(-1, 2)

    # Convert the numpy array to a list of tuples
    path_coords = [tuple(coord) for coord in path]

    # Create LineString and simplify
    line = LineString(path_coords)
    simplified_line = line.simplify(tolerance, preserve_topology=False)

    # Return as a numpy array to maintain consistency with input format
    return np.array(simplified_line.coords)


# Example usage with bundled edge paths
simplified_bundled_edges = {}
for edge_key, edge_data in list(bundled_edges_netgraph.items())[
    :5
]:  # Process only first 5 items for testing
    print(f"\nProcessing edge: {edge_key}")
    simplified_path = simplify_path(edge_data, tolerance=0.1)
    if simplified_path is not None:
        simplified_bundled_edges[edge_key] = simplified_path

# Print the first simplified edge path to check the result
if simplified_bundled_edges:
    first_key = next(iter(simplified_bundled_edges))
    print(f"\nFirst key: {first_key}")
    print(f"Simplified path: {simplified_bundled_edges[first_key]}")
else:
    print("No edges were successfully simplified.")


Processing edge: (1707, 3732)
Edge data type: <class 'numpy.ndarray'>
Edge data: [[0.41200054 0.2671905 ]
 [0.41063885 0.27017868]
 [0.40937536 0.27306618]
 [0.40820866 0.27585483]
 [0.4071373  0.27854643]
 [0.40615985 0.28114279]
 [0.4052749  0.28364571]
 [0.40448101 0.286057  ]
 [0.40377676 0.28837848]
 [0.4031607  0.29061195]
 [0.40263142 0.29275922]
 [0.40218749 0.29482209]
 [0.40182748 0.29680237]
 [0.40154996 0.29870188]
 [0.40135349 0.30052241]
 [0.40123666 0.30226578]
 [0.40119803 0.3039338 ]
 [0.40123618 0.30552827]
 [0.40134967 0.307051  ]
 [0.40153708 0.30850379]
 [0.40179698 0.30988847]
 [0.40212794 0.31120683]
 [0.40252852 0.31246068]
 [0.40299732 0.31365183]
 [0.40353288 0.31478209]
 [0.40413379 0.31585327]
 [0.40479862 0.31686717]
 [0.40552593 0.3178256 ]
 [0.40631431 0.31873037]
 [0.40716231 0.31958329]
 [0.40806852 0.32038616]
 [0.4090315  0.3211408 ]
 [0.41004982 0.32184901]
 [0.41112206 0.3225126 ]
 [0.41224679 0.32313338]
 [0.41342257 0.32371315]
 [0.41464799 0.324

In [228]:
import numpy as np
import pandas as pd


class EdgeZInterpolator:
    def __init__(self, bundled_edges):
        self.bundled_edges = bundled_edges

    def assign_same_z_coordinate_to_all_edge_points(self):
        # Calculate minimum z value more efficiently
        min_z = min(
            self.bundled_edges["source_position"].apply(lambda x: x["z"]).min(),
            self.bundled_edges["target_position"].apply(lambda x: x["z"]).min(),
        )
        print("Minimum z value:", min_z)

        # Vectorized operation to assign z values
        self.bundled_edges["z"] = self.bundled_edges.apply(
            lambda row: self._assign_z(row, min_z), axis=1
        )

        # Reorder columns
        column_order = [
            "edge_id",
            "source",
            "target",
            "x",
            "y",
            "z",
            "weight",
            "source_position",
            "target_position",
        ]
        self.bundled_edges = self.bundled_edges.reindex(columns=column_order)

        return self.bundled_edges

    @staticmethod
    def _assign_z(row, min_z):
        z_values = np.full(len(row["x"]), min_z)
        z_values[0] = row["source_position"]["z"]
        z_values[-1] = row["target_position"]["z"]
        return z_values.tolist()

In [229]:
edges_json = EdgesSaver.save_edges_for_js(
    new_bundled_edges_3d,
    [
        OUTPUT_DIR
        + f"bundled_edges_3d_clusters{min(cluster_list)}to{max(cluster_list)}.json",
        THREEJS_OUTPUT_DIR
        + f"bundled_edges_3d_clusters{min(cluster_list)}to{max(cluster_list)}.json",
    ],
    add_color_bool=True,
    g=g_fr_z_bundled_pruned,
    return_json=True,
)

9337 out of 9400 edges have the same source and target cluster.
No edges with NaN values found.
Edges data saved to ../data/99-testdata/bundled_edges_3d_clusters50to69.json


KeyboardInterrupt: 

In [120]:
# save to multiple paths
nodes_json = NodesSaver.save_igraph_nodes_to_json(
    g_fr_z_bundled_pruned,
    [
        OUTPUT_DIR + f"nodes_3d_clusters{min(cluster_list)}to{max(cluster_list)}.json",
        THREEJS_OUTPUT_DIR
        + f"nodes_3d_clusters{min(cluster_list)}to{max(cluster_list)}.json",
    ],
    return_json=True,
)

Graph nodes saved to ../data/99-testdata/nodes_3d_clusters50to69.json
Graph nodes saved to /Users/jlq293/Projects/Random Projects/LW-ThreeJS/2d_ssrinetworkviz/src/data/nodes_3d_clusters50to69.json


# With simple Interpolation / Good enough for now


In [87]:
g = GraphReader.read_and_clean_graph(INPUT_GRAPH_PATH)

cluster_list = list(range(0, 101))

# subset to only cluster 0 to 100
g = GraphReader.subgraph_of_clusters(g, cluster_list)

total_nodes = len(g.vs)
################################################################################################
layout_params = {
    # "k": 0.5, # distance between nodes; best to leave it to algo
    "iterations": 50,  # (default=50) use 100
    "threshold": 0.0001,  # default 0.0001
    "weight": "weight",
    "scale": 1,
    "center": (0, 0),
    "dim": 2,
    "seed": 1887,
}

g_fr, pos = LayoutUtility.fr_layout_nx(g, layout_params)

print("#" * 100)
print("Layout done")
print("#" * 100)

################################################################################################

g_fr_z = ZCoordinateAdder(g_fr, scale_factor=0.15).add_z_coordinate_to_nodes()
print("#" * 100)
print("Z coordinate added to nodes")
print("#" * 100)
################################################################################################

bundle_kwargs = {
    "decay": 0.95,
    "initial_bandwidth": 0.05,
    "iterations": 10,
    "include_edge_id": True,
}
bundler = GraphBundler2d(
    g_fr_z, pruning_weight_percentile=80, bundle_kwargs=bundle_kwargs
)
pruned_bundled_edges_2d, g_fr_z_bundled_pruned = bundler.bundle_edges()

print("#" * 100)
print("Edge bundling done")
print("#" * 100)

  g = ig.Graph.Read_GraphML(path)


Node Attributes: ['doi', 'year', 'title', 'cluster', 'node_id', 'node_name', 'centrality']
Edge Attributes: ['weight', 'edge_id']
Number of nodes: 40643
Number of edges: 602779
Starting Fruchterman-Reingold layout process...
Layout parameters: {'iterations': 50, 'threshold': 0.0001, 'weight': 'weight', 'scale': 1, 'center': (0, 0), 'dim': 2, 'seed': 1887}
Converting to NetworkX Graph...
Conversion complete.
Graph has 37804 nodes and 564246 edges.
Calculating layout...
Layout calculation completed in 1960.35 seconds.
Processing layout results...
Layout boundaries:
X-axis: Min = -1.00, Max = 0.75
Y-axis: Min = -0.75, Max = 0.77
Assigning coordinates to nodes...
Layout process completed in 1962.34 seconds.
####################################################################################################
Layout done
####################################################################################################
Bounds of the layout:
Min x: -1.0, Max x: 0.746563196182251
Min y: -0.753

In [88]:
# save as graphml
g_fr_z_bundled_pruned.write(
    OUTPUT_DIR
    + f"intermediate_graph_bundled_clusters{min(cluster_list)}to{max(cluster_list)}.graphml"
)

# save as csv
pruned_bundled_edges_2d.to_csv(
    OUTPUT_DIR
    + f"intermediate_bundled_edges_2d_clusters{min(cluster_list)}to{max(cluster_list)}.csv",
    index=False,
)

In [89]:
interpolator = SimpleEdgeZInterpolator(pruned_bundled_edges_2d)
z_interp_bundled_edges_df = interpolator.assign_same_z_coordinate_to_all_edge_points()

# print("#" * 100)
print("Z interpolation done")
# print("#" * 100)
#################################################################################################

Minimum z value: -0.14998970187326197
Z interpolation done


In [90]:
# save to multiple paths
nodes_json = NodesSaver.save_igraph_nodes_to_json(
    g_fr_z_bundled_pruned,
    [
        OUTPUT_DIR + f"nodes_3d_clusters{min(cluster_list)}to{max(cluster_list)}.json",
        THREEJS_OUTPUT_DIR
        + f"nodes_3d_clusters{min(cluster_list)}to{max(cluster_list)}.json",
    ],
    return_json=True,
)

Graph nodes saved to ../data/99-testdata/nodes_3d_clusters0to100.json
Graph nodes saved to /Users/jlq293/Projects/Random Projects/LW-ThreeJS/2d_ssrinetworkviz/src/data/nodes_3d_clusters0to100.json


In [91]:
edges_json = EdgesSaver.save_edges_for_js(
    z_interp_bundled_edges_df,
    [
        OUTPUT_DIR
        + f"bundled_edges_3d_clusters{min(cluster_list)}to{max(cluster_list)}.json",
        THREEJS_OUTPUT_DIR
        + f"bundled_edges_3d_clusters{min(cluster_list)}to{max(cluster_list)}.json",
    ],
    add_color_bool=True,
    g=g_fr_z_bundled_pruned,
    return_json=True,
)

99148 out of 112849 edges have the same source and target cluster.
No edges with NaN values found.
Edges data saved to ../data/99-testdata/bundled_edges_3d_clusters0to100.json
Edges data saved to /Users/jlq293/Projects/Random Projects/LW-ThreeJS/2d_ssrinetworkviz/src/data/bundled_edges_3d_clusters0to100.json


In [93]:
edges_json[0]

{'id': 18,
 'source': 1,
 'target': 3,
 'weight': 0.661441922187805,
 'colored': False,
 'points': [{'x': 0.1221037283539772,
   'y': 0.3766651451587677,
   'z': -0.14679787401605288},
  {'x': 0.10276992717022626,
   'y': 0.3633664946274533,
   'z': -0.14998970187326197},
  {'x': 0.082411802142482,
   'y': 0.35216182396024354,
   'z': -0.14998970187326197},
  {'x': 0.0611634848047633,
   'y': 0.3437742463348492,
   'z': -0.14998970187326197},
  {'x': 0.040672941587518, 'y': 0.3380450598572611, 'z': -0.14998970187326197},
  {'x': 0.023999530111220357,
   'y': 0.3345025088668898,
   'z': -0.14998970187326197},
  {'x': 0.01451600507685491,
   'y': 0.33271084531796014,
   'z': -0.14998970187326197},
  {'x': 0.014143231410115975,
   'y': 0.33237554740265174,
   'z': -0.14998970187326197},
  {'x': 0.022334358011036004,
   'y': 0.3333135174425177,
   'z': -0.14998970187326197},
  {'x': 0.03690488162827399,
   'y': 0.33534083357152333,
   'z': -0.14998970187326197},
  {'x': 0.05562574744511939