Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/community_resources/existing_plugins.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,5 @@ plugin, see the :ref:`plugin author guide<plugin_author_guide>`.
Plugins we know about
---------------------

- metagraph-cuda
- metagraph-igraph
- `metagraph-cuda <https://github.com/metagraph-dev/metagraph-cuda>`__
- `metagraph-igraph <https://github.com/metagraph-dev/metagraph-igraph>`__
2 changes: 1 addition & 1 deletion docs/getting_started/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Installing using conda

::

conda install -c conda-forge metagraph
conda install -c metagraph metagraph


Installing from PyPI
Expand Down
108 changes: 103 additions & 5 deletions docs/user_guide/algorithm_list.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Graphs often have natural structure which can be discovered, allowing them to be

.. py:function:: clustering.connected_components(graph: Graph(is_directed=False)) -> NodeMap

The connected components algorithm groups nodes of an **undirected** graph into subgraphs where all subgraph nodes
The connected components algorithm groups nodes of an undirected graph into subgraphs where all subgraph nodes
are reachable within a component.

:rtype: a dense NodeMap where each node is assigned an integer indicating the component.
Expand Down Expand Up @@ -47,6 +47,14 @@ Graphs often have natural structure which can be discovered, allowing them to be
This algorithms returns the total number of triangles in the graph.


.. py:function:: clustering.coloring.greedy(graph: Graph(is_directed=False)) -> Tuple[NodeMap, int]

Attempts to find the minimum number of colors required to label the graph such that no connected nodes have the
same color. Color is represented as a value from 0..n.

:rtype: (color for each node, number of unique colors)


Traversal
---------

Expand Down Expand Up @@ -123,6 +131,22 @@ Many algorithms assign a ranking or value to each vertex/node in the graph based
This algorithm determines the importance of a given node in the network based on links between important nodes.


.. py:function:: centrality.closeness(graph: Graph(edge_type="map", edge_dtype={"int", "float"}), nodes: Optional[NodeSet] = None) -> NodeMap

Calculates the closeness centrality metric, which estimates the average distance from a node to all other nodes.
A high closeness score indicates a small average distance to other nodes.

.. py:function:: centrality.eigenvector(graph: Graph(edge_type="map", edge_dtype={"int", "float"})) -> NodeMap

Calculates the eigenvector centrality, which estimates the importance of a node in the graph.

.. py:function:: centrality.hits(graph: Graph(edge_type="map", edge_dtype={"int", "float"}), max_iter: int = 100, tol: float = 1e-05, normalize: bool = True) -> Tuple[NodeMap, NodeMap]

Hyperlink-Induced Topic Search (HITS) centrality ranks nodes based on incoming and outgoing edges.

:rtype: (hubs, authority)


Subgraph
--------

Expand All @@ -136,7 +160,50 @@ Graphs are often too large to handle, so a portion of the graph is extracted. Of

.. py:function:: subgraph.k_core(graph: Graph(is_directed=False), k: int) -> Graph

This algorithm finds a maximal subgraph that contains nodes of at least degree *k*.
This algorithm finds a maximal subgraph that contains nodes of at least degree ``k``.


.. py:function:: subgraph.k_truss(graph: Graph(is_directed=False), k: int) -> Graph

Finds the maximal subgraph whose edges are supported by ``k`` - 2 other edges forming triangles.


.. py:function:: subgraph.maximal_independent_set(graph: Graph) -> NodeSet

Finds a maximal set of independent nodes, meaning the nodes in the set share no edges with each other
and no additional nodes in the graph can be added which satisfy this criteria.


.. py:function:: subgraph.subisomorphic(graph: Graph, subgraph: Graph) -> bool

Indicates whether ``subgraph`` is an isomorphic subcomponent of ``graph``.


.. py:function:: subgraph.sample.node_sampling(graph: Graph, p: float = 0.20) -> Graph

Returns a subgraph created by randomly sampling nodes and including edges which exist between sampled
nodes in the original graph.


.. py:function:: subgraph.sample.edge_sampling(graph: Graph, p: float = 0.20) -> Graph

Returns a subgraph created by randomly sampling edges and including both node endpoints.


.. py:function:: subgraph.sample.ties(graph: Graph, p: float = 0.20) -> Graph

Totally Induced Edge Sampling extends edge sampling by also including any edges between the nodes
which exist in the original graph. See the `paper <https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=2743&context=cstech>`__
for more details.


.. py:function:: subgraph.sample.random_walk(graph: Graph, num_steps: Optional[int] = None, num_nodes: Optional[int] = None, num_edges: Optional[int] = None, jump_probability: int = 0.15, start_node: Optional[NodeID] = None) -> Graph

Samples the graph using a random walk. For each step, there is a ``jump_probability`` to reset the walk.
When resetting the walk, if the ``start_node`` is specified, it always returns to this node. Otherwise a random
node is chosen for each resetting. The sampling stops when any of ``num_steps``, ``num_nodes``, or ``num_edges`` is
reached.



Bipartite
Expand All @@ -156,9 +223,18 @@ Algorithms pertaining to the flow capacity of edges.

.. py:function:: flow.max_flow(graph: Graph(edge_type="map", edge_dtype={"int", "float"}), source_node: NodeID, target_node: NodeID) -> Tuple[float, Graph]

Compute the maximum flow possible from source_node to target_node
Compute the maximum flow possible from ``source_node`` to ``target_node``.

:rtype: (max flow rate, computed flow graph)


.. py:function:: flow.min_cut(graph: Graph(edge_type="map", edge_dtype={"int", "float"}), source_node: NodeID, target_node: NodeID) -> Tuple[float, Graph]

Compute the minimum cut to separate source from target node. This is the list of edges which disconnect the graph
along edges with sum to the minimum weight.
Performing this computation yields the maximum flow.

:rtype: (max_flow_rate, compute_flow_graph)
:rtype: (max flow rate, graph containing cut edges)


Utility
Expand All @@ -168,7 +244,7 @@ These algorithms are small utility functions which perform common operations nee

.. py:function:: util.nodeset.choose_random(x: NodeSet, k: int) -> NodeSet

Given a set of nodes, choose k random nodes (no duplicates).
Given a set of nodes, choose ``k`` random nodes (no duplicates).

.. py:function:: util.nodeset.from_vector(x: Vector) -> NodeSet

Expand Down Expand Up @@ -198,6 +274,10 @@ These algorithms are small utility functions which perform common operations nee

Converts and EdgeSet into an EdgeMap by giving each edge a default value.

.. py:function:: util.graph.degree(graph: Graph, in_edges: bool = False, out_edges: bool = True) -> NodeMap

Computes the degree of each node. ``in_edges`` and ``out_edges`` can be used to control which degree is computed.

.. py:function:: util.graph.aggregate_edges(graph: Graph(edge_type="map"), func: Callable[[Any, Any], Any]), initial_value: Any, in_edges: bool = False, out_edges: bool = True) -> NodeMap

Aggregates the edge weights around a node, returning a single value per node.
Expand Down Expand Up @@ -225,3 +305,21 @@ These algorithms are small utility functions which perform common operations nee

Collapse a Graph into a smaller Graph by combining clusters of nodes into a single node.
``labels`` indicates the node groupings. ``aggregator`` indicates how to combine edge weights.

.. py:function:: util.graph.isomorphic(g1: Graph, g2: Graph) -> bool

Indicates whether ``g1`` and ``g2`` are isomorphic.

.. py:function:: util.node_embedding.apply(embedding: NodeEmbedding, nodes: Vector) -> Matrix

Returns a dense matrix given an embedding and a vector of NodeIDs.


Embedding
---------

Embeddings convert graph nodes or whole graphs into a dense vector representations.

.. py:function:: embedding.train.node2vec(graph: Graph, p: float, q: float, walks_per_node: int, walk_length: int, embedding_size: int, epochs: int, learning_rate: float) -> NodeEmbedding

Computes the `node2vec <https://snap.stanford.edu/node2vec/>`__ embedding.
24 changes: 24 additions & 0 deletions docs/user_guide/type_list.rst
Original file line number Diff line number Diff line change
Expand Up @@ -449,3 +449,27 @@ If any node has a weight, all nodes must have a weight. This includes nodes from
both node sets 0 and 1.

If any edge has a weight, all edges must have a weight.


NodeEmbedding
-------------

Holds an embedding for each node, extracted from a graph.
Conceptually, this can be thought of as a dense matrix with each row applying to a single NodeID.

Abstract Properties:

- matrix_dtype: ["str", "float", "int", "bool"]

→ NumpyNodeEmbedding
~~~~~~~~~~~~~~~~~~~~

:ConcreteType: ``NumpyNodeEmbedding.Type``
:value_type: ``NumpyNodeEmbedding``
:data objects:
``.matrix``: ``NumpyMatrix``

``.nodes``: optional ``NumpyNodeMap``

If ``nodes`` is None, the nodes are assumed to be fully sequential, corresponding to the height
of the matrix.
29 changes: 29 additions & 0 deletions metagraph/algorithms/centrality.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import metagraph as mg
from metagraph import abstract_algorithm
from metagraph.types import Graph, NodeMap, NodeSet, NodeID
from typing import Tuple


@abstract_algorithm("centrality.betweenness")
Expand Down Expand Up @@ -31,3 +32,31 @@ def pagerank(
tolerance: float = 1e-05,
) -> NodeMap:
pass # pragma: no cover


@abstract_algorithm("centrality.closeness")
def closeness_centrality(
graph: Graph(edge_type="map", edge_dtype={"int", "float"}),
nodes: mg.Optional[NodeSet] = None,
) -> NodeMap:
pass # pragma: no cover


@abstract_algorithm("centrality.eigenvector")
def eigenvector_centrality(
graph: Graph(edge_type="map", edge_dtype={"int", "float"}),
maxiter: int = 50,
tolerance: float = 1e-05,
) -> NodeMap:
pass # pragma: no cover


@abstract_algorithm("centrality.hits")
def hits_centrality(
graph: Graph(edge_type="map", edge_dtype={"int", "float"}, is_directed=True),
maxiter: int = 50,
tolerance: float = 1e-05,
normalize: bool = True,
) -> Tuple[NodeMap, NodeMap]:
"""Return (hubs, authority)"""
pass # pragma: no cover
12 changes: 12 additions & 0 deletions metagraph/algorithms/clustering.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,19 @@ def louvain_community_step(
pass # pragma: no cover


# TODO: why is this "cluster" instead of "clustering"?
@abstract_algorithm("cluster.triangle_count")
def triangle_count(graph: Graph(is_directed=False)) -> int:
"""Counts the number of unique triangles in an undirected graph"""
pass # pragma: no cover


@abstract_algorithm("clustering.coloring.greedy")
def greedy_coloring(graph: Graph(is_directed=False)) -> Tuple[NodeMap, int]:
"""
Attempts to find the minimum number of colors required to color the graph such that no connected
nodes have the same color. Color is simply represented as a value from 0..n

Returns color for each node and # of colors required
"""
pass # pragma: no covert
18 changes: 17 additions & 1 deletion metagraph/algorithms/flow.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,21 @@ def max_flow(
source_node: NodeID,
target_node: NodeID,
) -> Tuple[float, Graph]:
"""The returned graph is a graph whose edge weights represent the outward flow. It contains all the nodes of the input graph"""
"""
Returns the maximum flow and a graph whose edge weights represent the flow.
It contains all the nodes of the input graph
"""
pass # pragma: no cover


@abstract_algorithm("flow.min_cut")
def min_cut(
graph: Graph(edge_type="map", edge_dtype={"int", "float"}),
source_node: NodeID,
target_node: NodeID,
) -> Tuple[float, Graph]:
"""
Returns the sum of the minimum cut weights and a graph containing only those edges
which are part of the minimum cut.
"""
pass # pragma: no cover
58 changes: 57 additions & 1 deletion metagraph/algorithms/subgraph.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import metagraph as mg
from metagraph import abstract_algorithm
from metagraph.types import NodeSet, Graph
from metagraph.types import NodeSet, Graph, NodeID


@abstract_algorithm("subgraph.extract_subgraph")
Expand All @@ -10,3 +11,58 @@ def extract_subgraph(graph: Graph, nodes: NodeSet) -> Graph:
@abstract_algorithm("subgraph.k_core")
def k_core(graph: Graph(is_directed=False), k: int) -> Graph:
pass # pragma: no cover


@abstract_algorithm("subgraph.k_truss")
def k_truss(graph: Graph(is_directed=False), k: int) -> Graph:
pass # pragma: no cover


@abstract_algorithm("subgraph.maximal_independent_set")
def maximal_independent_set(graph: Graph) -> NodeSet:
pass # pragma: no cover


@abstract_algorithm("subgraph.subisomorphic")
def subisomorphic(graph: Graph, subgraph: Graph) -> bool:
pass # pragma: no cover


@abstract_algorithm("subgraph.sample.node_sampling")
def node_sampling(graph: Graph, p: float = 0.20) -> Graph:
pass # pragma: no cover


@abstract_algorithm("subgraph.sample.edge_sampling")
def edge_sampling(graph: Graph, p: float = 0.20) -> Graph:
pass # pragma: no cover


@abstract_algorithm("subgraph.sample.ties")
def totally_induced_edge_sampling(graph: Graph, p: float = 0.20) -> Graph:
"""
Totally Induced Edge Sampling method
https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=2743&context=cstech
"""
pass # pragma: no cover


@abstract_algorithm("subgraph.sample.random_walk")
def random_walk_sampling(
graph: Graph,
num_steps: mg.Optional[int] = None,
num_nodes: mg.Optional[int] = None,
num_edges: mg.Optional[int] = None,
jump_probability: float = 0.15,
start_node: mg.Optional[NodeID] = None,
) -> Graph:
"""
Sample using random walks

Sampling ends when number of steps, nodes, or edges are reached (first to occur if multiple are specified).
For each step, there is a jump_probability to reset the walk.
When resetting the walk, if start_node is specified, always reset to this node. If not specified, every reset
picks a new node in the graph at random.
"""
# TODO: check that `num_*` variables aren't all `None`
pass # pragma: no cover
12 changes: 12 additions & 0 deletions metagraph/algorithms/utility.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,13 @@ def edgemap_from_edgeset(edgeset: EdgeSet, default_value: Any) -> EdgeMap:
pass # pragma: no cover


@abstract_algorithm("util.graph.degree")
def graph_degree(
graph: Graph, in_edges: bool = False, out_edges: bool = True,
) -> NodeMap:
pass # pragma: no cover


@abstract_algorithm("util.graph.aggregate_edges")
def graph_aggregate_edges(
graph: Graph(edge_type="map"),
Expand Down Expand Up @@ -110,6 +117,11 @@ def graph_collapse_by_label(
pass # pragma: no cover


@abstract_algorithm("util.graph.isomorphic")
def graph_isomorphic(g1: Graph, g2: Graph) -> bool:
pass # pragma: no cover


@abstract_algorithm("util.node_embedding.apply")
def node_embedding_apply(embedding: NodeEmbedding, nodes: Vector) -> Matrix:
pass # pragma: no cover
Loading