In [74]:
import torch
from torch_geometric.utils import cumsum, coalesce, degree, sort_edge_index
import pathpyG as pp
from torch_geometric.data import Data

Explanation for order lifting


NB: Turns out everything depends from the initial sorting (even more than it seems from just looking at the code). 
The initial sorting by source id leads to **sorted** ho source indices.
Notice that each node has get edges (ho-nodes) with consective indices (with length equal to its outdegree).


In [104]:
# num_nodes = 3
# edge_index = torch.tensor([[2,1,1],[1,0,2]])
num_nodes = 6
edge_index = torch.tensor([[0,1,3,4,2,2,5],[2,2,5,5,3,4,0]])

In [105]:
data = Data(edge_index = edge_index)

g = pp.Graph(data,pp.IndexMap(list("012345")))
pp.plot(g,  node_label=g.mapping.node_ids.tolist())

<pathpyG.visualisations.network_plots.StaticNetworkPlot at 0x7fb91dba23e0>

In [138]:
pp.plot(ho_graph)

<pathpyG.visualisations.network_plots.StaticNetworkPlot at 0x7fb91dcf54b0>

first we need to sort the edge index by the source indexes

In [106]:
edge_index = sort_edge_index(edge_index, num_nodes=num_nodes)
edge_index

tensor([[0, 1, 2, 2, 3, 4, 5],
        [2, 2, 3, 4, 5, 5, 0]])

Then we compute the outdegrees.

In [107]:
outdegree = degree(edge_index[0], dtype = torch.long, num_nodes=num_nodes)
outdegree

tensor([1, 1, 2, 1, 1, 1])

The 'outdegree_per_dst' gives us the number of out edges we would find by arriving to the node through one of the edges that has it as target.
Indeed, edge_index[1] is a list of all the instubs. 
Associating each in-stub with the outdegree of the corresponding node gives us all 2-paths grouped by the in-edge (in edge is a x-v edge, where v is the node of which we know the outdegree -- Notice that at this time the node x of the x-edge is not considered).

In [108]:
outdegree_per_dst = outdegree[edge_index[1]]
outdegree_per_dst

# nodes 0,2,1

tensor([2, 2, 1, 1, 1, 1, 1])

From 'outdegree_per_dst' we can unpack the information per in-edge contained in the outdegree_per_dst by using the torch funtion 'repeat_interleave'.
'repeat_interleave' works like this: 
If the repeats is tensor([n1, n2, n3, …]), then the output will be tensor([0, 0, …, 1, 1, …, 2, 2, …, …]) where 0 appears n1 times, 1 appears n2 times, 2 appears n3 times, etc.
From the line grap perspective, we can see it as creating the indexes of (lifted, new) source-higher-order nodes.
In other words, for each (unknow) starting x, a new higher-order node-index is created and repeated a number of time given by the outdegree of v, giving the amount of times x-v appears as a source higher-order node. 


In [109]:
torch.repeat_interleave(torch.tensor([1,2,3]))

tensor([0, 1, 1, 2, 2, 2])

In [110]:
ho_edge_srcs = torch.repeat_interleave(outdegree_per_dst)
ho_edge_srcs

# 123 124 - 023 024 - 234 - 234 


tensor([0, 0, 1, 1, 2, 3, 4, 5, 6])

Finally, we define 'num_new_edges' as the total number of x-v-w. 
This comes from taking each incident edge x-v and considering all the w it can reach (given by its outdegree)


...i.e, the number of two paths.

In [111]:
num_new_edges = outdegree_per_dst.sum()
num_new_edges

tensor(9)

At the end of this first part of the process, we have take the edges incident to each node 
and have then repeated their index a number of time equals to the outdegree of that node

Then,
Create destination nodes that start the indexing after the cumulative sum of the outdegree
of all previous nodes in the ordered sequence of nodes 



This cumsum tells us the number of times we need each index (in teh out-indices?). i.e., 
If the first node has oudegree two, we need to allocate the index twice
the next node, needs the number reached by the previous plus the outdegree of the next node. 

**!!!**

ptrs for each node v (Notice it is applied on the outdegree vector which is in R^|V|) stores a starting index for the edges outgoing from it (i.e. the destination ho-nodes). 
Notice that the indexes of the edges outgoing from v (in position with index i) end where the ones of the next node (at position i+1) start.  

In [112]:


ptrs = cumsum(outdegree, dim=0)[:-1] # position in the first order tensor where edge indexes of edges startin from each node (let s say w) start
# 00001112 -> 0,4,7  (related to first order edge indices)

# node 0 needs from 0 to 4, 1 from 4 to 7, and 2... (the rest?)

ptrs


# ptrs[edge_index[1]]
# then we index by the target indexes in fo-nodes: edge_index[1]. 
# this will tell us the starting position of the target node as source node in the sorted fo-edge indices
# i.e., Target node w if taken as source will have positions starting from ptrs[w] in edge index
# e.g., edge_index[1] -> [4,4,7] means that the continueation (source) indexes for the the nodes in edge_index[1] start at [4,4,7]
# notice that this would be impossible if the edge_index weren't sorted


# torch.repeat_interleave(ptrs[edge_index[1]], outdegree_per_dst)
# Then, we repeat that starting index (of the target node w as source) a number of time equal to the nodes' outdegree
# This will give us the starting index (of the target node w as source) repeated the number of times w is actually used as source in the line-graph
# Givers the starting index (of w as source) for all edges w->y 
# e.g., [[4,4],[4,4],[7,7,7]] means that a-w_0 and b-w_0 can be continued twice, and c-w_1 three times (using fo-edge-indices starting at 4,4,and 7 ,repsectively)
# the above gives an initial 'ho_edge_dsts' (pointers to the w_i-k that are ho_edge_dsts)

# ---Index correction ---
# Until this point we only have the starting (nod) index of the fo-edges that can (transitively) propagate an edge with  w as target.
# we need to move to specific --not starting-- (ho-node) indexes of ho-edges.
# 
# first, we generated a tensor with valeus from 0 to num_new_edges ()
# then, we compute 'cumsum(outdegree_per_dst, dim=0)' which gives where the indexes of each w_i-k start in 'ho_edge_dsts' 
# e.g., [[4,4],[4,4],[7,7,7]] -> [0,2,5] (dim again equal to number of fo-edges) (WRONG HERE)
# this says 
# 
# then, we select them based on the ho_edge_srsc.
# so if ho_edge_srcs = [0,1,1,1,2,2], we ll select [0,2,2,2,5,5] 
# what is this???
# if we substrat this from the tensor with valeus from 0 to num_new_edges

tensor([0, 1, 2, 4, 5, 6])

Select the number of indexes (repetitions) needed by each ho-target index.
From which entry to which entry we ll have teh index of a (ho?) node

since the edge indices have been ordered by source index, now ptrs has the 

In [113]:
torch.repeat_interleave(torch.tensor([5,6,7]),torch.tensor([1,2,3]))

tensor([5, 6, 6, 7, 7, 7])

In [114]:
edge_index[1]

tensor([2, 2, 3, 4, 5, 5, 0])

In [115]:
# this gives the number
ptrs[edge_index[1]]

tensor([2, 2, 4, 5, 6, 6, 0])

In [116]:
outdegree_per_dst

tensor([2, 2, 1, 1, 1, 1, 1])

ptrs[edge_index[1]]: we use 'edge_index[1]' to couple each target node to the its starting-index-pointer (as obtained in ptrs) 

Then, use 'repeat_interleave' to repeat the pointer (containing the starting index) a number of time given by 'outdegree_per_dst'. 
Remember that ' outdegree_per_dst' maps each (target) node v, to its outdegree as a source. 
Therfore, at this point, ho_edge_dsts maps each node v to its starting-index-pointer. 
Succesively, the pointer will be corrected so that they contain the index and not the starting-index-pointer. 


In [117]:
# if I focus on the target indices, I have the number of time we need to repear the index of outstub.


# Building new edge_index
# node in ho is edge
# degree of fo-ones tells how many new outgoing edges we need to allocate for the ho-node
# diff from before... (where using outdegree...)
# ...cause 

# the pointers gives where edges start for each node in fo. Indexing by destination, I get 


ho_edge_dsts = torch.repeat_interleave(ptrs[edge_index[1]], outdegree_per_dst)
ho_edge_dsts

tensor([2, 2, 2, 2, 4, 5, 6, 6, 0])

The cell above give each (target) node v a list of pointers. 
Here we computer an index correction to adjust the indices (go beyond the starting-index-pointer)



*(outdegree-per_dst was giving the out degree of each target node v in 'edge_index[1]')*
'cumsum(outdegree_per_dst, dim=0)' stores a starting index for the edges incoming into a node v (in this case). 
Notice that the dimensionality of this tensor is equal to the number of edges (one entry for each time we take outdegree of a target node in 'edges_index[1]')



Then we select

In [118]:
# num_new_edges = outdegree_per_dst.sum()
idx_correction = torch.arange(num_new_edges, dtype=torch.long, device=edge_index.device)
idx_correction

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [119]:
edge_index

tensor([[0, 1, 2, 2, 3, 4, 5],
        [2, 2, 3, 4, 5, 5, 0]])

In [120]:
outdegree_per_dst

tensor([2, 2, 1, 1, 1, 1, 1])

In [121]:
cumsum(outdegree_per_dst, dim=0)

tensor([0, 2, 4, 5, 6, 7, 8, 9])

In [122]:
idx_correction - cumsum(outdegree_per_dst, dim=0)[ho_edge_srcs]

tensor([0, 1, 0, 1, 0, 0, 0, 0, 0])

In [123]:
ho_edge_srcs

tensor([0, 0, 1, 1, 2, 3, 4, 5, 6])

In [124]:
ho_edge_dsts

tensor([2, 2, 2, 2, 4, 5, 6, 6, 0])

In [125]:
# cumsum(outdegree_per_dst, dim=0) gives the number of entries required by each target of each source
# here, selecting with ho_edge_srcs 
cumsum(outdegree_per_dst, dim=0)[ho_edge_srcs]

tensor([0, 0, 2, 2, 4, 5, 6, 7, 8])

In [126]:
idx_correction -= cumsum(outdegree_per_dst, dim=0)[ho_edge_srcs]
idx_correction

tensor([0, 1, 0, 1, 0, 0, 0, 0, 0])

In [127]:
ho_edge_dsts += idx_correction
ho_edge_dsts

tensor([2, 3, 2, 3, 4, 5, 6, 6, 0])

All in all, we initialize 'ho_edge_dsts' entries with the minumum index the target ho-node v-t can get (???)(considering that also these indices are sorted by the source indices).


Then, we build upon this.
We initialize the correction as a tensor 'idx_correction' with the index of each position ('torch.arange(num_new_edges)')
Then we compute a starting index for the edges incoming into v (or t?); we subtract this from 'idx_correction'.
This gives the index correction. 
Thus, the index correction ... gives something that ????


In [130]:
ho_edge_dsts

tensor([2, 3, 2, 3, 4, 5, 6, 6, 0])

In [137]:
data_ho = Data(edge_index=torch.stack([ho_edge_srcs, ho_edge_dsts], dim=0))
ho_graph = pp.Graph(data_ho)