# WikiData Relation

In [21]:
import numpy as np

In [22]:
# Set the stock market (NASDAQ or NYSE)
market = "NYSE"

In [23]:
a = np.load("../Temporal_Relational_Stock_Ranking/data/relation/wikidata/{}_wiki_relation.npy".format(market))

In [24]:
a.shape

(1737, 1737, 33)

## Consistency Tests

Consider R(A, B) is the relation vector between stock A and B. The relation vector has size `num_relations` and value 1 in the index of the type of relation. Differently from sector industry, in this case, a relation vector can have multiple types, so a vector like this is possible: `[1 0 ... 1 0]`. Note that, for that reason, there will be duplicate edges in `edge_index`, but they will have different relation types. Additionally, keep in mind that this graph is directed.

For some reason, the relation vector of the same stock always has its last value as 1, so R(A, A) might have the following values, in this case: `[1 0 ... 1 1].` 

### Apparently, the last relation type is always 0 in R(A,B) relations

In [25]:
for i in range(a.shape[0]):
    for j in range(a.shape[1]):
        if a[i][j][-1] == 1 and i != j:
            print("ERROR: last relation type value 1 in R(A,B)")

### The last relation type is always 1 in R(A,A) relations

In [26]:
for i in range(a.shape[0]):
        if a[i][i][-1] == 0:
            print("ERROR: last relation type value 0 in R(A,B)")

### R(A, A) needs to have only one value 1

In [27]:
for i in range(a.shape[0]):
    sum = np.sum(a[i][i])
    if sum != 1:
        print("ERROR: condition not met!")

## Conversion to edge_index

In the wikidata, the graph is directed, so it might take longer.

In [28]:
# function to add edge to edge_index
def add_edge(edge_index, orig_node, dest_node, undirected=True):
    array_to_add = np.array([[orig_node], [dest_node]])
    if edge_index is None:
        new_edge_index = array_to_add.copy()
    else:
        new_edge_index = np.hstack((edge_index, array_to_add))
    if undirected:
        array_to_add = np.array([[dest_node], [orig_node]])
        new_edge_index = np.hstack((new_edge_index, array_to_add))
    return new_edge_index

In [29]:
# loop array
edges = 0
edge_index = None
edge_type = []

# loop through all relations.
for i in range(a.shape[0]):
    for j in range(a.shape[1]):
        if i != j and a[i][j].sum() > 0:
            types = np.where(a[i][j] == 1)[0]
            for type_ in types:
                edge_index = add_edge(edge_index, i, j, undirected=False)
                edge_type.append(type_)
                edges += 1

edge_type = np.array(edge_type)

print(edges)
print(edge_index.shape)
print(edge_type.shape)
print(np.max(edge_index))

8765
(2, 8765)
(8765,)
1735


## Save

In [30]:
np.save("../relational_data/edge_indexes/{}_wikidata_edge_index.npy".format(market), edge_index)
np.save("../relational_data/edge_types/{}_wikidata_edge_type.npy".format(market), edge_type)

## Visualization

In [17]:
import torch
from torch_geometric.data import Data

edge_index = torch.tensor(edge_index, dtype=torch.long)
x = torch.zeros(1, a.shape[0])

data = Data(x=x, edge_index=edge_index, num_nodes=a.shape[0])

In [18]:
import networkx as nx
from torch_geometric.utils import to_networkx

g = to_networkx(data, to_undirected=False)
nx.write_gexf(g, "../relational_data/gephi_visualizations/{}_wikidata.gexf".format(market))