<img src='harrypotter_image.png' width='200px' style="float:left;margin-right:10pt"></img>
# Illustration of Static Hypergraphs using Kaggle's HarryPotter dataset.

In this tutorial we introduce `hypernetx.StaticEntity` and `hypernetx.StaticEntitySet` and the new `static=True` attribute in the `hypernetx.Hypergraph` class. 

Harry Potter Data is available here: https://www.kaggle.com/gulsahdemiryurek/harry-potter-dataset.

Python code for parsing the dataset is in `harrypotter.py` in this tutorial's directory.

In [None]:
import hypernetx as hnx
import networkx as nx
import matplotlib.pyplot as plt
from collections import OrderedDict, defaultdict
import scipy
from scipy.sparse import coo_matrix, csr_matrix, issparse
import pandas as pd
import numpy as np
import itertools as it
import sys
from harrypotter import HarryPotter

## The Harry Potter Dataset: 
To use a csv file for a Static Hypergraph, we need every cell filled with a label. 
We have edited the Harry Potter dataset so that it has 5 categories and every cell is filled. Where a value is unknown, we marked it as "Unknown *category_name*". 

In [None]:
hogwarts = HarryPotter()

In [None]:
hogwarts.dataframe

### We define a labeling based on the categories and store it in an Ordered Dictionary.
The ordering of labels is determined by their order of appearance in the table with the exception of Unknown labels, which are always listed first.

In [None]:
hogwarts.labels

### We next create a tensor with dimension equal to the number of categories and indexed by the labels. 
We encode the data in each column using a sequence of integers and store the coded data along with translator functions to retrieve the original names as needed. Here we remove duplicate rows but counts could be collected for a weighting scheme.

In [None]:
## List of nonzero indices
hogwarts.data

In [None]:
hogwarts.data.shape

In [None]:
hogwarts.arr

In [None]:
hogwarts.arr.shape

## StaticEntity and StaticEntitySet

The entire dataset has now been represented using a data array or tensor array and a dictionary associating positions in the tensor array and values in the data array with values in the original data.

The basic object in HyperNetX, which holds the data and label dictionary for a static hypergraph, is a `StaticEntity`.

Each dimension of the array, is considered a **level** in the StaticEntity. A level's order corresponds to its position in the datatable. In terms of the original Entity structure in HyperNetX, levels 'loosely' reference an order of containment. Elements of the the 2nd level belong to elements in the 1st column. Pairwise one can say the elements of one level belong to the elements of its predecessor column. The order of levels is given by the order of keys in the labels:

In [None]:
E = hnx.StaticEntity(arr = hogwarts.arr, labels = hogwarts.labels)
E.keys

### A StaticEntitySet is a StaticEntity restricted to two levels. 
By default, a StaticEntity will grab the 1st two dimensions of the of the array and first two keys of the labels, but any pair of levels may be specified. 

In [None]:
ES = hnx.StaticEntitySet(E)
ES.labels

## Static Hypergraph
A static hypergraph is one where all nodes and edges are known at the time of construction. This permits an internal ordering and uid structure for easy reference and faster computation of metrics.




In [None]:
H = hnx.Hypergraph(ES,static=True,use_nwhy=True)
H.edges

In [None]:
H.nodes

In [None]:
H.incidence_matrix().todense()

In [None]:
H.state_dict

In [None]:
H.dataframe()

### Restrict to specific edges and nodes

In [None]:
HF = H.restrict_to_edges(['Gryffindor','Ravenclaw','Slytherin','Hufflepuff'])
HF.dataframe()

In [None]:
fig,ax = plt.subplots(1,2,figsize=(15,6))
hnx.draw(H,ax=ax[0]);
hnx.draw(H.dual())
H.edges

## Collapse identical elements
This method exists to collapse identical nodes and edges and is implemented for dynamic hypergraphs.
We wish to do the same for large unwieldy hypergraphs stored as static.

In [None]:
pos = {'Unknown House': np.array([-0.10670759,  0.39625995]), 'Gryffindor': np.array([-0.32244912,  0.27409625]), 'Ravenclaw': np.array([0.57391404, 0.27217292]), 'Hufflepuff': np.array([-0.02345858,  0.16025175]), 'Slytherin': np.array([-0.02249078, -0.50964294]), 'Durmstrang Institute': np.array([-0.08558045, -1.        ]), 'Unknown Blood status': np.array([0.15155363, 0.65523899]), 'Half-blood': np.array([0.24364009, 0.04186756]), 'Pure-blood': np.array([-0.45423213, -0.07752039]), 'Pure-blood or half-blood': np.array([ 0.04581088, -0.21272409])}

In [None]:
nodes = ['Pure-blood or half-blood',  'Unknown Blood status', 'Pure-blood', 'Half-blood',  ]
Hn = H.restrict_to_nodes(nodes)
hnx.draw(Hn,pos=pos)

In [None]:
Hc,clses = Hn.collapse_edges(return_equivalence_classes=True)

## now draw the dynamic versions
fig,ax = plt.subplots(1,2,figsize=(15,6))
hnx.draw(Hn.remove_static(),ax=ax[0],pos=pos);
ax[0].set_title('original')
hnx.draw(Hc,ax=ax[1],pos=pos);
ax[1].set_title('collapsed');
clses

### Hypergraph methods apply to both static and dynamic hypergraphs

In [None]:
H.isstatic

In [None]:
G = H.bipartite()
cmap = ['r' if G.nodes[n]['bipartite']==0 else 'b' for n in G.nodes ]
nx.draw(H.bipartite(),node_color=cmap,with_labels=True)

In [None]:
print(hnx.info(H))

In [None]:
## Once the dist stats are computed, they are stored in the state dict for fast recall and reference
hnx.dist_stats(H)

In [None]:
H.state_dict

In [None]:
fig,ax = plt.subplots(1,2,figsize=(15,6))
pos = hnx.draw(H,ax=ax[0],return_pos=True)
hnx.draw(H.toplexes(),ax=ax[1],pos=pos)

In [None]:
H.collapse_edges()

In [None]:
H.state_dict