<img src='harrypotter_image.png' width='200px' style="float:left;margin-right:10pt"></img>
## Illustration of NWHypergraph <-> HyperNetX exchange using PyBind and Kaggle's HarryPotter dataset.

In this tutorial we introduce `hypernetx.StaticEntity` and `hypernetx.StaticEntitySet` and the new `static=True` attribute in the `hypernetx.Hypergraph` class. 

Harry Potter Data is available here: https://www.kaggle.com/gulsahdemiryurek/harry-potter-dataset.

Python code for parsing the dataset is in `harrypotter.py` in this tutorial's directory.

In [1]:
import hypernetx as hnx
import matplotlib.pyplot as plt
from collections import OrderedDict, defaultdict
import scipy
from scipy.sparse import coo_matrix, csr_matrix, issparse
import pandas as pd
import numpy as np
import itertools as it
import sys
# from harrypotter import HarryPotter

In [2]:
hogwarts = HarryPotter()

NameError: name 'HarryPotter' is not defined

### The Harry Potter Dataset:
We have edited the dataset so that it has 5 categories and every cell is filled. Where a value is unknown, we marked it as "Unknown *category_name*". 

In [None]:
hogwarts.dataframe

**We define a labeling based on the categories and store it in an OrderedDict.**   
The ordering of labels is determined by their order of appearance in the table with the exception of Unknown labels, which are always listed first.

In [None]:
hogwarts.labels

**We next create a tensor with dimension equal to the number of categories and indexed by the labels.**  
The tensor is a 0-1 boolean tensor. A 1 in entry $(x_0,x_1,x_2,x_3,x_4, x_5)$ indicates that there is a record in the data corresponding to:

$
\text{'House':hogwarts.labels['House']}[x_0],\text{'Blood Status':hogwarts.labels['Blood Status']}[x_1]\text{...etc...}
$

The tensor may be stored as a list of its nonzero indices, or in a sparse format taking the transpose of that list, or as a `numpy.ndarray`.

In [None]:
## List of nonzero indices
hogwarts.data

In [None]:
## sparse tensor format
hogwarts.data.transpose()

In [None]:
## numpy.ndarray - note this will not be the way we store the data for our eventual release. 
## Rather, we will use a sparse format as in the last cell.
hogwarts.arr

In [None]:
hogwarts.arr.shape

### StaticEntity and StaticEntitySet

The entire dataset has now been represented using an array and a dictionary associating positions in the array with values in the data.

The basic object in HyperNetX, which holds an array and label dictionary, is a `StaticEntity`.

Each dimension of the array, is considered a "level" in the StaticEntity. A level's order corresponds to its position in the datatable. In terms of the original Entity structure in HyperNetX, levels reference an order of containment. Elements of the the 2nd level belong to elements in the 1st column and so on.  The order of levels is given by the order of keys in the labels:

In [None]:
E = hnx.StaticEntity(arr = hogwarts.arr, labels = hogwarts.labels)
E.keys

**Using the same nomenclature as `hypernetx.Entity`, the elements of a StaticEntity refer to the first level and its children refer to the second level.**

In [None]:
E = hnx.StaticEntity(arr=hogwarts.arr, labels=hogwarts.labels)
E.elements,E.children

**Levels can be reordered and any pair of columns may be organized to be elements and children.**

In [None]:
E.elements_by_level(level1=2,level2=3,translate=True)

In [None]:
F = E.restrict_to_levels([2,3,1]) ## This generates a new StaticEntity
F.labels

In [None]:
F.elements

### A StaticEntitySet is a StaticEntity restricted to two levels. 
By default, a StaticEntity will grab the 1st two dimensions of the of the array and first two keys of the labels, but any pair of levels may be specified. 

In [None]:
ES = hnx.StaticEntitySet(arr=hogwarts.arr, labels=hogwarts.labels, level1=0, level2=1)

In [None]:
ES.labels

## Static Hypergraph
A static hypergraph is one where all nodes and edges are known at the time of construction. This permits an internal ordering and uid structure for easy reference and faster computation of metrics.

The nodes and edges of a static Hypergraph are stored using the StaticEntitySet structure. 

A static Hypergraph may be instantiated by adding the keyword argument `static=True`. Hypergraphs have `static=False` by default. A static Hypergraph may be created with a set system of the form:
1. *dict* with static=True 
2. *iterable of iterables* with static=True, 
3. `hypernetx.EntitySet` with static=True
4. `hypernetx.StaticEntity` (static is automatically set to True),
5. `hypernetx.StaticEntitySet` (static is automatically set to True).


In [None]:
## example, instantiate from a dictionary
d = E.elements
Hdict = hnx.Hypergraph(d,static=True)

d,Hdict.edges.keys,Hdict.isstatic,Hdict.nodes,Hdict.edges

In [None]:
## example, instantiate from an Entity
H = hnx.Hypergraph(E)

H.edges.keys,H.isstatic,H.nodes,H.edges

**Restrict to specific edges and nodes as before**

In [None]:
H = H.restrict_to_edges(['Gryffindor','Ravenclaw','Slytherin','Hufflepuff'])
H.edges

**A Hypergraph with static=True may be converted to a static=False hypergraph if dynamic properties or visualizations are needed.**  
(Similarly any hypergraph may be converted to a static hypergraph using its incidence_dict.)

In [None]:
Hd = H.remove_static()
hnx.draw(Hd)

## NWHypergraph (NWHy) has fast methods to apply to Hypergraphs stored as sparse arrays

### All s-metric calculations in HNX use an s-line graph representation of the Hypergraph
For each integer s>0, a hypergraph generates a unique s-line graph. The vertices in the line graph correspond to hypergraph edges. There exists a line graph edge between two line graph vertices if the hypergraph edges they represent intersect in at least s nodes in the hypergraph.

Because this line graph representation is so important for all s-metrics, our proof of concept demonstration has NWHy calculating the s-line graphs of hypergraphs passed from their python representation.

**The nwhy.convert_to_s_overlap returns the adjacency matrix and index corresponding to the s-line graph.**

In [None]:
H.incidence_matrix()  ## the edge data for H is stored in the array associated with its edges, a StaticEntitySet

In [None]:
H.edges.elements

**HNX exchanges matrices with NWHypergraph using the sparse format of three arrays: row_indices, column_indices, data**

In [None]:
hpcoo = coo_matrix(H.incidence_matrix().transpose())
hpcoo.row,hpcoo.col,hpcoo.data

In [None]:
mat = H.incidence_matrix(index=True)
mat[0].transpose().todense(),mat[1],mat[2]

In [None]:
H.edge_adjacency_matrix(s=2).todense()

**We retrieve the s line graph for multiple values of s and return the s-metrics**

The newrow,newcol,newdata are indices generated by NWHy to reference the edges in H.   
These provide coordinate for the upper triangular portion of the s-adjacency matrix.  

The oldrow,oldcol,olddata return a mapping of these indices to the same ones used by HNX.

## Looking ahead:

Once NWHy has stored the s-graph adjacency matrix, all of the s-metrics we currently compute using Python libraries, may now be computed using the NWGraph and NWHypergraph libraries. This will permit us to continue using the HNX Python interface but optimized for large datasets.