# Graph Similarity Measures

Goal of today's class:

1. Understand the basic concept of graph similarity
2. Implement and analyze different similarity measures
3. Develop an intuition for what some similarity measures are telling us and when they might be appropriate to use
4. 

### We want your final projects to showcase your ability to distill the following:
1. Your understanding of a topic not covered in class
2. Your accuracy in creating *correct* code around your topic, along with a strong reference list
3. Your ability to convey your understanding via:
    - A) Chunking the information into a single lesson that builds on itself (e.g. breaking down complex functions into parts, putting the pieces together, using different examples to illustrate your methods)
    - B) Good examples / visuals / writeups of what results produced during the lesson / datasets used to illustrate your concept
    - C) At least one interactive component (i.e., the "Your Turn!" in our lessons)

Much like in our lessons, it's okay to include more than you imagine covering in a (hypothetical) 90 minute class period. Be sure to indicate what sections of your chapter are "advanced topics", and always include relevant citations or motivation.

## What is graph distance?

Graphs are complex, high dimensional objects. However sometimes we just want to ask something like: '_How similar are these two graphs?_'. Graph distances refer to a host of methods which attempt to answer this question by devising some function which takes two graphs as inputs and returns a single number representing how similar or different they are. In other words, given $G_1$ and $G_2$, we seek a function $f$ such that 

$$f(G_1, G_2) = d$$ 

where d is the distance between the two graphs. Of course, in the process of reducing a graph to a single number, information is going to be lost, no matter how ingenious your graph distance function is. Many, many graph distance measures have been proposed which each claim their own strangths.

There are two broad categories of problems when it comes to comparing graphs. Firstly, we may want to compare graphs where the each node in $G_1$ maps on to a node in $G_2$ (**Known Node Correspondence**). For example we may be interested in different types of social interaction that occur between the same set of people, or how the flight patterns between the same set of airports changes over time. Secondly, we may be interested in comparing graphs where there isn't a precise mapping of nodes between the two graphs (**Unknown Node Correspondence**). For example if we wanted to know how similar are the commuting patterns between Boston and San Diego. Or we may be interested in comparing graphs of different sizes. There are different measures for tackling each of these problems. In general, measures for comparing graphs of unknown node correspondence can also be used for graphs with known node correspondence but not vice versa.
________

Types of distance measures:
1. Known node correspondence
    1. Adjacency matrix difference
    2. DeltaCon
    3. Cut distance
2. Unknown node correspondence
   1. Spectral distances
   2. Global statistics
   3. Mesoscopic response functions
   4. Graphlet based methods
   5. Alignment based methods
   6. Portrait divergence
   7. Graph kernels
   8. Persistent homology
   9. Bayesian methods

Important concepts
1. Known vs unknown node correspondence
2. Comparison within vs between graph ensembles
3. Local vs meso vs macro structure

## 1. Frobenius norm
- Explain isomorphism
- Idea that distances should have the property that isomorphic (or minimally edited graphs) ought to have a smaller distance
- Compute $d$ for various instances of ER graph with different densities
- Test sensitivity to removing or randomizing edges

## 2. DeltaCon 
- Set up problem that we expect graphs with 

## 3. Spectral distances

## 4. Portrait divergence



_______

## References and further resources:

1. Hartle, H., Klein, B., McCabe, S., Daniels, A., St-Onge, G., Murphy, C., Hébert-Dufresne, L., 2020. Network comparison and the within-ensemble graph distance. Proc. R. Soc. A. 476, 20190744. https://doi.org/10.1098/rspa.2019.0744
2. Soundarajan, S., Eliassi-Rad, T., Gallagher, B., 2014. A Guide to Selecting a Network Similarity Method, in: Proceedings of the 2014 SIAM International Conference on Data Mining. Presented at the Proceedings of the 2014 SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, pp. 1037–1045. https://doi.org/10.1137/1.9781611973440.118
3. Tantardini, M., Ieva, F., Tajoli, L., Piccardi, C., 2019. Comparing methods for comparing networks. Sci Rep 9, 17557. https://doi.org/10.1038/s41598-019-53708-y
4. Wills, P., Meyer, F.G., 2020. Metrics for graph comparison: A practitioner’s guide. PLoS ONE 15, e0228728. https://doi.org/10.1371/journal.pone.0228728


(Aim for 10+ useful citations)