# Front Matter

## Abstract

This thesis is essentially a general overview of spectral methods on networks, and how you can use tools from a network's eigenspace to understand and explain the network more deeply. Why are networks an interesting thing to learn about, and why should you care?

Well, at some level, every aspect of reality seems to be made of interconnected parts. Atoms and molecules are connected to each other with chemical bonds. Your neurons connect to each other through synapses, and the different parts of your brain connect to each other through groups of neurons interacting with each other. At a larger level, you are interconnected with other humans through social networks, and our economy is a global, interconnected trade network. The Earth's food chain is an ecological network, and larger still, every object with mass in the universe is connected to every other object through a gravitational network.

So if you can understand networks, you can understand a little something about everything!

We'll cover the fundamentals of spectral methods with respect to network data science, focusing on developing intuition on networks as statistical objects, while paired with relevant Python code. By the end of this thesis, you will be able to utilize efficient and easy to use tools available for performing analyses on networks. You will also have a whole new range of statistical techniques in your toolbox, such as representations, theory, and algorithms for networks.

We'll spend this thesis learning about network algorithms by showing how they're implemented in production-ready Python frameworks:
- Numpy and Scipy are used for scientific programming. They give you access to array objects, which are the main way we'll represent networks computationally.
- Scikit-Learn is very easy to use, yet it implements many Machine Learning algorithms efficiently, so it makes a great entry point for downstream analysis of networks.
- Graspologic is an open-source Python package developed by Microsoft and the NeuroData lab at Johns Hopkins University which gives you utilities and algorithms for doing statistical analyses on network-valued data.

The thesis favors a hands-on approach, growing an intuitive understanding of networks through concrete working examples and a bit of theory. While you can read this thesis without picking up your laptop, I highly recommend you experiment with the code examples available online as Jupyter notebooks at [http://docs.neurodata.io/graph-stats-book/index.html](http://docs.neurodata.io/graph-stats-book/index.html).

**Primary Reader and Advisor**: Joshua Vogelstein  
**Secondary Reader**: Avanti Athreya

## Acknowledgements

Big thanks to everybody who has been reading the thesis as I write and giving feedback. This list includes Dax Pryce, Ross Lawrence, Geoff Loftus, Alexandra McCoy, Olivia Taylor Peter Brown, Sambit Panda, Eric Bridgeford, Josh Vogelstein, and Ali Sad-Aldin. 

I am grateful to my advisor, Joshua Vogelstein, for his insights and strong feedback. The value he puts on clarity and simplicity in any mathematical model has been an enormous help throughout this process.

I am also especially grateful to Eric Bridgeford, who has been giving me constant feedback throughout the writing process. I would be lost in a sea of papers without his help.

Lastly, I'm grateful to my father, Geoffrey Loftus, for teaching me the value of rigor in science and for being a resoundingly positive role model throughout my life.

## **Dedication**

This thesis is dedicated to my father, Geoffrey Loftus, for teaching me the value of rigor in science and for being a resoundingly positive role model throughout my life, and to my mother, Susan Loftus, for teaching me to never give up in the face of adversity.

## Contents

**Abstract**  
**Acknowledgements**  
**List of Figures**  
**Matrix Representations of Networks**  
  The Adjacency Matrix  
  The Incidence Matrix  
  The Oriented Incidence Matrix  
  The Degree Matrix  
  The Laplacian Matrix  
**Why Embed Networks?**  
  High Dimensionality of Network Data  
  Latent Estimation  
  The Latent Position Matrix  
  Edge Probability Estimation
  Block Probability Matrices  
  Geometry of Latent Positions  
**Spectral Embedding Methods**  
  Singular Vectors and Singular Value Decomposition  
  Breaking Down the Laplacian  
  Matrix Rank  
  Sums of Rank One Matrices  
  Laplacian Approximation Through Summation  
  Increased Usefulness of Approximation with Larger Networks  
  Matrix Rank and Spectral Embedding  
  Dimensionality Estimation  
  The Two-Truths Phenomenon  
**Multiple-Network Representation Learning**  
  Data Generation  
  Simple Embedding Methods on Multiple Networks  
  Averaging Separately  
  Averaging Together  
  Different Types of Multi-Network Representation Learning  
  Network Combination: Together  
  Network Combination: Separate  
  Embedding Combination  
  Multiple-Adjacency Spectral Embedding  
  Overview of MASE
  Data Generation  
  Embedding  
  Combining Embeddings  
  Joint Embedding of Combinations  
  Score Matrices  
  Omnibus Embedding  
  OMNI on Four Networks  
  Overview of OMNI  
  The Omnibus Matrix  
  Embedding the Omnibus Matrix  
  Using the Omnibus Embedding  
**Joint Representation Learning**  
  Data Generation  
  Covariates  
  Covariate-Assisted Spectral Embedding  
  Weight Exploration  
  Weight Estimation  
  Omnibus Joint Embedding  
  MASE Joint Embedding  
**Single-Network Vertex Nomination**  
  Spectral Vertex Nomination  
  Finding a Single Set of Nominations  
  Nominations for Each Node  
**Out-of-Sample Embedding**  
  Data Generation  
  Probability Vector Estimation  
  Inversion of Probability Vector Estimation  
  The Moore-Penrose Pseudoinverse  
  Using the Pseudoinverse for Out-of-Sample Estimation  
**Anomaly Detection for Timeseries of Networks**  
  Simulating Timeseries Data  
  Approaches for Anomaly Detection  
  Detecting if the First Time-Point is an Anomaly  
  Hypothesis Testing a Test Statistic  
  Bootstrapped Distribution Estimation  
  P-Value Estimation  
  Testing the Remaining Time-Points  
  The Distribution of the Bootstrapped Test Statistic  

## List of Figures