Lateral movement is a key stage of system compromise used by advanced persistent threats. Detecting it is no simple task. When network host logs are abstracted into discrete temporal graphs, the problem can be reframed as anomalous edge detection in an evolving network. Research in modern deep graph learning techniques has produced many creative and complicated models for this task. However, as is the case in many machine learning fields, generality of models is of paramount importance to achieve good accuracy and scalability in training and inference.
In this paper, we propose a formalized version of this approach in a framework we call Euler. It consists of a model-agnostic graph neural network stacked upon a model-agnostic sequence encoding layer such as a recurrent neural network. Models built according to the Euler framework can easily distribute their graph convolutional layers across multiple machines for large performance improvements. We demonstrate that Euler-based models are competitive, or better than many state-of-the-art approaches to anomalous link detection and prediction. As anomaly-based intrusion detection systems, Euler models can efficiently identify anomalous connections between entities with high precision and outperform other unsupervised techniques for anomalous lateral movement detection.
Most temporal link prediction models which utilize graph neural networks do not scale well. Almost every model in this field uses a GNN combined with an RNN in such a way where the GNN input is dependant on the RNN output, as shown below:
Our framework separates the two such that GNNs are free to run independantly. In this way, they can be distributed across multiple machines for large performance improvements
By reframing lateral movement detection in a network as anomalous edge detection over a temporal graph, models following this technique can be used for intrusion detection systems. Below, we present a toy example of anomalous network activity that could only be captured if a network were abstracted in this manner:
Here, C0 and C1 should only communicate with the shared drive after Alice and Bob have authenticated with them, respectively. However in time step 5, C1 connects to the shared drive without Bob's prior authorization. Pattern would be difficult to detect with other anomaly-based IDS's--all of which either do not consider the spatial relationship between Bob and the shared drive, or do not consider the temporal distinction between the edge (C1, SD) at times 1 and 5.
This framework is scalable to large datasets, and is much faster than every other temporal link prediction method we tried
This allows us to run experiments on real-world enterprise data sets quickly, and with highly accurate results. We show that while Euler-based models have equivilant precision and accuracy on smaller data sets, on larger ones, the simplicity of the model allows it to outperform state-of-the-art temporal link predictors, as well as other graph-based approaches to anomaly-based IDS's.