# Persistent Graph Semantics in Azure Data Explorer & Graphistry Integration

## Microsoft Azure Data Explorer Update

In a significant step forward for graph analytics, **Microsoft has introduced Persistent Graph Semantics in Azure Data Explorer (ADX)**.  
This feature, announced in [this official Azure update](https://azure.microsoft.com/en-us/updates?id=495985), allows users to define **graph projections as persistent database objects**. 

### Key Benefits:
- Define graph relationships using persisted `graph semantics` that live alongside your ADX tables.
- Simplifies repeatable graph analysis workflows.
- Enables performance optimizations by reusing defined graph topologies.

This new capability is especially powerful for users working with large-scale telemetry, security, and relational datasets natively in Kusto Query Language (KQL).

---

## Graphistry Adopts ADX Persistent Graph Semantics

The graph visualization and investigation platform **Graphistry** has embraced this enhancement in its latest release.

### What’s New in Graphistry:
- **Native KQL Support**: You can now use KQL directly within Graphistry to run ADX queries and return dataframes.
- **Persistent Graph Queries**: Pass the name of your persistent graph and perform investigations in Graphistry's GPU-accelerated visual interface..


### Why It Matters:
With persistent graph semantics and Graphistry’s GPU-accelerated visual interface, analysts can now:
- Query, explore, and visualize ADX graph data with minimal friction.
- Quickly pivot between raw table views and high-context relationship maps.
- Investigate threats, fraud, IoT, and more—faster and more collaboratively.


# Getting started

Getting started with Graphistry is a easy as registering for a free account on [Graphistry Hub](https://hub.graphistry.com) and install Graphistry in your environment.

Install Graphistry and start your Kusto graph journey.
```bash
pip install graphistry

```
Python:
```python
import graphistry

graphistry.register()
graphistry.configure_kusto(cluster=KUSTO_CONF['cluster'], database=KUSTO_CONF['database'])

dfs = graphistry.kql("YourTable | take 5")

for df in dfs:
    print(df)
```

# Taking it for a spin

In [2]:
import graphistry

In [None]:
KUSTO_CONF = {
    "cluster": "https://<clustername>.<region>.kusto.windows.net",
    "database": "YourDatabase"
}

GRAPHISTRY_CONF = {
    "personal_key_id": "YOUR_KEY_ID",
    "personal_key_secret": "YOUR_SECRET",
    "server": "hub.graphistry.com"
}

In [None]:
graphistry.register(api=3, 
                    personal_key_id=GRAPHISTRY_CONF['personal_key_id'], 
                    personal_key_secret=GRAPHISTRY_CONF['personal_key_secret'], 
                    server=GRAPHISTRY_CONF['server'])

graphistry.configure_kusto(cluster=KUSTO_CONF['cluster'], 
                           database=KUSTO_CONF['database'])

## Ingest data into your Azure Data Explorer cluster.

Import the RedTeam50k dataset used in our [UMAP cyber demo notebook](https://github.com/graphistry/pygraphistry/blob/master/demos/ai/cyber/cyber-redteam-umap-demo.ipynb) into your Azure Data Explorer cluster.

The dataset is a massaged version of the dataset publish by Alexander D. Kent.


Data citation:
```
A. D. Kent, “Comprehensive, Multi-Source Cybersecurity Events,”
Los Alamos National Laboratory, http://dx.doi.org/10.17021/1179829, 2015.

@Misc{kent-2015-cyberdata1,
  author =     {Alexander D. Kent},
  title =      {{Comprehensive, Multi-Source Cyber-Security Events}},
  year =       {2015},
  howpublished = {Los Alamos National Laboratory},
  doi = {10.17021/1179829}
}
```


### Building the Kusto query

In [None]:
query = """.execute script <|
.create-or-alter function graphistryRedTeam50k () {
    externaldata(index:long, event_time:long, src_domain:string, dst_domain:string, src_computer:string, dst_computer:string, auth_type:string, logontype:string, authentication_orientation:string, success_or_failure:string, RED:int, feats:string, feats2:string)
    [
        h@"https://raw.githubusercontent.com/graphistry/pygraphistry/master/demos/data/graphistry_redteam50k.csv"
    ]
    with(format="csv", ignoreFirstRecord=true)
    | extend event_time = datetime(2024-01-01) + event_time * 1s
}
"""

### Executing using graphistry


With your registered and configured pygraphistry object it is now easy to execute the query we built.

The function returns a list of dataframes. 

In [None]:
dfs = graphistry.kql(query)


# Adding a simple check the 
if len(dfs) == 1:
    print(dfs[0])

### Grabbing a sample of data

In [None]:
# Grabbing the first dataframe

graphistry.kql("graphistryRedTeam50k | take 10")[0]

## Building the schema and persisting the graph

A graph model defines the specifications of a graph stored in your database metadata. It includes:

Schema definition: Node and edge types with their properties
* Data source mappings: Instructions for building the graph from tabular data
* Labels: Both static (predefined) and dynamic (generated at runtime) labels for nodes and edges
* Graph models contain the blueprint for creating graph snapshots, not the actual graph data.

Read more: [Kusto Graph models](https://learn.microsoft.com/en-us/kusto/management/graph/graph-persistent-overview?view=microsoft-fabric#graph-models)

In [None]:
query = """
{
    "Schema": {
        "Nodes": {
            "Computer": {"computerName": "string", "RED":"int"},
            "Domain": {"domainName": "string"}
        },
        "Edges": {
            "AUTHENTICATES": {
                "event_time": "datetime",
                "src_computer": "string",
                "dst_computer": "string",
                "src_domain": "string",
                "dst_domain": "string",
                "auth_type": "string",
                "logontype": "string",
                "authentication_orientation": "string",
                "success_or_failure": "string",
                "RED": "int"
            }
        }
    },
    "Definition": {
        "Steps": [
            {
                "Kind": "AddNodes",
                "Query": "graphistryRedTeam50k | project computerName = src_computer, RED, nodeType = 'Computer'",
                "NodeIdColumn": "computerName",
                "Labels": ["Computer"],
                "LabelsColumn": "nodeType"
            },
            {
                "Kind": "AddNodes",
                "Query": "graphistryRedTeam50k | project computerName = dst_computer, RED, nodeType = 'Computer'",
                "NodeIdColumn": "computerName",
                "Labels": ["Computer"],
                "LabelsColumn": "nodeType"
            },
            {
                "Kind": "AddNodes",
                "Query": "graphistryRedTeam50k | project domainName = src_domain, nodeType = 'Domain'",
                "NodeIdColumn": "domainName",
                "Labels": ["Domain"],
                "LabelsColumn": "nodeType"
            },
            {
                "Kind": "AddNodes",
                "Query": "graphistryRedTeam50k | project domainName = dst_domain, nodeType = 'Domain'",
                "NodeIdColumn": "domainName",
                "Labels": ["Domain"],
                "LabelsColumn": "nodeType"
            },
            {
                "Kind": "AddEdges",
                "Query": "graphistryRedTeam50k | project event_time, src_computer, dst_computer, src_domain, dst_domain, auth_type, logontype, authentication_orientation, success_or_failure, RED",
                "SourceColumn": "src_computer",
                "TargetColumn": "dst_computer",
                "Labels": ["AUTHENTICATES"]
            }
        ]
    }
}
"""


graph_name = "graphistryRedTeamGraph"
persist_query = f".create-or-alter graph_model {graph_name} {query}"

In [None]:
graphistry.kql(persist_query)

## Making the snapshot

A graph snapshot is the actual graph instance materialized from a graph model. It represents:

* A specific point-in-time view of the data as defined by the model
* The nodes, edges, and their properties in a queryable format
* A self-contained entity that persists until explicitly removed

Snapshots are the entities you query when working with persistent graphs. 
Read more: [Kusto Graph snapshot](https://learn.microsoft.com/en-us/kusto/management/graph/graph-persistent-overview?view=microsoft-fabric#graph-snapshots)


In [None]:
snapshot_name = "InitialSnap"
graph_snapshot_query = f".make graph_snapshot {snapshot_name} from {graph_name}"

graphistry.kql(graph_snapshot_query)

# Fetching the graph and plotting


Once your **data**, **persistent graph** and **snapshot** is created in your Azure Data Explorer cluster it is time to see the power of Graphistry's GPU-accelerated visual interface.

The kusto_graph function accepts two parameters. 
The name of the graph, and the name of your snapshot **(snap_name="name")**. If you don't provide a snapshot it will grab the latest snapshot.

The function returns a Graphistry plottable object.

You can inspect the nodes and edges, add customizations or .plot() it as is.

In [None]:
g = graphistry.kusto_graph(graph_name, snap_name=snapshot_name)

## Plotting your object

In [None]:
g.plot()

### Changing colors, icons and more

#### Encode point color


Our data consists of two datasets where one contains verified red team activity. In the dataset these are tagged with the value 1 in the column **RED**.

Let's make our red nodes pop out in our visualization.

In [None]:
g2 = g.encode_point_color(
    "RED",
    categorical_mapping={
        1: "red"
    },
    default_mapping='blue'
)

g2.plot()

#### Encode icons


As our data is split into two different type of nodes **"Computer"** and **"Domain"**.

Give the different node types it's own Icon.

In [None]:
g3 = g2.encode_point_icon(
    'nodeType',
    shape="circle",
    categorical_mapping={
        "Computer": "laptop", 
        "Domain": "server"
    },
    default_mapping="question")

g3.plot()