# Microsoft Sentinel Security Analysis with Graphistry

This notebook demonstrates how to use Graphistry with Microsoft Sentinel (Log Analytics) to perform security analysis and visualization using KQL queries.

## Prerequisites

1. **Azure Access**: You need access to a Microsoft Sentinel workspace
2. **Authentication**: Either Azure CLI (`az login`) or service principal credentials
3. **Dependencies**: Install required packages

```bash
pip install graphistry[sentinel] python-dotenv
```

## Environment Setup

1. Copy `example.env` to `.env` in the same directory as this notebook
2. Edit `.env` with your actual credentials:

```bash
cp example.env .env
# Then edit .env with your credentials
```

The `.env` file should contain:

```env
# Graphistry credentials (register at https://www.graphistry.com)
GRAPHISTRY_PERSONAL_KEY_ID=your_personal_key_id
GRAPHISTRY_PERSONAL_KEY_SECRET=your_personal_key_secret

# Microsoft Sentinel workspace
SENTINEL_WORKSPACE_ID=12345678-1234-1234-1234-123456789abc

# Optional: Service Principal authentication (if not using Azure CLI)
# AZURE_TENANT_ID=your-tenant-id
# AZURE_CLIENT_ID=your-client-id
# AZURE_CLIENT_SECRET=your-client-secret
```

**Important**: The `.env` file is gitignored to avoid committing secrets. Never commit actual credentials!

## Getting Started

### Option 1: Azure CLI Authentication (Recommended for Development)

First, login with Azure CLI:
```bash
az login
```

In [None]:
import graphistry
from datetime import datetime, timedelta
import pandas as pd
import os
from dotenv import load_dotenv

# Load environment variables from .env file
# Option 1: Load from current directory (default)
load_dotenv()

# Option 2: Load from a custom location (uncomment and modify as needed)
# load_dotenv('~/custom.env')  # Load from home directory
# load_dotenv('/path/to/your/.env')  # Load from absolute path
# load_dotenv(os.path.expanduser('~/sentinel-credentials.env'))  # Expand ~ to home directory

# Register for free at https://www.graphistry.com
# Credentials loaded from .env file
graphistry.register(
    api=3,
    protocol="https",
    server="hub.graphistry.com",
    personal_key_id=os.getenv('GRAPHISTRY_PERSONAL_KEY_ID'),
    personal_key_secret=os.getenv('GRAPHISTRY_PERSONAL_KEY_SECRET')
)

# Configure Sentinel connection
# Workspace ID loaded from .env file
WORKSPACE_ID = os.getenv('SENTINEL_WORKSPACE_ID')

if not WORKSPACE_ID:
    raise ValueError("SENTINEL_WORKSPACE_ID not found in environment variables. Please check your .env file.")

g = graphistry.configure_sentinel(
    workspace_id=WORKSPACE_ID,
    use_device_auth=True  # Use device code authentication
)

### Option 2: Service Principal Authentication (Recommended for Production)

In [None]:
# Alternative: Service Principal authentication from .env file
# Uncomment the lines below if you prefer Service Principal over device authentication
# g = graphistry.configure_sentinel(
#     workspace_id=os.getenv('SENTINEL_WORKSPACE_ID'),
#     tenant_id=os.getenv('AZURE_TENANT_ID'),
#     client_id=os.getenv('AZURE_CLIENT_ID'),
#     client_secret=os.getenv('AZURE_CLIENT_SECRET')
# )

# Alternative: Use DefaultAzureCredential (tries Azure CLI, Managed Identity, etc.)
# g = graphistry.configure_sentinel(
#     workspace_id=os.getenv('SENTINEL_WORKSPACE_ID')
# )

## Test Connection

In [None]:
# Test the connection
# Note: If using device authentication, you'll see a code and URL to visit for authentication
try:
    g.sentinel_health_check()
    print("✅ Successfully connected to Microsoft Sentinel!")
except Exception as e:
    print(f"❌ Connection failed: {e}")
    print("💡 If using device auth, make sure to complete the authentication in your browser first.")

## Explore Available Data

Let's start by exploring what tables are available in your Sentinel workspace:

In [None]:
# List all available tables
try:
    tables_df = g.sentinel_tables()
    print(f"Found {len(tables_df)} tables in workspace")
    print("\nSecurity-related tables:")
    security_tables = tables_df[tables_df['DataType'].str.contains('Security|Alert|Incident', case=False, na=False)]
    if not security_tables.empty:
        print(security_tables['DataType'].tolist())
    else:
        print("No security-related tables found")
    print(f"\nAll tables: {tables_df['DataType'].tolist()}")
except Exception as e:
    print(f"Failed to list tables: {e}")
    print("This might happen if the workspace has no data or insufficient permissions")

In [None]:
# Get schema for SecurityEvent table (if available)
try:
    if 'SecurityEvent' in tables_df['DataType'].values:
        schema = g.sentinel_schema('SecurityEvent')
        print("SecurityEvent table schema:")
        print(schema[['ColumnName', 'DataType']].head(10))
    else:
        print("SecurityEvent table not found in workspace")
        print("Available tables for schema inspection:", tables_df['DataType'].head(5).tolist())
except Exception as e:
    print(f"Failed to get schema: {e}")

## Security Analysis Examples

### 1. Failed Login Analysis

In [16]:
# Query failed login attempts (last 7 days)
failed_logins_query = """
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType != "0"  // 0 = success
| project TimeGenerated, UserPrincipalName, IPAddress, Location, ResultType, ResultDescription
| summarize 
    FailureCount = count(),
    UniqueIPs = dcount(IPAddress),
    LatestFailure = max(TimeGenerated)
    by UserPrincipalName
| where FailureCount > 5
| order by FailureCount desc
| take 50
"""

try:
    failed_logins = g.kql(failed_logins_query, timespan=timedelta(days=7))
    print(f"Found {len(failed_logins)} users with multiple failed logins")
    print(failed_logins.head())
except Exception as e:
    print(f"Query failed: {e}")
    print("This might happen if SigninLogs table is not available in your workspace")

Found 1 users with multiple failed logins
       UserPrincipalName  FailureCount  UniqueIPs  \
0  sindre@graphistry.com            12          3   

                     LatestFailure  
0 2025-09-22 10:14:57.559331+00:00  


### 2. Security Alerts Analysis

In [9]:
# Query recent security alerts
alerts_query = """
SecurityAlert
| where TimeGenerated > ago(24h)
| project 
    TimeGenerated,
    AlertName,
    AlertSeverity,
    CompromisedEntity,
    Tactics,
    Techniques,
    Status
| order by TimeGenerated desc
"""

try:
    alerts = g.kql_last(alerts_query, hours=24)
    print(f"Found {len(alerts)} security alerts in the last 24 hours")
    if len(alerts) > 0:
        print("\nAlert severity distribution:")
        print(alerts['AlertSeverity'].value_counts())
        print("\nSample alerts:")
        print(alerts[['TimeGenerated', 'AlertName', 'AlertSeverity']].head())
    else:
        print("No alerts found (this is good!)")
except Exception as e:
    print(f"Query failed: {e}")
    print("This might happen if SecurityAlert table is not available")

Found 0 security alerts in the last 24 hours
No alerts found (this is good!)


### 3. Network Traffic Analysis

In [None]:
# Query network connections (example with CommonSecurityLog)
network_query = """
CommonSecurityLog
| where TimeGenerated > ago(1h)
| where isnotempty(SourceIP) and isnotempty(DestinationIP)
| project 
    TimeGenerated,
    SourceIP,
    DestinationIP,
    DestinationPort,
    Protocol,
    Activity,
    DeviceVendor
| summarize 
    ConnectionCount = count(),
    UniquePorts = dcount(DestinationPort)
    by SourceIP, DestinationIP
| where ConnectionCount > 10
| order by ConnectionCount desc
| take 100
"""

try:
    network_data = g.kql_last(network_query, hours=1)
    print(f"Found {len(network_data)} significant network connections")
    if len(network_data) > 0:
        print(network_data.head())
except Exception as e:
    print(f"Query failed: {e}")
    print("This might happen if CommonSecurityLog table is not available")

## Graph Visualization

Now let's create some graph visualizations from the security data:

### 1. User-IP Relationship Graph

In [17]:
# Query for user-IP relationships
user_ip_query = """
SigninLogs
| where TimeGenerated > ago(24h)
| where isnotempty(UserPrincipalName) and isnotempty(IPAddress)
| project UserPrincipalName, IPAddress, TimeGenerated, ResultType, Location
| summarize 
    LoginCount = count(),
    FailureCount = countif(ResultType != "0"),
    LatestLogin = max(TimeGenerated),
    Locations = make_set(Location)
    by UserPrincipalName, IPAddress
| extend RiskScore = FailureCount * 2 + iff(LoginCount == 1, 1, 0)
| take 500
"""

try:
    user_ip_data = g.kql_last(user_ip_query, hours=24)
    
    if len(user_ip_data) > 0:
        # Create nodes and edges for graph visualization
        
        # Create user nodes
        users = user_ip_data[['UserPrincipalName']].drop_duplicates()
        users['node_type'] = 'user'
        users['node_id'] = users['UserPrincipalName']
        users['node_label'] = users['UserPrincipalName']
        
        # Create IP nodes  
        ips = user_ip_data[['IPAddress']].drop_duplicates()
        ips['node_type'] = 'ip'
        ips['node_id'] = ips['IPAddress']
        ips['node_label'] = ips['IPAddress']
        
        # Combine nodes
        nodes = pd.concat([
            users[['node_id', 'node_label', 'node_type']],
            ips[['node_id', 'node_label', 'node_type']]
        ], ignore_index=True)
        
        # Create edges
        edges = user_ip_data.copy()
        edges['source'] = edges['UserPrincipalName']
        edges['target'] = edges['IPAddress']
        edges['edge_weight'] = edges['LoginCount']
        edges['edge_color'] = edges['RiskScore'].apply(
            lambda x: 'red' if x > 5 else 'orange' if x > 2 else 'green'
        )
        
        # Create and plot graph
        graph = g.nodes(nodes, node='node_id')\
                 .edges(edges, source='source', destination='target')\
                 .encode_point_color('node_type')\
                 .encode_edge_color('edge_color')\
                 .settings(url_params={'splashAfter': 'false'})
        
        print(f"Created graph with {len(nodes)} nodes and {len(edges)} edges")
        
        # Plot the graph
        graph.plot()
    else:
        print("No data available for user-IP graph")
        
except Exception as e:
    print(f"Graph creation failed: {e}")

Created graph with 7 nodes and 4 edges


Failed memoization speedup attempt due to Pandas internal hash function failing. Continuing without memoization speedups.This is fine, but for speedups around skipping re-uploads of previously seen tables, try identifying which columns have types that Pandas cannot hash, and convert them to hashable types like strings.


### 2. Alert Correlation Graph

In [None]:
# Query for alert correlations
alert_correlation_query = """
SecurityAlert
| where TimeGenerated > ago(7d)
| project 
    AlertName,
    CompromisedEntity,
    Tactics,
    AlertSeverity,
    TimeGenerated
| extend EntityType = case(
    CompromisedEntity contains "@", "User",
    CompromisedEntity matches regex @"\\b(?:[0-9]{1,3}\\.){3}[0-9]{1,3}\\b", "IP",
    "Host"
)
| summarize 
    AlertCount = count(),
    Severities = make_set(AlertSeverity),
    TacticsList = make_set(Tactics)
    by AlertName, CompromisedEntity, EntityType
| where AlertCount > 1
| take 200
"""

try:
    alert_data = g.kql(alert_correlation_query, timespan=timedelta(days=7))
    
    if len(alert_data) > 0:
        # Create alert type nodes
        alert_types = alert_data[['AlertName']].drop_duplicates()
        alert_types['node_type'] = 'alert'
        alert_types['node_id'] = alert_types['AlertName']
        alert_types['node_label'] = alert_types['AlertName']
        
        # Create entity nodes
        entities = alert_data[['CompromisedEntity', 'EntityType']].drop_duplicates()
        entities['node_type'] = entities['EntityType'].str.lower()
        entities['node_id'] = entities['CompromisedEntity']
        entities['node_label'] = entities['CompromisedEntity']
        
        # Combine nodes
        alert_nodes = pd.concat([
            alert_types[['node_id', 'node_label', 'node_type']],
            entities[['node_id', 'node_label', 'node_type']]
        ], ignore_index=True)
        
        # Create edges (alert -> entity)
        alert_edges = alert_data.copy()
        alert_edges['source'] = alert_edges['AlertName']
        alert_edges['target'] = alert_edges['CompromisedEntity']
        alert_edges['edge_weight'] = alert_edges['AlertCount']
        
        # Create and plot graph
        alert_graph = g.nodes(alert_nodes, node='node_id')\
                       .edges(alert_edges, source='source', destination='target')\
                       .encode_point_color('node_type')\
                       .encode_edge_size('edge_weight')\
                       .settings(url_params={'splashAfter': 'false'})
        
        print(f"Created alert correlation graph with {len(alert_nodes)} nodes and {len(alert_edges)} edges")
        
        # Plot the graph
        alert_graph.plot()
    else:
        print("No alert correlation data available")
        
except Exception as e:
    print(f"Alert correlation graph failed: {e}")

## Advanced Analysis

### Multi-table Correlation

In [None]:
# Complex query joining multiple data sources
correlation_query = """
// Get security incidents
let incidents = SecurityIncident
| where TimeGenerated > ago(30d)
| project IncidentNumber, Title, Severity, Status, Owner;

// Get related alerts  
let alerts = SecurityAlert
| where TimeGenerated > ago(30d)
| project AlertName, CompromisedEntity, AlertSeverity, Tactics;

// Join and analyze
incidents
| join kind=inner (alerts) on $left.Title == $right.AlertName
| summarize 
    IncidentCount = dcount(IncidentNumber),
    AffectedEntities = dcount(CompromisedEntity),
    TacticsUsed = make_set(Tactics)
    by Title, Severity
| order by IncidentCount desc
"""

try:
    correlation_data = g.kql(correlation_query, timespan=timedelta(days=30))
    print(f"Found {len(correlation_data)} incident-alert correlations")
    if len(correlation_data) > 0:
        print(correlation_data.head())
except Exception as e:
    print(f"Correlation query failed: {e}")
    print("This requires both SecurityIncident and SecurityAlert tables")

## Summary

This notebook demonstrated:

1. **Connecting to Microsoft Sentinel** using Azure authentication (device code, service principal, or DefaultAzureCredential)
2. **Exploring available data** with `sentinel_tables()` and `sentinel_schema()`
3. **Security analysis** using KQL queries for:
   - Failed login analysis
   - Security alerts monitoring
   - Network traffic analysis
4. **Graph visualization** of:
   - User-IP relationships
   - Alert correlations
5. **Advanced correlation** across multiple data sources

## Next Steps

- **Customize queries** for your specific security use cases and available data tables
- **Create automated dashboards** by scheduling notebook execution
- **Integrate with threat intelligence** feeds using additional KQL joins
- **Build detection rules** based on graph patterns you discover
- **Scale analysis** by adjusting time windows and data volumes

## Troubleshooting Tips

- **No data found**: Some workspaces may not have SecurityEvent, SigninLogs, or SecurityAlert tables
- **Authentication issues**: Try `az login` first, or check your service principal credentials
- **Permission errors**: Ensure your account has Log Analytics Reader permissions
- **Empty results**: Adjust time ranges - some workspaces have limited data retention

## Resources

- [Microsoft Sentinel KQL Reference](https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/)
- [Graphistry Documentation](https://pygraphistry.readthedocs.io/)
- [Azure Monitor Query Documentation](https://docs.microsoft.com/en-us/python/api/azure-monitor-query/)
- [Sentinel Data Connectors](https://docs.microsoft.com/en-us/azure/sentinel/connect-data-sources)