# üõ°Ô∏è Device Security Analysis - Microsoft Sentinel Data Lake

**Hunt for threats across your endpoint infrastructure using Microsoft Defender for Endpoint data.**

## üéØ Security Scenarios Covered

| Scenario | Detection | Impact |
|----------|-----------|---------|
| **üîì Credential Dumping** | LSASS access, Mimikatz, memory dumps | Critical |
| **üíæ Data Exfiltration** | USB activity, large file transfers | High |
| **üåê Lateral Movement** | Internal network scanning, admin shares | High |
| **‚ö° Living off the Land** | PowerShell, WMI, legitimate tool abuse | Medium |
| **üñ•Ô∏è Persistence Mechanisms** | Scheduled tasks, services, startup items | Medium |

## ‚öôÔ∏è Quick Setup
1. Update `PRIMARY_WORKSPACE` in the config cell below
2. Run all cells 
3. Analyze the security findings

---

In [None]:
# üîß CONFIGURATION - UPDATE THESE WORKSPACE NAMES!
# =================================================================
# ‚ö†Ô∏è  IMPORTANT: Update these workspace names to match YOUR environment
# =================================================================

PRIMARY_WORKSPACE = "ak-SecOps"    # üëà UPDATE THIS to your primary Sentinel workspace name
ENTRA_WORKSPACE = "default"       # üëà UPDATE THIS to your Entra workspace name (often "default")

# Workspace mapping for automatic fallback
workspace_mapping = {
    "DeviceEvents": PRIMARY_WORKSPACE,        # Microsoft Defender device events
    "DeviceProcessEvents": PRIMARY_WORKSPACE, # Process execution events
    "DeviceNetworkEvents": PRIMARY_WORKSPACE, # Network connection events
    "DeviceFileEvents": PRIMARY_WORKSPACE,    # File system events
    "DeviceRegistryEvents": PRIMARY_WORKSPACE, # Registry modification events
    "DeviceLogonEvents": PRIMARY_WORKSPACE,   # Device logon events
    "DeviceImageLoadEvents": PRIMARY_WORKSPACE, # DLL/image load events
    "DeviceInfo": PRIMARY_WORKSPACE,          # Device information
    "SecurityEvent": PRIMARY_WORKSPACE,       # Windows security events
    "Syslog": PRIMARY_WORKSPACE,             # Linux/Unix system logs
}

print("‚úÖ Configuration loaded successfully!")
print(f"Primary Workspace: {PRIMARY_WORKSPACE}")
print(f"Entra Workspace: {ENTRA_WORKSPACE}")
print("\nüîç Workspace Mapping:")
for table, workspace in workspace_mapping.items():
    print(f"  üìã {table} ‚Üí {workspace}")

print("\n‚ö†Ô∏è  Remember: Update the workspace names above to match YOUR environment!")
print("üìö Publication-ready: No hardcoded values, works anywhere!")

In [None]:
# üìä DATA LOADER
# =================================================================
# Simple data loading with fallback handling
# =================================================================

from sentinel_lake.providers import MicrosoftSentinelProvider
from pyspark.sql.functions import *
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Initialize data provider
data_provider = MicrosoftSentinelProvider(spark)

def load_security_data():
    """Load endpoint security data with smart fallbacks"""
    data = {}
    
    # Core endpoint tables for security analysis
    tables = {
        "DeviceProcessEvents": "Process execution events",
        "DeviceEvents": "Device security events", 
        "DeviceNetworkEvents": "Network connections",
        "DeviceFileEvents": "File system activity"
    }
    
    print("üîÑ Loading endpoint security data...\n")
    
    for table, description in tables.items():
        try:
            workspace = workspace_mapping.get(table, PRIMARY_WORKSPACE)
            df = data_provider.read_table(table, workspace)
            # Load last 24 hours for performance
            df = df.filter(col("Timestamp") >= (current_timestamp() - expr("INTERVAL 24 HOURS")))
            data[table] = df
            print(f"‚úÖ {table}: {description}")
        except:
            print(f"‚ö†Ô∏è {table}: Not available")
            data[table] = None
    
    print(f"\nüöÄ Loaded {len([v for v in data.values() if v is not None])}/{len(tables)} tables")
    return data

# Load the data
security_data = load_security_data()

## 1. Device Data Loading and Analysis

Load device security data and perform comprehensive security analysis.

## üîì Scenario 1: Credential Dumping Detection

**Hunt for tools and techniques used to extract credentials from memory (LSASS, SAM, etc.)**

**Common Techniques:**
- Mimikatz, ProcDump, Comsvcs.dll 
- PowerShell credential extraction
- Direct LSASS process access

In [None]:
# ? CREDENTIAL DUMPING DETECTION
# =================================================================

process_events = security_data.get("DeviceProcessEvents")

if process_events is not None:
    print("üîç Hunting for credential dumping activities...")
    
    # Known credential extraction patterns
    credential_dumping = process_events.filter(
        # Known tools
        lower(col("FileName")).rlike("mimikatz|procdump|lsassy|nanodump|sekurlsa|dumpert") |
        # LSASS targeting
        lower(col("ProcessCommandLine")).rlike("lsass.*dump|comsvcs.*dll.*minidump") |
        # PowerShell techniques
        lower(col("ProcessCommandLine")).rlike("invoke-mimikatz|sekurlsa|logonpasswords")
    )
    
    cred_count = credential_dumping.count()
    
    if cred_count > 0:
        print(f"üö® {cred_count} POTENTIAL CREDENTIAL DUMPING ATTEMPTS")
        
        # Show the attacks
        attacks = credential_dumping.groupBy(
            "DeviceName", "AccountName", "FileName", "ProcessCommandLine"
        ).agg(count("*").alias("Attempts")).orderBy(desc("Attempts"))
        
        attacks.show(20, truncate=False)
        
        print("\n‚ö° IMMEDIATE ACTIONS:")
        print("‚Ä¢ Isolate affected devices")
        print("‚Ä¢ Reset credentials for impacted accounts") 
        print("‚Ä¢ Hunt for lateral movement")
        
    else:
        print("‚úÖ No credential dumping detected")
        
else:
    print("‚ö†Ô∏è DeviceProcessEvents not available - credential dumping detection requires Defender for Endpoint")

## üíæ Scenario 2: Data Exfiltration via USB/External Storage

**Detect potential data theft through removable media and large file transfers**

**Detection Focus:**
- USB device connections during off-hours
- Access to sensitive file types (documents, databases)
- Large file transfers to external storage

In [None]:
# üíæ DATA EXFILTRATION DETECTION  
# =================================================================

device_events = security_data.get("DeviceEvents")
file_events = security_data.get("DeviceFileEvents")

print("üîç Hunting for data exfiltration...")

if device_events is not None:
    # First, let's see what columns are available
    print("üìã Available DeviceEvents columns:")
    print([col for col in device_events.columns if 'device' in col.lower() or 'action' in col.lower() or 'file' in col.lower()])
    
    # USB and external storage events - using correct ActionType values
    usb_activity = device_events.filter(
        col("ActionType").isin([
            "PnpDeviceConnected", 
            "PnpDeviceDisconnected",
            "UsbDriveMount", 
            "UsbDriveUnmount",
            "RemovableStorageDeviceEvent"
        ]) |
        # Also check for file operations on removable drives (D:, E:, F: etc.)
        (col("FolderPath").isNotNull() & col("FolderPath").rlike("^[D-Z]:\\\\"))
    )
    
    usb_count = usb_activity.count()
    
    if usb_count > 0:
        print(f"üì± {usb_count} USB/external storage events detected")
        
        # Show USB activity by user and device
        usb_summary = usb_activity.groupBy("DeviceName", "AccountName", "ActionType") \
            .agg(count("*").alias("USBEvents")) \
            .orderBy(desc("USBEvents"))
        
        usb_summary.show(10, truncate=False)
        
        # Show recent USB events
        print("\nüìÖ Recent USB activity:")
        usb_activity.select("Timestamp", "DeviceName", "AccountName", "ActionType", "FileName") \
                   .orderBy(desc("Timestamp")) \
                   .show(5, truncate=False)
        
    else:
        print("‚úÖ No USB/external storage activity detected")
        print("üí° Tip: This detection looks for ActionType values like PnpDeviceConnected, UsbDriveMount")
        
    # File-based exfiltration detection
    if file_events is not None:
        print("\nüìÅ Analyzing sensitive file access...")
        
        sensitive_files = file_events.filter(
            # Sensitive file extensions
            lower(col("FileName")).rlike(r"\.(docx?|xlsx?|pdf|csv|zip|7z|rar|db|bak|pst)$") |
            # Sensitive folder paths
            lower(col("FolderPath")).rlike("documents|finance|hr|confidential|sensitive|secret")
        )
        
        sensitive_count = sensitive_files.count()
        if sensitive_count > 0:
            print(f"üìä {sensitive_count} sensitive file access events")
            
            file_summary = sensitive_files.groupBy("DeviceName", "ActionType") \
                .agg(count("*").alias("FileEvents"),
                     countDistinct("FileName").alias("UniqueFiles")) \
                .orderBy(desc("FileEvents"))
            
            file_summary.show(10, truncate=False)
        else:
            print("‚úÖ No sensitive file access detected")
    else:
        print("‚ö†Ô∏è DeviceFileEvents not available for file access correlation")
        
else:
    print("‚ö†Ô∏è DeviceEvents not available - USB detection requires device monitoring")

## üåê Scenario 3: Lateral Movement Detection

**Identify attackers moving through your network using compromised credentials**

**Detection Patterns:**
- Internal network scanning (ports 445, 3389, 135)
- Remote admin tool usage (PSExec, WMI, PowerShell Remoting)  
- Multiple failed authentication attempts

In [None]:
# üåê LATERAL MOVEMENT DETECTION
# =================================================================

network_events = security_data.get("DeviceNetworkEvents")

if network_events is not None:
    print("üîç Hunting for lateral movement...")
    
    # Internal network connections to admin ports
    internal_ip_regex = r"^(10\.|192\.168\.|172\.(1[6-9]|2[0-9]|3[0-1])\.).*$"
    
    lateral_movement = network_events.filter(
        col("RemoteIP").rlike(internal_ip_regex) &
        col("RemotePort").isin([135, 139, 445, 3389, 5985, 5986])  # Admin ports
    )
    
    lateral_count = lateral_movement.count()
    
    if lateral_count > 0:
        print(f"üîÑ {lateral_count} internal admin connections detected")
        
        # Devices making multiple internal connections (scanning behavior)
        scanning_devices = lateral_movement.groupBy("DeviceName") \
            .agg(
                countDistinct("RemoteIP").alias("UniqueTargets"),
                count("*").alias("TotalConnections")
            ) \
            .filter(col("UniqueTargets") > 5) \
            .orderBy(desc("UniqueTargets"))
        
        scan_count = scanning_devices.count()
        
        if scan_count > 0:
            print(f"\nüö® {scan_count} devices showing scanning behavior:")
            scanning_devices.show(10, truncate=False)
            
            print("\n‚ö° RECOMMENDED ACTIONS:")
            print("‚Ä¢ Investigate high-activity devices")
            print("‚Ä¢ Check for credential compromise")
            print("‚Ä¢ Review privileged account usage")
        else:
            print("‚úÖ No suspicious scanning behavior detected")
            
        # Show top connections by port
        port_analysis = lateral_movement.groupBy("RemotePort") \
            .agg(count("*").alias("Connections")) \
            .orderBy(desc("Connections"))
        
        print("\nConnections by port:")
        port_analysis.show()
        
    else:
        print("‚úÖ No internal admin connections detected")
        
else:
    print("‚ö†Ô∏è DeviceNetworkEvents not available - lateral movement detection requires network monitoring")

## ‚ö° Scenario 4: Living off the Land Attacks

**Detect abuse of legitimate system tools for malicious purposes**

**Common Techniques:**
- PowerShell with obfuscation/encoding
- WMI for remote execution
- BITSAdmin for file downloads
- Certutil for encoding/decoding

In [None]:
# ‚ö° LIVING OFF THE LAND DETECTION
# =================================================================

process_events = security_data.get("DeviceProcessEvents")

if process_events is not None:
    print("üîç Hunting for living off the land techniques...")
    
    # Suspicious use of legitimate tools
    lolbins = process_events.filter(
        # PowerShell with suspicious patterns
        (
            lower(col("FileName")).rlike("powershell|pwsh") &
            lower(col("ProcessCommandLine")).rlike(
                "encodedcommand|bypass|hidden|downloadstring|iex|" +
                "invoke-expression|frombase64|reflection\.assembly"
            )
        ) |
        # WMI for remote execution
        lower(col("ProcessCommandLine")).rlike("wmic.*process.*call.*create") |
        # BITSAdmin for downloads
        lower(col("ProcessCommandLine")).rlike("bitsadmin.*transfer") |
        # Certutil abuse
        lower(col("ProcessCommandLine")).rlike("certutil.*-decode|certutil.*-urlcache") |
        # Rundll32 abuse
        lower(col("ProcessCommandLine")).rlike("rundll32.*javascript|rundll32.*vbscript")
    )
    
    lolbin_count = lolbins.count()
    
    if lolbin_count > 0:
        print(f"‚ö†Ô∏è {lolbin_count} suspicious legitimate tool usage detected")
        
        # Group by technique
        techniques = lolbins.groupBy("FileName") \
            .agg(
                count("*").alias("Count"),
                countDistinct("DeviceName").alias("UniqueDevices"),
                countDistinct("AccountName").alias("UniqueUsers")
            ) \
            .orderBy(desc("Count"))
        
        print("\nSuspicious tool usage:")
        techniques.show(truncate=False)
        
        # Show recent examples
        print("\nRecent examples:")
        lolbins.select("Timestamp", "DeviceName", "FileName", "ProcessCommandLine") \
               .orderBy(desc("Timestamp")) \
               .show(5, truncate=False)
        
    else:
        print("‚úÖ No suspicious legitimate tool abuse detected")
        
else:
    print("‚ö†Ô∏è DeviceProcessEvents not available")

## üîí Scenario 5: Persistence Mechanisms

**Hunt for ways attackers maintain access to compromised systems**

**Common Persistence Methods:**
- Scheduled tasks and services
- Registry run keys
- WMI event subscriptions
- File system modifications

In [None]:
# üîí PERSISTENCE MECHANISMS DETECTION
# =================================================================

process_events = security_data.get("DeviceProcessEvents")

if process_events is not None:
    print("üîç Hunting for persistence mechanisms...")
    
    # Persistence-related activities
    persistence = process_events.filter(
        # Scheduled tasks
        lower(col("ProcessCommandLine")).rlike("schtasks.*create.*tn") |
        # Service creation/modification
        lower(col("ProcessCommandLine")).rlike("sc.*create.*binpath|net.*user.*add") |
        # Registry persistence
        lower(col("ProcessCommandLine")).rlike("reg.*add.*run|reg.*add.*runonce") |
        # WMI event subscriptions
        lower(col("ProcessCommandLine")).rlike("wmic.*eventfilter|register-wmievent") |
        # Startup folder modifications
        (
            lower(col("FileName")).rlike("explorer|cmd|powershell") &
            lower(col("ProcessCommandLine")).rlike("startup|appdata.*roaming.*microsoft.*windows.*start")
        )
    )
    
    persistence_count = persistence.count()
    
    if persistence_count > 0:
        print(f"‚ö†Ô∏è {persistence_count} potential persistence mechanisms detected")
        
        # Group by persistence type
        persistence_types = persistence.withColumn(
            "PersistenceType",
            when(lower(col("ProcessCommandLine")).rlike("schtasks"), "Scheduled Task")
            .when(lower(col("ProcessCommandLine")).rlike("sc.*create|net.*user"), "Service/User Creation")  
            .when(lower(col("ProcessCommandLine")).rlike("reg.*add.*run"), "Registry Run Key")
            .when(lower(col("ProcessCommandLine")).rlike("wmic.*eventfilter"), "WMI Event")
            .otherwise("Other")
        )
        
        type_summary = persistence_types.groupBy("PersistenceType") \
            .agg(count("*").alias("Count")) \
            .orderBy(desc("Count"))
        
        print("\nPersistence mechanisms by type:")
        type_summary.show(truncate=False)
        
        # Show recent persistence attempts
        print("\nRecent persistence attempts:")
        persistence.select("Timestamp", "DeviceName", "AccountName", "ProcessCommandLine") \
                  .orderBy(desc("Timestamp")) \
                  .show(5, truncate=False)
        
    else:
        print("‚úÖ No suspicious persistence mechanisms detected")
        
else:
    print("‚ö†Ô∏è DeviceProcessEvents not available")

## üìã Security Assessment Summary

**Review the findings above and take action based on detected threats:**

### üö® High Priority Actions
- **Credential Dumping**: Immediately isolate devices and reset credentials
- **Data Exfiltration**: Review file access logs and USB policy compliance
- **Lateral Movement**: Hunt for additional compromised accounts

### üìä Investigation Recommendations  
- Correlate findings across multiple scenarios
- Review user behavior analytics for affected accounts
- Check for indicators of compromise (IoCs) in SIEM
- Validate detections with endpoint response tools

### üîß Detection Tuning
- Adjust time ranges for different scenarios
- Customize detection rules for your environment
- Add organization-specific IoCs and patterns

In [None]:
# üìä THREAT INTELLIGENCE ENRICHMENT
# =================================================================
# Cross-reference findings with known threat indicators
# =================================================================

print("üîç THREAT INTELLIGENCE SUMMARY")
print("=" * 40)

# Count findings across all scenarios
total_findings = 0
critical_findings = 0

print("? DETECTION SUMMARY:")

# Check each scenario for findings (this would be populated by running above cells)
scenarios = [
    "Credential Dumping",
    "Data Exfiltration", 
    "Lateral Movement",
    "Living off the Land",
    "Persistence Mechanisms"
]

for scenario in scenarios:
    print(f"   {scenario}: Run analysis cells above")

print(f"\nüéØ NEXT STEPS:")
print("1. üìã Review all detected activities above")
print("2. üîç Investigate high-confidence detections first") 
print("3. üìä Correlate findings with other security tools")
print("4. ? Escalate critical findings to incident response")
print("5. ? Update detection rules based on findings")

print(f"\n? TIP: Rerun this notebook periodically to track new threats")
print(f"‚ö° Consider automating high-confidence detections as alerts")