# Hello, Common Security Logs

This simple notebook ensure we can access the CSL table in our workspace

## 1. Setup & Config

First, we'll make sure everything is set up

In [None]:
# Import required libraries
from sentinel_lake.providers import MicrosoftSentinelProvider
from pyspark.sql.functions import (
    col, count, countDistinct, when, lit, expr,
    current_timestamp, avg, coalesce
)

# Configuration - Update this with your workspace name
WORKSPACE_NAME = "<YOUR_WORKSPACE_NAME>"

# Analysis window
ANALYSIS_DAYS = 2

# Initialize Sentinel provider
sentinel_provider = MicrosoftSentinelProvider(spark)

print("="*60)
print("CONFIGURATION")
print("="*60)
print(f"Analysis Window: {ANALYSIS_DAYS} days")
print(f"Workspace: {WORKSPACE_NAME}")
print("="*60)

## 2. Load Signin Logs

Next, we load signing logs, just to ensure that *some* table will load.

In [None]:
print("ðŸ“Š Loading SigninLogs...")

signin_df = (
    sentinel_provider.read_table('SigninLogs', WORKSPACE_NAME)
    .persist()
)

signin_count = signin_df.count()
print(f"âœ… Loaded {signin_count} sign-in events")

print("\nðŸ“‹ Sample of sign-in data (first 5 rows):")
signin_df.show(5, truncate=False)

## 3. Load CommonSecurityLog (CSL) data

Finally, we load and display the CSL data

**IMPORTANT** Scope the CSL table down to only the columns needed. The `DeviceCustomFloatingPoint1` column fails on Sentinel data lake.

In [None]:
print("ðŸ“Š Loading CSL (limit columns)...")

csl_df = (
    sentinel_provider.read_table('CommonSecurityLog', WORKSPACE_NAME)
    .filter(
        (col("TimeGenerated") >= expr(f"current_timestamp() - INTERVAL {ANALYSIS_DAYS} DAYS"))
    )
    .select(
        'TimeGenerated',
        'Activity',
        'DeviceVendor',
        'DeviceProduct',
        'DeviceEventClassId',
        'SourceIP',
    )
    .persist()
)

csl_count = csl_df.count()
print(f"âœ… Loaded {csl_count} CSL events")

print("\nðŸ“‹ Sample of CSL data (first 5 rows):")
csl_df.show(5, truncate=False)

## 4. Fail to load **all** CSL columns

Expected:

```
Py4JJavaError: An error occurred while calling o5193.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 20.0 failed 4 times, most recent failure: Lost task 0.3 in stage 20.0 (TID 2662) (vm-c9979623 executor 6): org.apache.spark.SparkException: Parquet column cannot be converted in file abfss://[REDACTED]@lakewus2vqlngfe.dfs.core.windows.net/dir1/dir2/20251227/14/[REDACTED]-zstd.parquet. Column: [DeviceCustomFloatingPoint1], Expected: float, Found: DOUBLE.
```

In [None]:
print("ðŸ“Š Loading CSL...")

csl_df = (
    sentinel_provider.read_table('CommonSecurityLog', WORKSPACE_NAME)
    .persist()
)

csl_count = csl_df.count()
print(f"âœ… Loaded {csl_count} CSL events")

print("\nðŸ“‹ Sample of CSL data (first 5 rows):")
csl_df.show(5, truncate=False)