# ü§ñ AI-Assisted Notebook Development

This template is designed to help you create Microsoft Sentinel notebooks using an AI assistant.

## üìö Reference Documentation

Share these links with your AI assistant for context:

* [Notebook examples](https://learn.microsoft.com/en-us/azure/sentinel/datalake/notebook-examples)
* [Microsoft Sentinel Provider class](https://learn.microsoft.com/en-us/azure/sentinel/datalake/sentinel-provider-class-reference)
* [Available system tables](https://learn.microsoft.com/en-us/azure/sentinel/datalake/enable-data-connectors)
* [Available workspace tables](https://learn.microsoft.com/en-us/azure/azure-monitor/reference/tables-index)

---

## üéØ Your Task Description

**Instructions:** In the space below, describe in detail what you want this notebook to accomplish. Be specific about:

- **Goal**: What security analysis or operation do you want to perform?
- **Data Sources**: Which Sentinel tables do you need to query?
- **Analysis**: What metrics, calculations, or transformations are needed?
- **Output**: What results do you want to see? (Visualizations, tables, alerts, etc.)
- **Use Case**: Who will use this notebook and how?

**Example:**
> "I want to analyze failed sign-in attempts from the SigninLogs table over the last 30 days. Calculate the number of failed attempts per user, identify patterns by time of day and location, and flag users with more than 10 failed attempts from different IP addresses. Visualize the results with charts and save high-risk users to a custom table."

---

### ‚úçÔ∏è YOUR DESCRIPTION HERE:

*Replace this text with your detailed description of what the notebook should do.*

---

## üîÑ Recommended Workflow

Follow this three-phase approach with your AI assistant:

### **Phase 1: Design** üìã
1. Ask your AI assistant: *"Based on my description above, create a detailed design document for this notebook."*
2. Review the design for:
   - Data sources and table schemas
   - Analysis methodology
   - Output schema
   - Edge cases and limitations
3. Iterate: Provide feedback and refine the design until you're satisfied
4. Save the final design in a `DESIGN.md` file

### **Phase 2: Implementation Plan** üó∫Ô∏è
1. Ask your AI assistant: *"Create a step-by-step implementation plan with specific cell-by-cell instructions."*
2. Review the plan for:
   - Logical flow and structure
   - Required libraries and imports
   - Error handling strategies
   - Testing approach
3. Iterate: Adjust the plan based on your environment and requirements
4. Save the final plan in an `IMPLEMENTATION.md` file

### **Phase 3: Cell-by-Cell Implementation** üíª
1. Ask your AI assistant: *"Create the first cell according to the implementation plan."*
2. Review and test each cell:
   - Run the cell and verify the output
   - Check for errors or unexpected results
   - Validate data quality and transformations
3. Iterate: If a cell doesn't work as expected:
   - Share the error message with your AI assistant
   - Describe what you expected vs. what happened
   - Ask for corrections or alternative approaches
4. Move to the next cell only after the current cell works correctly
5. Repeat until the notebook is complete

---

## üí° Tips for Working with AI

**Be Specific:**
- Provide exact table names and column names
- Specify date ranges and time windows
- Define thresholds and scoring criteria clearly

**Iterate Incrementally:**
- Test each cell before moving to the next
- Don't try to build the entire notebook at once
- Validate intermediate results

**Share Context:**
- Show error messages in full
- Describe your data characteristics (volume, freshness, etc.)
- Explain your security objectives

**Ask Questions:**
- "Why did you choose this approach?"
- "What are the performance implications?"
- "How can I optimize this query?"
- "What edge cases should I consider?"

---

## üìù Next Steps

1. **Fill in your task description** in the section above
2. **Share the reference links** with your AI assistant
3. **Start with Phase 1**: Request a design document
4. **Follow the workflow**: Design ‚Üí Plan ‚Üí Implement
5. **Document as you go**: Save DESIGN.md and IMPLEMENTATION.md files

---

## üéì Example Prompts

**For Design Phase:**
```
Based on my description above and using the Microsoft Sentinel documentation 
links provided, create a detailed design document for this notebook. Include:
- Required tables and their schemas
- Analysis methodology
- Expected output format
- Assumptions and limitations
```

**For Implementation Phase:**
```
Create a step-by-step implementation plan for the notebook. Break it down
into numbered cells with:
- Cell purpose and description
- Required PySpark operations
- Expected output
- Validation checks
```

**For Code Generation:**
```
Implement cell #3 from the implementation plan. Include:
- Complete working code
- Inline comments explaining the logic
- Error handling
- Print statements showing progress
```

---

**Ready to begin? Start by describing your task above, then engage your AI assistant!**

---

## Cell 1: Setup & Configuration

*This cell will be created by your AI assistant based on your design and implementation plan.*

**Typical contents:**
- Import required libraries (PySpark functions, Sentinel provider, etc.)
- Initialize Microsoft Sentinel provider
- Define configuration parameters (workspace name, time windows, etc.)
- Set analysis parameters and thresholds
- Display configuration summary

---

**Ask your AI assistant:** *"Create cell 1 for setup and configuration according to the implementation plan."*

In [None]:
# Your AI assistant will generate code here
# Example structure:
#
# from sentinel_lake.providers import MicrosoftSentinelProvider
# from pyspark.sql.functions import col, count, when, lit
#
# WORKSPACE_NAME = "<YOUR_WORKSPACE_NAME>"
# ANALYSIS_DAYS = 30
#
# sentinel_provider = MicrosoftSentinelProvider(spark)
# print(f"Configuration loaded: {ANALYSIS_DAYS} day analysis window")

## Cell 2: Load Data from Sentinel Tables

*This cell loads the required data from Microsoft Sentinel tables.*

**Typical contents:**
- Read data from one or more Sentinel tables using `sentinel_provider.read_table()`
- Apply initial filters (date range, user types, status codes, etc.)
- Select relevant columns
- Cache the data with `.persist()` for performance
- Display row count and sample data

**Common tables:**
- `SigninLogs` - User authentication events
- `AuditLogs` - Administrative actions
- `SecurityAlert` - Security alerts
- `EntraUsers` - User profile information

---

**Ask your AI assistant:** *"Create cell 2 to load data from the required Sentinel tables."*

In [None]:
# Your AI assistant will generate code here
# Example structure:
#
# print("Loading data from SigninLogs...")
# 
# data_df = (
#     sentinel_provider.read_table('SigninLogs', WORKSPACE_NAME)
#     .filter(
#         (col("TimeGenerated") >= expr(f"current_timestamp() - INTERVAL {ANALYSIS_DAYS} DAYS"))
#     )
#     .select("UserId", "UserPrincipalName", "IPAddress", "ResultType")
#     .persist()
# )
# 
# print(f"Loaded {data_df.count()} records")
# data_df.show(5)

## Cells 3-N: Process and Analyze Data

*These cells transform, analyze, and visualize your data.*

**Common processing operations:**

### Data Transformation
- Filter and clean data
- Group and aggregate (`.groupBy()`, `.agg()`)
- Join multiple data sources
- Calculate metrics and scores (`.withColumn()`, `when().otherwise()`)
- Create derived fields

### Analysis
- Statistical calculations (counts, averages, percentiles)
- Pattern detection (time-based, location-based, behavioral)
- Risk scoring and classification
- Anomaly identification
- Correlation analysis

### Visualization
- Bar charts and histograms
- Time series plots
- Distribution charts
- Heatmaps and pivot tables

---

**Ask your AI assistant:** *"Create cell 3 to [describe specific operation]."*

**Repeat for each processing step, testing each cell before moving to the next.**

In [None]:
# Processing cell 1
# Your AI assistant will generate code for the first processing step
#
# Example: Aggregation
# aggregated_df = (
#     data_df
#     .groupBy("UserId", "UserPrincipalName")
#     .agg(
#         count("*").alias("event_count"),
#         countDistinct("IPAddress").alias("unique_ips")
#     )
# )
# aggregated_df.show(10)

In [None]:
# Processing cell 2
# Your AI assistant will generate code for the next processing step
#
# Example: Scoring
# scored_df = (
#     aggregated_df
#     .withColumn("risk_score",
#         when(col("unique_ips") > 10, 20)
#         .when(col("unique_ips") > 5, 10)
#         .otherwise(0)
#     )
# )
# scored_df.orderBy(col("risk_score").desc()).show(10)

In [None]:
# Processing cell 3 (and more as needed)
# Continue adding cells for additional processing steps
#
# Example: Visualization
# import matplotlib.pyplot as plt
#
# dist_df = scored_df.groupBy("risk_level").count().toPandas()
# plt.bar(dist_df["risk_level"], dist_df["count"])
# plt.title("Risk Level Distribution")
# plt.show()

## Final Cell: Save Results

*This cell saves your analysis results to a custom Sentinel table.*

**Typical contents:**
- Prepare final output DataFrame with all required columns
- Add metadata columns (TimeGenerated, calculation timestamp, etc.)
- Order results appropriately
- Write to custom table using `sentinel_provider.save_as_table()`
- Handle permission errors gracefully
- Provide KQL query examples for accessing the saved data

**Custom Table Naming:**
- Use suffix `_SPRK` for Spark-generated tables
- Example: `FailedSigninAnalysis_SPRK`

**Required Permissions:**
- Microsoft Sentinel Contributor role, OR
- Storage Blob Data Contributor role

---

**Ask your AI assistant:** *"Create the final cell to save results to a custom Sentinel table."*

In [None]:
# Your AI assistant will generate code here
# Example structure:
#
# from pyspark.sql.functions import current_timestamp
#
# CUSTOM_TABLE_NAME = "YourAnalysisResults_SPRK"
#
# output_df = (
#     scored_df
#     .withColumn("TimeGenerated", current_timestamp())
#     .withColumn("calculation_date", current_timestamp())
#     .orderBy(col("risk_score").desc())
# )
#
# try:
#     sentinel_provider.save_as_table(
#         output_df,
#         CUSTOM_TABLE_NAME,
#         write_options={"mode": "overwrite", "mergeSchema": "true"}
#     )
#     print(f"‚úÖ Successfully saved {output_df.count()} records to {CUSTOM_TABLE_NAME}")
#     print(f"\nQuery in KQL: {CUSTOM_TABLE_NAME} | take 10")
# except Exception as e:
#     print(f"‚ùå Could not write to table. Results available in output_df variable.")
#     print(f"Export with: output_df.toPandas().to_csv('results.csv', index=False)")