Skip to content

sashank/malware-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Module 06 Exercise Playbooks - README

🎓 New to These Notebooks? Start Here!

📘 Complete Student Self-Study Guide

  • Learning path recommendations
  • Setup instructions
  • Time estimates
  • Tips & troubleshooting
  • Assessment guidelines

Overview

This directory contains Python playbooks (Jupyter notebooks) for hands-on exercises in Module 06: Malware Detection and Classification.

All notebooks are now self-explanatory with:

  • ✅ Detailed learning objectives
  • ✅ Background concepts and theory
  • ✅ Step-by-step explanations
  • ✅ Inline code comments
  • ✅ Expected outputs and interpretations
  • ✅ Key insights and takeaways
  • ✅ Discussion questions for deeper learning

Available Playbooks

✓ Exercise 6.1.3A: Static vs Dynamic Analysis Triage Playbook

File: Exercise_6_1_3A_Static_Dynamic_Triage.ipynb
Focus: Design a two-stage malware triage pipeline (static → dynamic escalation)
Dataset: Simulated EMBER-style static features
Key Concepts: Escalation thresholds, SOC capacity planning, TPR@FPR metrics

✓ Exercise 6.2.2A: Byte N-grams vs Strings Feature Engineering

File: Exercise_6_2_2A_Byte_Ngrams_vs_Strings.ipynb
Focus: Compare byte n-grams vs printable strings for static detection
Dataset: Simulated PE byte sequences and strings
Key Concepts: Feature engineering, TF-IDF, error analysis, Precision@k

✓ Exercise 6.3.1A: Sandbox Report Parsing → Behavioral Features

File: Exercise_6_3_1A_Sandbox_Report_Parsing.ipynb
Focus: Parse JSON sandbox reports into ML-ready behavioral features
Dataset: Simulated sandbox reports (file/registry/network/API operations)
Key Concepts: Behavioral detection, sandbox evasion, feature importance

✓ Exercise 6.4.2A: LSTM/Sequence Model for Behavioral Classification

File: Exercise_6_4_2A_LSTM_Behavioral_Sequences.ipynb
Focus: Build LSTM model for API call sequence classification
Dataset: API call sequences from malware/benign samples
Key Concepts: Sequential modeling, embeddings, temporal patterns, deep vs classical comparison

✓ Exercise 6.5.1A: Obfuscation/Packing Stress Test

File: Exercise_6_5_1A_Obfuscation_Stress_Test.ipynb
Focus: Test static model robustness against obfuscation/packing
Dataset: EMBER-style features with simulated packing effects
Key Concepts: Model robustness, adversarial conditions, feature sensitivity, mitigation strategies

✓ Exercise 6.5.2A: Polymorphism → Similarity-by-Behavior

File: Exercise_6_5_2A_Polymorphic_Clustering.ipynb
Focus: Group polymorphic variants using behavioral similarity
Dataset: API call sequences from polymorphic malware families
Key Concepts: Clustering, Jaccard/cosine similarity, variant detection, threat intelligence

Usage Instructions

Prerequisites

pip install pandas numpy scikit-learn matplotlib seaborn tensorflow keras

Running Notebooks

# Option 1: Local Jupyter
jupyter notebook

# Option 2: Google Colab
# Upload .ipynb files to Colab and run in browser

# Option 3: VS Code
# Open .ipynb files with Jupyter extension

Dataset Notes

  • All playbooks use simulated/synthetic data for safety
  • Real EMBER dataset can be substituted where noted
  • No live malware execution required
  • Sandbox reports are JSON simulations

Learning Path

Recommended Order:

  1. 6.1.3A (Beginner) → Understand triage pipeline design
  2. 6.2.2A (Intermediate) → Master static feature engineering
  3. 6.3.1A (Intermediate) → Learn behavioral feature extraction
  4. 6.4.2A (Advanced) → Deep learning for sequences
  5. 6.5.1A (Intermediate) → Test model robustness
  6. 6.5.2A (Intermediate) → Handle polymorphic threats

Key Takeaways

Each playbook demonstrates:

  • ✓ Realistic security scenarios and operational constraints
  • ✓ End-to-end pipelines from data → features → models → evaluation
  • ✓ SOC-aligned metrics (Precision@k, TPR@FPR, alert volume)
  • ✓ Error analysis and failure mode understanding
  • ✓ Integration with NIST/MITRE frameworks

Additional Resources

Support

For questions or issues:

  1. Check inline documentation in each notebook
  2. Review the main Exercises06.md for exercise specifications
  3. Consult Chapter06_Malware_Detection_Classification.md for theory

Generated for AI/ML Cybersecurity Course - Module 06

About

Notebooks to Learn Malware Detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published