📘 Complete Student Self-Study Guide
- Learning path recommendations
- Setup instructions
- Time estimates
- Tips & troubleshooting
- Assessment guidelines
This directory contains Python playbooks (Jupyter notebooks) for hands-on exercises in Module 06: Malware Detection and Classification.
All notebooks are now self-explanatory with:
- ✅ Detailed learning objectives
- ✅ Background concepts and theory
- ✅ Step-by-step explanations
- ✅ Inline code comments
- ✅ Expected outputs and interpretations
- ✅ Key insights and takeaways
- ✅ Discussion questions for deeper learning
File: Exercise_6_1_3A_Static_Dynamic_Triage.ipynb
Focus: Design a two-stage malware triage pipeline (static → dynamic escalation)
Dataset: Simulated EMBER-style static features
Key Concepts: Escalation thresholds, SOC capacity planning, TPR@FPR metrics
File: Exercise_6_2_2A_Byte_Ngrams_vs_Strings.ipynb
Focus: Compare byte n-grams vs printable strings for static detection
Dataset: Simulated PE byte sequences and strings
Key Concepts: Feature engineering, TF-IDF, error analysis, Precision@k
File: Exercise_6_3_1A_Sandbox_Report_Parsing.ipynb
Focus: Parse JSON sandbox reports into ML-ready behavioral features
Dataset: Simulated sandbox reports (file/registry/network/API operations)
Key Concepts: Behavioral detection, sandbox evasion, feature importance
File: Exercise_6_4_2A_LSTM_Behavioral_Sequences.ipynb
Focus: Build LSTM model for API call sequence classification
Dataset: API call sequences from malware/benign samples
Key Concepts: Sequential modeling, embeddings, temporal patterns, deep vs classical comparison
File: Exercise_6_5_1A_Obfuscation_Stress_Test.ipynb
Focus: Test static model robustness against obfuscation/packing
Dataset: EMBER-style features with simulated packing effects
Key Concepts: Model robustness, adversarial conditions, feature sensitivity, mitigation strategies
File: Exercise_6_5_2A_Polymorphic_Clustering.ipynb
Focus: Group polymorphic variants using behavioral similarity
Dataset: API call sequences from polymorphic malware families
Key Concepts: Clustering, Jaccard/cosine similarity, variant detection, threat intelligence
pip install pandas numpy scikit-learn matplotlib seaborn tensorflow keras# Option 1: Local Jupyter
jupyter notebook
# Option 2: Google Colab
# Upload .ipynb files to Colab and run in browser
# Option 3: VS Code
# Open .ipynb files with Jupyter extension- All playbooks use simulated/synthetic data for safety
- Real EMBER dataset can be substituted where noted
- No live malware execution required
- Sandbox reports are JSON simulations
Recommended Order:
- 6.1.3A (Beginner) → Understand triage pipeline design
- 6.2.2A (Intermediate) → Master static feature engineering
- 6.3.1A (Intermediate) → Learn behavioral feature extraction
- 6.4.2A (Advanced) → Deep learning for sequences
- 6.5.1A (Intermediate) → Test model robustness
- 6.5.2A (Intermediate) → Handle polymorphic threats
Each playbook demonstrates:
- ✓ Realistic security scenarios and operational constraints
- ✓ End-to-end pipelines from data → features → models → evaluation
- ✓ SOC-aligned metrics (Precision@k, TPR@FPR, alert volume)
- ✓ Error analysis and failure mode understanding
- ✓ Integration with NIST/MITRE frameworks
- EMBER Dataset: https://github.com/elastic/ember
- Cuckoo Sandbox: https://cuckoosandbox.org/
- MITRE ATT&CK: https://attack.mitre.org/
- Malware Analysis Course: https://malwareunicorn.org/
For questions or issues:
- Check inline documentation in each notebook
- Review the main
Exercises06.mdfor exercise specifications - Consult
Chapter06_Malware_Detection_Classification.mdfor theory
Generated for AI/ML Cybersecurity Course - Module 06