Skip to content

lponik/phishTriage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PhishTriage

PhishTriage is an explainable, offline phishing triage CLI for raw .eml files. It’s built to surface suspicious patterns quickly and produce analyst-readable findings with a simple risk score.

At A Glance

  • Offline CLI that triages raw emails and explains why something looks suspicious.
  • Focused on clear, readable findings instead of black-box scoring.

What It Does

  • Takes raw .eml emails and surfaces the signals an analyst would look for.
  • Checks for common phishing patterns and outputs evidence-backed findings.
  • Produces a simple risk score and a triage verdict to guide next steps.
  • Generates human-readable and machine-readable reports for review.

Core Functions

  • parse_eml: Turns a raw email into a consistent record for inspection.
  • extract_urls: Surfaces destinations so links can be reviewed quickly.
  • run_detectors: Applies targeted checks that map to known phishing behaviors.
  • compute_risk_score: Summarizes how many high-risk signals were found.
  • verdict_for_score: Translates the score into an action-oriented verdict.
  • render_terminal / render_json: Delivers results for humans or automation.
  • write_findings_csv / write_summary_md: Supports bulk triage and trend spotting.

Detectors And What They Signal

Each detector represents a known phishing pattern. The goal is explainability over cleverness.

  • REPLY_TO_MISMATCH: Flags when the Reply-To domain doesn’t match the From domain. This is a classic indicator of sender impersonation and reply hijacking.
  • URL_SHORTENER: Shortened links can hide the true destination and bypass casual inspection.
  • URL_IP_HOST: Raw IP URLs are often used to avoid domain-based reputation checks.
  • URL_PUNYCODE: Punycode domains can enable homograph attacks (lookalike characters).
  • SUSPICIOUS_TLD: Certain TLDs are disproportionately abused in phishing campaigns.
  • URGENCY_KEYWORDS: Urgent language is a social-engineering pressure tactic.
  • SENDER_LINK_DOMAIN_MISMATCH: If a credential-themed email links off-domain, that is suspicious even without a known brand.
  • HTML_LOGIN_FORM: Embedded password fields in HTML emails indicate credential harvesting.
  • EML_ATTACHMENT_CREDENTIAL_PHISH: Attached EMLs containing credential language and the recipient address are common in phishing kits.
  • SHORT_BODY_WITH_EML_ATTACHMENT: Minimal bodies with EML attachments often hide the real lure in the attachment.
  • SUSPICIOUS_ATTACHMENT: Executables, scripts, macros, and archives are common malware delivery methods.
  • DOUBLE_EXTENSION: Double extensions are used to disguise executables as safe documents.
  • DOMAIN_TYPOSQUAT: Small edit distance from known brand domains indicates typosquatting.
  • BRAND_LINK_MISMATCH: Brand mention plus off-brand links is a strong impersonation signal.

Quick Start

pip install -e .
phishtriage analyze tests/fixtures/phish_1.eml
phishtriage analyze tests/fixtures/phish_1.eml --json
phishtriage analyze-dir tests/fixtures --out reports/

Output Notes

  • Terminal output includes verdict, score, and evidence for each finding.
  • JSON output is structured for automation.
  • findings.csv supports bulk triage.
  • summary.md provides batch-level statistics.

Potential Bypasses And Drawbacks

This tool is intentionally lightweight and offline. That makes it portable and explainable, but it also creates gaps attackers can exploit.

  • Heuristic-based only. Well-crafted phishing that avoids these signals can pass.
  • No SPF/DKIM/DMARC checks. Sender authentication signals aren’t evaluated.
  • No live enrichment. Compromised legitimate domains or recently registered lookalikes won’t be flagged by reputation.
  • Naive registrable-domain parsing (no Public Suffix List). Some subdomain tricks may evade checks or cause false positives.
  • Limited brand list. Phishers impersonating lesser-known brands may not trigger brand-specific rules.
  • HTML and URL tricks like encoded/obfuscated links, redirection chains, or JavaScript-based redirects are only partially addressed.
  • Social engineering without links or attachments can evade many checks.

Bottom line: this is useful for general phishing triage and educational analysis, but it is not a comprehensive anti-phishing system.

Project Structure

  • phishtriage/core/parse_email.py: Email parsing and attachment extraction.
  • phishtriage/core/extract_urls.py: URL parsing and normalization.
  • phishtriage/core/detectors.py: Rule-based phishing detectors.
  • phishtriage/core/scoring.py: Risk scoring and verdict mapping.
  • phishtriage/core/report.py: Terminal, JSON, CSV, and summary outputs.
  • phishtriage/cli.py: CLI entrypoint.
  • tests/: Unit tests and .eml fixtures.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages