IntelExtractor uses SecureBERT 2.0 - Cisco's domain-specific AI language model for cybersecurity - to automatically extract Indicators of Compromise (IOCs) from threat intelligence reports, DFIR documents, and web pages.
| Category | Examples |
|---|---|
| Indicators | IP addresses, domains, hashes, CVEs, emails |
| Malware | Malware families (Emotet, LockBit, Cobalt Strike) |
| Vulnerabilities | CVE IDs (CVE-2024-12345) |
| Organizations | Threat groups, security companies |
| Systems | Software, platforms, services |
-
Multiple Input Sources
- 📝 Text input - Paste threat intelligence directly
- 📁 File upload - PDF, TXT, CSV, JSON, MD files
- 🌐 URL scraping - Fetch content from web pages
-
Smart Processing
- Chunked processing for large documents
- Deduplication of extracted IOCs
- Extraction history with local storage
-
Easy Export
- Download extracted IOCs as text file
Python 3.10+
torch>=2.1.0
transformers>=4.36.0
streamlit
pdfplumber
beautifulsoup4
requests
pandas
- Clone the repository
git clone https://github.com/stimway9-ops/IntelExtractor.git
cd IntelExtractor- Install dependencies
pip install -r requirements.txt- Run the application
python -m streamlit run app.py- Open browser
Navigate to:
http://localhost:8501
Run the Streamlit app and use the tabs:
- Text Input Tab - Paste any threat report text
- File Upload Tab - Upload PDF reports, log files, IOC exports
- URL Tab - Enter a URL to scrape and extract IOCs
python extract_iocs.pyThen paste or type your threat intelligence text when prompted.
┌─────────────────┐ ┌──────────────────────┐ ┌─────────────────┐
│ Input Source │ ──▶ │ SecureBERT 2.0 NER │ ──▶ │ IOC Categories │
│ (Text/File/URL)│ │ AI Model │ │ (5 entity types│
└─────────────────┘ └──────────────────────┘ └─────────────────┘
The SecureBERT 2.0 NER model (cisco-ai/SecureBERT2.0-NER) is:
- Fine-tuned on cybersecurity corpus
- Achieves 94.5% F1-score on NER tasks
- Recognizes 5 entity types specific to threat intelligence
After the initial model download from HuggingFace, text processing happens locally on your machine. Set offline mode for complete privacy:
export HF_HUB_OFFLINE=1
export TRANSFORMERS_OFFLINE=1IntelExtractor/
├── app.py # Streamlit web UI
├── extract_iocs.py # CLI extraction script
├── requirements.txt # Python dependencies
└── README.md # This file
Input text:
The Emotet malware is being distributed via malicious documents.
Researchers at Cisco Talos identified the campaign targeting financial
institutions. The attack used Cobalt Strike beacon at 10.0.0.25 communicating
with evil.example.net. Vulnerability CVE-2021-44228 was exploited.
Output:
- Indicators: 10.0.0.25, evil.example.net, CVE-2021-44228
- Malware: Emotet, Cobalt Strike beacon
- Organizations: Cisco Talos, financial institutions
Apache License 2.0 - See LICENSE file for details.
- Cisco AI - SecureBERT 2.0 model
- HuggingFace - Model hosting by
- SecureBERT 2.0 Paper - Research paper