# Threat Intelligence Data Collection for LLM-TIKG

This notebook implements comprehensive data collection for Threat Intelligence Knowledge Graph (TIKG) construction using Large Language Models, following the LLM-TIKG paper methodology.

## Overview

The notebook covers:
1. **Web scraping setup** for major threat intelligence platforms
2. **Data cleaning and preprocessing** functions
3. **Text extraction** with paragraph structure preservation
4. **Data validation** and quality checks
5. **Structured storage** for downstream processing

## Target Platforms
- Symantec Security Center
- Fortinet Threat Intelligence
- TrendMicro Security Blog
- CISA Advisories

## Methodology
Following the LLM-TIKG paper, we focus on collecting high-quality threat intelligence data that preserves:
- Contextual relationships between entities
- Temporal information
- Technical details and indicators
- Structured narrative flow


## 1. Environment Setup and Imports


In [1]:
# Core libraries
import requests
from bs4 import BeautifulSoup, Tag
from typing import cast
import pandas as pd
import numpy as np
import json
import re
import time
import random
import logging
from datetime import datetime, timedelta
from pathlib import Path
from typing import Dict, List, Optional, Tuple, Any
from urllib.parse import urljoin, urlparse
import warnings

# NLP libraries
import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
from nltk.chunk import ne_chunk
from nltk.tag import pos_tag

# Web scraping utilities
from fake_useragent import UserAgent
import lxml

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Suppress warnings
warnings.filterwarnings('ignore')

# Download required NLTK data
nltk_downloads = ['punkt', 'stopwords', 'averaged_perceptron_tagger', 'maxent_ne_chunker', 'words']
for item in nltk_downloads:
    try:
        nltk.data.find(f'tokenizers/{item}')
    except LookupError:
        nltk.download(item, quiet=True)

print("✅ All required libraries imported successfully")


✅ All required libraries imported successfully


## 2. Configuration and Setup

Setting up logging, data directories, and scraping configurations following security best practices.


In [2]:
# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('../data/raw/scraping.log'),
        logging.StreamHandler()
    ]
)
logger = logging.getLogger(__name__)

# Data directories
DATA_DIR = Path('../data')
RAW_DATA_DIR = DATA_DIR / 'raw'
PROCESSED_DATA_DIR = DATA_DIR / 'processed'

# Create directories if they don't exist
for dir_path in [RAW_DATA_DIR, PROCESSED_DATA_DIR]:
    dir_path.mkdir(parents=True, exist_ok=True)

# Scraping configuration
SCRAPING_CONFIG = {
    'rate_limit': 2.0,  # seconds between requests
    'timeout': 30,      # request timeout
    'max_retries': 3,   # maximum retry attempts
    'batch_size': 50,   # articles per batch
    'headers': {
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language': 'en-US,en;q=0.5',
        'Accept-Encoding': 'gzip, deflate',
        'Connection': 'keep-alive',
        'Upgrade-Insecure-Requests': '1',
    }
}

# Initialize user agent rotator
ua = UserAgent()

print("✅ Configuration setup complete")
print(f"📁 Raw data directory: {RAW_DATA_DIR}")
print(f"📁 Processed data directory: {PROCESSED_DATA_DIR}")


✅ Configuration setup complete
📁 Raw data directory: ..\data\raw
📁 Processed data directory: ..\data\processed


## 3. Core Web Scraping Infrastructure

Building robust web scraping infrastructure with rate limiting, error handling, and retry mechanisms.


In [3]:
class ThreatIntelligenceScraper:
    """
    A robust web scraper for threat intelligence data collection.
    Implements rate limiting, error handling, and data validation.
    """
    
    def __init__(self, config: Dict[str, Any]):
        self.config = config
        self.session = requests.Session()
        self.session.headers.update(config['headers'])
        self.last_request_time = 0
        self.ua = UserAgent()
        
    def _rate_limit(self):
        """Implement rate limiting between requests with random delay."""
        elapsed = time.time() - self.last_request_time
        base_sleep_time = self.config['rate_limit'] - elapsed
        
        # Add random delay if configured
        if 'random_delay' in self.config:
            min_delay, max_delay = self.config['random_delay']
            random_delay = min_delay + (max_delay - min_delay) * np.random.random()
            base_sleep_time += random_delay
        
        if base_sleep_time > 0:
            time.sleep(base_sleep_time)
        self.last_request_time = time.time()
    
    def _get_page(self, url: str, max_retries: int = 0) -> Optional[BeautifulSoup]:
        """Fetch a web page with enhanced retry logic and error handling."""
        if max_retries == 0:
            max_retries = self.config['max_retries'] or 1
        
        for attempt in range(max_retries + 1):
            try:
                self._rate_limit()
                
                # Rotate user agent and add additional headers
                self.session.headers.update({
                    'User-Agent': self.ua.random,
                    'Cache-Control': 'no-cache',
                    'Pragma': 'no-cache',
                    'DNT': '1'  # Do Not Track
                })
                
                # Log request attempt
                logger.info(f"Attempting to fetch {url} (attempt {attempt + 1}/{max_retries + 1})")
                
                response = self.session.get(
                    url, 
                    timeout=self.config['timeout'],
                    allow_redirects=True,
                    verify=True  # Enforce SSL verification
                )
                
                # Check response before processing
                if response.status_code == 403:
                    logger.warning(f"Access forbidden (403) for {url}")
                    time.sleep(5 ** attempt)  # Longer delay for 403s
                    continue
                    
                response.raise_for_status()
                
                # Check content type
                content_type = response.headers.get('content-type', '').lower()
                if 'text/html' not in content_type:
                    logger.warning(f"Unexpected content type: {content_type}")
                
                # Parse with BeautifulSoup with error handling
                try:
                    soup = BeautifulSoup(response.content, 'lxml')
                    
                    # Basic validation of parsed content
                    if not soup.find('body'):
                        logger.warning("No <body> tag found in response")
                        if attempt < max_retries:
                            continue
                    
                    logger.info(f"Successfully scraped: {url}")
                    return soup
                    
                except Exception as parse_error:
                    logger.error(f"Failed to parse HTML: {str(parse_error)}")
                    if attempt < max_retries:
                        continue
                    return None
                
            except requests.exceptions.RequestException as e:
                error_msg = str(e)
                logger.warning(f"Attempt {attempt + 1} failed for {url}: {error_msg}")
                
                # Specific handling for common errors
                if "SSLError" in error_msg:
                    logger.error("SSL verification failed")
                elif "Timeout" in error_msg:
                    logger.error("Request timed out")
                elif "ConnectionError" in error_msg:
                    logger.error("Failed to establish connection")
                
                if attempt < max_retries:
                    # Exponential backoff with jitter
                    delay = (2 ** attempt) + (random.random() * 2)
                    time.sleep(delay)
                else:
                    logger.error(f"Failed to scrape {url} after {max_retries + 1} attempts")
                    return None
    
    def extract_text_content(self, soup: BeautifulSoup, selectors: Dict[str, str]) -> Dict[str, Any]:
        """Extract structured text content from HTML using CSS selectors."""
        content = {}
        
        for field, selector in selectors.items():
            try:
                elements = soup.select(selector)
                if elements:
                    if field in ['title', 'date', 'author']:
                        content[field] = elements[0].get_text(strip=True)
                    else:
                        # Preserve paragraph structure for main content
                        content[field] = [elem.get_text(strip=True) for elem in elements if elem.get_text(strip=True)]
                else:
                    content[field] = None
            except Exception as e:
                logger.warning(f"Failed to extract {field}: {str(e)}")
                content[field] = None
        
        return content
    
    def validate_content(self, content: Dict[str, Any]) -> bool:
        """Validate extracted content quality."""
        # Check if essential fields are present
        required_fields = ['title', 'content']
        for field in required_fields:
            if not content.get(field):
                return False
        
        # Check content length (avoid very short articles)
        if isinstance(content['content'], list):
            total_length = sum(len(p) for p in content['content'])
        else:
            total_length = len(content['content']) if content['content'] else 0
        
        return total_length > 50  # Reduced from 100 to 50 for testing

# Initialize scraper
scraper = ThreatIntelligenceScraper(SCRAPING_CONFIG)
print("✅ Threat Intelligence Scraper initialized")

✅ Threat Intelligence Scraper initialized


## 4. Platform-Specific Scrapers

Implementing specialized scrapers for each threat intelligence platform with their unique HTML structures.


In [4]:
# Platform-specific scrapers
class CISAScraper:
    """Enhanced scraper for CISA (Cybersecurity and Infrastructure Security Agency) advisories."""
    
    def __init__(self, base_scraper: ThreatIntelligenceScraper):
        self.scraper = base_scraper
        self.base_url = "https://www.cisa.gov"
        self.advisories_url = "https://www.cisa.gov/news-events/cybersecurity-advisories"
        
        self.selectors = {
            'title': '.title, .advisory-title, .usa-accordion__heading, h1.usa-prose',
            'date': '.published-date, .date-published, time, .usa-prose time',
            'content': '.usa-prose p, .field--type-text-with-summary p, .field--name-body p, article p',
            'severity': '.severity, .risk-level, .tlp-label',
            'advisory_id': '.advisory-id, .alert-code, .reference-number'
        }
    
    def get_article_links(self, max_pages: int = 5) -> List[str]:
        """Extract article links from CISA advisories with pagination support."""
        links = []
        
        try:
            for page in range(1, max_pages + 1):
                page_url = f"{self.advisories_url}?page={page}" if page > 1 else self.advisories_url
                print(f"   Fetching CISA page {page}/{max_pages}: {page_url}")
                
                soup = self.scraper._get_page(page_url)
                if not soup:
                    print(f"   ⚠️ Failed to fetch page {page}")
                    break
                
                # Try multiple selector patterns
                selectors = [
                    'a[href*="/advisory/"]',
                    'a[href*="/alert/"]',
                    '.views-row a',
                    '.usa-collection__item a',
                    'article a'
                ]
                
                page_links_count = 0
                for selector in selectors:
                    article_links = soup.select(selector)
                    for link in article_links:
                        href = link.get('href')
                        if href and ('/advisory/' in href or '/alert/' in href):
                            full_url = urljoin(self.base_url, href)
                            if full_url not in links:
                                links.append(full_url)
                                page_links_count += 1
                
                print(f"      Found {page_links_count} new articles on page {page}")
                if page_links_count == 0:  # No more articles found
                    break
                            
        except Exception as e:
            print(f"   ❌ Error getting CISA article links: {str(e)}")
            logger.error(f"CISA link extraction failed: {str(e)}")
        
        print(f"   Total CISA articles found: {len(links)}")
        return links

print("✅ CISA scrapers initialized")


✅ CISA scrapers initialized


In [11]:
class FortinetScraper:
    """Corrected scraper for Fortinet Threat Research blog using URL path extensions."""
    
    def __init__(self, base_scraper):
        self.scraper = base_scraper
        self.base_url = "https://www.fortinet.com"
        self.blog_url = "https://www.fortinet.com/blog/threat-research"
        self.load_more_url = "https://www.fortinet.com/content/fortinet-blog/us/en/threat-research/jcr:content/root/bloglist"
        
        self.selectors = {
            'title': 'h1, h2, .title, .headline',
            'date': '.date, .time, .published, time',
            'content': 'p, .content, .text, .body',
            'author': '.author, .byline, .writer',
            'category': '.category, .tags, .topic'
        }
    
    def get_article_links(self, max_pages: int = 30) -> List[str]:
        """Extract article links from Fortinet blog using URL path extensions for pagination."""
        links = []
        
        try:
            # First get the initial page (page 0)
            print(f"   Fetching initial page: {self.blog_url}")
            soup = self.scraper._get_page(self.blog_url)
            if soup:
                # Find article links on the first page
                article_links = soup.find_all('a', href=True)
                for link in article_links:
                    if isinstance(link, Tag):
                        href = link.get('href')
                        if isinstance(href, str):
                            href_lower = href.lower()
                            if '/blog/' in href_lower and 'threat' in href_lower:
                                if href.startswith('/'):
                                    full_url = urljoin(self.base_url, href)
                                else:
                                    full_url = href
                                if full_url not in links and 'threat-research' in full_url:
                                    links.append(full_url)
                
                print(f"      Found {len(links)} articles on initial page")
                
                # Now use the URL path extension approach for pagination
                for page in range(1, max_pages + 1):
                    print(f"   Loading page {page}/{max_pages}")
                    
                    # Use URL path extension approach
                    page_url = f"{self.load_more_url}.{page}"
                    
                    try:
                        self.scraper._rate_limit()
                        response = self.scraper.session.get(
                            page_url,
                            timeout=self.scraper.config['timeout']
                        )
                        response.raise_for_status()
                        
                        # Parse HTML content from response
                        soup = BeautifulSoup(response.text, 'lxml')
                        article_links = soup.find_all('a', href=True)
                        
                        new_links = 0
                        for link in article_links:
                            if isinstance(link, Tag):
                                href = link.get('href')
                                if isinstance(href, str):
                                    href_lower = href.lower()
                                    if '/blog/' in href_lower and 'threat' in href_lower:
                                        if href.startswith('/'):
                                            full_url = urljoin(self.base_url, href)
                                        else:
                                            full_url = href
                                        if full_url not in links and 'threat-research' in full_url:
                                            links.append(full_url)
                                            new_links += 1
                        
                        print(f"      Found {new_links} new articles on page {page}")
                        
                        if new_links == 0:
                            print("      No more new articles found")
                            break
                        
                        # Stop if we've collected enough articles
                        if len(links) >= 300:  # Safety limit
                            print(f"      Reached article limit (300)")
                            break
                            
                    except Exception as e:
                        print(f"      Error loading page {page}: {str(e)}")
                        break
                            
        except Exception as e:
            print(f"   ❌ Error getting Fortinet article links: {str(e)}")
            logger.error(f"Fortinet link extraction failed: {str(e)}")
        
        print(f"   Total Fortinet articles found: {len(links)}")
        return links
    
    def scrape_article(self, url: str) -> Optional[Dict[str, Any]]:
        """Scrape a single Fortinet blog post with improved validation."""
        print(f"\n🔍 Scraping Fortinet article: {url}")
        
        soup = self.scraper._get_page(url)
        if not soup:
            print("   ❌ Failed to fetch article")
            return None
        
        content = self.scraper.extract_text_content(soup, self.selectors)
        content['url'] = url
        content['source'] = 'Fortinet'
        content['scraped_at'] = datetime.now().isoformat()
        
        if self.scraper.validate_content(content):
            print("   ✅ Successfully scraped article")
            print(f"      Title: {content.get('title', 'N/A')[:100]}")
            print(f"      Content length: {len(str(content.get('content', '')))}")
            return content
            
        print("   ❌ Content validation failed")
        return None

print("✅  Fortinet scraper initialized")


✅  Fortinet scraper initialized


In [6]:
class SymantecScraper:
    """Enhanced scraper for Symantec Security Center blog posts with pagination."""
    
    def __init__(self, base_scraper: ThreatIntelligenceScraper):
        self.scraper = base_scraper
        self.base_url = "https://symantec-enterprise-blogs.security.com"
        self.blog_url = "https://symantec-enterprise-blogs.security.com/blogs/threat-research"
        
        self.selectors = {
            'title': 'h1, .post-title, .entry-title',
            'date': '.date, .published, .post-date',
            'content': '.post-content p, .entry-content p, .content p',
            'author': '.author, .byline',
            'tags': '.tags, .categories'
        }
    
    def get_article_links(self, max_pages: int = 30) -> List[str]:
        """Extract article links from Symantec blog with pagination support."""
        links = []
        
        try:
            for page in range(1, max_pages + 1):
                # Try different pagination URL patterns
                page_urls = [
                    f"{self.blog_url}?page={page-1}" if page > 1 else self.blog_url,
                    f"{self.blog_url}/page/{page}" if page > 1 else self.blog_url,
                    f"{self.blog_url}?p={page}" if page > 1 else self.blog_url
                ]
                
                page_found = False
                for page_url in page_urls:
                    print(f"   Trying page {page}/{max_pages}: {page_url}")
                    
                    soup = self.scraper._get_page(page_url)
                    if not soup:
                        continue
                    
                    # Try multiple selector patterns
                    selectors = [
                        'article a',
                        '.post-title a',
                        '.entry-title a',
                        'a[href*="/blogs/threat-research/"]'
                    ]
                    
                    page_links_count = 0
                    for selector in selectors:
                        article_links = soup.select(selector)
                        for link in article_links:
                            if isinstance(link, Tag):
                                href = link.get('href')
                                if isinstance(href, str) and ('/blogs/' in href or '/threat-research/' in href):
                                    if href.startswith('/'):
                                        full_url = urljoin(self.base_url, href)
                                    else:
                                        full_url = href
                                    if full_url not in links and 'threat-research' in full_url:
                                        links.append(full_url)
                                        page_links_count += 1
                    
                    if page_links_count > 0:
                        print(f"      Found {page_links_count} new articles")
                        page_found = True
                        break  # Found articles on this page, no need to try other URL patterns
                    
                if not page_found:
                    print(f"      No articles found on page {page}, stopping pagination")
                    break  # No articles found with any URL pattern, likely reached the end
                
                # Stop if we've collected enough articles
                if len(links) >= 300:  # Safety limit
                    print(f"      Reached article limit (300)")
                    break
                            
        except Exception as e:
            print(f"   ❌ Error getting Symantec article links: {str(e)}")
            logger.error(f"Symantec link extraction failed: {str(e)}")
        
        print(f"   Total Symantec articles found: {len(links)}")
        return links
    
    def scrape_article(self, url: str) -> Optional[Dict[str, Any]]:
        """Scrape a single Symantec blog post with improved validation."""
        print(f"\n🔍 Scraping Symantec article: {url}")
        
        soup = self.scraper._get_page(url)
        if not soup:
            print("   ❌ Failed to fetch article")
            return None
        
        content = self.scraper.extract_text_content(soup, self.selectors)
        content['url'] = url
        content['source'] = 'Symantec'
        content['scraped_at'] = datetime.now().isoformat()
        
        if self.scraper.validate_content(content):
            print("   ✅ Successfully scraped article")
            print(f"      Title: {content.get('title', 'N/A')[:100]}")
            print(f"      Content length: {len(str(content.get('content', '')))}")
            return content
            
        print("   ❌ Content validation failed")
        return None

print("✅ Symantec scrapers initialized")

✅ Symantec scrapers initialized


## 5. Text Processing and Preprocessing

Advanced text processing functions to clean, normalize, and structure the collected threat intelligence data.


In [7]:
class ThreatIntelligenceProcessor:
    """
    Advanced text processor for threat intelligence data.
    Handles cleaning, normalization, and structure preservation.
    """
    
    def __init__(self):
        self.stop_words = set(stopwords.words('english'))
        
        # Threat intelligence specific patterns
        self.patterns = {
            'ip_address': re.compile(r'\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b'),
            'domain': re.compile(r'\b[a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.[a-zA-Z]{2,}\b'),
            'hash_md5': re.compile(r'\b[a-fA-F0-9]{32}\b'),
            'hash_sha1': re.compile(r'\b[a-fA-F0-9]{40}\b'),
            'hash_sha256': re.compile(r'\b[a-fA-F0-9]{64}\b'),
            'cve': re.compile(r'CVE-\d{4}-\d{4,7}', re.IGNORECASE),
            'email': re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'),
            'url': re.compile(r'https?://(?:[-\w.])+(?:[:\d]+)?(?:/(?:[\w/_.])*(?:\?(?:[\w&=%.])*)?(?:#(?:\w)*)?)?'),
        }
        
        # Threat keywords for relevance scoring
        self.threat_keywords = [
            'apt', 'advanced persistent threat', 'ransomware', 'trojan', 'backdoor',
            'botnet', 'malware', 'phishing', 'spear phishing', 'zero-day',
            'exploit', 'vulnerability', 'attack', 'campaign', 'threat actor'
        ]
    
    def clean_text(self, text: str) -> str:
        """Clean and normalize text while preserving important technical details."""
        if not text:
            return ""
        
        # Remove HTML entities and extra whitespace
        text = re.sub(r'&[a-zA-Z0-9#]+;', ' ', text)
        text = re.sub(r'\s+', ' ', text)
        
        # Preserve important punctuation in technical contexts
        text = re.sub(r'([.!?])([A-Z])', r'\1 \2', text)
        
        return text.strip()
    
    def extract_indicators(self, text: str) -> Dict[str, List[str]]:
        """Extract cybersecurity indicators from text."""
        indicators = {}
        
        for indicator_type, pattern in self.patterns.items():
            matches = pattern.findall(text)
            indicators[indicator_type] = list(set(matches))  # Remove duplicates
        
        return indicators
    
    def calculate_threat_relevance_score(self, text: str) -> float:
        """Calculate how relevant the text is to threat intelligence."""
        if not text:
            return 0.0
        
        text_lower = text.lower()
        score = 0.0
        
        # Count threat-related keywords
        for keyword in self.threat_keywords:
            count = text_lower.count(keyword)
            score += count * 0.1
        
        # Bonus for technical indicators
        indicators = self.extract_indicators(text)
        for indicator_type, matches in indicators.items():
            if matches:
                score += len(matches) * 0.2
        
        # Normalize by text length
        text_length = len(text.split())
        if text_length > 0:
            score = score / (text_length / 100)  # Per 100 words
        
        return min(score, 10.0)  # Cap at 10.0
    
    def process_article(self, article_data: Dict[str, Any]) -> Dict[str, Any]:
        """Process a complete article with all preprocessing steps."""
        processed = article_data.copy()
        
        # Clean title
        if processed.get('title'):
            processed['title'] = self.clean_text(processed['title'])
        
        # Process content
        if processed.get('content'):
            if isinstance(processed['content'], list):
                # Preserve paragraph structure
                cleaned_paragraphs = []
                for paragraph in processed['content']:
                    cleaned = self.clean_text(paragraph)
                    if len(cleaned) > 10:  # Filter out very short paragraphs
                        cleaned_paragraphs.append(cleaned)
                
                processed['paragraphs'] = cleaned_paragraphs
                processed['full_text'] = ' '.join(cleaned_paragraphs)
            else:
                # Single string content
                clean_content = self.clean_text(processed['content'])
                processed['full_text'] = clean_content
                processed['paragraphs'] = [clean_content]
        
        # Extract indicators and calculate relevance
        if processed.get('full_text'):
            processed['indicators'] = self.extract_indicators(processed['full_text'])
            processed['threat_relevance_score'] = self.calculate_threat_relevance_score(processed['full_text'])
        
        # Add processing metadata
        processed['processed_at'] = datetime.now().isoformat()
        processed['processing_version'] = '1.0'
        
        return processed

# Initialize processor
processor = ThreatIntelligenceProcessor()
print("✅ Threat Intelligence Processor initialized")
print(f"🔍 Monitoring {len(processor.patterns)} indicator types")
print(f"📝 Tracking {len(processor.threat_keywords)} threat keywords")


✅ Threat Intelligence Processor initialized
🔍 Monitoring 8 indicator types
📝 Tracking 15 threat keywords


## 6. Data Validation and Quality Checks

Comprehensive data validation system to ensure high-quality threat intelligence collection.


In [8]:
class DataValidator:
    """
    Comprehensive data validation for threat intelligence articles.
    Ensures data quality and consistency for downstream processing.
    """
    
    def __init__(self):
        self.validation_rules = {
            'min_content_length': 100,
            'max_content_length': 50000,
            'min_title_length': 5,
            'max_title_length': 200,
            'min_threat_score': 0.1,
            'required_fields': ['title', 'content', 'source', 'url']
        }
    
    def validate_structure(self, article: Dict[str, Any]) -> Tuple[bool, List[str]]:
        """Validate article structure and required fields."""
        errors = []
        
        # Check required fields
        for field in self.validation_rules['required_fields']:
            if not article.get(field):
                errors.append(f"Missing required field: {field}")
        
        # Validate title
        title = article.get('title', '')
        if len(title) < self.validation_rules['min_title_length']:
            errors.append(f"Title too short: {len(title)} chars")
        elif len(title) > self.validation_rules['max_title_length']:
            errors.append(f"Title too long: {len(title)} chars")
        
        # Validate content length
        content_length = 0
        if article.get('full_text'):
            content_length = len(article['full_text'])
        elif article.get('content'):
            if isinstance(article['content'], list):
                content_length = sum(len(p) for p in article['content'])
            else:
                content_length = len(article['content'])
        
        if content_length < self.validation_rules['min_content_length']:
            errors.append(f"Content too short: {content_length} chars")
        elif content_length > self.validation_rules['max_content_length']:
            errors.append(f"Content too long: {content_length} chars")
        
        return len(errors) == 0, errors
    
    def generate_quality_report(self, article: Dict[str, Any]) -> Dict[str, Any]:
        """Generate comprehensive quality report for an article."""
        report = {
            'article_id': article.get('url', 'unknown'),
            'source': article.get('source', 'unknown'),
            'validation_timestamp': datetime.now().isoformat()
        }
        
        # Structure validation
        structure_valid, structure_errors = self.validate_structure(article)
        report['structure'] = {
            'valid': structure_valid,
            'errors': structure_errors
        }
        
        # Calculate overall quality score
        quality_score = 0
        if structure_valid:
            quality_score += 4
        
        # Check threat relevance
        threat_score = article.get('threat_relevance_score', 0.0)
        if threat_score >= self.validation_rules['min_threat_score']:
            quality_score += 3
        
        # Check for technical indicators
        indicators = article.get('indicators', {})
        total_indicators = sum(len(inds) for inds in indicators.values())
        if total_indicators > 0:
            quality_score += 3
        
        report['overall'] = {
            'score': quality_score,
            'max_score': 10,
            'grade': self._calculate_grade(quality_score, 10),
            'approved': quality_score >= 7
        }
        
        return report
    
    def _calculate_grade(self, score: float, max_score: float) -> str:
        """Calculate letter grade based on score."""
        percentage = (score / max_score) * 100
        if percentage >= 90:
            return 'A'
        elif percentage >= 80:
            return 'B'
        elif percentage >= 70:
            return 'C'
        elif percentage >= 60:
            return 'D'
        else:
            return 'F'

# Initialize validator
validator = DataValidator()
print("✅ Data Validator initialized")
print(f"📋 Validation rules: {len(validator.validation_rules)} criteria")


✅ Data Validator initialized
📋 Validation rules: 6 criteria


In [9]:
# Export functions for data processing
def export_for_llm_training(articles: List[Dict[str, Any]]) -> Dict[str, Any]:
    """Export data in formats suitable for LLM training and knowledge graph construction."""
    
    if not articles:
        print("❌ No articles to export")
        return {}
    
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    
    # 1. Training text format (for LLM fine-tuning)
    training_texts = []
    for article in articles:
        text_entry = {
            'id': f"{article['source']}_{hash(article['url']) % 100000}",
            'source': article['source'],
            'title': article['title'],
            'text': article['full_text'],
            'metadata': {
                'url': article['url'],
                'scraped_at': article['scraped_at'],
                'indicators': article.get('indicators', {}),
                'threat_score': article.get('threat_relevance_score', 0)
            }
        }
        training_texts.append(text_entry)
    
    training_file = PROCESSED_DATA_DIR / f'llm_training_data_{timestamp}.json'
    with open(training_file, 'w', encoding='utf-8') as f:
        json.dump(training_texts, f, indent=2, ensure_ascii=False)
    
    # 2. Entity-Relationship format (for knowledge graph)
    entities_and_relations = {
        'entities': [],
        'relations': [],
        'documents': []
    }
    
    for idx, article in enumerate(articles):
        doc_id = f"doc_{idx}"
        
        # Document node
        entities_and_relations['documents'].append({
            'id': doc_id,
            'title': article['title'],
            'source': article['source'],
            'url': article['url'],
            'threat_score': article.get('threat_relevance_score', 0)
        })
        
        # Extract entities from indicators
        indicators = article.get('indicators', {})
        for indicator_type, values in indicators.items():
            for value in values:
                entity_id = f"{indicator_type}_{hash(value) % 100000}"
                
                # Entity
                entities_and_relations['entities'].append({
                    'id': entity_id,
                    'type': indicator_type,
                    'value': value
                })
                
                # Relation
                entities_and_relations['relations'].append({
                    'source': doc_id,
                    'target': entity_id,
                    'relation': 'mentions',
                    'type': indicator_type
                })
    
    # Remove duplicate entities
    seen_entities = set()
    unique_entities = []
    for entity in entities_and_relations['entities']:
        entity_key = (entity['type'], entity['value'])
        if entity_key not in seen_entities:
            seen_entities.add(entity_key)
            unique_entities.append(entity)
    entities_and_relations['entities'] = unique_entities
    
    kg_file = PROCESSED_DATA_DIR / f'knowledge_graph_data_{timestamp}.json'
    with open(kg_file, 'w', encoding='utf-8') as f:
        json.dump(entities_and_relations, f, indent=2, ensure_ascii=False)
    
    # 3. JSONL format (for streaming/batch processing)
    jsonl_file = PROCESSED_DATA_DIR / f'threat_intelligence_{timestamp}.jsonl'
    with open(jsonl_file, 'w', encoding='utf-8') as f:
        for article in articles:
            simplified_article = {
                'title': article['title'],
                'content': article['full_text'],
                'source': article['source'],
                'indicators': article.get('indicators', {}),
                'threat_score': article.get('threat_relevance_score', 0)
            }
            f.write(json.dumps(simplified_article, ensure_ascii=False) + '\n')
    
    # 4. Summary statistics
    export_summary = {
        'export_timestamp': datetime.now().isoformat(),
        'total_articles': len(articles),
        'sources': list(set(article['source'] for article in articles)),
        'total_entities': len(entities_and_relations['entities']),
        'total_relations': len(entities_and_relations['relations']),
        'files_created': {
            'llm_training': str(training_file),
            'knowledge_graph': str(kg_file),
            'jsonl_format': str(jsonl_file)
        },
        'statistics': {
            'avg_threat_score': sum(article.get('threat_relevance_score', 0) for article in articles) / len(articles),
            'total_technical_indicators': sum(sum(len(inds) for inds in article.get('indicators', {}).values()) for article in articles)
        }
    }
    
    summary_file = PROCESSED_DATA_DIR / f'export_summary_{timestamp}.json'
    with open(summary_file, 'w', encoding='utf-8') as f:
        json.dump(export_summary, f, indent=2, ensure_ascii=False)
    
    print("\n✅ Export completed successfully!")
    print("="*60)
    print(f"📊 Exported {len(articles)} articles")
    print(f"🎯 {len(entities_and_relations['entities'])} unique entities")
    print(f"🔗 {len(entities_and_relations['relations'])} relations")
    print(f"📈 Avg threat score: {export_summary['statistics']['avg_threat_score']:.2f}")
    
    print("\n📁 Files created:")
    for file_type, file_path in export_summary['files_created'].items():
        print(f"  {file_type}: {Path(file_path).name}")
    
    return export_summary

print("✅ Export functions initialized")


✅ Export functions initialized


## 7. Data Collection Execution

Execute the data collection process with monitoring and comprehensive error handling.


In [12]:
# Enhanced configuration for multi-source data collection
COLLECTION_CONFIG = {
    'max_articles_per_platform': 300,    # Maximum articles to collect per platform
    'enable_progress_tracking': True,
    'save_intermediate_results': True,
    'rate_limit_multiplier': 3.0,      # Conservative rate limiting
    'max_retries': 5,                  # More retries for reliability
    'timeout': 45,                     # Longer timeout for slow sites
    'random_delay': (1, 3)             # Random delay between requests
}

print("🌐 Multi-Platform Threat Intelligence Collection")
print("="*60)

# Initialize scrapers with enhanced configuration
live_scraping_config = SCRAPING_CONFIG.copy()
live_scraping_config.update({
    'rate_limit': SCRAPING_CONFIG['rate_limit'] * COLLECTION_CONFIG['rate_limit_multiplier'],
    'max_retries': COLLECTION_CONFIG['max_retries'],
    'timeout': COLLECTION_CONFIG['timeout'],
    'random_delay': COLLECTION_CONFIG['random_delay']
})

live_scraper = ThreatIntelligenceScraper(live_scraping_config)

# Initialize all platform scrapers with our improved versions
scrapers = {
    'CISA': CISAScraper(live_scraper),
    'Fortinet': FortinetScraper(live_scraper),
    'Symantec': SymantecScraper(live_scraper)
}

print("\n⚙️ Scraping Configuration:")
print(f"  Rate Limit: {live_scraping_config['rate_limit']} seconds")
print(f"  Max Retries: {live_scraping_config['max_retries']}")
print(f"  Timeout: {live_scraping_config['timeout']} seconds")
print(f"  Sources: {', '.join(scrapers.keys())}\n")

def collect_threat_intelligence_data():
    """Enhanced function to collect and process threat intelligence data from multiple sources."""
    
    start_time = datetime.now()
    logger.info(f"Collection started at {start_time}")
    
    all_collected = []
    sources_status = {}
    
    try:
        # Process each source
        for source_name, source_scraper in scrapers.items():
            print(f"\n📡 Collecting from {source_name}...")
            
            try:
                # Get article links with increased max_pages
                links = source_scraper.get_article_links(max_pages=30)  # Try up to 30 pages
                print(f"   Found {len(links)} potential articles")
                
                source_collected = 0
                source_attempted = min(len(links), COLLECTION_CONFIG['max_articles_per_platform'])
                
                # Process articles (limited by max_articles_per_platform)
                for i, url in enumerate(links[:COLLECTION_CONFIG['max_articles_per_platform']], 1):
                    print(f"   Processing article {i}/{source_attempted}: {url[:60]}...")
                    
                    try:
                        # Scrape and process article
                        article = source_scraper.scrape_article(url)
                        
                        if article:
                            processed = processor.process_article(article)
                            quality_report = validator.generate_quality_report(processed)
                            processed['quality_report'] = quality_report
                            
                            if quality_report['overall']['approved']:
                                all_collected.append(processed)
                                source_collected += 1
                                print(f"   ✅ Approved (Grade: {quality_report['overall']['grade']})")
                                
                                # Show extracted indicators
                                indicators = processed.get('indicators', {})
                                total_iocs = sum(len(inds) for inds in indicators.values())
                                if total_iocs > 0:
                                    print(f"   🔍 Extracted {total_iocs} technical indicators")
                                    for ioc_type, ioc_list in indicators.items():
                                        if ioc_list:
                                            print(f"      {ioc_type}: {len(ioc_list)} found")
                            else:
                                print(f"   ❌ Rejected (Grade: {quality_report['overall']['grade']})")
                        else:
                            print(f"   ❌ Failed to scrape")
                            
                    except Exception as e:
                        print(f"   ❌ Error processing article: {str(e)[:100]}")
                        continue
                    
                    # Rate limiting between articles
                    time.sleep(live_scraping_config['rate_limit'])
                
                # Record source statistics
                sources_status[source_name] = {
                    'attempted': source_attempted,
                    'collected': source_collected,
                    'success_rate': source_collected / source_attempted if source_attempted > 0 else 0
                }
                
            except Exception as e:
                print(f"   ❌ Source failed: {str(e)}")
                sources_status[source_name] = {'error': str(e)}
        
        # Save collected data
        if all_collected:
            timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
            output_file = RAW_DATA_DIR / f'threat_intelligence_multi_source_{timestamp}.json'
            
            with open(output_file, 'w', encoding='utf-8') as f:
                json.dump(all_collected, f, indent=2, ensure_ascii=False)
            
            print(f"\n💾 Saved {len(all_collected)} articles from multiple sources")
            
            # Export for LLM training
            export_summary = export_for_llm_training(all_collected)
        else:
            print("\n⚠️  No articles were successfully collected")
        
        # Display summary
        end_time = datetime.now()
        duration = end_time - start_time
        
        print("\n" + "="*70)
        print("📊 COLLECTION SUMMARY")
        print("="*70)
        print(f"⏱️  Duration: {duration}")
        
        # Source-specific statistics
        for source, status in sources_status.items():
            if 'error' in status:
                print(f"{source}: ❌ {status['error']}")
            else:
                success_rate = status['success_rate'] * 100
                print(f"{source}: {status['collected']}/{status['attempted']} ({success_rate:.1f}%)")
        
        if all_collected:
            print(f"\n📄 Total Articles Collected: {len(all_collected)}")
            
            # Calculate overall statistics
            avg_threat_score = sum(article.get('threat_relevance_score', 0) for article in all_collected) / len(all_collected)
            print(f"🎯 Average Threat Score: {avg_threat_score:.2f}")
            
            # Show total indicators
            total_indicators = sum(
                sum(len(inds) for inds in article.get('indicators', {}).values())
                for article in all_collected
            )
            print(f"🔍 Total Technical Indicators: {total_indicators}")
            
            # Show sources distribution
            sources_count = {}
            for article in all_collected:
                source = article.get('source', 'Unknown')
                sources_count[source] = sources_count.get(source, 0) + 1
            print("\n📊 Articles per source:")
            for source, count in sources_count.items():
                print(f"   {source}: {count}")
        
        print("\n🎉 Data collection completed successfully!")
        return all_collected
        
    except Exception as e:
        logger.error(f"Collection failed: {str(e)}")
        print(f"❌ Collection failed: {str(e)}")
        raise

# Execute collection
collected_data = collect_threat_intelligence_data()


2025-07-20 13:37:21,955 - INFO - Collection started at 2025-07-20 13:37:21.955543
2025-07-20 13:37:21,979 - INFO - Attempting to fetch https://www.cisa.gov/news-events/cybersecurity-advisories (attempt 1/6)


🌐 Multi-Platform Threat Intelligence Collection

⚙️ Scraping Configuration:
  Rate Limit: 6.0 seconds
  Max Retries: 5
  Timeout: 45 seconds
  Sources: CISA, Fortinet, Symantec


📡 Collecting from CISA...
   Fetching CISA page 1/30: https://www.cisa.gov/news-events/cybersecurity-advisories


2025-07-20 13:37:22,469 - INFO - Successfully scraped: https://www.cisa.gov/news-events/cybersecurity-advisories


      Found 0 new articles on page 1
   Total CISA articles found: 0
   Found 0 potential articles

📡 Collecting from Fortinet...
   Fetching initial page: https://www.fortinet.com/blog/threat-research


2025-07-20 13:37:29,627 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research (attempt 1/6)
2025-07-20 13:37:29,913 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research


      Found 13 articles on initial page
   Loading page 1/30
      Found 9 new articles on page 1
   Loading page 2/30
      Found 9 new articles on page 2
   Loading page 3/30
      Found 10 new articles on page 3
   Loading page 4/30
      Found 10 new articles on page 4
   Loading page 5/30
      Found 10 new articles on page 5
   Loading page 6/30
      Found 10 new articles on page 6
   Loading page 7/30
      Found 10 new articles on page 7
   Loading page 8/30
      Found 10 new articles on page 8
   Loading page 9/30
      Found 10 new articles on page 9
   Loading page 10/30
      Found 10 new articles on page 10
   Loading page 11/30
      Found 10 new articles on page 11
   Loading page 12/30
      Found 10 new articles on page 12
   Loading page 13/30
      Found 10 new articles on page 13
   Loading page 14/30
      Found 10 new articles on page 14
   Loading page 15/30
      Found 10 new articles on page 15
   Loading page 16/30
      Found 10 new articles on page 16
   L

2025-07-20 13:39:52,023 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research (attempt 1/6)
2025-07-20 13:39:52,176 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research


   ✅ Successfully scraped article
      Title: FortiGuard Labs Threat Research
      Content length: 2577
   ✅ Approved (Grade: A)
   🔍 Extracted 2 technical indicators
      cve: 2 found
   Processing article 2/171: https://www.fortinet.com/blog/threat-research/nailaolocker-r...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/nailaolocker-ransomware-cheese


2025-07-20 13:40:00,988 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/nailaolocker-ransomware-cheese (attempt 1/6)
2025-07-20 13:40:01,151 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/nailaolocker-ransomware-cheese


   ✅ Successfully scraped article
      Title: NailaoLocker Ransomware’s “Cheese”
      Content length: 11392
   ✅ Approved (Grade: A)
   🔍 Extracted 3 technical indicators
      domain: 3 found
   Processing article 3/171: https://www.fortinet.com/blog/threat-research/improving-clou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/improving-cloud-intrusion-detection-and-triage-with-forticnapp


2025-07-20 13:40:08,417 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/improving-cloud-intrusion-detection-and-triage-with-forticnapp (attempt 1/6)
2025-07-20 13:40:09,256 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/improving-cloud-intrusion-detection-and-triage-with-forticnapp


   ✅ Successfully scraped article
      Title: Improving Cloud Intrusion Detection and Triage with FortiCNAPP Composite Alerts
      Content length: 10643
   ✅ Approved (Grade: A)
   🔍 Extracted 4 technical indicators
      ip_address: 1 found
      domain: 2 found
      email: 1 found
   Processing article 4/171: https://www.fortinet.com/blog/threat-research/old-miner-new-...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/old-miner-new-tricks


2025-07-20 13:40:17,343 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/old-miner-new-tricks (attempt 1/6)
2025-07-20 13:40:17,528 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/old-miner-new-tricks


   ✅ Successfully scraped article
      Title: Old Miner, New Tricks
      Content length: 21272
   ✅ Approved (Grade: A)
   🔍 Extracted 46 technical indicators
      domain: 18 found
      hash_md5: 16 found
      cve: 3 found
      url: 9 found
   Processing article 5/171: https://www.fortinet.com/blog/threat-research/fortisandbox-d...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/fortisandbox-detects-dark-101-ransomware-despite-evasion-techniques


2025-07-20 13:40:24,504 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/fortisandbox-detects-dark-101-ransomware-despite-evasion-techniques (attempt 1/6)
2025-07-20 13:40:25,339 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/fortisandbox-detects-dark-101-ransomware-despite-evasion-techniques


   ✅ Successfully scraped article
      Title: How FortiSandbox 5.0 Detects Dark 101 Ransomware Despite Evasion Techniques
      Content length: 7626
   ✅ Approved (Grade: A)
   🔍 Extracted 3 technical indicators
      domain: 2 found
      hash_md5: 1 found
   Processing article 6/171: https://www.fortinet.com/blog/threat-research/catching-smart...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/catching-smarter-mice-with-even-smarter-cats


2025-07-20 13:40:32,908 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/catching-smarter-mice-with-even-smarter-cats (attempt 1/6)
2025-07-20 13:40:33,414 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/catching-smarter-mice-with-even-smarter-cats


   ✅ Successfully scraped article
      Title: Catching Smarter Mice with Even Smarter Cats
      Content length: 5290
   ✅ Approved (Grade: A)
   🔍 Extracted 11 technical indicators
      domain: 5 found
      hash_sha256: 1 found
      url: 5 found
   Processing article 7/171: https://www.fortinet.com/blog/threat-research/norddragonscan...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/norddragonscan-quiet-data-harvester-on-windows


2025-07-20 13:40:40,907 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/norddragonscan-quiet-data-harvester-on-windows (attempt 1/6)
2025-07-20 13:40:41,744 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/norddragonscan-quiet-data-harvester-on-windows


   ✅ Successfully scraped article
      Title: NordDragonScan: Quiet Data-Harvester on Windows
      Content length: 7400
   ✅ Approved (Grade: A)
   🔍 Extracted 10 technical indicators
      domain: 9 found
      hash_sha256: 1 found
   Processing article 8/171: https://www.fortinet.com/blog/threat-research/rondobox-unvei...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/rondobox-unveiled-breaking-down-a-botnet-threat


2025-07-20 13:40:48,169 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/rondobox-unveiled-breaking-down-a-botnet-threat (attempt 1/6)
2025-07-20 13:40:49,018 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/rondobox-unveiled-breaking-down-a-botnet-threat


   ✅ Successfully scraped article
      Title: RondoDox Unveiled: Breaking Down a New Botnet Threat
      Content length: 11308
   ✅ Approved (Grade: A)
   🔍 Extracted 6 technical indicators
      domain: 4 found
      cve: 2 found
   Processing article 9/171: https://www.fortinet.com/blog/threat-research/dcrat-imperson...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/dcrat-impersonating-the-columbian-government


2025-07-20 13:40:55,634 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/dcrat-impersonating-the-columbian-government (attempt 1/6)
2025-07-20 13:40:56,477 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/dcrat-impersonating-the-columbian-government


   ✅ Successfully scraped article
      Title: DCRAT Impersonating the Colombian Government
      Content length: 12519
   ✅ Approved (Grade: A)
   🔍 Extracted 2 technical indicators
      domain: 2 found
   Processing article 10/171: https://www.fortinet.com/blog/threat-research/dissecting-a-m...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/dissecting-a-malicious-havoc-sample


2025-07-20 13:41:03,105 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/dissecting-a-malicious-havoc-sample (attempt 1/6)
2025-07-20 13:41:03,973 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/dissecting-a-malicious-havoc-sample


   ✅ Successfully scraped article
      Title: Dissecting a Malicious Havoc Sample
      Content length: 19970
   ✅ Approved (Grade: A)
   🔍 Extracted 6 technical indicators
      domain: 4 found
      hash_sha256: 2 found
   Processing article 11/171: https://www.fortinet.com/blog/threat-research/threat-group-t...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/threat-group-targets-companies-in-taiwan


2025-07-20 13:41:11,449 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/threat-group-targets-companies-in-taiwan (attempt 1/6)
2025-07-20 13:41:12,498 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/threat-group-targets-companies-in-taiwan


   ✅ Successfully scraped article
      Title: Threat Group Targets Companies in Taiwan
      Content length: 13650
   ✅ Approved (Grade: A)
   🔍 Extracted 36 technical indicators
      domain: 36 found
   Processing article 12/171: https://www.fortinet.com/blog/threat-research/rolandskimmer-...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/rolandskimmer-silent-credit-card-thief-uncovered


2025-07-20 13:41:19,864 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/rolandskimmer-silent-credit-card-thief-uncovered (attempt 1/6)
2025-07-20 13:41:20,707 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/rolandskimmer-silent-credit-card-thief-uncovered


   ✅ Successfully scraped article
      Title: RolandSkimmer: Silent Credit Card Thief Uncovered
      Content length: 12277
   ✅ Approved (Grade: A)
   🔍 Extracted 19 technical indicators
      domain: 19 found
   Processing article 13/171: https://www.fortinet.com/blog/threat-research/how-a-maliciou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/how-a-malicious-excel-file-cve-2017-0199-delivers-the-formbook-payload


2025-07-20 13:41:28,597 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/how-a-malicious-excel-file-cve-2017-0199-delivers-the-formbook-payload (attempt 1/6)
2025-07-20 13:41:29,448 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/how-a-malicious-excel-file-cve-2017-0199-delivers-the-formbook-payload


   ✅ Successfully scraped article
      Title: How a Malicious Excel File (CVE-2017-0199) Delivers the FormBook Payload
      Content length: 8332
   ✅ Approved (Grade: A)
   🔍 Extracted 12 technical indicators
      domain: 6 found
      hash_sha256: 5 found
      cve: 1 found
   Processing article 14/171: https://www.fortinet.com/blog/threat-research/deep-dive-into...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/deep-dive-into-a-dumped-malware-without-a-pe-header


2025-07-20 13:41:36,034 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/deep-dive-into-a-dumped-malware-without-a-pe-header (attempt 1/6)
2025-07-20 13:41:36,886 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/deep-dive-into-a-dumped-malware-without-a-pe-header


   ✅ Successfully scraped article
      Title: Deep Dive into a Dumped Malware without a PE Header
      Content length: 15952
   ✅ Approved (Grade: A)
   🔍 Extracted 15 technical indicators
      domain: 14 found
      hash_sha256: 1 found
   Processing article 15/171: https://www.fortinet.com/blog/threat-research/infostealer-ma...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/infostealer-malware-formbook-spread-via-phishing-campaign


2025-07-20 13:41:44,943 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/infostealer-malware-formbook-spread-via-phishing-campaign (attempt 1/6)
2025-07-20 13:41:45,974 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/infostealer-malware-formbook-spread-via-phishing-campaign


   ✅ Successfully scraped article
      Title: Infostealer Malware FormBook Spread via Phishing Campaign – Part II
      Content length: 30116
   ✅ Approved (Grade: A)
   🔍 Extracted 42 technical indicators
      domain: 39 found
      cve: 1 found
      url: 2 found
   Processing article 16/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-vanhelsing


2025-07-20 13:41:53,610 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-vanhelsing (attempt 1/6)
2025-07-20 13:41:54,461 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-vanhelsing


   ✅ Successfully scraped article
      Title: Ransomware Roundup – VanHelsing
      Content length: 8640
   ✅ Approved (Grade: A)
   🔍 Extracted 10 technical indicators
      domain: 8 found
      hash_sha256: 2 found
   Processing article 17/171: https://www.fortinet.com/blog/threat-research/horabot-unleas...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/horabot-unleashed-a-stealthy-phishing-threat


2025-07-20 13:42:02,127 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/horabot-unleashed-a-stealthy-phishing-threat (attempt 1/6)
2025-07-20 13:42:02,987 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/horabot-unleashed-a-stealthy-phishing-threat


   ✅ Successfully scraped article
      Title: Horabot Unleashed: A Stealthy Phishing Threat
      Content length: 11862
   ✅ Approved (Grade: A)
   🔍 Extracted 7 technical indicators
      domain: 7 found


2025-07-20 13:42:09,185 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/multilayered-email-attack-how-a-pdf-invoice-and-geofencing-led-to-rat-malware (attempt 1/6)


   Processing article 18/171: https://www.fortinet.com/blog/threat-research/multilayered-e...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/multilayered-email-attack-how-a-pdf-invoice-and-geofencing-led-to-rat-malware


2025-07-20 13:42:10,026 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/multilayered-email-attack-how-a-pdf-invoice-and-geofencing-led-to-rat-malware


   ✅ Successfully scraped article
      Title: Multilayered Email Attack: How a PDF Invoice and Geo-Fencing Led to RAT Malware
      Content length: 11293
   ✅ Approved (Grade: A)
   🔍 Extracted 2 technical indicators
      domain: 2 found
   Processing article 19/171: https://www.fortinet.com/blog/threat-research/fortiguard-inc...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/fortiguard-incident-response-team-detects-intrusion-into-middle-east-critical-national-infrastructure


2025-07-20 13:42:17,178 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/fortiguard-incident-response-team-detects-intrusion-into-middle-east-critical-national-infrastructure (attempt 1/6)
2025-07-20 13:42:17,680 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/fortiguard-incident-response-team-detects-intrusion-into-middle-east-critical-national-infrastructure


   ✅ Successfully scraped article
      Title: FortiGuard Incident Response Team Detects Intrusion into Middle East Critical National Infrastructur
      Content length: 4160
   ✅ Approved (Grade: C)
   Processing article 20/171: https://www.fortinet.com/blog/threat-research/key-takeaways-...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/key-takeaways-from-the-2025-global-threat-landscape-report


2025-07-20 13:42:25,771 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/key-takeaways-from-the-2025-global-threat-landscape-report (attempt 1/6)
2025-07-20 13:42:26,605 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/key-takeaways-from-the-2025-global-threat-landscape-report


   ✅ Successfully scraped article
      Title: Key Takeaways from the 2025 Global Threat Landscape Report
      Content length: 6751
   ✅ Approved (Grade: A)
   🔍 Extracted 3 technical indicators
      cve: 3 found
   Processing article 21/171: https://www.fortinet.com/blog/threat-research/ingressnightma...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ingressnightmare-understanding-cve-2025-1974-in-kubernetes-ingress-nginx


2025-07-20 13:42:34,282 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ingressnightmare-understanding-cve-2025-1974-in-kubernetes-ingress-nginx (attempt 1/6)
2025-07-20 13:42:34,619 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ingressnightmare-understanding-cve-2025-1974-in-kubernetes-ingress-nginx


   ✅ Successfully scraped article
      Title: IngressNightmare: Understanding CVE‑2025‑1974 in Kubernetes Ingress-NGINX
      Content length: 4628
   ✅ Approved (Grade: A)
   🔍 Extracted 1 technical indicators
      cve: 1 found
   Processing article 22/171: https://www.fortinet.com/blog/threat-research/infostealer-ma...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/infostealer-malware-formbook-spread-via-phishing-campaign-part-i


2025-07-20 13:42:43,080 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/infostealer-malware-formbook-spread-via-phishing-campaign-part-i (attempt 1/6)
2025-07-20 13:42:43,920 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/infostealer-malware-formbook-spread-via-phishing-campaign-part-i


   ✅ Successfully scraped article
      Title: Infostealer Malware FormBook Spread via Phishing Campaign – Part I
      Content length: 11879
   ✅ Approved (Grade: A)
   🔍 Extracted 15 technical indicators
      domain: 10 found
      hash_sha256: 4 found
      cve: 1 found
   Processing article 23/171: https://www.fortinet.com/blog/threat-research/new-rust-botne...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/new-rust-botnet-rustobot-is-routed-via-routers


2025-07-20 13:42:51,545 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/new-rust-botnet-rustobot-is-routed-via-routers (attempt 1/6)
2025-07-20 13:42:52,379 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/new-rust-botnet-rustobot-is-routed-via-routers


   ✅ Successfully scraped article
      Title: New Rust Botnet "RustoBot" is Routed via Routers
      Content length: 8373
   ✅ Approved (Grade: A)
   🔍 Extracted 15 technical indicators
      ip_address: 1 found
      domain: 8 found
      cve: 6 found
   Processing article 24/171: https://www.fortinet.com/blog/threat-research/malicious-npm-...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/malicious-npm-packages-targeting-paypal-users


2025-07-20 13:42:59,854 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/malicious-npm-packages-targeting-paypal-users (attempt 1/6)
2025-07-20 13:43:00,698 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/malicious-npm-packages-targeting-paypal-users


   ✅ Successfully scraped article
      Title: Malicious NPM Packages Targeting PayPal Users
      Content length: 7831
   ✅ Approved (Grade: A)
   🔍 Extracted 23 technical indicators
      hash_sha256: 23 found
   Processing article 25/171: https://www.fortinet.com/blog/threat-research/real-time-anti...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/real-time-anti-phishing-essential-defense-against-evolving-cyber-threats


2025-07-20 13:43:07,518 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/real-time-anti-phishing-essential-defense-against-evolving-cyber-threats (attempt 1/6)
2025-07-20 13:43:08,361 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/real-time-anti-phishing-essential-defense-against-evolving-cyber-threats


   ✅ Successfully scraped article
      Title: Real-Time Anti-Phishing: Essential Defense Against Evolving Cyber Threats
      Content length: 5744
   ✅ Approved (Grade: A)
   🔍 Extracted 9 technical indicators
      domain: 8 found
      url: 1 found
   Processing article 26/171: https://www.fortinet.com/blog/threat-research/fortinet-ident...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/fortinet-identifies-malicious-packages-in-the-wild-insights-and-trends


2025-07-20 13:43:15,484 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/fortinet-identifies-malicious-packages-in-the-wild-insights-and-trends (attempt 1/6)
2025-07-20 13:43:16,489 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/fortinet-identifies-malicious-packages-in-the-wild-insights-and-trends


   ✅ Successfully scraped article
      Title: Fortinet Identifies Malicious Packages in the Wild: Insights and Trends from November 2024 Onward
      Content length: 11966
   ✅ Approved (Grade: A)
   🔍 Extracted 15 technical indicators
      domain: 4 found
      hash_sha256: 11 found


2025-07-20 13:43:22,603 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/havoc-sharepoint-with-microsoft-graph-api-turns-into-fud-c2 (attempt 1/6)


   Processing article 27/171: https://www.fortinet.com/blog/threat-research/havoc-sharepoi...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/havoc-sharepoint-with-microsoft-graph-api-turns-into-fud-c2


2025-07-20 13:43:23,445 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/havoc-sharepoint-with-microsoft-graph-api-turns-into-fud-c2


   ✅ Successfully scraped article
      Title: Havoc: SharePoint with Microsoft Graph API turns into FUD C2
      Content length: 8873
   ✅ Approved (Grade: A)
   🔍 Extracted 5 technical indicators
      domain: 5 found


2025-07-20 13:43:29,639 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/winos-spreads-via-impersonation-of-official-email-to-target-users-in-taiwan (attempt 1/6)


   Processing article 28/171: https://www.fortinet.com/blog/threat-research/winos-spreads-...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/winos-spreads-via-impersonation-of-official-email-to-target-users-in-taiwan


2025-07-20 13:43:30,485 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/winos-spreads-via-impersonation-of-official-email-to-target-users-in-taiwan


   ✅ Successfully scraped article
      Title: Winos 4.0 Spreads via Impersonation of Official Email to Target Users in Taiwan
      Content length: 12928
   ✅ Approved (Grade: A)
   🔍 Extracted 10 technical indicators
      domain: 9 found
      hash_sha256: 1 found
   Processing article 29/171: https://www.fortinet.com/blog/threat-research/fortisandbox-d...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/fortisandbox-detects-evolving-snake-keylogger-variant


2025-07-20 13:43:37,909 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/fortisandbox-detects-evolving-snake-keylogger-variant (attempt 1/6)
2025-07-20 13:43:38,744 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/fortisandbox-detects-evolving-snake-keylogger-variant


   ✅ Successfully scraped article
      Title: FortiSandbox 5.0 Detects Evolving Snake Keylogger Variant
      Content length: 10355
   ✅ Approved (Grade: A)
   🔍 Extracted 5 technical indicators
      domain: 2 found
      hash_md5: 2 found
      url: 1 found


2025-07-20 13:43:44,947 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-lynx (attempt 1/6)


   Processing article 30/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-lynx


2025-07-20 13:43:45,785 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-lynx


   ✅ Successfully scraped article
      Title: Ransomware Roundup – Lynx
      Content length: 10707
   ✅ Approved (Grade: A)
   🔍 Extracted 17 technical indicators
      domain: 1 found
      hash_sha256: 16 found
   Processing article 31/171: https://www.fortinet.com/blog/threat-research/analyzing-elf-...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/analyzing-elf-sshdinjector-with-a-human-and-artificial-analyst


2025-07-20 13:43:53,101 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/analyzing-elf-sshdinjector-with-a-human-and-artificial-analyst (attempt 1/6)
2025-07-20 13:43:54,015 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/analyzing-elf-sshdinjector-with-a-human-and-artificial-analyst


   ✅ Successfully scraped article
      Title: Analyzing ELF/Sshdinjector.A!tr with a Human and Artificial Analyst
      Content length: 7691
   ✅ Approved (Grade: A)
   🔍 Extracted 6 technical indicators
      domain: 3 found
      hash_sha256: 3 found
   Processing article 32/171: https://www.fortinet.com/blog/threat-research/coyote-banking...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/coyote-banking-trojan-a-stealthy-attack-via-lnk-files


2025-07-20 13:44:01,595 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/coyote-banking-trojan-a-stealthy-attack-via-lnk-files (attempt 1/6)
2025-07-20 13:44:02,442 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/coyote-banking-trojan-a-stealthy-attack-via-lnk-files


   ✅ Successfully scraped article
      Title: Coyote Banking Trojan: A Stealthy Attack via LNK Files
      Content length: 9849
   ✅ Approved (Grade: A)
   🔍 Extracted 16 technical indicators
      domain: 16 found
   Processing article 33/171: https://www.fortinet.com/blog/threat-research/deep-dive-into...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/deep-dive-into-a-linux-rootkit-malware


2025-07-20 13:44:08,896 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/deep-dive-into-a-linux-rootkit-malware (attempt 1/6)
2025-07-20 13:44:09,749 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/deep-dive-into-a-linux-rootkit-malware


   ✅ Successfully scraped article
      Title: Deep Dive Into a Linux Rootkit Malware
      Content length: 16124
   ✅ Approved (Grade: A)
   🔍 Extracted 10 technical indicators
      domain: 7 found
      hash_sha256: 3 found
   Processing article 34/171: https://www.fortinet.com/blog/threat-research/phish-free-pay...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/phish-free-paypal-phishing


2025-07-20 13:44:17,359 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/phish-free-paypal-phishing (attempt 1/6)
2025-07-20 13:44:17,899 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/phish-free-paypal-phishing


   ✅ Successfully scraped article
      Title: Phish-free PayPal Phishing
      Content length: 3131
   ✅ Approved (Grade: A)
   🔍 Extracted 2 technical indicators
      domain: 2 found
   Processing article 35/171: https://www.fortinet.com/blog/threat-research/catching-ec2-g...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/catching-ec2-grouper-no-indicators-required


2025-07-20 13:44:25,995 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/catching-ec2-grouper-no-indicators-required (attempt 1/6)
2025-07-20 13:44:26,492 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/catching-ec2-grouper-no-indicators-required


   ✅ Successfully scraped article
      Title: Catching "EC2 Grouper"- no indicators required!
      Content length: 7155
   ✅ Approved (Grade: A)
   🔍 Extracted 2 technical indicators
      ip_address: 2 found
   Processing article 36/171: https://www.fortinet.com/blog/threat-research/botnets-contin...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/botnets-continue-to-target-aging-d-link-vulnerabilities


2025-07-20 13:44:33,671 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/botnets-continue-to-target-aging-d-link-vulnerabilities (attempt 1/6)
2025-07-20 13:44:34,515 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/botnets-continue-to-target-aging-d-link-vulnerabilities


   ✅ Successfully scraped article
      Title: Botnets Continue to Target Aging D-Link Vulnerabilities
      Content length: 11369
   ✅ Approved (Grade: A)
   🔍 Extracted 8 technical indicators
      domain: 4 found
      cve: 4 found
   Processing article 37/171: https://www.fortinet.com/blog/threat-research/analyzing-mali...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/analyzing-malicious-intent-in-python-code


2025-07-20 13:44:41,078 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/analyzing-malicious-intent-in-python-code (attempt 1/6)
2025-07-20 13:44:41,410 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/analyzing-malicious-intent-in-python-code


   ✅ Successfully scraped article
      Title: Analyzing Malicious Intent in Python Code: A Case Study
      Content length: 11657
   ✅ Approved (Grade: A)
   🔍 Extracted 10 technical indicators
      domain: 7 found
      hash_sha256: 3 found
   Processing article 38/171: https://www.fortinet.com/blog/threat-research/fortinet-contr...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/fortinet-contributes-to-major-cybercrime-operation-arrests


2025-07-20 13:44:48,554 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/fortinet-contributes-to-major-cybercrime-operation-arrests (attempt 1/6)
2025-07-20 13:44:49,388 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/fortinet-contributes-to-major-cybercrime-operation-arrests


   ✅ Successfully scraped article
      Title: Fortinet Contributes to Major Cybercrime Operation Arrests
      Content length: 5466
   ✅ Approved (Grade: C)
   Processing article 39/171: https://www.fortinet.com/blog/threat-research/sophisticated-...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/sophisticated-attack-targets-taiwan-with-smokeloader


2025-07-20 13:44:56,844 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/sophisticated-attack-targets-taiwan-with-smokeloader (attempt 1/6)
2025-07-20 13:44:57,686 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/sophisticated-attack-targets-taiwan-with-smokeloader


   ✅ Successfully scraped article
      Title: SmokeLoader Attack Targets Companies in Taiwan
      Content length: 13307
   ✅ Approved (Grade: A)
   🔍 Extracted 27 technical indicators
      domain: 25 found
      hash_sha256: 1 found
      cve: 1 found
   Processing article 40/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-interlock


2025-07-20 13:45:05,267 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-interlock (attempt 1/6)
2025-07-20 13:45:06,113 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-interlock


   ✅ Successfully scraped article
      Title: Ransomware Roundup - Interlock
      Content length: 9126
   ✅ Approved (Grade: A)
   🔍 Extracted 7 technical indicators
      hash_sha256: 7 found
   Processing article 41/171: https://www.fortinet.com/blog/threat-research/advanced-cyber...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/advanced-cyberthreats-targeting-holiday-shoppers


2025-07-20 13:45:13,595 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/advanced-cyberthreats-targeting-holiday-shoppers (attempt 1/6)
2025-07-20 13:45:14,429 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/advanced-cyberthreats-targeting-holiday-shoppers


   ✅ Successfully scraped article
      Title: Advanced Cyberthreats Targeting Holiday Shoppers
      Content length: 3548
   ✅ Approved (Grade: C)
   Processing article 42/171: https://www.fortinet.com/blog/threat-research/threat-predict...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/threat-predictions-for-2025-get-ready-for-bigger-bolder-attacks


2025-07-20 13:45:21,961 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/threat-predictions-for-2025-get-ready-for-bigger-bolder-attacks (attempt 1/6)
2025-07-20 13:45:22,789 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/threat-predictions-for-2025-get-ready-for-bigger-bolder-attacks


   ✅ Successfully scraped article
      Title: Threat Predictions for 2025: Get Ready for Bigger, Bolder Attacks
      Content length: 2808
   ✅ Approved (Grade: C)
   Processing article 43/171: https://www.fortinet.com/blog/threat-research/new-campaign-u...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/new-campaign-uses-remcos-rat-to-exploit-victims


2025-07-20 13:45:30,021 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/new-campaign-uses-remcos-rat-to-exploit-victims (attempt 1/6)
2025-07-20 13:45:30,859 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/new-campaign-uses-remcos-rat-to-exploit-victims


   ✅ Successfully scraped article
      Title: New Campaign Uses Remcos RAT to Exploit Victims
      Content length: 17880
   ✅ Approved (Grade: A)
   🔍 Extracted 19 technical indicators
      domain: 12 found
      hash_sha256: 6 found
      cve: 1 found
   Processing article 44/171: https://www.fortinet.com/blog/threat-research/threat-campaig...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/threat-campaign-spreads-winos4-through-game-application


2025-07-20 13:45:38,059 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/threat-campaign-spreads-winos4-through-game-application (attempt 1/6)
2025-07-20 13:45:38,898 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/threat-campaign-spreads-winos4-through-game-application


   ✅ Successfully scraped article
      Title: Threat Campaign Spreads Winos4.0 Through Game Application
      Content length: 9536
   ✅ Approved (Grade: A)
   🔍 Extracted 28 technical indicators
      domain: 16 found
      hash_md5: 1 found
      hash_sha256: 11 found
   Processing article 45/171: https://www.fortinet.com/blog/threat-research/burning-zero-d...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/burning-zero-days-suspected-nation-state-adversary-targets-ivanti-csa


2025-07-20 13:45:46,390 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/burning-zero-days-suspected-nation-state-adversary-targets-ivanti-csa (attempt 1/6)
2025-07-20 13:45:47,418 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/burning-zero-days-suspected-nation-state-adversary-targets-ivanti-csa


   ✅ Successfully scraped article
      Title: Burning Zero Days: Suspected Nation-State Adversary Targets Ivanti CSA
      Content length: 24270
   ✅ Approved (Grade: A)
   🔍 Extracted 70 technical indicators
      ip_address: 1 found
      domain: 55 found
      hash_md5: 1 found
      hash_sha1: 2 found
      hash_sha256: 3 found
      cve: 3 found
      url: 5 found
   Processing article 46/171: https://www.fortinet.com/blog/threat-research/threat-actors-...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/threat-actors-exploit-geoserver-vulnerability-cve-2024-36401


2025-07-20 13:45:55,151 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/threat-actors-exploit-geoserver-vulnerability-cve-2024-36401 (attempt 1/6)
2025-07-20 13:45:55,990 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/threat-actors-exploit-geoserver-vulnerability-cve-2024-36401


   ✅ Successfully scraped article
      Title: Threat Actors Exploit GeoServer Vulnerability CVE-2024-36401
      Content length: 19928
   ✅ Approved (Grade: A)
   🔍 Extracted 20 technical indicators
      domain: 18 found
      cve: 2 found
   Processing article 47/171: https://www.fortinet.com/blog/threat-research/emansrepo-stea...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/emansrepo-stealer-multi-vector-attack-chains


2025-07-20 13:46:02,686 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/emansrepo-stealer-multi-vector-attack-chains (attempt 1/6)
2025-07-20 13:46:03,540 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/emansrepo-stealer-multi-vector-attack-chains


   ✅ Successfully scraped article
      Title: Emansrepo Stealer: Multi-Vector Attack Chains
      Content length: 10874
   ✅ Approved (Grade: A)
   🔍 Extracted 7 technical indicators
      domain: 4 found
      hash_sha256: 2 found
      url: 1 found
   Processing article 48/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-underground


2025-07-20 13:46:09,804 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-underground (attempt 1/6)
2025-07-20 13:46:10,646 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-underground


   ✅ Successfully scraped article
      Title: Ransomware Roundup - Underground
      Content length: 8727
   ✅ Approved (Grade: A)
   🔍 Extracted 8 technical indicators
      domain: 1 found
      hash_sha256: 6 found
      cve: 1 found
   Processing article 49/171: https://www.fortinet.com/blog/threat-research/deep-analysis-...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/deep-analysis-of-snake-keylogger-new-variant


2025-07-20 13:46:17,783 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/deep-analysis-of-snake-keylogger-new-variant (attempt 1/6)
2025-07-20 13:46:18,287 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/deep-analysis-of-snake-keylogger-new-variant


   ✅ Successfully scraped article
      Title: Deep Analysis of Snake Keylogger’s New Variant
      Content length: 14283
   ✅ Approved (Grade: A)
   🔍 Extracted 19 technical indicators
      domain: 14 found
      hash_sha256: 4 found
      cve: 1 found
   Processing article 50/171: https://www.fortinet.com/blog/threat-research/valleyrat-camp...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/valleyrat-campaign-targeting-chinese-speakers


2025-07-20 13:46:24,899 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/valleyrat-campaign-targeting-chinese-speakers (attempt 1/6)
2025-07-20 13:46:25,744 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/valleyrat-campaign-targeting-chinese-speakers


   ✅ Successfully scraped article
      Title: A Deep Dive into a New ValleyRAT Campaign Targeting Chinese Speakers
      Content length: 19255
   ✅ Approved (Grade: A)
   🔍 Extracted 22 technical indicators
      ip_address: 1 found
      domain: 19 found
      url: 2 found
   Processing article 51/171: https://www.fortinet.com/blog/threat-research/preparation-is...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/preparation-is-not-optional-10-incident-response-readiness-considerations


2025-07-20 13:46:33,619 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/preparation-is-not-optional-10-incident-response-readiness-considerations (attempt 1/6)
2025-07-20 13:46:34,594 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/preparation-is-not-optional-10-incident-response-readiness-considerations


   ✅ Successfully scraped article
      Title: Preparation Is Not Optional: 10 Incident Response Readiness Considerations for Any Organization
      Content length: 16391
   ✅ Approved (Grade: C)


2025-07-20 13:46:40,732 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/purehvnc-deployed-via-python-multi-stage-loader (attempt 1/6)


   Processing article 52/171: https://www.fortinet.com/blog/threat-research/purehvnc-deplo...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/purehvnc-deployed-via-python-multi-stage-loader


2025-07-20 13:46:41,569 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/purehvnc-deployed-via-python-multi-stage-loader


   ✅ Successfully scraped article
      Title: PureHVNC Deployed via Python Multi-stage Loader
      Content length: 15152
   ✅ Approved (Grade: A)
   🔍 Extracted 25 technical indicators
      domain: 3 found
      hash_sha256: 22 found
   Processing article 53/171: https://www.fortinet.com/blog/threat-research/malicious-pack...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/malicious-packages-hidden-in-pypl


2025-07-20 13:46:49,238 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/malicious-packages-hidden-in-pypl (attempt 1/6)
2025-07-20 13:46:50,083 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/malicious-packages-hidden-in-pypl


   ✅ Successfully scraped article
      Title: Malicious Packages Hidden in PyPI
      Content length: 7673
   ✅ Approved (Grade: A)
   🔍 Extracted 8 technical indicators
      domain: 4 found
      hash_sha256: 4 found
   Processing article 54/171: https://www.fortinet.com/blog/threat-research/phishing-campa...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/phishing-campaign-targeting-mobile-users-in-india-using-india-post-lures


2025-07-20 13:46:57,026 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/phishing-campaign-targeting-mobile-users-in-india-using-india-post-lures (attempt 1/6)
2025-07-20 13:46:58,073 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/phishing-campaign-targeting-mobile-users-in-india-using-india-post-lures


   ✅ Successfully scraped article
      Title: Phishing Campaign Targeting Mobile Users in India Using India Post Lures
      Content length: 17550
   ✅ Approved (Grade: C)
   Processing article 55/171: https://www.fortinet.com/blog/threat-research/dark-web-shows...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/dark-web-shows-cybercriminals-ready-for-olympics


2025-07-20 13:47:04,943 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/dark-web-shows-cybercriminals-ready-for-olympics (attempt 1/6)
2025-07-20 13:47:05,428 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/dark-web-shows-cybercriminals-ready-for-olympics


   ✅ Successfully scraped article
      Title: Dark Web Shows Cybercriminals Ready for Olympics. Are You?
      Content length: 11943
   ✅ Approved (Grade: C)
   Processing article 56/171: https://www.fortinet.com/blog/threat-research/merkspy-exploi...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/merkspy-exploiting-cve-2021-40444-to-infiltrate-systems


2025-07-20 13:47:13,058 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/merkspy-exploiting-cve-2021-40444-to-infiltrate-systems (attempt 1/6)
2025-07-20 13:47:13,893 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/merkspy-exploiting-cve-2021-40444-to-infiltrate-systems


   ✅ Successfully scraped article
      Title: MerkSpy: Exploiting CVE-2021-40444 to Infiltrate Systems
      Content length: 8108
   ✅ Approved (Grade: A)
   🔍 Extracted 10 technical indicators
      domain: 3 found
      hash_sha256: 6 found
      cve: 1 found
   Processing article 57/171: https://www.fortinet.com/blog/threat-research/growing-threat...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/growing-threat-of-malware-concealed-behind-cloud-services


2025-07-20 13:47:20,962 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/growing-threat-of-malware-concealed-behind-cloud-services (attempt 1/6)
2025-07-20 13:47:21,812 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/growing-threat-of-malware-concealed-behind-cloud-services


   ✅ Successfully scraped article
      Title: The Growing Threat of Malware Concealed Behind Cloud Services
      Content length: 15073
   ✅ Approved (Grade: A)
   🔍 Extracted 64 technical indicators
      ip_address: 1 found
      domain: 4 found
      hash_sha256: 53 found
      cve: 6 found
   Processing article 58/171: https://www.fortinet.com/blog/threat-research/fickle-stealer...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/fickle-stealer-distributed-via-multiple-attack-chain


2025-07-20 13:47:28,953 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/fickle-stealer-distributed-via-multiple-attack-chain (attempt 1/6)
2025-07-20 13:47:29,294 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/fickle-stealer-distributed-via-multiple-attack-chain


   ✅ Successfully scraped article
      Title: Fickle Stealer Distributed via Multiple Attack Chain
      Content length: 18193
   ✅ Approved (Grade: A)
   🔍 Extracted 57 technical indicators
      domain: 8 found
      hash_sha256: 49 found
   Processing article 59/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-shinra-and-limpopo-ransomware


2025-07-20 13:47:37,492 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-shinra-and-limpopo-ransomware (attempt 1/6)
2025-07-20 13:47:38,509 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-shinra-and-limpopo-ransomware


   ✅ Successfully scraped article
      Title: Ransomware Roundup – Shinra and Limpopo Ransomware
      Content length: 13774
   ✅ Approved (Grade: A)
   🔍 Extracted 15 technical indicators
      domain: 7 found
      hash_sha256: 6 found
      cve: 2 found
   Processing article 60/171: https://www.fortinet.com/blog/threat-research/new-agent-tesl...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/new-agent-tesla-campaign-targeting-spanish-speaking-people


2025-07-20 13:47:45,954 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/new-agent-tesla-campaign-targeting-spanish-speaking-people (attempt 1/6)
2025-07-20 13:47:46,800 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/new-agent-tesla-campaign-targeting-spanish-speaking-people


   ✅ Successfully scraped article
      Title: New Agent Tesla Campaign Targeting Spanish-Speaking People
      Content length: 14975
   ✅ Approved (Grade: A)
   🔍 Extracted 19 technical indicators
      domain: 13 found
      hash_sha256: 4 found
      cve: 2 found
   Processing article 61/171: https://www.fortinet.com/blog/threat-research/menace-unleash...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/menace-unleashed-excel-file-deploys-cobalt-strike-at-ukraine


2025-07-20 13:47:54,669 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/menace-unleashed-excel-file-deploys-cobalt-strike-at-ukraine (attempt 1/6)
2025-07-20 13:47:55,508 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/menace-unleashed-excel-file-deploys-cobalt-strike-at-ukraine


   ✅ Successfully scraped article
      Title: Menace Unleashed: Excel File Deploys Cobalt Strike at Ukraine
      Content length: 8426
   ✅ Approved (Grade: A)
   🔍 Extracted 14 technical indicators
      domain: 6 found
      hash_sha256: 8 found
   Processing article 62/171: https://www.fortinet.com/blog/threat-research/zeus-stealer-d...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/zeus-stealer-distributed-via-crafted-minecraft-source-pack


2025-07-20 13:48:02,655 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/zeus-stealer-distributed-via-crafted-minecraft-source-pack (attempt 1/6)
2025-07-20 13:48:03,664 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/zeus-stealer-distributed-via-crafted-minecraft-source-pack


   ✅ Successfully scraped article
      Title: zEus Stealer Distributed via Crafted Minecraft Source Pack
      Content length: 14903
   ✅ Approved (Grade: A)
   🔍 Extracted 38 technical indicators
      domain: 15 found
      hash_sha256: 23 found
   Processing article 63/171: https://www.fortinet.com/blog/threat-research/key-findings-2...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/key-findings-2h-2023-fortiguard-labs-threat-report


2025-07-20 13:48:10,738 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/key-findings-2h-2023-fortiguard-labs-threat-report (attempt 1/6)
2025-07-20 13:48:11,574 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/key-findings-2h-2023-fortiguard-labs-threat-report


   ✅ Successfully scraped article
      Title: Key Findings from the 2H 2023 FortiGuard Labs Threat Report
      Content length: 7924
   ✅ Approved (Grade: C)
   Processing article 64/171: https://www.fortinet.com/blog/threat-research/new-goldoon-bo...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/new-goldoon-botnet-targeting-d-link-devices


2025-07-20 13:48:18,940 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/new-goldoon-botnet-targeting-d-link-devices (attempt 1/6)
2025-07-20 13:48:19,295 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/new-goldoon-botnet-targeting-d-link-devices


   ✅ Successfully scraped article
      Title: New “Goldoon” Botnet Targeting D-Link Devices
      Content length: 9080
   ✅ Approved (Grade: A)
   🔍 Extracted 6 technical indicators
      ip_address: 2 found
      domain: 3 found
      cve: 1 found
   Processing article 65/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-keganohitobito-and-donex


2025-07-20 13:48:26,673 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-keganohitobito-and-donex (attempt 1/6)
2025-07-20 13:48:27,512 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-keganohitobito-and-donex


   ✅ Successfully scraped article
      Title: Ransomware Roundup - KageNoHitobito and DoNex
      Content length: 11301
   ✅ Approved (Grade: A)
   🔍 Extracted 19 technical indicators
      domain: 9 found
      hash_sha256: 10 found
   Processing article 66/171: https://www.fortinet.com/blog/threat-research/unraveling-cyb...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/unraveling-cyber-threats-insights-from-code-analysis


2025-07-20 13:48:34,900 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/unraveling-cyber-threats-insights-from-code-analysis (attempt 1/6)
2025-07-20 13:48:35,905 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/unraveling-cyber-threats-insights-from-code-analysis


   ✅ Successfully scraped article
      Title: Unraveling Cyber Threats: Insights from Code Analysis
      Content length: 9294
   ✅ Approved (Grade: A)
   🔍 Extracted 15 technical indicators
      domain: 2 found
      hash_sha256: 13 found
   Processing article 67/171: https://www.fortinet.com/blog/threat-research/botnets-contin...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/botnets-continue-exploiting-cve-2023-1389-for-wide-scale-spread


2025-07-20 13:48:42,901 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/botnets-continue-exploiting-cve-2023-1389-for-wide-scale-spread (attempt 1/6)
2025-07-20 13:48:43,246 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/botnets-continue-exploiting-cve-2023-1389-for-wide-scale-spread


   ✅ Successfully scraped article
      Title: Botnets Continue Exploiting CVE-2023-1389 for Wide-Scale Spread
      Content length: 13147
   ✅ Approved (Grade: A)
   🔍 Extracted 6 technical indicators
      domain: 5 found
      cve: 1 found
   Processing article 68/171: https://www.fortinet.com/blog/threat-research/scrubcrypt-dep...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/scrubcrypt-deploys-venomrat-with-arsenal-of-plugins


2025-07-20 13:48:50,160 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/scrubcrypt-deploys-venomrat-with-arsenal-of-plugins (attempt 1/6)
2025-07-20 13:48:51,024 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/scrubcrypt-deploys-venomrat-with-arsenal-of-plugins


   ✅ Successfully scraped article
      Title: ScrubCrypt Deploys VenomRAT with an Arsenal of Plugins
      Content length: 16808
   ✅ Approved (Grade: A)
   🔍 Extracted 15 technical indicators
      domain: 15 found
   Processing article 69/171: https://www.fortinet.com/blog/threat-research/byakugan-malwa...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/byakugan-malware-behind-a-phishing-attack


2025-07-20 13:48:59,080 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/byakugan-malware-behind-a-phishing-attack (attempt 1/6)
2025-07-20 13:49:00,033 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/byakugan-malware-behind-a-phishing-attack


   ✅ Successfully scraped article
      Title: Byakugan – The Malware Behind a Phishing Attack
      Content length: 4785
   ✅ Approved (Grade: A)
   🔍 Extracted 6 technical indicators
      domain: 5 found
      hash_sha256: 1 found


2025-07-20 13:49:06,163 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-ra-world (attempt 1/6)


   Processing article 70/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-ra-world


2025-07-20 13:49:06,838 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-ra-world


   ✅ Successfully scraped article
      Title: Ransomware Roundup – RA World
      Content length: 10179
   ✅ Approved (Grade: A)
   🔍 Extracted 33 technical indicators
      domain: 29 found
      hash_sha256: 4 found
   Processing article 71/171: https://www.fortinet.com/blog/threat-research/vcurms-a-simpl...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/vcurms-a-simple-and-functional-weapon


2025-07-20 13:49:14,312 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/vcurms-a-simple-and-functional-weapon (attempt 1/6)
2025-07-20 13:49:14,664 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/vcurms-a-simple-and-functional-weapon


   ✅ Successfully scraped article
      Title: VCURMS: A Simple and Functional Weapon
      Content length: 9082
   ✅ Approved (Grade: A)
   🔍 Extracted 6 technical indicators
      domain: 6 found
   Processing article 72/171: https://www.fortinet.com/blog/threat-research/banking-trojan...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/banking-trojan-chavecloak-targets-brazil


2025-07-20 13:49:22,309 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/banking-trojan-chavecloak-targets-brazil (attempt 1/6)
2025-07-20 13:49:23,156 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/banking-trojan-chavecloak-targets-brazil


   ✅ Successfully scraped article
      Title: New Banking Trojan “CHAVECLOAK” Targets Brazil
      Content length: 9139
   ✅ Approved (Grade: A)
   🔍 Extracted 10 technical indicators
      ip_address: 1 found
      domain: 9 found


2025-07-20 13:49:29,407 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/fortiguard-labs-outbreak-alerts-report-2023 (attempt 1/6)


   Processing article 73/171: https://www.fortinet.com/blog/threat-research/fortiguard-lab...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/fortiguard-labs-outbreak-alerts-report-2023


2025-07-20 13:49:30,363 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/fortiguard-labs-outbreak-alerts-report-2023


   ✅ Successfully scraped article
      Title: FortiGuard Labs Outbreak Alerts Annual Report 2023: A Glimpse into the Evolving Threat Landscape
      Content length: 1266
   ✅ Approved (Grade: C)
   Processing article 74/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-abyss-locker


2025-07-20 13:49:36,690 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-abyss-locker (attempt 1/6)
2025-07-20 13:49:37,653 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-abyss-locker


   ✅ Successfully scraped article
      Title: Ransomware Roundup – Abyss Locker
      Content length: 16374
   ✅ Approved (Grade: A)
   🔍 Extracted 139 technical indicators
      domain: 124 found
      hash_sha256: 15 found
   Processing article 75/171: https://www.fortinet.com/blog/threat-research/android-spynot...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/android-spynote-moves-to-crypto-currencies


2025-07-20 13:49:44,778 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/android-spynote-moves-to-crypto-currencies (attempt 1/6)
2025-07-20 13:50:13,288 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/android-spynote-moves-to-crypto-currencies


   ✅ Successfully scraped article
      Title: Android/SpyNote Moves to Crypto Currencies
      Content length: 5241
   ✅ Approved (Grade: A)
   🔍 Extracted 4 technical indicators
      domain: 3 found
      hash_sha256: 1 found


2025-07-20 13:50:19,456 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/tictactoe-dropper (attempt 1/6)


   Processing article 76/171: https://www.fortinet.com/blog/threat-research/tictactoe-drop...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/tictactoe-dropper


2025-07-20 13:50:20,333 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/tictactoe-dropper


   ✅ Successfully scraped article
      Title: TicTacToe Dropper
      Content length: 15229
   ✅ Approved (Grade: A)
   🔍 Extracted 22 technical indicators
      domain: 10 found
      hash_sha1: 5 found
      hash_sha256: 6 found
      url: 1 found
   Processing article 77/171: https://www.fortinet.com/blog/threat-research/python-info-st...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/python-info-stealer-malicious-excel-document


2025-07-20 13:50:27,781 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/python-info-stealer-malicious-excel-document (attempt 1/6)
2025-07-20 13:50:28,639 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/python-info-stealer-malicious-excel-document


   ✅ Successfully scraped article
      Title: Python Info-stealer Distributed by Malicious Excel Document
      Content length: 7137
   ✅ Approved (Grade: A)
   🔍 Extracted 8 technical indicators
      domain: 8 found
   Processing article 78/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-albabat


2025-07-20 13:50:35,409 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-albabat (attempt 1/6)
2025-07-20 13:50:36,250 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-albabat


   ✅ Successfully scraped article
      Title: Ransomware Roundup - Albabat
      Content length: 10048
   ✅ Approved (Grade: A)
   🔍 Extracted 18 technical indicators
      domain: 13 found
      hash_sha256: 5 found
   Processing article 79/171: https://www.fortinet.com/blog/threat-research/phobos-ransomw...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/phobos-ransomware-variant-launches-attack-faust


2025-07-20 13:50:44,339 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/phobos-ransomware-variant-launches-attack-faust (attempt 1/6)
2025-07-20 13:50:45,192 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/phobos-ransomware-variant-launches-attack-faust


   ✅ Successfully scraped article
      Title: Another Phobos Ransomware Variant Launches Attack – FAUST
      Content length: 8323
   ✅ Approved (Grade: A)
   🔍 Extracted 5 technical indicators
      domain: 4 found
      email: 1 found
   Processing article 80/171: https://www.fortinet.com/blog/threat-research/info-stealing-...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/info-stealing-packages-hidden-in-pypi


2025-07-20 13:50:52,791 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/info-stealing-packages-hidden-in-pypi (attempt 1/6)
2025-07-20 13:50:52,964 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/info-stealing-packages-hidden-in-pypi


   ✅ Successfully scraped article
      Title: Info Stealing Packages Hidden in PyPI
      Content length: 13503
   ✅ Approved (Grade: A)
   🔍 Extracted 33 technical indicators
      domain: 7 found
      hash_sha256: 26 found
   Processing article 81/171: https://www.fortinet.com/blog/threat-research/lumma-variant-...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/lumma-variant-on-youtube


2025-07-20 13:51:01,613 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/lumma-variant-on-youtube (attempt 1/6)
2025-07-20 13:51:01,769 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/lumma-variant-on-youtube


   ✅ Successfully scraped article
      Title: Deceptive Cracked Software Spreads Lumma Variant on YouTube
      Content length: 8331
   ✅ Approved (Grade: A)
   🔍 Extracted 2 technical indicators
      ip_address: 1 found
      domain: 1 found
   Processing article 82/171: https://www.fortinet.com/blog/threat-research/malicious-pypi...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/malicious-pypi-packages-deploy-coinminer-on-linux-devices


2025-07-20 13:51:09,751 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/malicious-pypi-packages-deploy-coinminer-on-linux-devices (attempt 1/6)
2025-07-20 13:51:10,595 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/malicious-pypi-packages-deploy-coinminer-on-linux-devices


   ✅ Successfully scraped article
      Title: Three New Malicious PyPI Packages Deploy CoinMiner on Linux Devices
      Content length: 7072
   ✅ Approved (Grade: A)
   🔍 Extracted 9 technical indicators
      domain: 4 found
      hash_sha256: 5 found
   Processing article 83/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-8base


2025-07-20 13:51:18,242 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-8base (attempt 1/6)
2025-07-20 13:51:18,763 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-8base


   ✅ Successfully scraped article
      Title: Ransomware Roundup - 8base
      Content length: 15584
   ✅ Approved (Grade: A)
   🔍 Extracted 127 technical indicators
      domain: 32 found
      hash_sha256: 94 found
      email: 1 found
   Processing article 84/171: https://www.fortinet.com/blog/threat-research/bandook-persis...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/bandook-persistent-threat-that-keeps-evolving


2025-07-20 13:51:26,518 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/bandook-persistent-threat-that-keeps-evolving (attempt 1/6)
2025-07-20 13:51:26,681 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/bandook-persistent-threat-that-keeps-evolving


   ✅ Successfully scraped article
      Title: Bandook - A Persistent Threat That Keeps Evolving
      Content length: 10987
   ✅ Approved (Grade: A)
   🔍 Extracted 9 technical indicators
      domain: 8 found
      hash_sha256: 1 found
   Processing article 85/171: https://www.fortinet.com/blog/threat-research/teamcity-intru...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/teamcity-intrusion-saga-apt29-suspected-exploiting-cve-2023-42793


2025-07-20 13:51:35,127 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/teamcity-intrusion-saga-apt29-suspected-exploiting-cve-2023-42793 (attempt 1/6)
2025-07-20 13:51:35,319 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/teamcity-intrusion-saga-apt29-suspected-exploiting-cve-2023-42793


   ✅ Successfully scraped article
      Title: TeamCity Intrusion Saga: APT29 Suspected Among the Attackers Exploiting CVE-2023-42793
      Content length: 53392
   ❌ Rejected (Grade: D)
   Processing article 86/171: https://www.fortinet.com/blog/threat-research/mranon-stealer...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/mranon-stealer-spreads-via-email-with-fake-hotel-booking-pdf


2025-07-20 13:51:42,940 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/mranon-stealer-spreads-via-email-with-fake-hotel-booking-pdf (attempt 1/6)
2025-07-20 13:51:43,118 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/mranon-stealer-spreads-via-email-with-fake-hotel-booking-pdf


   ✅ Successfully scraped article
      Title: MrAnon Stealer Spreads via Email with Fake Hotel Booking PDF
      Content length: 9047
   ✅ Approved (Grade: A)
   🔍 Extracted 36 technical indicators
      domain: 36 found
   Processing article 87/171: https://www.fortinet.com/blog/threat-research/gotitan-botnet...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/gotitan-botnet-exploitation-on-apache-activemq


2025-07-20 13:51:50,235 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/gotitan-botnet-exploitation-on-apache-activemq (attempt 1/6)
2025-07-20 13:51:50,394 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/gotitan-botnet-exploitation-on-apache-activemq


   ✅ Successfully scraped article
      Title: GoTitan Botnet - Ongoing Exploitation on Apache ActiveMQ
      Content length: 10587
   ✅ Approved (Grade: A)
   🔍 Extracted 9 technical indicators
      ip_address: 1 found
      domain: 7 found
      cve: 1 found
   Processing article 88/171: https://www.fortinet.com/blog/threat-research/konni-campaign...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/konni-campaign-distributed-via-malicious-document


2025-07-20 13:51:58,801 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/konni-campaign-distributed-via-malicious-document (attempt 1/6)
2025-07-20 13:51:58,977 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/konni-campaign-distributed-via-malicious-document


   ✅ Successfully scraped article
      Title: Konni Campaign Distributed Via Malicious Document
      Content length: 9528
   ✅ Approved (Grade: A)
   🔍 Extracted 7 technical indicators
      domain: 7 found
   Processing article 89/171: https://www.fortinet.com/blog/threat-research/investigating-...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/investigating-the-new-rhysida-ransomware


2025-07-20 13:52:06,479 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/investigating-the-new-rhysida-ransomware (attempt 1/6)
2025-07-20 13:52:07,442 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/investigating-the-new-rhysida-ransomware


   ✅ Successfully scraped article
      Title: Investigating the New Rhysida Ransomware
      Content length: 2745
   ✅ Approved (Grade: C)
   Processing article 90/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-noescape


2025-07-20 13:52:14,233 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-noescape (attempt 1/6)
2025-07-20 13:52:15,070 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-noescape


   ✅ Successfully scraped article
      Title: Ransomware Roundup – NoEscape
      Content length: 8746
   ✅ Approved (Grade: A)
   🔍 Extracted 18 technical indicators
      hash_sha256: 18 found
   Processing article 91/171: https://www.fortinet.com/blog/threat-research/2024-threat-pr...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/2024-threat-predictions-chained-ai-and-caas-operations


2025-07-20 13:52:21,717 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/2024-threat-predictions-chained-ai-and-caas-operations (attempt 1/6)
2025-07-20 13:52:22,274 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/2024-threat-predictions-chained-ai-and-caas-operations


   ✅ Successfully scraped article
      Title: Threat Predictions for 2024: Chained AI and CaaS Operations Give Attackers More “Easy” Buttons Than 
      Content length: 7811
   ✅ Approved (Grade: C)
   Processing article 92/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-knight


2025-07-20 13:52:30,282 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-knight (attempt 1/6)
2025-07-20 13:52:31,296 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-knight


   ✅ Successfully scraped article
      Title: Ransomware Roundup - Knight
      Content length: 11086
   ✅ Approved (Grade: A)
   🔍 Extracted 50 technical indicators
      ip_address: 1 found
      domain: 1 found
      hash_sha256: 48 found
   Processing article 93/171: https://www.fortinet.com/blog/threat-research/exelastealer-i...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/exelastealer-infostealer-enters-the-field


2025-07-20 13:52:38,767 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/exelastealer-infostealer-enters-the-field (attempt 1/6)
2025-07-20 13:52:39,797 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/exelastealer-infostealer-enters-the-field


   ✅ Successfully scraped article
      Title: Another InfoStealer Enters the Field, ExelaStealer
      Content length: 7998
   ✅ Approved (Grade: A)
   🔍 Extracted 17 technical indicators
      domain: 9 found
      hash_sha256: 5 found
      url: 3 found
   Processing article 94/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-akira


2025-07-20 13:52:46,196 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-akira (attempt 1/6)
2025-07-20 13:52:46,390 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-akira


   ✅ Successfully scraped article
      Title: Ransomware Roundup - Akira
      Content length: 18152
   ✅ Approved (Grade: A)
   🔍 Extracted 46 technical indicators
      domain: 5 found
      hash_sha256: 39 found
      url: 2 found
   Processing article 95/171: https://www.fortinet.com/blog/threat-research/Iz1h9-campaign...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/Iz1h9-campaign-enhances-arsenal-with-scores-of-exploits


2025-07-20 13:52:54,303 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/Iz1h9-campaign-enhances-arsenal-with-scores-of-exploits (attempt 1/6)
2025-07-20 13:52:55,153 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/Iz1h9-campaign-enhances-arsenal-with-scores-of-exploits


   ✅ Successfully scraped article
      Title: IZ1H9 Campaign Enhances Its Arsenal with Scores of Exploits
      Content length: 7864
   ✅ Approved (Grade: A)
   🔍 Extracted 17 technical indicators
      ip_address: 2 found
      domain: 4 found
      cve: 11 found
   Processing article 96/171: https://www.fortinet.com/blog/threat-research/malicious-pack...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/malicious-packages-hiddin-in-npm


2025-07-20 13:53:03,006 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/malicious-packages-hiddin-in-npm (attempt 1/6)
2025-07-20 13:53:03,875 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/malicious-packages-hiddin-in-npm


   ✅ Successfully scraped article
      Title: Malicious Packages Hidden in NPM
      Content length: 11826
   ✅ Approved (Grade: A)
   🔍 Extracted 40 technical indicators
      domain: 7 found
      hash_md5: 33 found
   Processing article 97/171: https://www.fortinet.com/blog/threat-research/threat-Actors-...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/threat-Actors-exploit-the-tensions-between-azerbaijan-and-armenia


2025-07-20 13:53:11,228 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/threat-Actors-exploit-the-tensions-between-azerbaijan-and-armenia (attempt 1/6)
2025-07-20 13:53:11,744 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/threat-Actors-exploit-the-tensions-between-azerbaijan-and-armenia


   ✅ Successfully scraped article
      Title: Threat Actors Exploit the Tensions Between Azerbaijan and Armenia
      Content length: 8118
   ✅ Approved (Grade: A)
   🔍 Extracted 13 technical indicators
      domain: 7 found
      hash_sha256: 5 found
      url: 1 found
   Processing article 98/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-retch-and-sho


2025-07-20 13:53:18,243 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-retch-and-sho (attempt 1/6)
2025-07-20 13:53:19,248 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-retch-and-sho


   ✅ Successfully scraped article
      Title: Ransomware Roundup - Retch and S.H.O.
      Content length: 12712
   ✅ Approved (Grade: A)
   🔍 Extracted 12 technical indicators
      domain: 5 found
      hash_sha256: 7 found
   Processing article 99/171: https://www.fortinet.com/blog/threat-research/new-midgedropp...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/new-midgedropper-variant


2025-07-20 13:53:26,383 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/new-midgedropper-variant (attempt 1/6)
2025-07-20 13:53:27,552 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/new-midgedropper-variant


   ✅ Successfully scraped article
      Title: New MidgeDropper Variant
      Content length: 7938
   ✅ Approved (Grade: A)
   🔍 Extracted 20 technical indicators
      domain: 10 found
      hash_sha256: 9 found
      url: 1 found
   Processing article 100/171: https://www.fortinet.com/blog/threat-research/originbotnet-s...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/originbotnet-spreads-via-malicious-word-document


2025-07-20 13:53:34,313 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/originbotnet-spreads-via-malicious-word-document (attempt 1/6)
2025-07-20 13:53:34,478 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/originbotnet-spreads-via-malicious-word-document


   ✅ Successfully scraped article
      Title: OriginBotnet Spreads via Malicious Word Document
      Content length: 12943
   ✅ Approved (Grade: A)
   🔍 Extracted 24 technical indicators
      domain: 16 found
      hash_sha256: 5 found
      url: 3 found
   Processing article 101/171: https://www.fortinet.com/blog/threat-research/agent-tesla-va...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/agent-tesla-variant-spread-by-crafted-excel-document


2025-07-20 13:53:41,993 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/agent-tesla-variant-spread-by-crafted-excel-document (attempt 1/6)
2025-07-20 13:53:42,177 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/agent-tesla-variant-spread-by-crafted-excel-document


   ✅ Successfully scraped article
      Title: New Agent Tesla Variant Being Spread by Crafted Excel Document
      Content length: 14358
   ✅ Approved (Grade: A)
   🔍 Extracted 15 technical indicators
      domain: 11 found
      hash_sha256: 2 found
      cve: 2 found
   Processing article 102/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-rhysida


2025-07-20 13:53:49,978 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-rhysida (attempt 1/6)
2025-07-20 13:53:50,819 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-rhysida


   ✅ Successfully scraped article
      Title: Ransomware Roundup - Rhysida
      Content length: 8785
   ✅ Approved (Grade: A)
   🔍 Extracted 11 technical indicators
      domain: 1 found
      hash_sha256: 10 found
   Processing article 103/171: https://www.fortinet.com/blog/threat-research/multiple-threa...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/multiple-threats-target-adobe-coldfusion-vulnerabilities


2025-07-20 13:53:58,228 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/multiple-threats-target-adobe-coldfusion-vulnerabilities (attempt 1/6)
2025-07-20 13:53:59,064 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/multiple-threats-target-adobe-coldfusion-vulnerabilities


   ✅ Successfully scraped article
      Title: Multiple Threats Target Adobe ColdFusion Vulnerabilities
      Content length: 7256
   ✅ Approved (Grade: A)
   🔍 Extracted 5 technical indicators
      domain: 2 found
      cve: 3 found
   Processing article 104/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-trash-panda-and-nocry-variant


2025-07-20 13:54:06,515 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-trash-panda-and-nocry-variant (attempt 1/6)
2025-07-20 13:54:07,373 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-trash-panda-and-nocry-variant


   ✅ Successfully scraped article
      Title: Ransomware Roundup – Trash Panda and A New Minor Variant of NoCry
      Content length: 8604
   ✅ Approved (Grade: A)
   🔍 Extracted 3 technical indicators
      domain: 1 found
      hash_sha256: 2 found
   Processing article 105/171: https://www.fortinet.com/blog/threat-research/fortiguard-ai-...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/fortiguard-ai-detects-malicious-packages-in-pypi


2025-07-20 13:54:15,402 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/fortiguard-ai-detects-malicious-packages-in-pypi (attempt 1/6)
2025-07-20 13:54:16,234 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/fortiguard-ai-detects-malicious-packages-in-pypi


   ✅ Successfully scraped article
      Title: FortiGuard AI Detects Malicious Packages Hidden in the Python Package Index
      Content length: 4631
   ✅ Approved (Grade: A)
   🔍 Extracted 10 technical indicators
      domain: 3 found
      hash_md5: 6 found
      email: 1 found
   Processing article 106/171: https://www.fortinet.com/blog/threat-research/malware-distri...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/malware-distributed-via-freezers-and-syk-crypter


2025-07-20 13:54:22,831 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/malware-distributed-via-freezers-and-syk-crypter (attempt 1/6)
2025-07-20 13:54:22,991 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/malware-distributed-via-freezers-and-syk-crypter


   ✅ Successfully scraped article
      Title: Attackers Distribute Malware via Freeze.rs And SYK Crypter
      Content length: 12072
   ✅ Approved (Grade: A)
   🔍 Extracted 9 technical indicators
      domain: 8 found
      url: 1 found
   Processing article 107/171: https://www.fortinet.com/blog/threat-research/fortiguard-lab...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/fortiguard-labs-threat-report-key-findings-1h-2023


2025-07-20 13:54:30,594 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/fortiguard-labs-threat-report-key-findings-1h-2023 (attempt 1/6)
2025-07-20 13:54:31,093 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/fortiguard-labs-threat-report-key-findings-1h-2023


   ✅ Successfully scraped article
      Title: Key Findings from the 1H 2023 FortiGuard Labs Threat Report
      Content length: 6182
   ✅ Approved (Grade: C)
   Processing article 108/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-dodo-and-proton


2025-07-20 13:54:38,754 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-dodo-and-proton (attempt 1/6)
2025-07-20 13:54:39,595 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-dodo-and-proton


   ✅ Successfully scraped article
      Title: Ransomware Roundup - DoDo and Proton
      Content length: 11722
   ✅ Approved (Grade: A)
   🔍 Extracted 18 technical indicators
      domain: 1 found
      hash_sha256: 17 found
   Processing article 109/171: https://www.fortinet.com/blog/threat-research/microsoft-mess...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/microsoft-message-queuing-service-vulnerabilities


2025-07-20 13:54:46,838 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/microsoft-message-queuing-service-vulnerabilities (attempt 1/6)
2025-07-20 13:54:47,347 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/microsoft-message-queuing-service-vulnerabilities


   ✅ Successfully scraped article
      Title: FortiGuard Labs Discovers Multiple Vulnerabilities in Microsoft Message Queuing Service
      Content length: 12825
   ✅ Approved (Grade: A)
   🔍 Extracted 4 technical indicators
      domain: 2 found
      cve: 2 found
   Processing article 110/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-cl0p


2025-07-20 13:54:54,910 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-cl0p (attempt 1/6)
2025-07-20 13:54:55,752 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-cl0p


   ✅ Successfully scraped article
      Title: Ransomware Roundup - Cl0p
      Content length: 10745
   ✅ Approved (Grade: A)
   🔍 Extracted 10 technical indicators
      domain: 1 found
      hash_sha256: 8 found
      cve: 1 found
   Processing article 111/171: https://www.fortinet.com/blog/threat-research/ddos-botnets-t...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ddos-botnets-target-zyxel-vulnerability-cve-2023-28771


2025-07-20 13:55:03,874 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ddos-botnets-target-zyxel-vulnerability-cve-2023-28771 (attempt 1/6)
2025-07-20 13:55:04,034 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ddos-botnets-target-zyxel-vulnerability-cve-2023-28771


   ✅ Successfully scraped article
      Title: DDoS Botnets Target Zyxel Vulnerability CVE-2023-28771
      Content length: 11587
   ✅ Approved (Grade: A)
   🔍 Extracted 5 technical indicators
      ip_address: 1 found
      domain: 3 found
      cve: 1 found
   Processing article 112/171: https://www.fortinet.com/blog/threat-research/adobe-indesign...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/adobe-indesign-zero-day-vulnerabilities


2025-07-20 13:55:11,646 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/adobe-indesign-zero-day-vulnerabilities (attempt 1/6)
2025-07-20 13:55:12,629 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/adobe-indesign-zero-day-vulnerabilities


   ✅ Successfully scraped article
      Title: FortiGuard Labs Discovers Multiple Vulnerabilities in Adobe InDesign
      Content length: 9165
   ✅ Approved (Grade: A)
   🔍 Extracted 12 technical indicators
      cve: 12 found
   Processing article 113/171: https://www.fortinet.com/blog/threat-research/lokibot-target...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/lokibot-targets-microsoft-office-document-using-vulnerabilities-and-macros


2025-07-20 13:55:18,908 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/lokibot-targets-microsoft-office-document-using-vulnerabilities-and-macros (attempt 1/6)
2025-07-20 13:55:19,090 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/lokibot-targets-microsoft-office-document-using-vulnerabilities-and-macros


   ✅ Successfully scraped article
      Title: LokiBot Campaign Targets Microsoft Office Document Using Vulnerabilities and Macros
      Content length: 8785
   ✅ Approved (Grade: A)
   🔍 Extracted 12 technical indicators
      domain: 9 found
      hash_sha256: 1 found
      cve: 2 found
   Processing article 114/171: https://www.fortinet.com/blog/threat-research/lockbit-most-p...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/lockbit-most-prevalent-ransomware


2025-07-20 13:55:26,852 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/lockbit-most-prevalent-ransomware (attempt 1/6)
2025-07-20 13:55:27,872 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/lockbit-most-prevalent-ransomware


   ✅ Successfully scraped article
      Title: Meet LockBit: The Most Prevalent Ransomware in 2022
      Content length: 28832
   ✅ Approved (Grade: A)
   🔍 Extracted 64 technical indicators
      domain: 5 found
      hash_sha256: 58 found
      cve: 1 found


2025-07-20 13:55:33,964 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-rancoz (attempt 1/6)


   Processing article 115/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-rancoz


2025-07-20 13:55:34,962 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-rancoz


   ✅ Successfully scraped article
      Title: Ransomware Roundup - Rancoz
      Content length: 9195
   ✅ Approved (Grade: A)
   🔍 Extracted 5 technical indicators
      domain: 2 found
      hash_sha256: 3 found
   Processing article 116/171: https://www.fortinet.com/blog/threat-research/new-fast-devel...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/new-fast-developing-thirdeye-infostealer-pries-open-system-information


2025-07-20 13:55:42,138 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/new-fast-developing-thirdeye-infostealer-pries-open-system-information (attempt 1/6)
2025-07-20 13:55:42,981 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/new-fast-developing-thirdeye-infostealer-pries-open-system-information


   ✅ Successfully scraped article
      Title: New Fast-Developing ThirdEye Infostealer Pries Open System Information
      Content length: 7660
   ✅ Approved (Grade: A)
   🔍 Extracted 18 technical indicators
      domain: 2 found
      hash_sha256: 16 found
   Processing article 117/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-black-basta


2025-07-20 13:55:51,017 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-black-basta (attempt 1/6)
2025-07-20 13:55:52,030 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-black-basta


   ✅ Successfully scraped article
      Title: Ransomware Roundup - Black Basta
      Content length: 20217
   ✅ Approved (Grade: A)
   🔍 Extracted 118 technical indicators
      domain: 2 found
      hash_sha256: 112 found
      cve: 2 found
      url: 2 found
   Processing article 118/171: https://www.fortinet.com/blog/threat-research/fortinet-rever...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/fortinet-reverses-flutter-based-android-malware-fluhorse


2025-07-20 13:55:59,436 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/fortinet-reverses-flutter-based-android-malware-fluhorse (attempt 1/6)
2025-07-20 13:56:00,431 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/fortinet-reverses-flutter-based-android-malware-fluhorse


   ✅ Successfully scraped article
      Title: Fortinet Reverses Flutter-based Android Malware “Fluhorse”
      Content length: 9683
   ✅ Approved (Grade: A)
   🔍 Extracted 9 technical indicators
      domain: 8 found
      hash_sha256: 1 found
   Processing article 119/171: https://www.fortinet.com/blog/threat-research/condi-ddos-bot...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/condi-ddos-botnet-spreads-via-tp-links-cve-2023-1389


2025-07-20 13:56:06,737 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/condi-ddos-botnet-spreads-via-tp-links-cve-2023-1389 (attempt 1/6)
2025-07-20 13:56:07,278 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/condi-ddos-botnet-spreads-via-tp-links-cve-2023-1389


   ✅ Successfully scraped article
      Title: Condi DDoS Botnet Spreads via TP-Link's CVE-2023-1389
      Content length: 10144
   ✅ Approved (Grade: A)
   🔍 Extracted 2 technical indicators
      domain: 1 found
      cve: 1 found
   Processing article 120/171: https://www.fortinet.com/blog/threat-research/fortiguard-lab...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/fortiguard-labs-ransomware-roundup-big-head


2025-07-20 13:56:14,424 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/fortiguard-labs-ransomware-roundup-big-head (attempt 1/6)
2025-07-20 13:56:15,275 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/fortiguard-labs-ransomware-roundup-big-head


   ✅ Successfully scraped article
      Title: Ransomware Roundup - Big Head
      Content length: 9198
   ✅ Approved (Grade: A)
   🔍 Extracted 11 technical indicators
      hash_sha256: 11 found
   Processing article 121/171: https://www.fortinet.com/blog/threat-research/moveit-transfe...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/moveit-transfer-critical-vulnerability-cve-2023-34362-exploited-as-a-0-day


2025-07-20 13:56:22,145 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/moveit-transfer-critical-vulnerability-cve-2023-34362-exploited-as-a-0-day (attempt 1/6)
2025-07-20 13:56:23,639 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/moveit-transfer-critical-vulnerability-cve-2023-34362-exploited-as-a-0-day


   ✅ Successfully scraped article
      Title: MOVEit Transfer Critical Vulnerability (CVE-2023-34362) Exploited as a 0-day
      Content length: 10377
   ✅ Approved (Grade: A)
   🔍 Extracted 35 technical indicators
      ip_address: 1 found
      hash_sha256: 32 found
      cve: 2 found
   Processing article 122/171: https://www.fortinet.com/blog/threat-research/youtube-pirate...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/youtube-pirated-software-videos-deliver-triple-threat-vidar-stealer-laplas-clipper-xmrig-miner


2025-07-20 13:56:30,734 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/youtube-pirated-software-videos-deliver-triple-threat-vidar-stealer-laplas-clipper-xmrig-miner (attempt 1/6)
2025-07-20 13:56:31,614 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/youtube-pirated-software-videos-deliver-triple-threat-vidar-stealer-laplas-clipper-xmrig-miner


   ✅ Successfully scraped article
      Title: YouTube Pirated Software Videos Deliver Triple Threat: Vidar Stealer, Laplas Clipper, XMRig Miner
      Content length: 16826
   ✅ Approved (Grade: A)
   🔍 Extracted 35 technical indicators
      ip_address: 1 found
      domain: 22 found
      hash_sha256: 12 found
   Processing article 123/171: https://www.fortinet.com/blog/threat-research/wintapix-kerna...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/wintapix-kernal-driver-middle-east-countries


2025-07-20 13:56:38,969 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/wintapix-kernal-driver-middle-east-countries (attempt 1/6)
2025-07-20 13:56:39,364 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/wintapix-kernal-driver-middle-east-countries


   ✅ Successfully scraped article
      Title: WINTAPIX: A New Kernel Driver Targeting Countries in The Middle East
      Content length: 13496
   ✅ Approved (Grade: A)
   🔍 Extracted 8 technical indicators
      domain: 3 found
      hash_sha256: 5 found
   Processing article 124/171: https://www.fortinet.com/blog/threat-research/more-supply-ch...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/more-supply-chain-attacks-via-malicious-python-packages


2025-07-20 13:56:46,964 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/more-supply-chain-attacks-via-malicious-python-packages (attempt 1/6)
2025-07-20 13:56:47,970 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/more-supply-chain-attacks-via-malicious-python-packages


   ✅ Successfully scraped article
      Title: More Supply Chain Attacks via Malicious Python Packages
      Content length: 8799
   ✅ Approved (Grade: A)
   🔍 Extracted 41 technical indicators
      domain: 3 found
      hash_md5: 37 found
      url: 1 found


2025-07-20 13:56:54,050 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-maori (attempt 1/6)


   Processing article 125/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-maori


2025-07-20 13:56:54,898 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-maori


   ✅ Successfully scraped article
      Title: Ransomware Roundup - Maori
      Content length: 6491
   ✅ Approved (Grade: A)
   🔍 Extracted 1 technical indicators
      hash_sha256: 1 found
   Processing article 126/171: https://www.fortinet.com/blog/threat-research/rapperbot-ddos...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/rapperbot-ddos-botnet-expands-into-cryptojacking


2025-07-20 13:57:01,402 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/rapperbot-ddos-botnet-expands-into-cryptojacking (attempt 1/6)
2025-07-20 13:57:01,576 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/rapperbot-ddos-botnet-expands-into-cryptojacking


   ✅ Successfully scraped article
      Title: RapperBot DDoS Botnet Expands into Cryptojacking
      Content length: 16237
   ✅ Approved (Grade: A)
   🔍 Extracted 13 technical indicators
      domain: 2 found
      hash_sha256: 10 found
      url: 1 found
   Processing article 127/171: https://www.fortinet.com/blog/threat-research/andoryubot-new...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/andoryubot-new-botnet-campaign-targets-ruckus-wireless-admin-remote-code-execution-vulnerability-cve-2023-25717


2025-07-20 13:57:10,212 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/andoryubot-new-botnet-campaign-targets-ruckus-wireless-admin-remote-code-execution-vulnerability-cve-2023-25717 (attempt 1/6)
2025-07-20 13:57:11,201 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/andoryubot-new-botnet-campaign-targets-ruckus-wireless-admin-remote-code-execution-vulnerability-cve-2023-25717


   ✅ Successfully scraped article
      Title: AndoryuBot – New Botnet Campaign Targets Ruckus Wireless Admin Remote Code Execution Vulnerability (
      Content length: 5841
   ✅ Approved (Grade: A)
   🔍 Extracted 12 technical indicators
      domain: 1 found
      hash_sha256: 10 found
      cve: 1 found
   Processing article 128/171: https://www.fortinet.com/blog/threat-research/clean-rooms-nu...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/clean-rooms-nuclear-missiles-and-sidecopy


2025-07-20 13:57:18,986 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/clean-rooms-nuclear-missiles-and-sidecopy (attempt 1/6)
2025-07-20 13:57:20,034 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/clean-rooms-nuclear-missiles-and-sidecopy


   ✅ Successfully scraped article
      Title: Clean Rooms, Nuclear Missiles, and SideCopy, Oh My!
      Content length: 15083
   ✅ Approved (Grade: A)
   🔍 Extracted 35 technical indicators
      domain: 20 found
      hash_sha256: 12 found
      url: 3 found
   Processing article 129/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-uniza-coverage


2025-07-20 13:57:26,648 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-uniza-coverage (attempt 1/6)
2025-07-20 13:57:28,170 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-uniza-coverage


   ✅ Successfully scraped article
      Title: Ransomware Roundup - UNIZA
      Content length: 6529
   ✅ Approved (Grade: A)
   🔍 Extracted 4 technical indicators
      domain: 1 found
      hash_sha256: 3 found
   Processing article 130/171: https://www.fortinet.com/blog/threat-research/evil-extractor...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/evil-extractor-all-in-one-stealer


2025-07-20 13:57:35,233 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/evil-extractor-all-in-one-stealer (attempt 1/6)
2025-07-20 13:57:36,067 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/evil-extractor-all-in-one-stealer


   ✅ Successfully scraped article
      Title: EvilExtractor – All-in-One Stealer
      Content length: 7277
   ✅ Approved (Grade: A)
   🔍 Extracted 8 technical indicators
      domain: 7 found
      url: 1 found
   Processing article 131/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-kadavro-vector-ransomware


2025-07-20 13:57:44,078 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-kadavro-vector-ransomware (attempt 1/6)
2025-07-20 13:57:44,930 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-kadavro-vector-ransomware


   ✅ Successfully scraped article
      Title: Ransomware Roundup – Kadavro Vector Ransomware
      Content length: 8190
   ✅ Approved (Grade: A)
   🔍 Extracted 2 technical indicators
      hash_sha256: 2 found
   Processing article 132/171: https://www.fortinet.com/blog/threat-research/tax-scammers-a...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/tax-scammers-at-large


2025-07-20 13:57:51,845 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/tax-scammers-at-large (attempt 1/6)
2025-07-20 13:57:52,685 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/tax-scammers-at-large


   ✅ Successfully scraped article
      Title: Deja Vu All Over Again: Tax Scammers at Large
      Content length: 11412
   ✅ Approved (Grade: A)
   🔍 Extracted 31 technical indicators
      domain: 20 found
      hash_sha256: 11 found
   Processing article 133/171: https://www.fortinet.com/blog/threat-research/exploring-rece...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/exploring-recent-microsoft-outlook-vulnerability-cve-2023-23397


2025-07-20 13:58:00,698 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/exploring-recent-microsoft-outlook-vulnerability-cve-2023-23397 (attempt 1/6)
2025-07-20 13:58:01,886 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/exploring-recent-microsoft-outlook-vulnerability-cve-2023-23397


   ✅ Successfully scraped article
      Title: Exploring a Recent Microsoft Outlook Vulnerability: CVE-2023-23397
      Content length: 3065
   ✅ Approved (Grade: A)
   🔍 Extracted 1 technical indicators
      cve: 1 found
   Processing article 134/171: https://www.fortinet.com/blog/threat-research/are-internet-m...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/are-internet-macros-dead-or-alive


2025-07-20 13:58:08,724 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/are-internet-macros-dead-or-alive (attempt 1/6)
2025-07-20 13:58:09,725 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/are-internet-macros-dead-or-alive


   ✅ Successfully scraped article
      Title: Are Internet Macros Dead or Alive?
      Content length: 13361
   ✅ Approved (Grade: A)
   🔍 Extracted 4 technical indicators
      domain: 4 found


2025-07-20 13:58:15,896 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/malware-disguised-as-document-ukraine-energoatom-delivers-havoc-demon-backdoor (attempt 1/6)


   Processing article 135/171: https://www.fortinet.com/blog/threat-research/malware-disgui...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/malware-disguised-as-document-ukraine-energoatom-delivers-havoc-demon-backdoor


2025-07-20 13:58:16,234 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/malware-disguised-as-document-ukraine-energoatom-delivers-havoc-demon-backdoor


   ✅ Successfully scraped article
      Title: Malware Disguised as Document from Ukraine's Energoatom Delivers Havoc Demon Backdoor
      Content length: 17873
   ✅ Approved (Grade: A)
   🔍 Extracted 26 technical indicators
      domain: 21 found
      hash_sha256: 4 found
      url: 1 found
   Processing article 136/171: https://www.fortinet.com/blog/threat-research/3cx-desktop-ap...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/3cx-desktop-app-compromised


2025-07-20 13:58:23,717 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/3cx-desktop-app-compromised (attempt 1/6)
2025-07-20 13:58:24,561 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/3cx-desktop-app-compromised


   ✅ Successfully scraped article
      Title: 3CX Desktop App Compromised (CVE-2023-29059)
      Content length: 7605
   ✅ Approved (Grade: A)
   🔍 Extracted 24 technical indicators
      domain: 4 found
      hash_md5: 2 found
      hash_sha1: 2 found
      hash_sha256: 15 found
      cve: 1 found
   Processing article 137/171: https://www.fortinet.com/blog/threat-research/dark-power-and...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/dark-power-and-payme100usd-ransomware


2025-07-20 13:58:31,227 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/dark-power-and-payme100usd-ransomware (attempt 1/6)
2025-07-20 13:58:32,080 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/dark-power-and-payme100usd-ransomware


   ✅ Successfully scraped article
      Title: Ransomware Roundup – Dark Power and PayMe100USD Ransomware
      Content length: 9462
   ✅ Approved (Grade: A)
   🔍 Extracted 49 technical indicators
      domain: 45 found
      hash_sha256: 4 found
   Processing article 138/171: https://www.fortinet.com/blog/threat-research/moobot-strikes...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/moobot-strikes-again-targeting-cacti-and-realtek-vulnerabilities


2025-07-20 13:58:39,287 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/moobot-strikes-again-targeting-cacti-and-realtek-vulnerabilities (attempt 1/6)
2025-07-20 13:58:40,301 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/moobot-strikes-again-targeting-cacti-and-realtek-vulnerabilities


   ✅ Successfully scraped article
      Title: Moobot Strikes Again - Targeting Cacti And RealTek Vulnerabilities
      Content length: 9175
   ✅ Approved (Grade: A)
   🔍 Extracted 42 technical indicators
      domain: 2 found
      hash_sha256: 38 found
      cve: 2 found
   Processing article 139/171: https://www.fortinet.com/blog/threat-research/supply-chain-a...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/supply-chain-attack-via-new-malicious-python-packages


2025-07-20 13:58:47,233 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/supply-chain-attack-via-new-malicious-python-packages (attempt 1/6)
2025-07-20 13:58:48,090 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/supply-chain-attack-via-new-malicious-python-packages


   ✅ Successfully scraped article
      Title: Supply Chain Attack via New Malicious Python Packages
      Content length: 12167
   ✅ Approved (Grade: A)
   🔍 Extracted 70 technical indicators
      domain: 5 found
      hash_md5: 64 found
      url: 1 found


2025-07-20 13:58:54,284 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/intel-on-wiper-malware (attempt 1/6)


   Processing article 140/171: https://www.fortinet.com/blog/threat-research/intel-on-wiper...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/intel-on-wiper-malware


2025-07-20 13:58:55,168 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/intel-on-wiper-malware


   ✅ Successfully scraped article
      Title: The Latest Intel on Wipers
      Content length: 4424
   ✅ Approved (Grade: C)
   Processing article 141/171: https://www.fortinet.com/blog/threat-research/fortiguard-lab...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/fortiguard-labs-ransomware-roundup


2025-07-20 13:59:02,497 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/fortiguard-labs-ransomware-roundup (attempt 1/6)
2025-07-20 13:59:03,382 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/fortiguard-labs-ransomware-roundup


   ✅ Successfully scraped article
      Title: Ransomware Roundup — HardBit 2.0
      Content length: 7357
   ✅ Approved (Grade: A)
   🔍 Extracted 8 technical indicators
      domain: 3 found
      hash_sha256: 4 found
      url: 1 found
   Processing article 142/171: https://www.fortinet.com/blog/threat-research/microsoft-onen...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/microsoft-onenote-file-being-leveraged-by-phishing-campaigns-to-spread-malware


2025-07-20 13:59:11,432 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/microsoft-onenote-file-being-leveraged-by-phishing-campaigns-to-spread-malware (attempt 1/6)
2025-07-20 13:59:12,452 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/microsoft-onenote-file-being-leveraged-by-phishing-campaigns-to-spread-malware


   ✅ Successfully scraped article
      Title: Microsoft OneNote File Being Leveraged by Phishing Campaigns to Spread Malware
      Content length: 16268
   ✅ Approved (Grade: A)
   🔍 Extracted 19 technical indicators
      ip_address: 1 found
      domain: 13 found
      hash_sha256: 3 found
      url: 2 found
   Processing article 143/171: https://www.fortinet.com/blog/threat-research/bad-actors-res...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/bad-actors-resurrecting-old-tactics


2025-07-20 13:59:20,224 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/bad-actors-resurrecting-old-tactics (attempt 1/6)
2025-07-20 13:59:21,142 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/bad-actors-resurrecting-old-tactics


   ✅ Successfully scraped article
      Title: Reduce, Reuse, Recycle: Bad Actors Practicing the Three Rs
      Content length: 5486
   ✅ Approved (Grade: C)
   Processing article 144/171: https://www.fortinet.com/blog/threat-research/old-cyber-gang...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/old-cyber-gang-uses-new-crypter-scrubcrypt


2025-07-20 13:59:28,985 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/old-cyber-gang-uses-new-crypter-scrubcrypt (attempt 1/6)
2025-07-20 13:59:29,986 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/old-cyber-gang-uses-new-crypter-scrubcrypt


   ✅ Successfully scraped article
      Title: Old Cyber Gang Uses New Crypter – ScrubCrypt
      Content length: 9537
   ✅ Approved (Grade: A)
   🔍 Extracted 30 technical indicators
      domain: 7 found
      hash_sha256: 23 found
   Processing article 145/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-sirattacker-acl


2025-07-20 13:59:36,536 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-sirattacker-acl (attempt 1/6)
2025-07-20 13:59:37,703 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-sirattacker-acl


   ✅ Successfully scraped article
      Title: Ransomware Roundup – Sirattacker and ALC
      Content length: 11524
   ✅ Approved (Grade: A)
   🔍 Extracted 17 technical indicators
      domain: 2 found
      hash_sha256: 15 found
   Processing article 146/171: https://www.fortinet.com/blog/threat-research/just-because-i...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/just-because-its-old-doesnt-mean-you-throw-it-away-including-malware


2025-07-20 13:59:44,594 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/just-because-its-old-doesnt-mean-you-throw-it-away-including-malware (attempt 1/6)
2025-07-20 13:59:44,758 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/just-because-its-old-doesnt-mean-you-throw-it-away-including-malware


   ✅ Successfully scraped article
      Title: Just Because It’s Old Doesn’t Mean You Throw It Away (Including Malware!)
      Content length: 11304
   ✅ Approved (Grade: A)
   🔍 Extracted 108 technical indicators
      ip_address: 47 found
      domain: 10 found
      hash_sha256: 46 found
      email: 4 found
      url: 1 found
   Processing article 147/171: https://www.fortinet.com/blog/threat-research/emerging-lockb...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/emerging-lockbit-campaign


2025-07-20 13:59:53,552 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/emerging-lockbit-campaign (attempt 1/6)
2025-07-20 13:59:54,540 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/emerging-lockbit-campaign


   ✅ Successfully scraped article
      Title: Can You See It Now? An Emerging LockBit Campaign
      Content length: 8514
   ✅ Approved (Grade: A)
   🔍 Extracted 15 technical indicators
      domain: 7 found
      hash_sha256: 7 found
      url: 1 found
   Processing article 148/171: https://www.fortinet.com/blog/threat-research/fortiguard-lab...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/fortiguard-labs-threat-report-key-findings-2h-2022


2025-07-20 14:00:01,886 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/fortiguard-labs-threat-report-key-findings-2h-2022 (attempt 1/6)
2025-07-20 14:00:02,868 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/fortiguard-labs-threat-report-key-findings-2h-2022


   ✅ Successfully scraped article
      Title: Key Findings from the 2H 2022 FortiGuard Labs Threat Report
      Content length: 7235
   ✅ Approved (Grade: C)
   Processing article 149/171: https://www.fortinet.com/blog/threat-research/royal-ransomwa...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/royal-ransomware-targets-linux-esxi-servers


2025-07-20 14:00:09,098 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/royal-ransomware-targets-linux-esxi-servers (attempt 1/6)
2025-07-20 14:00:09,950 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/royal-ransomware-targets-linux-esxi-servers


   ✅ Successfully scraped article
      Title: Royal Ransomware Targets Linux ESXi Servers
      Content length: 9060
   ✅ Approved (Grade: A)
   🔍 Extracted 1 technical indicators
      hash_sha256: 1 found
   Processing article 150/171: https://www.fortinet.com/blog/threat-research/more-supply-ch...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/more-supply-chain-attacks-via-new-malicious-python-packages-in-pypi


2025-07-20 14:00:16,513 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/more-supply-chain-attacks-via-new-malicious-python-packages-in-pypi (attempt 1/6)
2025-07-20 14:00:17,507 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/more-supply-chain-attacks-via-new-malicious-python-packages-in-pypi


   ✅ Successfully scraped article
      Title: More Supply Chain Attacks via New Malicious Python Packages in PyPi
      Content length: 4303
   ✅ Approved (Grade: A)
   🔍 Extracted 9 technical indicators
      domain: 6 found
      hash_sha256: 2 found
      url: 1 found
   Processing article 151/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-catb-ransomware


2025-07-20 14:00:24,913 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-catb-ransomware (attempt 1/6)
2025-07-20 14:00:25,425 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-catb-ransomware


   ✅ Successfully scraped article
      Title: Ransomware Roundup – CatB
      Content length: 9649
   ✅ Approved (Grade: A)
   🔍 Extracted 8 technical indicators
      domain: 2 found
      hash_sha256: 6 found
   Processing article 152/171: https://www.fortinet.com/blog/threat-research/supply-chain-a...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/supply-chain-attack-via-new-malicious-python-packages-by-malware-author-core1337


2025-07-20 14:00:32,102 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/supply-chain-attack-via-new-malicious-python-packages-by-malware-author-core1337 (attempt 1/6)
2025-07-20 14:00:32,750 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/supply-chain-attack-via-new-malicious-python-packages-by-malware-author-core1337


   ✅ Successfully scraped article
      Title: Supply Chain Attack via New Malicious Python Packages by Malware Author Core1337
      Content length: 3904
   ✅ Approved (Grade: A)
   🔍 Extracted 5 technical indicators
      domain: 3 found
      hash_sha256: 1 found
      url: 1 found
   Processing article 153/171: https://www.fortinet.com/blog/threat-research/supply-chain-a...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/supply-chain-attack-by-new-malicious-python-package-web3-essential


2025-07-20 14:00:40,606 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/supply-chain-attack-by-new-malicious-python-package-web3-essential (attempt 1/6)
2025-07-20 14:00:41,438 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/supply-chain-attack-by-new-malicious-python-package-web3-essential


   ✅ Successfully scraped article
      Title: Supply Chain Attack by New Malicious Python Package, “web3-essential”
      Content length: 4451
   ✅ Approved (Grade: A)
   🔍 Extracted 3 technical indicators
      domain: 2 found
      hash_sha256: 1 found
   Processing article 154/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-trigona-ransomware


2025-07-20 14:00:49,015 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-trigona-ransomware (attempt 1/6)
2025-07-20 14:00:49,850 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-trigona-ransomware


   ✅ Successfully scraped article
      Title: Ransomware Roundup – Trigona
      Content length: 7357
   ✅ Approved (Grade: A)
   🔍 Extracted 15 technical indicators
      hash_sha256: 15 found
   Processing article 155/171: https://www.fortinet.com/blog/threat-research/malicious-code...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/malicious-code-cryptojacks-device-to-mine-for-monero-crypto


2025-07-20 14:00:57,595 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/malicious-code-cryptojacks-device-to-mine-for-monero-crypto (attempt 1/6)
2025-07-20 14:00:58,602 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/malicious-code-cryptojacks-device-to-mine-for-monero-crypto


   ✅ Successfully scraped article
      Title: Analyzing Malware Code that Cryptojacks System to Mine for Monero Crypto
      Content length: 13497
   ✅ Approved (Grade: A)
   🔍 Extracted 19 technical indicators
      domain: 12 found
      hash_md5: 3 found
      hash_sha256: 4 found
   Processing article 156/171: https://www.fortinet.com/blog/threat-research/fortiguard-out...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/fortiguard-outbreak-alerts-2022-annual-report


2025-07-20 14:01:05,264 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/fortiguard-outbreak-alerts-2022-annual-report (attempt 1/6)
2025-07-20 14:01:05,737 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/fortiguard-outbreak-alerts-2022-annual-report


   ✅ Successfully scraped article
      Title: FortiGuard Outbreak Alerts - 2022 Annual Report
      Content length: 2136
   ✅ Approved (Grade: C)
   Processing article 157/171: https://www.fortinet.com/blog/threat-research/the-year-of-th...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/the-year-of-the-wiper


2025-07-20 14:01:14,262 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/the-year-of-the-wiper (attempt 1/6)
2025-07-20 14:01:14,592 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/the-year-of-the-wiper


   ✅ Successfully scraped article
      Title: The Year of the Wiper
      Content length: 12219
   ✅ Approved (Grade: A)
   🔍 Extracted 18 technical indicators
      domain: 4 found
      hash_sha256: 14 found
   Processing article 158/171: https://www.fortinet.com/blog/threat-research/qr-code-phishi...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/qr-code-phishing-attempts-to-steal-credentials-from-chinese-language-users


2025-07-20 14:01:21,774 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/qr-code-phishing-attempts-to-steal-credentials-from-chinese-language-users (attempt 1/6)
2025-07-20 14:01:22,619 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/qr-code-phishing-attempts-to-steal-credentials-from-chinese-language-users


   ✅ Successfully scraped article
      Title: QR Code Phishing Attempts to Steal Credentials from Chinese Language Users
      Content length: 6174
   ✅ Approved (Grade: A)
   🔍 Extracted 8 technical indicators
      domain: 5 found
      hash_sha256: 3 found
   Processing article 159/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-playing-whack-a-mole-with-new-crysis-dharma-variants


2025-07-20 14:01:29,670 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-playing-whack-a-mole-with-new-crysis-dharma-variants (attempt 1/6)
2025-07-20 14:01:30,505 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-playing-whack-a-mole-with-new-crysis-dharma-variants


   ✅ Successfully scraped article
      Title: Ransomware Roundup – Playing Whack-a-Mole with New CrySIS/Dharma Variants
      Content length: 7142
   ✅ Approved (Grade: A)
   🔍 Extracted 7 technical indicators
      domain: 1 found
      hash_sha256: 6 found
   Processing article 160/171: https://www.fortinet.com/blog/threat-research/supply-chain-a...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/supply-chain-attack-using-identical-pypi-packages-colorslib-httpslib-libhttps


2025-07-20 14:01:38,474 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/supply-chain-attack-using-identical-pypi-packages-colorslib-httpslib-libhttps (attempt 1/6)
2025-07-20 14:01:38,805 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/supply-chain-attack-using-identical-pypi-packages-colorslib-httpslib-libhttps


   ✅ Successfully scraped article
      Title: Supply Chain Attack Using Identical PyPI Packages, “colorslib”, “httpslib”, and “libhttps”
      Content length: 3692
   ✅ Approved (Grade: A)
   🔍 Extracted 9 technical indicators
      domain: 5 found
      hash_sha256: 3 found
      url: 1 found
   Processing article 161/171: https://www.fortinet.com/blog/threat-research/2022-iot-threa...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/2022-iot-threat-review


2025-07-20 14:01:45,869 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/2022-iot-threat-review (attempt 1/6)
2025-07-20 14:01:46,705 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/2022-iot-threat-review


   ✅ Successfully scraped article
      Title: 2022 IoT Threat Review
      Content length: 10626
   ✅ Approved (Grade: A)
   🔍 Extracted 16 technical indicators
      domain: 1 found
      hash_sha256: 11 found
      cve: 4 found
   Processing article 162/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-monti-blackhunt-and-more


2025-07-20 14:01:54,678 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-monti-blackhunt-and-more (attempt 1/6)
2025-07-20 14:01:55,514 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-monti-blackhunt-and-more


   ✅ Successfully scraped article
      Title: Ransomware Roundup – Monti, BlackHunt, and Putin
      Content length: 7343
   ✅ Approved (Grade: A)
   🔍 Extracted 2 technical indicators
      domain: 2 found
   Processing article 163/171: https://www.fortinet.com/blog/threat-research/trying-to-stea...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/trying-to-steal-christmas-again


2025-07-20 14:02:02,850 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/trying-to-steal-christmas-again (attempt 1/6)
2025-07-20 14:02:03,695 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/trying-to-steal-christmas-again


   ✅ Successfully scraped article
      Title: Trying to Steal Christmas (Again!)
      Content length: 6364
   ✅ Approved (Grade: A)
   🔍 Extracted 10 technical indicators
      domain: 8 found
      hash_sha256: 2 found
   Processing article 164/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-play-ransomware


2025-07-20 14:02:11,088 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-play-ransomware (attempt 1/6)
2025-07-20 14:02:12,094 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-play-ransomware


   ✅ Successfully scraped article
      Title: Ransomware Roundup – Play
      Content length: 9447
   ✅ Approved (Grade: A)
   🔍 Extracted 39 technical indicators
      domain: 1 found
      hash_sha256: 38 found
   Processing article 165/171: https://www.fortinet.com/blog/threat-research/the-taxman-nev...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/the-taxman-never-sleeps


2025-07-20 14:02:18,710 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/the-taxman-never-sleeps (attempt 1/6)
2025-07-20 14:02:19,542 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/the-taxman-never-sleeps


   ✅ Successfully scraped article
      Title: The Taxman Never Sleeps
      Content length: 8036
   ✅ Approved (Grade: A)
   🔍 Extracted 9 technical indicators
      domain: 6 found
      hash_sha256: 3 found
   Processing article 166/171: https://www.fortinet.com/blog/threat-research/new-supply-cha...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/new-supply-chain-attack-uses-python-package-index-aioconsol


2025-07-20 14:02:25,886 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/new-supply-chain-attack-uses-python-package-index-aioconsol (attempt 1/6)
2025-07-20 14:02:30,057 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/new-supply-chain-attack-uses-python-package-index-aioconsol


   ✅ Successfully scraped article
      Title: New Supply Chain Attack Uses Python Package Index “aioconsol”
      Content length: 3093
   ✅ Approved (Grade: A)
   🔍 Extracted 7 technical indicators
      domain: 5 found
      hash_sha256: 2 found


2025-07-20 14:02:36,098 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/supply-chain-attack-via-new-malicious-python-package-shaderz-part-2 (attempt 1/6)


   Processing article 167/171: https://www.fortinet.com/blog/threat-research/supply-chain-a...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/supply-chain-attack-via-new-malicious-python-package-shaderz-part-2


2025-07-20 14:02:36,783 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/supply-chain-attack-via-new-malicious-python-package-shaderz-part-2


   ✅ Successfully scraped article
      Title: Supply Chain Attack via New Malicious Python Package, “shaderz” (Part 2)
      Content length: 3724
   ✅ Approved (Grade: A)
   🔍 Extracted 6 technical indicators
      domain: 4 found
      hash_sha256: 1 found
      url: 1 found
   Processing article 168/171: https://www.fortinet.com/blog/threat-research/want-to-know-w...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/want-to-know-whats-in-that-online-mystery-box-nothing-at-all


2025-07-20 14:02:43,432 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/want-to-know-whats-in-that-online-mystery-box-nothing-at-all (attempt 1/6)
2025-07-20 14:02:43,931 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/want-to-know-whats-in-that-online-mystery-box-nothing-at-all


   ✅ Successfully scraped article
      Title: Want to Know What’s in That Online Mystery Box? NOTHING AT ALL
      Content length: 10972
   ✅ Approved (Grade: C)
   🔍 Extracted 3 technical indicators
      domain: 2 found
      url: 1 found
   Processing article 169/171: https://www.fortinet.com/blog/threat-research/gotrim-go-base...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/gotrim-go-based-botnet-actively-brute-forces-wordpress-websites


2025-07-20 14:02:51,900 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/gotrim-go-based-botnet-actively-brute-forces-wordpress-websites (attempt 1/6)
2025-07-20 14:02:52,740 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/gotrim-go-based-botnet-actively-brute-forces-wordpress-websites


   ✅ Successfully scraped article
      Title: GoTrim: Go-based Botnet Actively Brute Forces WordPress Websites
      Content length: 17823
   ✅ Approved (Grade: A)
   🔍 Extracted 15 technical indicators
      domain: 6 found
      hash_sha256: 8 found
      url: 1 found
   Processing article 170/171: https://www.fortinet.com/blog/threat-research/supply-chain-a...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/supply-chain-attack-new-malicious-python-package-shaderz


2025-07-20 14:03:00,573 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/supply-chain-attack-new-malicious-python-package-shaderz (attempt 1/6)
2025-07-20 14:03:01,415 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/supply-chain-attack-new-malicious-python-package-shaderz


   ✅ Successfully scraped article
      Title: Supply Chain Attack via New Malicious Python Package, “shaderz” (Part 1)
      Content length: 2725
   ✅ Approved (Grade: A)
   🔍 Extracted 5 technical indicators
      domain: 2 found
      hash_sha256: 2 found
      url: 1 found
   Processing article 171/171: https://www.fortinet.com/blog/threat-research/ransomware-rou...

🔍 Scraping Fortinet article: https://www.fortinet.com/blog/threat-research/ransomware-roundup-new-vohuk-scarecrow-and-aerst-variants


2025-07-20 14:03:09,406 - INFO - Attempting to fetch https://www.fortinet.com/blog/threat-research/ransomware-roundup-new-vohuk-scarecrow-and-aerst-variants (attempt 1/6)
2025-07-20 14:03:10,395 - INFO - Successfully scraped: https://www.fortinet.com/blog/threat-research/ransomware-roundup-new-vohuk-scarecrow-and-aerst-variants


   ✅ Successfully scraped article
      Title: Ransomware Roundup – New Vohuk, ScareCrow, and AERST Variants
      Content length: 7362
   ✅ Approved (Grade: A)
   🔍 Extracted 2 technical indicators
      domain: 2 found

📡 Collecting from Symantec...
   Trying page 1/30: https://symantec-enterprise-blogs.security.com/blogs/threat-research


2025-07-20 14:03:17,033 - INFO - Attempting to fetch https://symantec-enterprise-blogs.security.com/blogs/threat-research (attempt 1/6)
2025-07-20 14:03:24,328 - INFO - Successfully scraped: https://symantec-enterprise-blogs.security.com/blogs/threat-research


   Trying page 1/30: https://symantec-enterprise-blogs.security.com/blogs/threat-research


2025-07-20 14:03:25,272 - INFO - Attempting to fetch https://symantec-enterprise-blogs.security.com/blogs/threat-research (attempt 1/6)
2025-07-20 14:03:25,434 - INFO - Successfully scraped: https://symantec-enterprise-blogs.security.com/blogs/threat-research


   Trying page 1/30: https://symantec-enterprise-blogs.security.com/blogs/threat-research


2025-07-20 14:03:33,761 - INFO - Attempting to fetch https://symantec-enterprise-blogs.security.com/blogs/threat-research (attempt 1/6)
2025-07-20 14:03:33,927 - INFO - Successfully scraped: https://symantec-enterprise-blogs.security.com/blogs/threat-research


      No articles found on page 1, stopping pagination
   Total Symantec articles found: 0
   Found 0 potential articles

💾 Saved 170 articles from multiple sources

✅ Export completed successfully!
📊 Exported 170 articles
🎯 2511 unique entities
🔗 2881 relations
📈 Avg threat score: 0.55

📁 Files created:
  llm_training: llm_training_data_20250720_140334.json
  knowledge_graph: knowledge_graph_data_20250720_140334.json
  jsonl_format: threat_intelligence_20250720_140334.jsonl

📊 COLLECTION SUMMARY
⏱️  Duration: 0:26:12.157983
CISA: 0/0 (0.0%)
Fortinet: 170/171 (99.4%)
Symantec: 0/0 (0.0%)

📄 Total Articles Collected: 170
🎯 Average Threat Score: 0.55
🔍 Total Technical Indicators: 2881

📊 Articles per source:
   Fortinet: 170

🎉 Data collection completed successfully!



## 8. Data Export and Format Conversion

Export the collected and validated data in various formats for downstream processing in the LLM-TIKG pipeline.


In [None]:
def export_for_llm_training(articles: List[Dict[str, Any]]):
    """Export data in formats suitable for LLM training and knowledge graph construction."""
    
    if not articles:
        print("❌ No articles to export")
        return
    
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    
    # 1. Training text format (for LLM fine-tuning)
    training_texts = []
    for article in articles:
        text_entry = {
            'id': f"{article['source']}_{hash(article['url']) % 100000}",
            'source': article['source'],
            'title': article['title'],
            'text': article['full_text'],
            'metadata': {
                'url': article['url'],
                'scraped_at': article['scraped_at'],
                'indicators': article.get('indicators', {}),
                'threat_score': article.get('threat_relevance_score', 0)
            }
        }
        training_texts.append(text_entry)
    
    training_file = PROCESSED_DATA_DIR / f'llm_training_data_{timestamp}.json'
    with open(training_file, 'w', encoding='utf-8') as f:
        json.dump(training_texts, f, indent=2, ensure_ascii=False)
    
    # 2. Entity-Relationship format (for knowledge graph)
    entities_and_relations = {
        'entities': [],
        'relations': [],
        'documents': []
    }
    
    for idx, article in enumerate(articles):
        doc_id = f"doc_{idx}"
        
        # Document node
        entities_and_relations['documents'].append({
            'id': doc_id,
            'title': article['title'],
            'source': article['source'],
            'url': article['url'],
            'threat_score': article.get('threat_relevance_score', 0)
        })
        
        # Extract entities from indicators
        indicators = article.get('indicators', {})
        for indicator_type, values in indicators.items():
            for value in values:
                entity_id = f"{indicator_type}_{hash(value) % 100000}"
                
                # Entity
                entities_and_relations['entities'].append({
                    'id': entity_id,
                    'type': indicator_type,
                    'value': value
                })
                
                # Relation
                entities_and_relations['relations'].append({
                    'source': doc_id,
                    'target': entity_id,
                    'relation': 'mentions',
                    'type': indicator_type
                })
    
    # Remove duplicate entities
    seen_entities = set()
    unique_entities = []
    for entity in entities_and_relations['entities']:
        entity_key = (entity['type'], entity['value'])
        if entity_key not in seen_entities:
            seen_entities.add(entity_key)
            unique_entities.append(entity)
    entities_and_relations['entities'] = unique_entities
    
    kg_file = PROCESSED_DATA_DIR / f'knowledge_graph_data_{timestamp}.json'
    with open(kg_file, 'w', encoding='utf-8') as f:
        json.dump(entities_and_relations, f, indent=2, ensure_ascii=False)
    
    # 3. JSONL format (for streaming/batch processing)
    jsonl_file = PROCESSED_DATA_DIR / f'threat_intelligence_{timestamp}.jsonl'
    with open(jsonl_file, 'w', encoding='utf-8') as f:
        for article in articles:
            simplified_article = {
                'title': article['title'],
                'content': article['full_text'],
                'source': article['source'],
                'indicators': article.get('indicators', {}),
                'threat_score': article.get('threat_relevance_score', 0)
            }
            f.write(json.dumps(simplified_article, ensure_ascii=False) + '\n')
    
    # 4. Summary statistics
    export_summary = {
        'export_timestamp': datetime.now().isoformat(),
        'total_articles': len(articles),
        'sources': list(set(article['source'] for article in articles)),
        'total_entities': len(entities_and_relations['entities']),
        'total_relations': len(entities_and_relations['relations']),
        'files_created': {
            'llm_training': str(training_file),
            'knowledge_graph': str(kg_file),
            'jsonl_format': str(jsonl_file)
        },
        'statistics': {
            'avg_threat_score': sum(article.get('threat_relevance_score', 0) for article in articles) / len(articles),
            'total_technical_indicators': sum(sum(len(inds) for inds in article.get('indicators', {}).values()) for article in articles)
        }
    }
    
    summary_file = PROCESSED_DATA_DIR / f'export_summary_{timestamp}.json'
    with open(summary_file, 'w', encoding='utf-8') as f:
        json.dump(export_summary, f, indent=2, ensure_ascii=False)
    
    print("\\n✅ Export completed successfully!")
    print("="*60)
    print(f"📊 Exported {len(articles)} articles")
    print(f"🎯 {len(entities_and_relations['entities'])} unique entities")
    print(f"🔗 {len(entities_and_relations['relations'])} relations")
    print(f"📈 Avg threat score: {export_summary['statistics']['avg_threat_score']:.2f}")
    
    print("\\n📁 Files created:")
    for file_type, file_path in export_summary['files_created'].items():
        print(f"  {file_type}: {Path(file_path).name}")
    
    return export_summary

# Export the collected data
if 'collected_data' in locals() and collected_data:
    export_summary = export_for_llm_training(collected_data)
    print("\\n🎉 Data collection and export pipeline completed!")
else:
    print("⚠️  No data collected. Run the collection cell first.")


## Conclusion and Next Steps

This notebook implements a comprehensive threat intelligence data collection system following the LLM-TIKG methodology. The implementation includes:

### ✅ Key Components

1. **Robust Web Scraping Infrastructure**
   - Rate-limited, error-resilient scrapers with random delays
   - Platform-specific implementations for CISA, Fortinet, and Symantec
   - User agent rotation and retry mechanisms
   - Comprehensive error handling and logging

2. **Advanced Text Processing**
   - Paragraph structure preservation
   - Technical indicator extraction (IPs, domains, hashes, CVEs)
   - Threat relevance scoring
   - Content validation and quality checks

3. **Multiple Export Formats**
   - LLM training-ready JSON format
   - Knowledge graph entities and relations
   - JSONL for streaming processing
   - Statistical summaries and reports

### 🔍 Key Features

- **Quality Assurance**: Automated validation ensures only high-quality data is retained
- **Scalability**: Modular design allows easy addition of new sources
- **Reproducibility**: Comprehensive logging and configuration management
- **Error Handling**: Robust error recovery and fallback mechanisms
- **Rate Limiting**: Intelligent rate limiting with random delays to avoid blocking

### 🚀 Next Steps

1. **Expand Sources**:
   - Add more threat intelligence platforms
   - Implement RSS feed monitoring
   - Add support for API-based sources

2. **Enhance Processing**:
   - Improve entity extraction
   - Add relationship extraction
   - Implement cross-source validation

3. **Optimize Performance**:
   - Implement parallel processing
   - Add caching mechanisms
   - Optimize network requests

4. **Data Quality**:
   - Add more validation rules
   - Implement duplicate detection
   - Add content relevance scoring

This foundation provides the high-quality, structured threat intelligence data required for successful LLM-TIKG implementation.
