Skip to content

SmartCrawler 0.4.1

Latest

Choose a tag to compare

@github-actions github-actions released this 09 Jul 03:38
· 3 commits to main since this release
  • chore: bump version to 0.4.1
  • Merge pull request #65 from pixlie/chore/update-readme-docs
  • docs: move development and contributing info to separate guide
  • chore: restructure documentation with docs/ folder and OS-specific guides
  • Merge pull request #63 from pixlie/feature/template-detection
  • fix: resolve clippy warnings for uninlined format args
  • chore: apply code formatting with cargo fmt
  • feat: disable domain duplicate filtering in template mode
  • feat: apply template detection before domain duplicate analysis
  • feat: implement template detection for variable and text concatenation patterns
  • Merge pull request #60 from pixlie/feature/real-world-tests
  • fix: resolve clippy warnings in real-world tests
  • style: apply cargo fmt formatting
  • feat: enhance real-world tests with complete SmartCrawler pipeline
  • fix: configure real-world tests to run serially
  • feat: add real-world integration tests (#56)
  • Merge pull request #59 from pixlie/fix/duplicate-root-url-loading
  • fix: prevent duplicate root URL loading (#58)
  • Merge pull request #57 from pixlie/feature/filter-domain-duplicate-html-nodes
  • feat: prioritize root URLs and improve element ID handling
  • fix: remove overly aggressive page-level duplicate filtering
  • feat: improve duplicate detection to include complete element structure
  • fix: improve domain duplicate filtering logic
  • feat: add domain-level duplicate HTML node filtering
  • Merge pull request #55 from pixlie/feature/project-restart-step-one
  • Formatting fixes
  • chore: fix formatting and linting issues
  • fix: change default WebDriver port from 9515 to 4444
  • fix: initialize rustls crypto provider to prevent runtime error
  • feat: project restart with WebDriver-based crawler
  • feat: include single-item groups in grouped data detection
  • Merge pull request #50 from pixlie/feature/grouped-data-extraction
  • fix: show complete path including grouped elements in path display
  • refactor: improve grouped data path display with full CSS selectors
  • fix: eliminate duplicates in grouped data detection
  • feat: implement grouped data detection with --grouped CLI option
  • Merge pull request #48 from brainless/feature/data-extraction
  • fix: resolve clippy warning in extractor module
  • feat: implement data extraction mode with --extract CLI option
  • Merge pull request #47 from brainless/chore/remove-html-cleaning
  • chore: remove HTML cleaning functionality
  • Merge pull request #46 from brainless/chore/remove-structured-content
  • Formatting fixes
  • chore: remove StructuredContent and extract_structured_data
  • Minor changes to Claude.md
  • Merge pull request #40 from brainless/feature/improve-html-cleaner
  • feat: extend image filtering to handle SVG elements (issue #39)
  • feat: improve HTML cleaner with image filtering and comment removal (fixes #39)
  • Merge pull request #38 from brainless/feature/clean-html-url-support
  • fix: initialize crypto provider in browser URL test
  • feat: use browser/webdriver for URL fetching in --clean-html mode
  • feat: add URL support to --clean-html mode (fixes #37)