Description
I am using the MIT community version of Onyx and hosting in my own server.
Onyx version: v0.27.0-beta.1
When using the Web Connector with sitemap as the Scrape Method, the system reindexes all pages on each refresh attempt instead of only processing new or changed documents. This leads to unnecessary embedding token usage.
- The connector is configured to refresh once per day (1440 minutes)
- Each indexing attempt processes all documents (~5400+) instead of just new/changed ones; See the number of Total Docs
- Other connector types (Google Drive, Wikipedia) properly perform incremental indexing
- This behaviour significantly increases embedding token consumption, since I am using APIs to call Cohere embedding models

Questions
- Is this behavior specific to the sitemap Scrape Method or does it affect all Web Connector methods (recursive, single, sitemap)?
- Is there a way to configure the Web Connector to perform true incremental indexing?
- If not possible currently, can this feature be implemented to reduce token usage?