Skip to content

[0.4] Fix excluded tags lookup to use correct key type (#417)#418

Merged
seanstory merged 1 commit into0.4from
backport/0.4/pr-417
Feb 6, 2026
Merged

[0.4] Fix excluded tags lookup to use correct key type (#417)#418
seanstory merged 1 commit into0.4from
backport/0.4/pr-417

Conversation

@github-actions
Copy link

@github-actions github-actions bot commented Feb 6, 2026

Backports the following commits to 0.4:

### Closes #416

The `exclude_tags` configuration was not being applied correctly. The
config stores exclude_tags keyed by domain URL strings (e.g.,
`"https://example.com"`), but the lookup in `get_body_tag` was using the
URL object directly as the hash key instead of `url.site`.

This fix changes the lookup to use `url.site` (which returns the scheme
+ host as a string) to match how the config stores the keys.

### Checklists

#### Pre-Review Checklist
- [x] This PR does NOT contain credentials of any kind, such as API keys
or username/passwords (double check `crawler.yml.example` and
`elasticsearch.yml.example`)
- [x] This PR has a meaningful title
- [x] This PR links to all relevant GitHub issues that it fixes or
partially addresses
    - Fixes #416
- [x] this PR has a thorough description
- [x] Covered the changes with automated tests
- [ ] Tested the changes locally
- [x] Added a label for each target release version (example: `v0.1.0`)
- [x] Considered corresponding documentation changes
    - N/A - this is a bug fix, no documentation changes needed
- [x] Contributed any configuration settings changes to the
configuration reference
    - N/A - no configuration changes
- [x] Ran `make notice` if any dependencies have been added
    - N/A - no dependencies added

#### Changes Requiring Extra Attention

N/A - This is a straightforward bug fix with no security implications or
new dependencies.

### Release Note

Fixed `exclude_tags` domain configuration not being applied during
crawl. Tags specified in `exclude_tags` for a domain are now correctly
excluded from the document body.
@github-actions github-actions bot requested a review from a team as a code owner February 6, 2026 20:48
@seanstory seanstory enabled auto-merge (squash) February 6, 2026 21:12
@seanstory seanstory merged commit 23bffb5 into 0.4 Feb 6, 2026
2 checks passed
@seanstory seanstory deleted the backport/0.4/pr-417 branch February 6, 2026 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant