fix: WestSuffolkCouncil — User-Agent + renamed h4 heading by InertiaUK · Pull Request #1978 · robbrad/UKBinCollectionData

InertiaUK · 2026-04-22T10:15:28Z

Two upstream changes on the West Suffolk bin-day page had left the scraper silently failing (empty result, no exception):

User-Agent now required. maps.westsuffolk.gov.uk's IIS returns 404 to requests with no User-Agent. Sending a realistic browser UA fixes the 404.
Collection-panel heading renamed. The <h4> that sits above the bin panel was renamed from Bin collection days to Bin collection days current. The existing scraper used find_all("h4", string="Bin collection days") which is an exact match, so the panel filter never matched and the result came back empty. Matching as a substring via a lambda covers either label.

Verification

Tested live against UPRN 10009739960 (1 The Drift, Culford, IP28 6DR):

{"bins": [
  {"type": "Black Bins", "collectionDate": "29/04/2026"},
  {"type": "Blue Bins",  "collectionDate": "22/04/2026"},
  {"type": "Brown Bins", "collectionDate": "22/04/2026"}
]}

Also confirmed via the Kepthouse /resolve-v2 API (lat/lng → UPRN → council → scraper) — returned status: ok with the bin list in ~22 s on cold Selenium grid.

Supersedes #1955

#1955 only addressed the User-Agent side of this and was based on a branch that accidentally picked up four unrelated stacked council fixes. This PR is a clean standalone against current master with both the UA fix and the h4 rename fix in one commit. Happy to close #1955 once this merges.

Summary by CodeRabbit

Bug Fixes
- Improved bin collection schedule parsing to handle variations in data formatting across different council responses.
- Enhanced HTTP request handling with improved browser compatibility headers to ensure more reliable data retrieval of your collection information.

Two small upstream changes have been breaking the West Suffolk scraper: 1. westsuffolk.gov.uk's IIS now 404s requests with no User-Agent. Send a realistic browser UA so the bin-day page returns 200 with the collection panel. 2. The council renamed the collection panel heading from "Bin collection days" to "Bin collection days current", which broke the exact-string `find_all(..., string="Bin collection days")` guard and caused an empty result. Match as a substring via a lambda so either label works. Verified against live site for UPRN 10009739960 (IP28 6DR) — returns Black / Blue / Brown collection dates. Supersedes the earlier WestSuffolk fix in robbrad#1955 (that branch also picked up unrelated commits from stacked council fixes and only addressed part of the problem).

coderabbitai · 2026-04-22T10:15:41Z

📝 Walkthrough

Walkthrough

Updated the WestSuffolkCouncil scraper to include a realistic User-Agent header in HTTP requests and relaxed HTML matching logic to handle header label variations, enabling the parser to correctly retrieve bin collection data from the API endpoint.

Changes

Cohort / File(s)	Summary
User-Agent Header & HTML Matching `uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py`	Added User-Agent header to `requests.get()` call to restore full response from IIS endpoint; relaxed `panel_search` HTML matching from exact string equality to substring check to accommodate header label changes.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Possibly related PRs

PR #1733: Applies identical User-Agent header addition pattern to a different council scraper module.

Suggested reviewers

dp247

Poem

🐰 A browser's voice was all we lacked,
IIS turned our requests back.
With User-Agent's gentle call,
The bins now dance and never fall! 📦✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the two main changes: adding a User-Agent header and handling the renamed h4 heading in WestSuffolkCouncil.
Linked Issues check	✅ Passed	The pull request implements both requirements from `#1955`: adding a User-Agent header to requests and handling the renamed h4 heading for robust parsing.
Out of Scope Changes check	✅ Passed	All changes are directly related to fixing the WestSuffolkCouncil scraper; no unrelated modifications were introduced outside the stated objectives.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py (1)

45-53: ⚠️ Potential issue | 🟠 Major

Raise when the collection panel is missing.

The relaxed heading match is good, but if the panel still is not found, this returns {"bins": []} and reintroduces the silent-empty failure mode this PR is fixing.

Proposed fix

-        collection_tag = soup.body.find_all(panel_search)
+        if soup.body is None:
+            raise ValueError("West Suffolk response did not contain a <body> element")
+
+        collection_tag = soup.body.find_all(panel_search)
+        if not collection_tag:
+            raise ValueError("West Suffolk bin collection panel not found")

Based on learnings, parsing council bin collection data should prefer explicit failures over silent defaults or swallowed errors.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py` around
lines 45 - 53, The panel search currently falls back to returning no data when
the collection panel isn't found; after computing collection_tag =
soup.body.find_all(panel_search) you must detect an empty result and raise an
explicit exception (e.g., RuntimeError or a custom ParseError) instead of
letting the code continue and return {"bins": []}; update the code around
collection_tag and the panel_search helper so that if not collection_tag you
raise a clear error indicating the WestSuffolkCouncil collection panel is
missing, referencing panel_search and collection_tag so reviewers can find the
change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py`:
- Line 26: The HTTP fetch in WestSuffolkCouncil (where requests.get(api_url,
headers=headers) is called) must include a timeout and validate the response
before parsing: add a timeout argument to requests.get (e.g. timeout=10) and
call response.raise_for_status() immediately after the request and before
creating the BeautifulSoup object; ensure this mirrors the pattern used in
BelfastCityCouncil (use requests.get, response.raise_for_status(), then
BeautifulSoup) so 4xx/5xx responses raise and are not parsed silently.

---

Outside diff comments:
In `@uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py`:
- Around line 45-53: The panel search currently falls back to returning no data
when the collection panel isn't found; after computing collection_tag =
soup.body.find_all(panel_search) you must detect an empty result and raise an
explicit exception (e.g., RuntimeError or a custom ParseError) instead of
letting the code continue and return {"bins": []}; update the code around
collection_tag and the panel_search helper so that if not collection_tag you
raise a clear error indicating the WestSuffolkCouncil collection panel is
missing, referencing panel_search and collection_tag so reviewers can find the
change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 62e18e1d-002a-49ca-a399-c6b75f914480

📥 Commits

Reviewing files that changed from the base of the PR and between 60bd3cc and c25dba4.

📒 Files selected for processing (1)

uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py

coderabbitai · 2026-04-22T10:18:17Z

+                "(KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36"
+            )
+        }
+        response = requests.get(api_url, headers=headers)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash rg -nP --type=python -C3 'requests\.get\(api_url, headers=headers|raise_for_status\(|timeout='

Repository: robbrad/UKBinCollectionData

Length of output: 101

🏁 Script executed:

cat -n uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py

Repository: robbrad/UKBinCollectionData

Length of output: 4403

🏁 Script executed:

rg -nA5 -B5 'requests\.get.*api_url' uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py

Repository: robbrad/UKBinCollectionData

Length of output: 496

🏁 Script executed:

# Check what other council parsers do with HTTP error handling rg -l 'raise_for_status' uk_bin_collection/uk_bin_collection/councils/ | head -5

Repository: robbrad/UKBinCollectionData

Length of output: 420

🏁 Script executed:

# Check timeout usage patterns in council parsers rg 'timeout=' uk_bin_collection/uk_bin_collection/councils/ | head -10

Repository: robbrad/UKBinCollectionData

Length of output: 1241

🏁 Script executed:

rg -B5 -A5 'raise_for_status' uk_bin_collection/uk_bin_collection/councils/BelfastCityCouncil.py | head -20

Repository: robbrad/UKBinCollectionData

Length of output: 894

🏁 Script executed:

# Check if there's a base class or common pattern for HTTP requests rg -B3 -A3 'requests\.get.*timeout' uk_bin_collection/uk_bin_collection/councils/BlackpoolCouncil.py | head -20

Repository: robbrad/UKBinCollectionData

Length of output: 53

Add timeout and HTTP status validation before parsing the response.

The User-Agent header fixes the immediate 404, but this still parses any 4xx/5xx/error response silently and returns empty results. Add both a timeout and raise_for_status() before BeautifulSoup, consistent with patterns used in other council parsers like BelfastCityCouncil.

Proposed fix

- response = requests.get(api_url, headers=headers) + response = requests.get(api_url, headers=headers, timeout=30) + response.raise_for_status()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

response = requests.get(api_url, headers=headers)

response = requests.get(api_url, headers=headers, timeout=30)

response.raise_for_status()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py` at line 26, The HTTP fetch in WestSuffolkCouncil (where requests.get(api_url, headers=headers) is called) must include a timeout and validate the response before parsing: add a timeout argument to requests.get (e.g. timeout=10) and call response.raise_for_status() immediately after the request and before creating the BeautifulSoup object; ensure this mirrors the pattern used in BelfastCityCouncil (use requests.get, response.raise_for_status(), then BeautifulSoup) so 4xx/5xx responses raise and are not parsed silently.

blackandwhitetux · 2026-04-25T12:32:05Z

Works for me.

robbrad · 2026-05-01T20:22:16Z

Included in May 2026 Release PR #1992. Closing.

…esolved conflict with robbrad#1955)

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

robbrad mentioned this pull request May 1, 2026

May 2026 Release #1992

Merged

robbrad closed this May 1, 2026

pull Bot pushed a commit to mrw298/UKBinCollectionData that referenced this pull request May 1, 2026

Merge PR robbrad#1978 - WestSuffolkCouncil User-Agent + h4 heading (r…

e39cb7c

…esolved conflict with robbrad#1955)

robbrad mentioned this pull request May 2, 2026

May 2026 Release #1996

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: WestSuffolkCouncil — User-Agent + renamed h4 heading#1978

fix: WestSuffolkCouncil — User-Agent + renamed h4 heading#1978
InertiaUK wants to merge 1 commit intorobbrad:masterfrom
InertiaUK:fix/west-suffolk-ua-and-header

InertiaUK commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

blackandwhitetux commented Apr 25, 2026

Uh oh!

robbrad commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	response = requests.get(api_url, headers=headers)
	response = requests.get(api_url, headers=headers, timeout=30)
	response.raise_for_status()

Conversation

InertiaUK commented Apr 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verification

Supersedes #1955

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

blackandwhitetux commented Apr 25, 2026

Uh oh!

robbrad commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

InertiaUK commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading