Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add news scraper for Santa Clara county #64

Closed
Mr0grog opened this issue May 27, 2020 · 0 comments · Fixed by #70
Closed

Add news scraper for Santa Clara county #64

Mr0grog opened this issue May 27, 2020 · 0 comments · Fixed by #70
Labels
enhancement New feature or request news Related to scraping news (rather than data)

Comments

@Mr0grog
Copy link
Collaborator

Mr0grog commented May 27, 2020

News scrapers live in the news directory. You can follow the San Francisco scraper as an example.

Santa Clara County

  1. The COVID-19 homepage has a list of announcements near the bottom we can scrape (no equivalent RSS I can find).

  2. The Public Health Department has a newsroom we can scrape. Can’t find any RSS or Atom feeds for it. :\

  3. The Office of Public Affairs also has a newsroom of the same format with slightly broader coverage. As far as I can tell, though, the Public Health Department one pretty well covers all the coronavirus-related stuff.

  4. There are some SOAP services linked from the COVID-19 page, but they seem to require authentication to access.

@Mr0grog Mr0grog added enhancement New feature or request news Related to scraping news (rather than data) labels May 27, 2020
Mr0grog added a commit that referenced this issue May 30, 2020
This news page is populated at runtime via JavaScript, so we are using Selenium to load it.

Fixes #64.
Mr0grog added a commit that referenced this issue Jun 1, 2020
Unfortunately, the page we're scraping gets populated at runtime via JavaScript, so, like Alameda, I wound up using Selenium. This also fixes missing support for tags in our RSS output (the news items here have “categories,” like “press release” or “announcement,” and this code sets those as tags on each news item).

Fixes #64.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request news Related to scraping news (rather than data)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant