Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add news scraper for Sonoma county #67

Closed
Mr0grog opened this issue May 27, 2020 · 0 comments · Fixed by #77
Closed

Add news scraper for Sonoma county #67

Mr0grog opened this issue May 27, 2020 · 0 comments · Fixed by #77
Labels
enhancement New feature or request news Related to scraping news (rather than data)

Comments

@Mr0grog
Copy link
Collaborator

Mr0grog commented May 27, 2020

News scrapers live in the news directory. You can follow the San Francisco scraper as an example.

Sonoma County

  1. The county home page has 3 recent news items, but they are just a subset of the news page (2) (and not the most recent 3 items, either!) https://sonomacounty.ca.gov/Home/

  2. The county news page has a more complete list: https://sonomacounty.ca.gov/News/

    Some entries are repeated in Spanish; I’m not sure there’s an obvious way to determine which ones are different language versions of the same page. The topics are somewhat broad, going beyond COVID-related stuff.

  3. The county emergency site has a Coronavirus news page: https://socoemergency.org/emergency/novel-coronavirus/latest-news/

  4. The county emergency site has a public health orders page: https://socoemergency.org/emergency/novel-coronavirus/health-orders/

  5. The county emergency site has an RSS feed, which appears to be about any/every page on the site. This also means it has the same pages repeated in English and Spanish in a way that’s hard to put together, like (2) above. https://socoemergency.org/feed/

The fact that (5) appears to show every page on the emergency site as it gets added or updated might be useful, but also seems like not quite the right fit.

(2) is more friendly (e.g. it has the headline “Sonoma County Public Health Officer Amends Shelter in Place Order to Allow Additional Businesses to Reopen” while the emergency site has the headline “Amendment No. 3 to Health Order No. C19-09” for the same news item), but covers a lot of non-COVID stuff.

I’m thinking combining (3) and (4) is probably the way to go here.

@Mr0grog Mr0grog added enhancement New feature or request news Related to scraping news (rather than data) labels May 27, 2020
Mr0grog added a commit that referenced this issue Jun 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request news Related to scraping news (rather than data)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant