Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
31 lines (18 sloc) 1.13 KB

4. Webscraping

Date: 2019-06-27

Architecture issue: #252

Status

Accepted

Context

Webscraping is when we use code to mimic a user and log in to a website and get data in Home Assistant. This is usually needed because certain data sources/integrations do not offer an API.

Webscraping comes with the following downsides:

  • Very fragile, break often. When the website is updated, the integration will need to be updated.
  • Some vendors (like USPS) have IP banned users of such integrations
  • Some rely on beautifulsoup (Python-based), others are relying on PhantomJS or other headless browsers, meaning we need to include a whole browser.

Proposal

  • We no longer accept any new integration that relies on webscraping
  • We identify, deprecate for 2 releases and remove integrations that rely on webscraping
  • It will still be possible to have custom integrations provide information via webscraping
  • Generic integration to parse HTML are excluded from this decision

Consequences

Integrations that rely on webscraping will have to be maintained as custom integrations.

You can’t perform that action at this time.