## Web scraping
To get values from websites which don't provide an API is often only through scraping. It can be very tricky to get to the right values but this example here should help you to get started. This is very similar to the work-flow the [`scrap` sensor](https://home-assistant.io/components/sensor.scrap/) using.

### Get the value

Importing the needed modules.

In [63]:
import requests
import xmltodict
from bs4 import BeautifulSoup

The URL to scrap the data from. We want to see how many user in our [Gitter chatroom](https://gitter.im/home-assistant/home-assistant) are.

In [64]:
URL = 'https://gitter.im/home-assistant/home-assistant'

In [65]:
raw_html = requests.get(URL).text
raw_data = BeautifulSoup(raw_html, 'html.parser')
data = xmltodict.parse(raw_data.find('body').prettify())

Now you have the complete content inside `<body>...</body>`.

In [66]:
print(data)

OrderedDict([('body', OrderedDict([('@class', 'logged-out'), ('div', OrderedDict([('@class', 'app-layout'), ('div', [OrderedDict([('@class', 'nli-menu__logo'), ('div', OrderedDict([('@class', 'logo-holder logo-animation'), ('div', [OrderedDict([('@class', 'logo-left-arm logo-animation')]), OrderedDict([('@class', 'logo-body-left logo-animation')]), OrderedDict([('@class', 'logo-body-right logo-animation')]), OrderedDict([('@class', 'logo-right-arm logo-animation')])])])), ('img', OrderedDict([('@class', 'nli-menu__logo-text'), ('@src', '//cdn03.gitter.im/_s/8fb437e/images/gitter-vector-logo.svg')]))]), OrderedDict([('@class', 'iframe-region'), ('@id', 'iframe-region'), ('iframe', OrderedDict([('@frameborder', '0'), ('@id', 'content-frame'), ('@src', '/home-assistant/home-assistant/~chat#initial'), ('@style', 'display: block; width: 100%; height: 100%; border: 0;')]))])]), ('nav', OrderedDict([('@class', 'nli-menu'), ('h1', OrderedDict([('@class', 'nli-menu__header'), ('#text', 'Where c

Start with the first iteration. It will be `'body'` and go down till you reach the key with your data. Here we are looking for the Gitter Javascript. At the end trim the string.

In [75]:
print(data['body']['script'][0]['#text'][2086:2090])

1617


This is the number of the current users in our [Gitter chatroom](https://gitter.im/home-assistant/home-assistant).

Note that the [`scrap` sensor](https://home-assistant.io/components/sensor.scrap/) is doing a search in the whole directory. Thus we would go directly for `['#text']`.

### Send the value to the Home Assistant frontend
The ["Using the Home Assistant Python API"](http://nbviewer.jupyter.org/github/home-assistant/home-assistant-notebooks/blob/master/home-assistant-python-api.ipynb) notebooks contains an intro to the [Python API](https://home-assistant.io/developers/python_api/) of Home Assistant and Jupyther notebooks. Here we are sending the scrapped value to the Home Assistant frontend.

In [76]:
import homeassistant.remote as remote

HOST = '127.0.0.1'
PASSWORD = 'YOUR_PASSWORD'

api = remote.API(HOST, PASSWORD)

In [77]:
new_state = data['body']['script'][0]['#text'][2086:2090]
attributes = {
  "friendly_name": "Gitter Users",
  "unit_of_measurement": "Count"
}
remote.set_state(api, 'sensor.gitter_users', new_state=new_state, attributes=attributes)

True