# Example for website scraping

In this specific example, we are looking for Dutch nature areas with a certain lookup word in it. The process is as follows:
1. Parse a list with urls of all nature areas from www.natura2000.nl/gebieden.
2. Retrieve all text from each of the nature pages.
3. Search for the lookup word in the text and return those areas that contain the word.

Scrape and parse using the `requests` and `BeautifulSoup` libraries.

In [None]:
from bs4 import BeautifulSoup
import requests

We are scraping the natura2000 website which has an overview of all nature areas in the Netherlands.

In [None]:
URL = 'https://www.natura2000.nl'

lookup_word = ' vijf '

### 1. Get urls of all nature areas

In [None]:
page = requests.get(URL + '/gebieden')
soup = BeautifulSoup(page.content, "html.parser")

In [None]:
gebieden_urls = [gebied.find('a')['href'] for gebied in 
                 soup.find_all("li", class_="gebieden-row")]

### 2. Parse text on each nature page

In [None]:
gebieden_pages = [
    BeautifulSoup(requests.get(URL + gebied_url).content, "html.parser") 
    for gebied_url in gebieden_urls
]

In [None]:
# concatenate all paragraphs on a page into one big string
gebieden_text = [
    " ".join(
        [textbox.text for textbox in 
         gebied_page.find_all("div", class_="field field--name-field-body content-item")]
    ).replace('\n', ' ')
    for gebied_page in gebieden_pages
]

### 3. Return areas with the lookup word in it

In [None]:
gebieden_met_vijf = [
    lookup_word in gebied_text 
    for gebied_text in gebieden_text
]

# list urls of nature areas with the lookup_word in it
[
    URL + gebieden_urls[i] for i in 
    [i for i, x in enumerate(gebieden_met_vijf) if x]
]

Done.