## Nature trails
The eighty nature trails that are available from https://www.kurzemesregions.lv/en/projects/tourism/unigreen/nature-trails/ encompass the area of Kurzeme (a region in Latvia), as well as some places in Lithuania.
<br>
For our upcoming trip I wanted to get the coordinates for all of the trails and put them in a `DataFrame` (with columns `Name`, `Lat`, `Lon`) to further visualise them on a map.


In [57]:
# libs
import re
import pandas as pd

import requests
# import urllib.request
import time
from bs4 import BeautifulSoup
from time import sleep

import folium
from folium.plugins import MarkerCluster


In [25]:
def get_trail_names(url):
    """
    Function to get all 80 trail names from the Kurzemesregions website.
    
    Parameters:
    url (string): url to be webscraped from
    
    Returns:
    all_trails (list): trail names from 'Brocēni forest park' (first) to 'Walking paths at Remte manor park' (last)
    """
    # connect to the URL
    response = requests.get(url)
    # parse HTML and save to BeautifulSoup object
    soup = BeautifulSoup(response.text, "html.parser")
    # we have 80 trails in the positions 2 to 82
    trails = soup.findAll('p')[2:82]
    all_trails = []
    for trail in trails:
        all_trails.append(trail.contents[0])
    return all_trails

def get_trail_urls(url):
    """
    Function to get all 80 trail URLs from the Kurzemesregions website.
    
    Parameters:
    url (string): url to be webscraped from
    
    Returns:
    all_urls (list): trail URLs from 'Brocēni forest park' (first) to 'Walking paths at Remte manor park' (last)
    """
    # connect to the URL
    response = requests.get(url)
    # parse HTML and save to BeautifulSoup object
    soup = BeautifulSoup(response.text, "html.parser")
    all_hrefs = []
    for link in soup.find_all('a'):
        all_hrefs.append(link.get('href'))
    all_urls = all_hrefs[19:99]
    return all_urls

def get_lat_lon_list(urls):
    """
    Function to get a list of all 80 trail coordinates.
    Includes dirty data.
    
    Parameters:
    urls (list): url list to be webscraped from
    
    Returns:
    lat_list (list): all latitudes
    lon_list (list): all longitudes
    """
    lat_list = []
    lon_list = []
    for url in urls:
        # connect to the URL
        response = requests.get(url)
        # parse HTML and save to BeautifulSoup object
        soup = BeautifulSoup(response.text, "html.parser")

        # find latitude, add it to lat_list
        lat = soup.body.findAll(text=re.compile('Lat[:\s]'), limit=1)[0]
        # if written US style (decimal with period)
        if "," not in lat[5:8]:
            # only digits and period remain in lat
            lat = re.sub("[^0123456789\.]","",lat[0:13])[0:9] # only check first part of lat string
        # if written LV style (decimal with comma)
        else:
            lat = re.sub("[^0123456789\.]","",lat[0:13])[0:9]
            lat = lat[0:2] + '.' + lat[2:]
        lat_list.append(lat)

        # find longitude, add it to lon_list     
        lon = soup.body.findAll(text=re.compile('Lon[:\s]'), limit=1)[0]
        # if written US style (decimal with period)
        if "," not in lon[5:8]:
            # only digits and period remain in lon
            lon = re.sub("[^0123456789\.]","",lon)[-8:]
        # if written LV style (decimal with comma)
        else:
            lon = re.sub("[^0123456789\.]","",lon)[-7:]
            lon = lon[0:2] + '.' + lon[2:]   
        lon_list.append(lon)
        sleep(1)
    return lat_list, lon_list
    
    


In [26]:
# set the URL to webscrape from
url = 'https://www.kurzemesregions.lv/en/projects/tourism/unigreen/nature-trails/'
print(get_trail_names(url))

['Brocēni forest park', 'Ālande River Park trail', 'Horse island nature trail and Lake Liepāja boardwalk', 'Žibgrava health trail', 'Virsaiši waterfall nature trail', 'Usma Elkrags nature trail', 'Skrunda walking trail', 'Sātiņi ponds', 'Roja beach trails', 'Pēterezers nature trail', 'Mežtaka - Jūrtaka nature trail', 'Kalēti forest park trails', 'Embūte nature trail', 'Dvēseles veldzes dārzs', 'Durbe castle mound trail and Durbe Lake watching tower', 'Dunika bog trail', 'Bernāti nature trail', "Cīrava beavers' trail", 'Lake Būšnieki nature trail', 'Pārventa walking trail', 'Aizvīķi park', 'Aukštumala cognitive trail', 'Biking through the Lakeside', 'Biržai town pedestrian and bicycle trail', 'Cognitive path in the forest of Šipelke or Dauginčiai', 'Čužu mire trail', 'Drubazas botanical trail', 'Ecological cognitive path "Litorina"', 'Ēvaži nature trail', 'Fairy-tale forest trail', 'Great Ķemeri bog boardwalk', 'Horse trail and Love trail', 'Jāņupīte nature trail', 'Kolka pine trail', '

In [27]:
trail_names = get_trail_names(url)
trail_names

['Brocēni forest park',
 'Ālande River Park trail',
 'Horse island nature trail and Lake Liepāja boardwalk',
 'Žibgrava health trail',
 'Virsaiši waterfall nature trail',
 'Usma Elkrags nature trail',
 'Skrunda walking trail',
 'Sātiņi ponds',
 'Roja beach trails',
 'Pēterezers nature trail',
 'Mežtaka - Jūrtaka nature trail',
 'Kalēti forest park trails',
 'Embūte nature trail',
 'Dvēseles veldzes dārzs',
 'Durbe castle mound trail and Durbe Lake watching tower',
 'Dunika bog trail',
 'Bernāti nature trail',
 "Cīrava beavers' trail",
 'Lake Būšnieki nature trail',
 'Pārventa walking trail',
 'Aizvīķi park',
 'Aukštumala cognitive trail',
 'Biking through the Lakeside',
 'Biržai town pedestrian and bicycle trail',
 'Cognitive path in the forest of Šipelke or Dauginčiai',
 'Čužu mire trail',
 'Drubazas botanical trail',
 'Ecological cognitive path "Litorina"',
 'Ēvaži nature trail',
 'Fairy-tale forest trail',
 'Great Ķemeri bog boardwalk',
 'Horse trail and Love trail',
 'Jāņupīte natu

In [28]:
# check that there are 80 URLs
trail_urls = get_trail_urls(url)
len(trail_urls)

80

In [29]:
latitudes, longitudes = get_lat_lon_list(trail_urls)

In [31]:
latitudes

['56.66974',
 '56.53513',
 '56.51718',
 '56.98428',
 '57.09522',
 '57.20721',
 '56.67338',
 '56.62741',
 '57.50994',
 '57.65414',
 '57.32788',
 '56.35929',
 '56.50560',
 '56.727804',
 '56.59782',
 '56.28120',
 '56.32893',
 '56.73578',
 '57.45294',
 '56.97075',
 '56.334407',
 '55.38786',
 '55.32893',
 '56.20173',
 '55.95135',
 '57.01956',
 '57.03040',
 '55.79894',
 '57.67759',
 '57.02073',
 '56.91661',
 '57.02451',
 '57.11190',
 '57.75630',
 '56.06945',
 '56.90632',
 '56.45871',
 '57.44570',
 '57.45806',
 '56.99197',
 '56.73229',
 '56.99763',
 '56.72480',
 '56.99165',
 '55.26376',
 '56.15068',
 '56.50629',
 '56.97058',
 '55.33182',
 '56.12722',
 '57.68406',
 '56.12609',
 '57.25805',
 '56.04684',
 '56.04229',
 '57.57676',
 '57.37181',
 '56.97027',
 '56.64482',
 '56.80334',
 '56.96522',
 '56.04841',
 '55.38102',
 '57.62650',
 '57.24411',
 '57.50824',
 '56.20586',
 '56.24849',
 '56.15066',
 '56.20241',
 '57.08240',
 '55.35644',
 '57.35689',
 '56.96741',
 '56.82316',
 '57.38586',
 '56.43446

In [32]:
longitudes

['22.56860',
 '21.16260',
 '21.03084',
 '21.57882',
 '22.54177',
 '22.15261',
 '22.02398',
 '22.34980',
 '22.79975',
 '22.27086',
 '23.12750',
 '21.48929',
 '21.81922',
 '21.09308',
 '21.34384',
 '01.38282',
 '20.99273',
 '21.38335',
 '21.62905',
 '21.98025',
 '1.724638',
 '21.35243',
 '26.10224',
 '24.75408',
 '21.53947',
 '22.77878',
 '22.59662',
 '21.06731',
 '22.56800',
 '22.64499',
 '23.46481',
 '22.59271',
 '23.04368',
 '22.59131',
 '21.06473',
 '22.39933',
 '22.11609',
 '22.90463',
 '22.85598',
 '23.42601',
 '21.73375',
 '23.47608',
 '21.73576',
 '23.52354',
 '25.82226',
 '21.03084',
 '20.99717',
 '21.97934',
 '26.10571',
 '21.82533',
 '22.31742',
 '21.59004',
 '23.13639',
 '421.8858',
 '21.81529',
 '22.62746',
 '22.52713',
 '21.34781',
 '21.88786',
 '22.53467',
 '23.46955',
 '21.82736',
 '25.97555',
 '22.29191',
 '22.59070',
 '22.78451',
 '24.69287',
 '24.69088',
 '21.03089',
 '24.75562',
 '22.37690',
 '21.35374',
 '22.15348',
 '21.97676',
 '22.97218',
 '21.53668',
 '21.84496',

I expect that each <b>`latitude` starts with "5"</b> and each <b>`longitude` starts with "2"</b>.

In [33]:
def get_wrong_latitudes(lat_list):
    """
    Function to get the wrong latitudes from the latitude list.
    
    Parameters:
    lat_list (list): all latitudes
    
    Returns:
    wrong_latitudes (list): faulty latitudes
    """
    wrong_latitudes = []
    for lat in lat_list:
        if lat[0] != "5":
            wrong_latitudes.append(lat)
    return wrong_latitudes
        
    
    
def get_wrong_longitudes(lon_list):
    """
    Function to get the wrong longitudes from the longitude list.
    
    Parameters:
    lon_list (list): all longitudes
    
    Returns:
    wrong_longitudes (list): faulty longitudes
    """
    wrong_longitudes = []
    for lon in lon_list:
        if lon[0] != "2":
            wrong_longitudes.append(lon)
    return wrong_longitudes
    

In [34]:
wrong_lat_list = get_wrong_latitudes(latitudes)
wrong_lat_list

[]

In [35]:
wrong_lon_list = get_wrong_longitudes(longitudes)
wrong_lon_list

['01.38282', '1.724638', '421.8858']

So far I have:
<br> `trail_names`, `latitudes`, `longitudes`. All of these will be part of a `DataFrame` with a name `trail_df`.
<br> `longitudes` list still has some dirty data which I will clean manually now.
<br> I will change the following values:
<br> '01.38282' should be '21.38282'
<br> '1.724638' should be '21.724638'
<br> '421.8858' should be '21.8858'

In [36]:
ind1 = longitudes.index('01.38282')
ind2 = longitudes.index('1.724638')
ind3 = longitudes.index('421.8858')

In [37]:
longitudes[ind1] = '21.38282'
longitudes[ind2] =  '21.724638'
longitudes[ind3] = '21.8858'

In [38]:
longitudes

['22.56860',
 '21.16260',
 '21.03084',
 '21.57882',
 '22.54177',
 '22.15261',
 '22.02398',
 '22.34980',
 '22.79975',
 '22.27086',
 '23.12750',
 '21.48929',
 '21.81922',
 '21.09308',
 '21.34384',
 '21.38282',
 '20.99273',
 '21.38335',
 '21.62905',
 '21.98025',
 '21.724638',
 '21.35243',
 '26.10224',
 '24.75408',
 '21.53947',
 '22.77878',
 '22.59662',
 '21.06731',
 '22.56800',
 '22.64499',
 '23.46481',
 '22.59271',
 '23.04368',
 '22.59131',
 '21.06473',
 '22.39933',
 '22.11609',
 '22.90463',
 '22.85598',
 '23.42601',
 '21.73375',
 '23.47608',
 '21.73576',
 '23.52354',
 '25.82226',
 '21.03084',
 '20.99717',
 '21.97934',
 '26.10571',
 '21.82533',
 '22.31742',
 '21.59004',
 '23.13639',
 '21.8858',
 '21.81529',
 '22.62746',
 '22.52713',
 '21.34781',
 '21.88786',
 '22.53467',
 '23.46955',
 '21.82736',
 '25.97555',
 '22.29191',
 '22.59070',
 '22.78451',
 '24.69287',
 '24.69088',
 '21.03089',
 '24.75562',
 '22.37690',
 '21.35374',
 '22.15348',
 '21.97676',
 '22.97218',
 '21.53668',
 '21.84496',

In [42]:
# check that no more wrong latitudes are left
wrong_lon_list = get_wrong_longitudes(longitudes)
wrong_lon_list

[]

Next, I will create the `DataFrame` `trails_df`.

In [43]:
data = {'Name':trail_names, 'Lat':latitudes, 'Lon':longitudes}

In [44]:
trail_df = pd.DataFrame(data)


In [45]:
trail_df.head(10)

Unnamed: 0,Name,Lat,Lon
0,Brocēni forest park,56.66974,22.5686
1,Ālande River Park trail,56.53513,21.1626
2,Horse island nature trail and Lake Liepāja boa...,56.51718,21.03084
3,Žibgrava health trail,56.98428,21.57882
4,Virsaiši waterfall nature trail,57.09522,22.54177
5,Usma Elkrags nature trail,57.20721,22.15261
6,Skrunda walking trail,56.67338,22.02398
7,Sātiņi ponds,56.62741,22.3498
8,Roja beach trails,57.50994,22.79975
9,Pēterezers nature trail,57.65414,22.27086


In [46]:
trail_df.tail()

Unnamed: 0,Name,Lat,Lon
75,Ventspils Seaside Park,57.38586,21.53668
76,Volzbahs trail and trail to Lielbāta spring,56.43446,21.84496
77,Walking path along Lake Ciecere,56.6801,22.56659
78,Walking path Rīti,57.25265,22.62033
79,Walking paths at Remte manor park,56.73886,22.69686


Finally, I will visualise each trail on the map.

In [49]:
coordinates = trail_df[['Lat', 'Lon']]
coordinate_list = coordinates.values.tolist()
coordinate_list[75]

['57.38586', '21.53668']

In [54]:
simple_map = folium.Map(location=[56.7, 22], zoom_start=8)
for point in range(0, len(coordinate_list)):
    folium.Marker(coordinate_list[point], popup=trail_df['Name'][point]).add_to(simple_map)
simple_map

In this map, I can see that almost all of the trails are in fact in Kurzeme or western Lithuania (near the sea).

In [56]:
cluster_map = folium.Map(location=[56.7, 22], tiles='CartoDB Positron', zoom_start=8)

marker_cluster = MarkerCluster().add_to(cluster_map)

for point in range(0, len(coordinate_list)):
    folium.Marker(coordinate_list[point], popup=trail_df['Name'][point]).add_to(marker_cluster)
cluster_map

Whilst the `cluster_map` shows me the areas where most of the trails are concentrated, as well as the number of the trails, the `simple_map` seems to be more helpful in actually planning which roads to take when going on our trip.