# Dog parks in Barcelona

**Associació Canina ICanWalk**

**ICanWalk** is an association that want to promote *health*, *respect for the environmen*t and *enjoy quality time*, with the **best travel companion**.

<img src='../img/amy.jpg' width='400' height='400'/>

Find the list of dog parks in Barcelona https://icanwalk.es/parques-caninos-en-barcelona/

## 0.Import libraries

In [1]:
import requests 
from bs4 import BeautifulSoup as bs
import numpy as np
import pandas as pd
import re
import time
from itertools import *

# import os
# from urllib.parse import urlparse

# Options for DataFrame visualization:
pd.set_option('display.width', 1000)
pd.set_option('display.max_columns', 40)
# Option to show not truncated cells in pandas
pd.set_option('display.max_colwidth', -1)

## 1. Download and display the content of robots.txt

In [2]:
def robot_txt():
    response = requests.get('https://icanwalk.es/robots.txt')
    test = response.text
    print('robots.txt for https://icanwalk.es/')
    print('=====================================')
    result_data_set = {'DISALLOWED':[], 'ALLOWED':[]}

    for line in test.split('\n'):
        if line.startswith('Allow'):    # this is for allowed url
            result_data_set['ALLOWED'].append(line.split(': ')[1].split(' ')[0])    
        elif line.startswith('Disallow'):    # this is for disallowed url
            result_data_set['DISALLOWED'].append(line.split(': ')[1].split(' ')[0])    

    return result_data_set
robot_txt()

robots.txt for https://icanwalk.es/


{'DISALLOWED': ['/wp-admin/',
  '/wp-login.php',
  '/wp-signup.php',
  '/press-this.php',
  '/remote-login.php',
  '/activate/',
  '/cgi-bin/',
  '/mshots/v1/',
  '/next/',
  '/public.api/'],
 'ALLOWED': ['/wp-admin/admin-ajax.php']}

## 2. Create a random User Agent generator

In [3]:
def get_random_ua():
    random_ua = ''
    ua_file = 'ua_file.txt'
    try:
        with open(ua_file) as f:
            lines = f.readlines()
        if len(lines) > 0:
            # random.RandomState exposes a number of methods for generating random numbers drawn from a variety of probability distributions
            prng = np.random.RandomState()
            index = prng.permutation(len(lines) - 1)
            idx = np.asarray(index, dtype=np.integer)[0]
            random_ua = lines[int(idx)]
    except Exception as ex:
        print('Exception in random_ua')
        print(str(ex))
    finally:
        return random_ua

## 3. Web scraping

Find the list of dog parks in Barcelona

In [4]:
def dog_parks_scraper(url):
    '''
    Scraper for the list of the dogs parks in Barcelona
    '''
    user_agent = get_random_ua()
    headers = {'user_agent':user_agent}
    html = requests.get(url, headers).text
    soup = bs(html, 'lxml')
    
    # create database with data from the table
    tables = soup.find_all('table')
    table = tables[0]
#     print(table)
    tab_data = [[cell.text for cell in row.find_all('td')]
                            for row in table.find_all('tr')]

    df = pd.DataFrame(tab_data)
    df.columns = df.iloc[0,:]
    df.drop(index=0,inplace=True)
    df.reset_index(drop = True, inplace = True)
    return df

In [5]:
url = 'https://icanwalk.es/parques-caninos-en-barcelona/'
df=dog_parks_scraper(url)

In [6]:
df.head(10)

Unnamed: 0,Distrito/Dirección,Superficie (m2)
0,Áreas para perros en el distrito de Ciutat Vella,
1,Parc de la Ciutadella,287 m2
2,Parc de la Barceloneta,451 m2
3,Jardins de Sant Pau del Camp,227 m2
4,,
5,Áreas para perros en el distrito de l’Eixample\n,
6,Jardins Montserrat,38 m2
7,Pl. Doctor Letamendi,8 m2
8,Jardins de Doctor Duran i Reynals,83 m2
9,Pl. Sagrada Família,333 m2


## 4.EDA-Exploratory Data Analysis

In [7]:
df.shape

(118, 2)

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 118 entries, 0 to 117
Data columns (total 2 columns):
Distrito/Dirección    118 non-null object
Superficie (m2)       108 non-null object
dtypes: object(2)
memory usage: 2.0+ KB


Drop rows with nan-values

In [9]:
df.replace('', np.nan, inplace=True)
df.dropna(how='all', inplace=True)
df.reset_index(drop=True, inplace=True)
df.head()

Unnamed: 0,Distrito/Dirección,Superficie (m2)
0,Áreas para perros en el distrito de Ciutat Vella,
1,Parc de la Ciutadella,287 m2
2,Parc de la Barceloneta,451 m2
3,Jardins de Sant Pau del Camp,227 m2
4,Áreas para perros en el distrito de l’Eixample\n,


In [10]:
df.tail()

Unnamed: 0,Distrito/Dirección,Superficie (m2)
110,Jardins Mercè Rodoreda,88 m2
111,Turó Parc,123 m2
112,Jardins Doctor Samuel C. Hahnemann,283 m2
113,Pl. Ventura i Gassol,118 m2
114,Jardins Casa Sagnier\nJardins de Vil·la Amèlia\nJardins de Piscines i Esports,


In [11]:
df.shape[0]

115

## 4.1 Formatting DF

### Column 'Distrito/Dirección'

Create a new column **Distritos**

In [12]:
df_aux=df[df['Distrito/Dirección'].str.contains('Áreas para perros en el distrito de')]
df_aux

Unnamed: 0,Distrito/Dirección,Superficie (m2)
0,Áreas para perros en el distrito de Ciutat Vella,
4,Áreas para perros en el distrito de l’Eixample\n,
16,Áreas para perros en el distrito de Gràcia,
28,Áreas para perros en el distrito de Horta-Guinardó,
39,Áreas para perros en el distrito de Les Corts,
49,Áreas para perros en el distrito de Nou Barris,
57,Áreas para perros en el distrito de Sant Andreu,
71,Áreas para perros en el distrito de Sant Martí,
86,Áreas para perros en el distrito de Sants-Montjuïc,
99,Áreas para perros en el distrito de Sarrià-Sant Gervasi,


Extracting the Districts and their indexes

In [13]:
# df['Distrito']=[re.sub(r'[\n]','',str(x)) for x in df['Distrito/Dirección']]
df_aux['Lista Distritos']=[re.sub(r'[\W\w]*de ','',str(x.strip('\n'))) for x in df_aux['Distrito/Dirección']]
df_aux['Lista Distritos']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


0     Ciutat Vella       
4     l’Eixample         
16    Gràcia             
28    Horta-Guinardó     
39    Les Corts          
49    Nou Barris         
57    Sant Andreu        
71    Sant Martí         
86    Sants-Montjuïc     
99    Sarrià-Sant Gervasi
Name: Lista Distritos, dtype: object

Creating a list of Districts

In [14]:
district_list=df_aux['Lista Distritos'].tolist()
# district_list

Creating a new column dog_parks['Distritos']

In [15]:
for i, _ in df.iterrows():
    for j in range(len(district_list)):

        if i<4:
            df.loc[i,'Distritos'] = district_list[0]
        elif i>=4 and i<16:
            df.loc[i,'Distritos'] = district_list[1]
        elif i>=16 and i<28:
            df.loc[i,'Distritos'] = district_list[2]
        elif i>=28 and i<39:
            df.loc[i,'Distritos'] = district_list[3]
        elif i>=39 and i<49:
            df.loc[i,'Distritos'] = district_list[4]
        elif i>=49 and i<57:
            df.loc[i,'Distritos'] = district_list[5]
        elif i>=57 and i<71:
            df.loc[i,'Distritos'] = district_list[6]
        elif i>=71 and i<86:
            df.loc[i,'Distritos'] = district_list[7]
        elif i>=86 and i<99:
            df.loc[i,'Distritos'] = district_list[8]
        else:
            df.loc[i,'Distritos'] = district_list[9]

### Re-ordering columns and changing names

In [16]:
df.head()

Unnamed: 0,Distrito/Dirección,Superficie (m2),Distritos
0,Áreas para perros en el distrito de Ciutat Vella,,Ciutat Vella
1,Parc de la Ciutadella,287 m2,Ciutat Vella
2,Parc de la Barceloneta,451 m2,Ciutat Vella
3,Jardins de Sant Pau del Camp,227 m2,Ciutat Vella
4,Áreas para perros en el distrito de l’Eixample\n,,l’Eixample


Eliminating the name of the Areas

In [17]:
df1=df[~df['Distrito/Dirección'].isin(df_aux['Distrito/Dirección'])]

In [18]:
df1=df1[['Distritos','Distrito/Dirección','Superficie (m2)']]

In [19]:
df1.rename(columns = {'Distrito/Dirección':'Dirección'}, inplace = True)
df1.reset_index(drop=True, inplace=True)

In [20]:
df1.shape

(105, 3)

In [21]:
# df1.head()

In [22]:
# df1.tail()

### Column 'Superficie'

Formatting text

In [23]:
# df1

In [24]:
df1['Superficie (m2)']=df1['Superficie (m2)'].fillna('-').str.replace('m2','')
# df1.tail()

## 5. Export final DF

In [26]:
df1.to_csv('../data/'+'dogs_parks.csv', sep=",", index=False)

In [None]:
# Share data between Jupyter Notebooks
%dogs_parks