PORTUGUESE CASTLES

February 2023 - Rui Cruzeiro

This notebook creates a dictionary with the names of Portuguese castles, as well as some information about each of them. It was created as a part of my personal portfolio.

In [210]:
import requests
import folium as fl
from bs4 import BeautifulSoup
import time
import random

The information about the castles was retrieved from the castelosdeportugal.pt website through web scraping. Since we get a `403 - Forbidden` while using BeautifulSoup, I needed to use the ScrapeOps API (scrapeops.io) to be able to extract the information from the website.

In [18]:
API_KEY = '63716ad6-e533-46c1-8f2e-db3895ad9685'

response = requests.get(
  url='https://proxy.scrapeops.io/v1/',
  params={
      'api_key': API_KEY,
      'url': 'https://www.castelosdeportugal.pt/castelos/SiteMap.html', 
  },
)

soup = BeautifulSoup(response.content, "html.parser")

soup

<!DOCTYPE html>

<html>
<head>
<meta charset="utf-8"/>
<meta content="width=device-width,initial-scale=1" name="viewport"/>
<meta content="width=device-width" name="viewport"/>
<title>Índice</title>
<style>
			body{
			background-color:#373e48;
			font-family:"Times New Roman", Times, serif;
			}
			.center{
			text-align:center;
			}
			h1{
			text-align:center;
			margin:0;
			color:#ffff;
			}	
			h4{
			Color:#ffff;
			margin-bottom:0px;
			margin-top:.5rem;
			}	
			h3 {
			font-size:2em;
			text-align:center;
			}	
			a {
			padding-left:10px;
			padding-right:10px;
			}
			a:link {
			color:#ffe30a;
			font-style:italic;
			}
			a:visited {
			color:#ffe30a;
			}
			.floatLeft {
			float:left;
			}
			h3 a{
			color:#000000;
			}
			#indice{
			font-size:1.2em;
			background-color:#373e48;
			}
		</style>
</head>
<body>
<div id="indice">
<h1>Castelos de Portugal</h1>
<div class="row">
<h4>A</h4>
<a href="CastelosSECXII/abrantes.html" rel="nofollow">Abrantes</a>
<a href="Castelos

In the castelosdeportugal.pt website, the castle information is shown on individual html pages. Let's first store the URL of each page to be able to parse them one by one later.

In [48]:
castles_html_dict = {}

for letter in soup.find_all('div', class_='row'):
    for castle_entry in letter.find_all('a'):
        castles_html_dict[castle_entry.string] = castle_entry.get('href')
        
castles_html_dict

{'Abrantes': 'CastelosSECXII/abrantes.html',
 'Aguiar Sousa': 'Castelos(pre)SECXII/aguiarSousa.html',
 'Alcácer do Sal': 'CastelosSECXII/alcacerSal.html',
 'Alcanede': 'Castelos(pre)SECXII/alcanede.html',
 'Alcobaça': 'Castelos(pre)SECXII/alcobaca.html',
 'Alcantarilha': 'CastelosSECXII/alcantarilha.html',
 'Alfaiates': 'CastelosSECXII/alfaiates.html',
 'Alfeizerão': 'CastelosSECXII/alfeizerao.html',
 'Aljezur': 'Castelos(pre)SECXII/aljezur.html',
 'Aljustrel': 'CastelosSECXII/aljustrel.html',
 'Alenquer': 'CastelosSECXII/alenquer.html',
 'Almada': 'CastelosSECXII/almada.html',
 'Almourol': 'CastelosSECXII/almourol.html',
 'Alter do Chão': 'Castelos(pos)SECXIII/alterChao.html',
 'Alva': 'Castelos(pre)SECXII/alva.html',
 'Alvito': 'Castelos(pos)SECXIII/alvito.html',
 'Alvor': 'Castelos(pre)SECXII/alvor.html',
 'Amieira': 'Castelos(pos)SECXIII/amieira.html',
 'Alpalhão': 'CastelosSECXII/alpalhao.html',
 'Arraiolos': 'Castelos(pos)SECXIII/arraiolos.html',
 'Ansiães': 'Castelos(pre)SECXII/

In [49]:
len(castles_html_dict)

141

I retrieved the information for 141 Portuguese castles.

In [218]:
# This cell will make a request for each castle page and extract its information using BeautifulSoup
# and store it in the castles_dict dictionary

castles_dict = {}

for castle_name, castle_url in castles_html_dict.items():

    print('Parsing through ' + castle_name + ' castle...')

    # Make request
    response = requests.get(
      url='https://proxy.scrapeops.io/v1/',
      params={
          'api_key': API_KEY,
          'url': 'https://www.castelosdeportugal.pt/castelos/' + castle_url, 
      },
    )

    # Get soup

    castle_soup = BeautifulSoup(response.content, "html.parser")


    # Get smaller soups with the relevant information for each castle

    try:
        castle_soup = castle_soup.find_all('div', class_='table-responsive table-bordered')[0]
        castle_location = castle_soup.find_all('a')
        castle_info = castle_soup.find_all('td')
    except IndexError:
        castle_location = ['SEM INFO', 'SEM INFO']
        print('- No district information for ' + castle_name + ' castle!')
        print('- No council information for ' + castle_name + ' castle!')


    # Get the district and council where the castle is located

    try:
        castle_district = castle_location[0].string
    except IndexError:
        castle_district = 'SEM INFO'
        print('- No district information for ' + castle_name + ' castle!')

    try:    
        castle_council = castle_location[1].string   
    except IndexError:
        castle_council = 'SEM INFO'
        print('- No council information for ' + castle_name + ' castle!')


    # Get the century in which the castle was built

    try:
        castle_century = 0
    except NameError:
        pass

    for index, item in enumerate(castle_info):
        if 'Construção' in item.text:
            if castle_info[index+1].text != '( )':
                castle_century = castle_info[index+1].text
                castle_century = castle_century.replace('<td>', '').replace('</td>', '').replace('séc.', 'Século')
                print('- Found century/year info: ', castle_century)

    if castle_century == 0:
        castle_century = 'SEM INFO'
        print('- No century/year information for ' + castle_name + ' castle!')


    # Get the name of the king who had the castle built

    try:
        castle_king = 0
    except NameError:
        pass

    for index, item in enumerate(castle_info):
        if 'Reinado' in item.text:
            if castle_info[index+1].text != '( )':
                castle_king = castle_info[index+1].text
                castle_king = castle_king.replace('<td>', '').replace('</td>', '')
                print('- Found kingdom info: ', castle_king)

    if castle_king == 0:
        castle_king = 'SEM INFO'
        print('- No kingdom information for ' + castle_name + ' castle!')


    # Add information to castle dictionary

    castles_dict[castle_name] = {
        'district': castle_district,
        'council': castle_council,
        'century': castle_century,
        'king': castle_king,
        'coordinates': 'SEM COORDENADAS'   # Left this key-value pair as a placeholder for later
    }


    # Keeping ScrapeOps and castelosdeportugal.pt happy

    print('')
    wait_time = random.randrange(50, 100, 10)
    time.sleep(wait_time)

Parsing through Abrantes castle...
- Found century/year info:  Século XII
- Found kingdom info:  D. Afonso Henriques

Parsing through Aguiar Sousa castle...
- No century/year information for Aguiar Sousa castle!
- No kingdom information for Aguiar Sousa castle!

Parsing through Alcácer do Sal castle...
- Found century/year info:  Século XII
- No kingdom information for Alcácer do Sal castle!

Parsing through Alcanede castle...
- No century/year information for Alcanede castle!
- No kingdom information for Alcanede castle!

Parsing through Alcobaça castle...
- Found century/year info:  (c.650)
- No kingdom information for Alcobaça castle!

Parsing through Alcantarilha castle...
- Found century/year info:  Séc. XII
- No kingdom information for Alcantarilha castle!

Parsing through Alfaiates castle...
- Found century/year info:  (Século XII)?
- No kingdom information for Alfaiates castle!

Parsing through Alfeizerão castle...
- Found century/year info:  c. 1147
- Found kingdom info:  D. A

- Found century/year info:  (Século XII)?
- No kingdom information for Linhares da Beira castle!

Parsing through Longroiva castle...
- No century/year information for Longroiva castle!
- No kingdom information for Longroiva castle!

Parsing through Loulé castle...
- No century/year information for Loulé castle!
- No kingdom information for Loulé castle!

Parsing through Lourinhã castle...
- Found century/year info:  Séc. XII
- No kingdom information for Lourinhã castle!

Parsing through Lousã castle...
- No century/year information for Lousã castle!
- No kingdom information for Lousã castle!

Parsing through Marialva castle...
- Found century/year info:  c. 1179
- Found kingdom info:  D. Afonso Henriques

Parsing through Mau Vizinho castle...
- No century/year information for Mau Vizinho castle!
- No kingdom information for Mau Vizinho castle!

Parsing through Mau Vizinho(Évora) castle...
- No century/year information for Mau Vizinho(Évora) castle!
- No kingdom information for Mau Viz

Parsing through Torres Vedras castle...
- No century/year information for Torres Vedras castle!
- No kingdom information for Torres Vedras castle!

Parsing through Trancoso castle...
- No century/year information for Trancoso castle!
- No kingdom information for Trancoso castle!

Parsing through Veiros castle...
- No century/year information for Veiros castle!
- No kingdom information for Veiros castle!

Parsing through Velho de Alcoutim castle...
- Found century/year info:  Século VIII
- Found kingdom info:  Califado Omíada

Parsing through Velho do Degebe castle...
- Found century/year info:  Idade do Ferro
- No kingdom information for Velho do Degebe castle!

Parsing through Vermoim castle...
- Found century/year info:  (ant. a 977)
- No kingdom information for Vermoim castle!

Parsing through Viana do Alentejo castle...
- Found century/year info:  1313
- Found kingdom info:  D. Dinis I

Parsing through Vidigueira castle...
- Found century/year info:  (Séc. XII)?
- No kingdom inform

In [219]:
castles_dict

{'Abrantes': {'district': 'Santarém',
  'council': 'Abrantes',
  'century': 'Século XII',
  'king': 'D. Afonso Henriques',
  'coordinates': 'SEM COORDENADAS'},
 'Aguiar Sousa': {'district': 'Porto',
  'council': 'Paredes',
  'century': 'SEM INFO',
  'king': 'SEM INFO',
  'coordinates': 'SEM COORDENADAS'},
 'Alcácer do Sal': {'district': 'Setúbal',
  'council': 'Alcácer do Sal',
  'century': 'Século XII',
  'king': 'SEM INFO',
  'coordinates': 'SEM COORDENADAS'},
 'Alcanede': {'district': 'Santarém',
  'council': 'Alcanede',
  'century': 'SEM INFO',
  'king': 'SEM INFO',
  'coordinates': 'SEM COORDENADAS'},
 'Alcobaça': {'district': 'Leiria',
  'council': 'Alcobaça',
  'century': '(c.650)',
  'king': 'SEM INFO',
  'coordinates': 'SEM COORDENADAS'},
 'Alcantarilha': {'district': 'Faro',
  'council': 'Silves',
  'century': 'Séc. XII',
  'king': 'SEM INFO',
  'coordinates': 'SEM COORDENADAS'},
 'Alfaiates': {'district': 'Guarda',
  'council': 'Sabugal',
  'century': '(Século XII)?',
  'kin