# Disambiguation of the New Entities in Book 4

The notebook contains the code to compare the list of new entities extracted by NER in Book 4 with an authoritative list of ancient place names from Pleiades. If the named entity matches with the place name in the list, the corresponding Pleiades link is associated to the entity. In addition, the entitiy is searched on the website in Trismegistos Place, a database of currently 61,420 ancient places (https://www.trismegistos.org/geo/about.php). If the entity is found, the Trismegistos ID is associated with the entity.

To enhance the disambiguation process, additional contextual information are integrated, specifically by comparing the  contextual information of the entity in the text with the contextual information of the place name extracted from Pleiades and Trismegistos (ontology-based reasoning).

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

In [2]:
## open the enriched annotation (2,508 rows)
Enriched_ToposText_Book4 = pd.read_csv("/Users/u0154817/OneDrive - KU Leuven/Documents/KU Leuven/PhD project 'Greek Spaces in Roman Times'/Data_Extraction/Outputs/1.8.Enriched_ToposText_Book4.csv", delimiter=",")

In [3]:
len(Enriched_ToposText_Book4)

2508

In [4]:
## the filtered_df only contains the annotations without a ToposText ID
filtered_df = Enriched_ToposText_Book4[pd.isna(Enriched_ToposText_Book4['ToposText ID'])]

In [5]:
len(filtered_df)

631

In [6]:
## open the authoritative list of place names from Pleiades (60,090  rows)
Pleiades_Places = pd.read_csv("/Users/u0154817/OneDrive - KU Leuven/Documents/KU Leuven/PhD project 'Greek Spaces in Roman Times'/Data_Extraction/Outputs/2.1.Pleiades_Places.csv", delimiter=",")

In [7]:
len(Pleiades_Places)

60090

In [8]:
list_PleiadesID = [] ## create a list of PleiadesIDs
n_PleiadesID = [] ## create a list to count

for i1,PlaceName1 in enumerate(filtered_df['Tagged Entity']): ## for each entity
    Temp_PleiadesID = [] ## create a temporary list
    
    for i2,PlaceName2 in enumerate(Pleiades_Places['Place']): ## for each place name
        
        if PlaceName1==PlaceName2: ## if the entity matches with the place name
            PleiadesID = Pleiades_Places['Pleiades ID'][i2] ## the PleiadesID of the entity is the same of the place name
            Temp_PleiadesID.append(PleiadesID) ## append the Pleiades ID
            
    list_PleiadesID.append(Temp_PleiadesID) ## append the list of Pleiades IDs
    n_PleiadesID.append(len(Temp_PleiadesID)) ## append the number of Pleiades IDs

In [9]:
list_TrismegistosID = [] ## create a list of Trismegistos IDs
n_TrismegistosID = [] ## create a list to count

for i,PlaceName in enumerate(filtered_df['Tagged Entity']): ## for each entity
    Temp_TrismID = [] ## create a temporary list
    
    url='https://www.trismegistos.org/geo/index.php?searchterm=' ## create the url
    Trismegistos_Page=requests.get(url+PlaceName) ## navigate the Trismegistos search page (25/05/2023)
    Soup_Request=BeautifulSoup(Trismegistos_Page.content) ## get the soup of the Trismegistos result page
    
    for a_tag in Soup_Request.find_all('a', href=True): ## inspect all the a tags containing href
        if '/place/' in a_tag['href']: ## if there is a ToposTextID
            Trismegistos_ID = a_tag.get('href') ## get the ToposTextID
            Temp_TrismID.append(Trismegistos_ID) ## append the ToposTextID
    
    list_TrismegistosID.append(Temp_TrismID) ## append the list of ToposTextIDs
    n_TrismegistosID.append(len(Temp_TrismID)) ## append the number of ToposTextIDs
    print(i, PlaceName, 'PLEIADES:', n_PleiadesID[i], 'TRISMEGISTOS:', len(Temp_TrismID))

0 Europe PLEIADES: 2 TRISMEGISTOS: 1
1 Thracia PLEIADES: 3 TRISMEGISTOS: 4
2 Greece PLEIADES: 0 TRISMEGISTOS: 2
3 Antigonenses PLEIADES: 0 TRISMEGISTOS: 0
4 Aornos PLEIADES: 2 TRISMEGISTOS: 1
5 Cestrinis PLEIADES: 0 TRISMEGISTOS: 0
6 Cassiopaei PLEIADES: 0 TRISMEGISTOS: 0
7 Sellae PLEIADES: 0 TRISMEGISTOS: 1
8 Hellopes PLEIADES: 0 TRISMEGISTOS: 0
9 Medi PLEIADES: 0 TRISMEGISTOS: 50
10 Denselatae PLEIADES: 0 TRISMEGISTOS: 0
11 Royal Waters PLEIADES: 0 TRISMEGISTOS: 0
12 Gulf PLEIADES: 0 TRISMEGISTOS: 18
13 Gulf PLEIADES: 0 TRISMEGISTOS: 18
14 Neritis PLEIADES: 0 TRISMEGISTOS: 0
15 Dioryctos PLEIADES: 0 TRISMEGISTOS: 0
16 Amphilochian PLEIADES: 0 TRISMEGISTOS: 1
17 Ephyri PLEIADES: 1 TRISMEGISTOS: 15
18 Maraces PLEIADES: 1 TRISMEGISTOS: 0
19 Atraces PLEIADES: 0 TRISMEGISTOS: 0
20 Acanthon PLEIADES: 0 TRISMEGISTOS: 0
21 Panaetolium PLEIADES: 0 TRISMEGISTOS: 0
22 Macynium PLEIADES: 0 TRISMEGISTOS: 0
23 Gulf of Crissa PLEIADES: 0 TRISMEGISTOS: 0
24 Argyna PLEIADES: 1 TRISMEGISTOS: 0
25 Eupa

204 Irine PLEIADES: 0 TRISMEGISTOS: 5
205 Ephyre PLEIADES: 1 TRISMEGISTOS: 1
206 Tiparenus PLEIADES: 0 TRISMEGISTOS: 0
207 Baucidias PLEIADES: 0 TRISMEGISTOS: 0
208 Eleusa PLEIADES: 1 TRISMEGISTOS: 0
209 Adendros PLEIADES: 1 TRISMEGISTOS: 0
210 Craugiae PLEIADES: 0 TRISMEGISTOS: 0
211 Caeciae PLEIADES: 0 TRISMEGISTOS: 0
212 Methurides PLEIADES: 0 TRISMEGISTOS: 0
213 Cytaion PLEIADES: 0 TRISMEGISTOS: 0
214 Matium PLEIADES: 1 TRISMEGISTOS: 0
215 Hierapolis PLEIADES: 7 TRISMEGISTOS: 8
216 Dium PLEIADES: 0 TRISMEGISTOS: 50
217 Asus PLEIADES: 0 TRISMEGISTOS: 11
218 Holopyxos PLEIADES: 0 TRISMEGISTOS: 0
219 Therapnae PLEIADES: 1 TRISMEGISTOS: 0
220 Marathusa PLEIADES: 0 TRISMEGISTOS: 0
221 Dictynnaeus PLEIADES: 0 TRISMEGISTOS: 0
222 Matium PLEIADES: 1 TRISMEGISTOS: 0
223 Onisia PLEIADES: 0 TRISMEGISTOS: 2
224 Butoa PLEIADES: 0 TRISMEGISTOS: 0
225 Aradus PLEIADES: 0 TRISMEGISTOS: 2
226 Phocoe PLEIADES: 1 TRISMEGISTOS: 0
227 Platiae PLEIADES: 1 TRISMEGISTOS: 0
228 Sirnides PLEIADES: 0 TRISMEGI

404 Northern Ocean PLEIADES: 0 TRISMEGISTOS: 0
405 Parapanisus PLEIADES: 0 TRISMEGISTOS: 0
406 Amalchian PLEIADES: 0 TRISMEGISTOS: 0
407 Morimarusa PLEIADES: 0 TRISMEGISTOS: 0
408 Promontory of Rubeas PLEIADES: 0 TRISMEGISTOS: 0
409 Cronian Sea PLEIADES: 0 TRISMEGISTOS: 0
410 Baltia PLEIADES: 0 TRISMEGISTOS: 1
411 Oonae PLEIADES: 0 TRISMEGISTOS: 0
412 Hippopodes PLEIADES: 0 TRISMEGISTOS: 0
413 Panotii PLEIADES: 0 TRISMEGISTOS: 0
414 Sevo PLEIADES: 0 TRISMEGISTOS: 10
415 Codanian PLEIADES: 0 TRISMEGISTOS: 0
416 Hilleviones PLEIADES: 0 TRISMEGISTOS: 0
417 Eningia PLEIADES: 0 TRISMEGISTOS: 0
418 Sarmati PLEIADES: 0 TRISMEGISTOS: 2
419 Venedi PLEIADES: 1 TRISMEGISTOS: 2
420 Hirri PLEIADES: 0 TRISMEGISTOS: 0
421 Cylipenus PLEIADES: 0 TRISMEGISTOS: 0
422 Latris PLEIADES: 0 TRISMEGISTOS: 0
423 Lagnus PLEIADES: 0 TRISMEGISTOS: 0
424 Cartris PLEIADES: 0 TRISMEGISTOS: 0
425 Roman PLEIADES: 0 TRISMEGISTOS: 50
426 Burcana PLEIADES: 1 TRISMEGISTOS: 1
427 Fabaria PLEIADES: 1 TRISMEGISTOS: 1
428 Acta

603 Colarni PLEIADES: 1 TRISMEGISTOS: 1
604 Cibilitani PLEIADES: 2 TRISMEGISTOS: 0
605 Concordienses PLEIADES: 0 TRISMEGISTOS: 0
606 Elbocorii PLEIADES: 0 TRISMEGISTOS: 0
607 Interannienses PLEIADES: 1 TRISMEGISTOS: 1
608 Mirobrigenses PLEIADES: 0 TRISMEGISTOS: 0
609 Celtici PLEIADES: 1 TRISMEGISTOS: 1
610 Plumbarii PLEIADES: 0 TRISMEGISTOS: 0
611 Ocelenses PLEIADES: 0 TRISMEGISTOS: 1
612 Barduli PLEIADES: 1 TRISMEGISTOS: 1
613 Tapori PLEIADES: 3 TRISMEGISTOS: 1
614 Spain PLEIADES: 0 TRISMEGISTOS: 2
615 Greeks PLEIADES: 0 TRISMEGISTOS: 4
616 Promontory of the Arrotrebae PLEIADES: 0 TRISMEGISTOS: 0
617 Islands of the Gods PLEIADES: 0 TRISMEGISTOS: 0
618 Fortunate Islands PLEIADES: 0 TRISMEGISTOS: 0
619 Gadis PLEIADES: 0 TRISMEGISTOS: 2
620 Roman PLEIADES: 0 TRISMEGISTOS: 50
621 Spain PLEIADES: 0 TRISMEGISTOS: 2
622 Aphrodisias PLEIADES: 8 TRISMEGISTOS: 9
623 Romans PLEIADES: 0 TRISMEGISTOS: 2
624 Gadir PLEIADES: 0 TRISMEGISTOS: 0
625 Punic PLEIADES: 0 TRISMEGISTOS: 2
626 Erythraean PLEI