#  Milestone type of inscription exploration

**Research idea & domain expertise:** Petra Hermankova, Aarhus University

**Script & technical solution:** Vojtech Kase, Aarhus University

Source: https://github.com/sdam-au/social_diversity/

In [1]:
### REQUIREMENTS - will install the libraries
import numpy as np
import math
import pandas as pd
import sys
import requests
from urllib.request import urlopen 
from bs4 import BeautifulSoup
import io

# to avoid errors, we sometime use time.sleep(N) before retrying a request
import time

# the input data have typically a json structure
import json
import getpass

import datetime as dt

#!pip install sddk ### our own package under construction, always install to have an up-to-date version
!pip install --ignore-installed sddk
import sddk


Collecting sddk
  Using cached sddk-2.8.2-py3-none-any.whl (11 kB)
Collecting numpy
  Downloading numpy-1.20.1-cp37-cp37m-manylinux2010_x86_64.whl (15.3 MB)
[K     |████████████████████████████████| 15.3 MB 4.7 MB/s eta 0:00:01
[?25hCollecting pyarrow
  Downloading pyarrow-3.0.0-cp37-cp37m-manylinux2014_x86_64.whl (20.7 MB)
[K     |████████████████████████████████| 20.7 MB 34.0 MB/s eta 0:00:01
[?25hCollecting pandas
  Downloading pandas-1.2.2-cp37-cp37m-manylinux1_x86_64.whl (9.9 MB)
[K     |████████████████████████████████| 9.9 MB 36.1 MB/s eta 0:00:01
[?25hCollecting requests
  Downloading requests-2.25.1-py2.py3-none-any.whl (61 kB)
[K     |████████████████████████████████| 61 kB 507 kB/s  eta 0:00:01
[?25hCollecting matplotlib
  Downloading matplotlib-3.3.4-cp37-cp37m-manylinux1_x86_64.whl (11.5 MB)
[K     |████████████████████████████████| 11.5 MB 86.1 MB/s eta 0:00:01
[?25hCollecting beautifulsoup4
  Downloading beautifulsoup4-4.9.3-py3-none-any.whl (115 kB)
[K     |█

## Establishing connection to the Sciencedata.dk, connecting GoogleSheets

In [3]:
# to access gsheet, you need Google Service Account key json file
# I have mine located in my personal space on sciencedata.dk, so I read it from there:
conf = sddk.configure()

# (1) read the file and parse its content
file_data = conf[0].get(conf[1] + "ServiceAccountsKey.json").json()
# (2) transform the content into crendentials object
credentials = service_account.Credentials.from_service_account_info(file_data)
# (3) specify your usage of the credentials
scoped_credentials = credentials.with_scopes(['https://spreadsheets.google.com/feeds', 'https://www.googleapis.com/auth/drive'])
# (4) use the constrained credentials for authentication of gspread package
gc = gspread.Client(auth=scoped_credentials)
# (5) establish connection with spreadsheets specified by their url
terms = gc.open_by_url("https://docs.google.com/spreadsheets/d/1tdtjPCoHY61FSZB0CxAdZXN9xDgl76KU-ObMp4uNG2A/edit#gid=0")

sciencedata.dk username (format '123456@au.dk'): 648560@au.dk
sciencedata.dk password: ··········
connection with shared folder established with you as its ordinary user
endpoint variable has been configured to: https://sciencedata.dk/sharingout/648597%40au.dk/SDAM_root/


## Connecting to the preprocessed and enriched JSON files from sciencedata.dk


In [2]:
# read datasets in public folders

publicfolder = "66cbabddae0e02c6ae6c15be9746990c/"

EDH = sddk.read_file("EDH_terms_2021-02-26.json", "df", publicfolder)
EDCS = sddk.read_file("EDCS_terms_2021-02-26.json", "df", publicfolder)

reading file located in a public folder
reading file located in a public folder


In [3]:
len(EDH)

81476

In [4]:
# Inspect how many rows and columns we have
EDH.shape

(81476, 76)

## Subsetting the dataset

In [5]:
# Inspect all unique values within "type_of_inscription"
EDH["type_of_inscription"].unique()

array(['epitaph', 'honorific inscription', 'votive inscription',
       'defixio', 'owner/artist inscription', 'owner/artist inscription?',
       'mile-/leaguestone', 'acclamation', 'boundary inscription',
       'building/dedicatory inscription', None, 'votive inscription?',
       'military diploma', 'building/dedicatory inscription?', 'epitaph?',
       'honorific inscription?', 'identification inscription',
       'public legal inscription', 'private legal inscription',
       'boundary inscription?', 'label', 'label?', 'list',
       'private legal inscription?', 'calendar',
       'identification inscription?', 'list?', 'seat inscription',
       'elogium', 'assignation inscription', 'seat inscription?',
       'elogium?', 'prayer', 'acclamation?', 'defixio?', 'calendar?',
       'letter', 'mile-/leaguestone?', 'adnuntiatio',
       'public legal inscription?', 'prayer?', 'letter?',
       'assignation inscription?', 'military diploma?'], dtype=object)

In [9]:
# Example how to subset the dataset, this time based on a specific string in the type of inscription
EDH_miles = EDH[EDH["type_of_inscription"].str.startswith("mile-/lea", na=False)]
len(EDH_miles) ### shows how many records in the dataset fulfils the condition

1730

In [22]:
EDH_miles.head(2) # shows the first (2) rows of the dataset

AttributeError: 'NotebookFormatter' object has no attribute 'get_result'

    responsible_individual type_of_inscription letter_size not_after  \
23                 Feraudi   mile-/leaguestone     7-10 cm      0375   
176                Feraudi   mile-/leaguestone        8 cm      0300   

                                            literature  work_status    height  \
23   AE 1983, 0575.; L. Dos Santos - P. Le Roux - A...  provisional  (107) cm   
176  AE 1983, 0572.; L. Dos Santos - P. Le Roux - A...  provisional   (22) cm   

                                       diplomatic_text  \
23   D N / VALENTIN[ ] / VICTORI AC TRIVMPHATORI [ ...   
176                             ]AV[ ] / [ ] VIII CON[   

                                                people depth  ...  \
23   [{'name': 'Valentin[iano]', 'person_id': '1', ...  None  ...   
176  [{'nomen': '[---]', 'cognomen': '[---]', 'gend...  None  ...   

                      clean_text_interpretive_sentence  \
23   Domino nostro Valentiniano victori ac triumpha...   
176                           AV imperat

In [12]:
# how to show only the dated ones
EDH_miles_date = EDH_miles[EDH_miles["origdate_text"].str.startswith("", na=False)]
len(EDH_miles_date) ### how long it is?

1726

In [13]:
# with geolocations
len(EDH_miles[EDH_miles["coordinates"].notnull()])

1730

In [14]:
# selects only the milestones in the province Sardinia
EDH_miles_sardinia = EDH_miles[EDH_miles["province_label"].str.startswith("Sardinia", na=False)]
len(EDH_miles_sardinia)

6

### Saving the subset as CSV file

In [0]:
# If you need to save the subset into a CSV and save it into a local computer
from google.colab import files
EDH_miles.to_csv('EDH_milestones.csv') 
files.download('EDH_milestones.csv')

In [0]:
# prints as CSV into a local computer
from google.colab import files
EDH_miles_sardinia.to_csv('EDH_milestones_sardinia.csv') 
files.download('EDH_milestones_sardinia.csv')

## Inscriptions from one province (Example of Sardinia)

In [20]:
EDH["province_label_clean"].unique()

array(['Latium et Campania (Regio I)', 'Roma', 'Baetica', 'Britannia',
       'Aemilia (Regio VIII)', 'Hispania citerior', 'unknown',
       'Alpes Maritimae', 'Apulia et Calabria (Regio II)', 'Narbonensis',
       'Lusitania', 'Africa Proconsularis', 'Samnium (Regio IV)',
       'Etruria (Regio VII)', 'Raetia', 'Pannonia superior',
       'Lugdunensis', 'Moesia inferior', 'Dalmatia', 'Belgica',
       'Umbria (Regio VI)', 'Germania inferior', 'Germania superior',
       'Dacia', 'Aquitania', 'Arabia', 'Mauretania Caesariensis',
       'Noricum', 'Numidia', 'Pannonia inferior',
       'Venetia et Histria (Regio X)', 'Barbaricum',
       'Transpadana (Regio XI)', 'Sardinia', 'Aegyptus',
       'Mauretania Tingitana', 'Asia', 'Syria', 'Bithynia et Pontus',
       'Cyrene', 'Moesia superior', 'Macedonia',
       'Bruttium et Lucania (Regio III)', 'Picenum (Regio V)', 'Epirus',
       'Alpes Poeninae', 'Galatia', 'Liguria (Regio IX)',
       'Sicilia, Melita', 'Iudaea', 'Corsica', 'Achaia'

In [21]:
# subset based on the name of province 
EDH_sardinia = EDH[EDH["province_label"].str.startswith("Sardinia", na=False)]
len(EDH_sardinia) ### how long it is?

228