# INTRODUCTION

This notebook explores the potential of the Epigraphic Database Heidelberg web API , [EDH API](https://edh-www.adw.uni-heidelberg.de/data/api) in combination with sciencedata.dk as a datastorage (see more about our current progress in using sciencedata.dk [here](https://docs.google.com/document/d/1sojHsxkcAbZH9DpWFuHDomQwTZHPQv_WaAxO_erP6FE/edit?usp=sharing)).

The ambition here is to use cloud based solutions as much as possible, without any dependence on local machines. At the same time, we do not like to rely completely upon google services. 

In [1]:
### REQUIREMENTS
import pandas as pd
import requests
import time
from concurrent.futures import ThreadPoolExecutor
import sddk # >=3.2

# EDH via API

The basis form of an request is as follows:
```
https://edh.ub.uni-heidelberg.de/data/api/inschrift/suche?
```
With this, to create query based on inscription number, you have to specify the paramenter **hd_nr**, like here:

```
https://edh.ub.uni-heidelberg.de/data/api/inschrift/suche?hd_nr=1

```
 (Feel free to explore this in web browser).

Here we use the function ```requests.get()``` to make our requests from python.

## One inscription query example

In [8]:
%%time
inscription_number = 1000
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}


URL_form = "https://edh.ub.uni-heidelberg.de/data/api/inschrift/suche?"

response = requests.get(URL_form + "hd_nr=" + str(inscription_number), headers=headers)
#response
json_data = response.json()
print(json_data)

{'items': [{'commentary': 'Amphore des Typs Camulodunum 186 = Dressel 7-8. Inschrift aufgemalt. Y: Datierung nach Fundumständen. ""In dumped material behind the second period Roman quay, constructed c. A.D. 70. ""- Hassall - Tomlin.', 'country': 'United Kingdom', 'diplomatic_text': 'C ACERRONI FVR[ ]', 'findspot': 'The City, {Pudding Lane}, Lower Thames Street', 'findspot_ancient': 'Londinium', 'findspot_modern': 'London', 'id': 'HD001000', 'language': 'L', 'last_update': '2019-04-29', 'letter_size': None, 'literature': 'AE 1982, 0656. ; M.W.C. Hassall - R.S.O. Tomlin, Britannia 13, 1982, 417, Nr. 60. - AE 1982. ; RIB 2492, 9; Zeichnung. ;', 'material': 'Ton', 'modern_region': 'London', 'not_after': '51', 'not_before': '200', 'responsible_individual': 'Cowey', 'transcription': 'C(ai) Acerroni Fur[---]', 'trismegistos_uri': 'https://www.trismegistos.org/text/165167', 'type_of_inscription': 'owner/artist inscription', 'type_of_monument': 'instrumentum domesticum', 'work_status': 'checked

In [9]:
"province" in response.text

False

In [3]:
%%time
inscription_number = 1
URL_form = "https://edh.ub.uni-heidelberg.de/data/api/inschrift/suche?"

response = requests.get(URL_form + "hd_nr=" + str(inscription_number))
json_data = response.json()
print(json_data)

{'items': [{'commentary': '(C): 2. Hälfte 1. - Anfang 2. Jh. - AE; Ende 1. - 1. Hälfte 2. Jh. - Annecchino.', 'country': 'Italy', 'depth': '2 cm', 'diplomatic_text': 'D M / NONIAE P F OPTATAE / ET C IVLIO ARTEMONI / PARENTIBVS / LIBERTIS LIBERTABVSQVE / POSTERISQVE EORVM / C IVLIVS C F OPTATVS / FILIVS', 'findspot_ancient': 'Cumae, bei', 'findspot_modern': 'Cuma, bei', 'height': '33 cm', 'id': 'HD000001', 'language': 'L', 'last_update': '2014-04-07', 'letter_size': '3.2-2 cm', 'literature': 'AE 1983, 0192. ; M. Annecchino, Puteoli 4/5, 1980/81, 286-287, Nr. 17; fig. 17. (C) - AE 1983. ;', 'material': 'Marmor, geädert / farbig', 'modern_region': 'Campania', 'not_after': '71', 'not_before': '130', 'responsible_individual': 'Feraudi', 'transcription': 'D(is) M(anibus) / Noniae P(ubli) f(iliae) Optatae / et C(aio) Iulio Artemoni / parentibus / libertis libertabusque / posterisque eorum / C(aius) Iulius C(ai) f(ilius) Optatus / filius', 'trismegistos_uri': 'https://www.trismegistos.org/text

In [4]:
%%time
### the actual data are part of the tag "items"
pd.DataFrame(json_data["items"])

CPU times: user 777 µs, sys: 704 µs, total: 1.48 ms
Wall time: 2.97 ms


Unnamed: 0,commentary,country,depth,diplomatic_text,findspot_ancient,findspot_modern,height,id,language,last_update,...,modern_region,not_after,not_before,responsible_individual,transcription,trismegistos_uri,type_of_inscription,type_of_monument,width,work_status
0,(C): 2. Hälfte 1. - Anfang 2. Jh. - AE; Ende 1...,Italy,2 cm,D M / NONIAE P F OPTATAE / ET C IVLIO ARTEMONI...,"Cumae, bei","Cuma, bei",33 cm,HD000001,L,2014-04-07,...,Campania,71,130,Feraudi,D(is) M(anibus) / Noniae P(ubli) f(iliae) Opta...,https://www.trismegistos.org/text/251193,epitaph,tabula,34 cm,provisional


# Version 1: Extracting inscriptions one by one (using simple paralel computing)

In [5]:
def get_inscription_data(num):
    try:
        response = requests.get(URL_form + "hd_nr=" + str(num), headers = headers)
        json_data_items = response.json()["items"][0]
    except:
        time.sleep(1)
        try:
            response = requests.get(URL_form + "hd_nr=" + str(num), headers = headers)
            json_data_items = response.json()["items"][0]
        except:
            json_data_items = {}
    return json_data_items

In [6]:
%%time
#### TEST without paralel computing:

all_inscriptions = []
for num in range(1,200): 
  currently_parsed = get_inscription_data(num)
  all_inscriptions.append(currently_parsed)

CPU times: user 2.02 s, sys: 240 ms, total: 2.26 s
Wall time: 13.9 s


In [7]:
all_inscriptions[:2]

[{'commentary': '(C): 2. Hälfte 1. - Anfang 2. Jh. - AE; Ende 1. - 1. Hälfte 2. Jh. - Annecchino.',
  'country': 'Italy',
  'depth': '2 cm',
  'diplomatic_text': 'D M / NONIAE P F OPTATAE / ET C IVLIO ARTEMONI / PARENTIBVS / LIBERTIS LIBERTABVSQVE / POSTERISQVE EORVM / C IVLIVS C F OPTATVS / FILIVS',
  'findspot_ancient': 'Cumae, bei',
  'findspot_modern': 'Cuma, bei',
  'height': '33 cm',
  'id': 'HD000001',
  'language': 'L',
  'last_update': '2014-04-07',
  'letter_size': '3.2-2 cm',
  'literature': 'AE 1983, 0192. ; M. Annecchino, Puteoli 4/5, 1980/81, 286-287, Nr. 17; fig. 17. (C) - AE 1983. ;',
  'material': 'Marmor, geädert / farbig',
  'modern_region': 'Campania',
  'not_after': '71',
  'not_before': '130',
  'responsible_individual': 'Feraudi',
  'transcription': 'D(is) M(anibus) / Noniae P(ubli) f(iliae) Optatae / et C(aio) Iulio Artemoni / parentibus / libertis libertabusque / posterisque eorum / C(aius) Iulius C(ai) f(ilius) Optatus / filius',
  'trismegistos_uri': 'https:/

In [8]:
pd.DataFrame(all_inscriptions)

Unnamed: 0,commentary,country,depth,diplomatic_text,findspot_ancient,findspot_modern,height,id,language,last_update,...,transcription,trismegistos_uri,type_of_inscription,type_of_monument,width,work_status,findspot,year_of_find,present_location,religion
0,(C): 2. Hälfte 1. - Anfang 2. Jh. - AE; Ende 1...,Italy,2 cm,D M / NONIAE P F OPTATAE / ET C IVLIO ARTEMONI...,"Cumae, bei","Cuma, bei",33 cm,HD000001,L,2014-04-07,...,D(is) M(anibus) / Noniae P(ubli) f(iliae) Opta...,https://www.trismegistos.org/text/251193,epitaph,tabula,34 cm,provisional,,,,
1,AE 1983: Breite: 35 cm.,Italy,,C SEXTIVS PARIS / QVI VIXIT / ANNIS LXX,Roma,Roma,28 cm,HD000002,L,2014-04-07,...,C(aius) Sextius Paris / qui vixit / annis LXX,https://www.trismegistos.org/text/265631,epitaph,tabula,85 cm,no image,"Via Nomentana, S. Alessandro, Kirche",1937.0,,
2,(B): [S]isenna ist falscher Kasus; folgende Er...,Spain,(12) cm,[ ]VMMIO [ ] / [ ]ISENNA[ ] / [ ] XV[ ] / [ ] / [,,Tomares,(37) cm,HD000003,L,2006-08-31,...,[P(ublio) M]ummio [P(ubli) f(ilio)] / [Gal(eri...,https://www.trismegistos.org/text/220675,honorific inscription,statue base,(34) cm,provisional,,-1975.0,"Sevilla, Privatbesitz",
3,Material: lokaler grauer Kalkstein. (B): Stylo...,Spain,18 cm,[ ]AVS[ ]LLA / M PORCI NIGRI SER / DOMINAE VEN...,Ipolcobulcula,Carcabuey,(39) cm,HD000004,L,2015-03-27,...,[---?]AV(?)S(?)[---]L(?)L(?)A / M(arci) Porci ...,https://www.trismegistos.org/text/222102,votive inscription,altar,27 cm,checked with photo,,-1979.0,"Carcabuey, Grupo Escolar",names of pagan deities
4,(B): Z. 3: C(ai) l(ibertae) Tyches.,Italy,,[ ] L SVCCESSVS / [ ] L L IRENAEVS / [ ] C L T...,Roma,Roma,,HD000005,L,2010-01-04,...,[---] l(ibertus) Successus / [---] L(uci) l(ib...,https://www.trismegistos.org/text/265629,epitaph,stele,,no image,Via Cupa (ehem. Vigna Nardi),,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
194,(B): AE 1983: Z. 1: Zeilenfall vor XXXV. Unter...,Italy,,] / [ ] XXXV MIL A X[ ] / [ ]RONTO ARM CVST / ...,Roma,Roma,(73) cm,HD000195,L,2011-04-06,...,------] / [vix(it) a(nnos)] XXXV mil(itavit) a...,https://www.trismegistos.org/text/265610,epitaph,stele,71 cm,checked with photo,"Via Labicana, Friedhof der equites singulares",,"Mentana, Privatbesitz",
195,(B): AE: Name des libertus: Abestus statt Asbe...,Italy,20 cm,D M / M LATTIO / PRISCO / AVGVSTALI / LVCERIAE...,"Luceria, bei","Lucera, bei",134 cm,HD000196,L,2013-12-05,...,D(is) M(anibus) / M(arco) Lattio / Prisco / Au...,https://www.trismegistos.org/text/245285,epitaph,stele,85 cm,provisional,"Vigna dell'arco, Straße von Lucera nach Foggia",1972.0,"Lucera, Mus. Civ. """"Giuseppe Fiorelli""""","cult functions, pagan"
196,FO nach Annecchino ungewiß (Campi Flegrei); He...,Italy,2 cm,D M C IVLI PETRONI / ANI MAN EX III SILVAN / N...,Misenum?,Miseno?,21 cm,HD000197,L,2013-12-05,...,D(is) M(anibus) C(ai) Iuli Petroni/ani man(ipu...,https://www.trismegistos.org/text/251190,epitaph,tabula,35 cm,provisional,Nekropole?,,,
197,(B): kleinere Leseabweichungen.,Portugal,,[ ]OMINO NOSTRO [ ] / [ ]NIANO VICTORI [ ] / T...,,Oleiros,57 cm,HD000198,L,1997-07-09,...,[D]omino nostro [Val]/[enti]niano victori [ac]...,https://www.trismegistos.org/text/226603,mile-/leaguestone,mile-/leaguestone,23 cm,provisional,Bouça do Benefício paroquial,,Mus. Pio XII Braga,


In [9]:
%%time

### TEST with paralel computing
###to make N requests in paralel, we first have to generate a range of ranges: [1,2,3], [4,5,6], [7,8,9]
all_inscriptions = []
for num in range(1,200, 100): 
  actual_nums = list(range(num, num+100))
  with ThreadPoolExecutor(max_workers=100) as pool:
    currently_parsed = list(pool.map(get_inscription_data,actual_nums))
  all_inscriptions.extend(currently_parsed)

CPU times: user 1.37 s, sys: 688 ms, total: 2.06 s
Wall time: 1.92 s


ok, the testing clearly demonstrate that using 100 workers in paralel is about 10 times faster. Let's scale it up for the whole dataset

In [10]:
%%time
### main run of the function

all_inscriptions = []
for num in range(1,90000, 200): 
    actual_nums = list(range(num, num+200))
    with ThreadPoolExecutor(max_workers=300) as pool:
        currently_parsed = list(pool.map(get_inscription_data,actual_nums))
    all_inscriptions.extend(currently_parsed)

CPU times: user 11min 57s, sys: 6min 53s, total: 18min 50s
Wall time: 18min 26s


In [11]:
all_inscriptions_filtered = []
for ins in all_inscriptions:
    try: all_inscriptions_filtered.append(ins)
    except: pass #[ins[0] for ins in all_inscriptions if ins != None
len(all_inscriptions_filtered)

90000

In [12]:
all_inscriptions_filtered

[{'commentary': '(C): 2. Hälfte 1. - Anfang 2. Jh. - AE; Ende 1. - 1. Hälfte 2. Jh. - Annecchino.',
  'country': 'Italy',
  'depth': '2 cm',
  'diplomatic_text': 'D M / NONIAE P F OPTATAE / ET C IVLIO ARTEMONI / PARENTIBVS / LIBERTIS LIBERTABVSQVE / POSTERISQVE EORVM / C IVLIVS C F OPTATVS / FILIVS',
  'findspot_ancient': 'Cumae, bei',
  'findspot_modern': 'Cuma, bei',
  'height': '33 cm',
  'id': 'HD000001',
  'language': 'L',
  'last_update': '2014-04-07',
  'letter_size': '3.2-2 cm',
  'literature': 'AE 1983, 0192. ; M. Annecchino, Puteoli 4/5, 1980/81, 286-287, Nr. 17; fig. 17. (C) - AE 1983. ;',
  'material': 'Marmor, geädert / farbig',
  'modern_region': 'Campania',
  'not_after': '71',
  'not_before': '130',
  'responsible_individual': 'Feraudi',
  'transcription': 'D(is) M(anibus) / Noniae P(ubli) f(iliae) Optatae / et C(aio) Iulio Artemoni / parentibus / libertis libertabusque / posterisque eorum / C(aius) Iulius C(ai) f(ilius) Optatus / filius',
  'trismegistos_uri': 'https:/

In [13]:
inscriptions_data_df = pd.DataFrame(all_inscriptions)
len(inscriptions_data_df)

90000

In [15]:
inscriptions_data_df = inscriptions_data_df[inscriptions_data_df["id"].notnull()]
len(inscriptions_data_df) # in april 2022, we had 81883

81883

81883

In [16]:
# check missing numbers
number_set = [n for n in range(1, len(inscriptions_data_df))]
ins_ns = [int(ins.partition("HD")[2]) for ins in inscriptions_data_df["id"].tolist()]
set(ins_ns) ^ set(number_set)

{485,
 526,
 719,
 1115,
 1535,
 1797,
 2799,
 2901,
 3038,
 3044,
 3084,
 3189,
 3243,
 3282,
 3610,
 3924,
 4200,
 4223,
 4570,
 4693,
 4787,
 4902,
 5083,
 5322,
 5325,
 5328,
 5554,
 5622,
 5715,
 5872,
 5877,
 5904,
 6342,
 6393,
 6424,
 6554,
 6572,
 6783,
 6852,
 7836,
 7930,
 7947,
 8145,
 8154,
 8161,
 8251,
 8263,
 8303,
 8437,
 8561,
 8576,
 8740,
 8789,
 8886,
 9084,
 9087,
 9090,
 9093,
 9096,
 9102,
 9141,
 9263,
 9293,
 9449,
 9529,
 9602,
 9682,
 9705,
 9736,
 9739,
 9817,
 9993,
 10028,
 10065,
 10068,
 10093,
 10132,
 10317,
 10320,
 10329,
 10581,
 10714,
 10717,
 10735,
 10768,
 10769,
 10828,
 10874,
 10912,
 11149,
 11167,
 11276,
 11284,
 11327,
 11333,
 11336,
 11339,
 11548,
 11658,
 11880,
 12010,
 12077,
 12106,
 12124,
 12127,
 12291,
 12296,
 12318,
 12333,
 12348,
 12488,
 12572,
 12993,
 13413,
 13513,
 13709,
 13915,
 14297,
 14318,
 14386,
 14421,
 14887,
 14925,
 15069,
 15160,
 15284,
 15359,
 15360,
 15496,
 15499,
 15502,
 15693,
 15694,
 15699,
 15

In [17]:
inscriptions_data_df.head(5)

Unnamed: 0,commentary,country,depth,diplomatic_text,findspot_ancient,findspot_modern,height,id,language,last_update,...,transcription,trismegistos_uri,type_of_inscription,type_of_monument,width,work_status,findspot,year_of_find,present_location,religion
0,(C): 2. Hälfte 1. - Anfang 2. Jh. - AE; Ende 1...,Italy,2 cm,D M / NONIAE P F OPTATAE / ET C IVLIO ARTEMONI...,"Cumae, bei","Cuma, bei",33 cm,HD000001,L,2014-04-07,...,D(is) M(anibus) / Noniae P(ubli) f(iliae) Opta...,https://www.trismegistos.org/text/251193,epitaph,tabula,34 cm,provisional,,,,
1,AE 1983: Breite: 35 cm.,Italy,,C SEXTIVS PARIS / QVI VIXIT / ANNIS LXX,Roma,Roma,28 cm,HD000002,L,2014-04-07,...,C(aius) Sextius Paris / qui vixit / annis LXX,https://www.trismegistos.org/text/265631,epitaph,tabula,85 cm,no image,"Via Nomentana, S. Alessandro, Kirche",1937.0,,
2,(B): [S]isenna ist falscher Kasus; folgende Er...,Spain,(12) cm,[ ]VMMIO [ ] / [ ]ISENNA[ ] / [ ] XV[ ] / [ ] / [,,Tomares,(37) cm,HD000003,L,2006-08-31,...,[P(ublio) M]ummio [P(ubli) f(ilio)] / [Gal(eri...,https://www.trismegistos.org/text/220675,honorific inscription,statue base,(34) cm,provisional,,-1975.0,"Sevilla, Privatbesitz",
3,Material: lokaler grauer Kalkstein. (B): Stylo...,Spain,18 cm,[ ]AVS[ ]LLA / M PORCI NIGRI SER / DOMINAE VEN...,Ipolcobulcula,Carcabuey,(39) cm,HD000004,L,2015-03-27,...,[---?]AV(?)S(?)[---]L(?)L(?)A / M(arci) Porci ...,https://www.trismegistos.org/text/222102,votive inscription,altar,27 cm,checked with photo,,-1979.0,"Carcabuey, Grupo Escolar",names of pagan deities
4,(B): Z. 3: C(ai) l(ibertae) Tyches.,Italy,,[ ] L SVCCESSVS / [ ] L L IRENAEVS / [ ] C L T...,Roma,Roma,,HD000005,L,2010-01-04,...,[---] l(ibertus) Successus / [---] L(uci) l(ib...,https://www.trismegistos.org/text/265629,epitaph,stele,,no image,Via Cupa (ehem. Vigna Nardi),,,


In [18]:
inscriptions_data_df.columns

Index(['commentary', 'country', 'depth', 'diplomatic_text', 'findspot_ancient',
       'findspot_modern', 'height', 'id', 'language', 'last_update',
       'letter_size', 'literature', 'material', 'modern_region', 'not_after',
       'not_before', 'responsible_individual', 'transcription',
       'trismegistos_uri', 'type_of_inscription', 'type_of_monument', 'width',
       'work_status', 'findspot', 'year_of_find', 'present_location',
       'religion'],
      dtype='object')

# Save locally and upload the data to sciencedata.dk shared folder

In [19]:
# save locally
inscriptions_data_df.to_json("../data/large_data/EDH_onebyone.json")

In [11]:
### configure session
### in the case of "SDAM_root", the group owner is Vojtech with username 648597@au.dk
s = sddk.cloudSession("sciencedata.dk", "SDAM_root/SDAM_data/EDH", "648597@au.dk")

connection with shared folder established with you as its owner
endpoint variable has been configured to: https://sciencedata.dk/files/SDAM_root/SDAM_data/EDH/


In [21]:
s.write_file("EDH_onebyone_2022-10-31.json", inscriptions_data_df)

Your <class 'pandas.core.frame.DataFrame'> object has been succesfully written as "https://sciencedata.dk/files/SDAM_root/SDAM_data/EDH/EDH_onebyone_2022-10-31.json"


In [12]:
inscriptions_data_df = s.read_file("EDH_onebyone_2022-10-31.json")
inscriptions_data_df.shape

(81883, 27)

In [13]:
inscriptions_data_df.columns

Index(['commentary', 'country', 'depth', 'diplomatic_text', 'findspot_ancient',
       'findspot_modern', 'height', 'id', 'language', 'last_update',
       'letter_size', 'literature', 'material', 'modern_region', 'not_after',
       'not_before', 'responsible_individual', 'transcription',
       'trismegistos_uri', 'type_of_inscription', 'type_of_monument', 'width',
       'work_status', 'findspot', 'year_of_find', 'present_location',
       'religion'],
      dtype='object')

In [14]:
inscriptions_data_df_old = s.read_file("EDH_onebyone_2020-09-14.json")
inscriptions_data_df_old.shape

(81476, 36)

In [15]:
inscriptions_data_df_old.columns

Index(['responsible_individual', 'type_of_inscription', 'letter_size',
       'not_after', 'literature', 'work_status', 'height', 'diplomatic_text',
       'people', 'depth', 'material', 'type_of_monument', 'province_label',
       'width', 'transcription', 'country', 'uri', 'findspot_ancient',
       'last_update', 'modern_region', 'findspot_modern', 'language', 'id',
       'edh_geography_uri', 'commentary', 'trismegistos_uri', 'not_before',
       'findspot', 'year_of_find', 'present_location', 'external_image_uris',
       'religion', 'fotos', 'geography', 'social_economic_legal_history',
       'military'],
      dtype='object')

In [17]:
set(inscriptions_data_df_old).difference(inscriptions_data_df)

{'edh_geography_uri',
 'external_image_uris',
 'fotos',
 'geography',
 'military',
 'people',
 'province_label',
 'social_economic_legal_history',
 'uri'}