# INTRODUCTION

This notebook explores the potential of the Epigraphic Database Heidelberg web API , [EDH API](https://edh-www.adw.uni-heidelberg.de/data/api) in combination with sciencedata.dk as a datastorage (see more about our current progress in using sciencedata.dk [here](https://docs.google.com/document/d/1sojHsxkcAbZH9DpWFuHDomQwTZHPQv_WaAxO_erP6FE/edit?usp=sharing)).

The ambition here is to use cloud based solutions as much as possible, without any dependence on local machines. At the same time, we do not like to rely completely upon google services. 

In [0]:
### REQUIREMENTS
import numpy as np
import math
import pandas as pd

import sys
### we do a lot of requests during the scrapping. Some of them with requests package, some of them with urllib
import requests
from urllib.request import urlopen 
from urllib.parse import quote  
from bs4 import BeautifulSoup
import xml.etree.cElementTree as ET

# to avoid errors, we sometime use time.sleep(N) before retrying a request
import time
# the input data have typically a json structure
import json
import getpass

import datetime as dt
# for simple paralel computing:
from concurrent.futures import ThreadPoolExecutor
### google drive
from google.colab import drive
#import gspread
#from gspread_dataframe import get_as_dataframe, set_with_dataframe

# EDH via API

The basis form of an request is as follows:
```
https://edh-www.adw.uni-heidelberg.de/data/api/inscriptions/search?
```
With this, to create query based on inscription number, you will have tospecify the paramenter **hd_nr**, like here:

```
https://edh-www.adw.uni-heidelberg.de/data/api/inscriptions/search?hd_nr=1
```
 (Feel free to explore this in the browser).

Here we use the function ```requests.get()``` to make our requests from python.

## One inscription query example

In [0]:
%%time
inscription_number = 100
URL_form = "https://edh-www.adw.uni-heidelberg.de/data/api/inscriptions/search?"

response = requests.get(URL_form + "hd_nr=" + str(inscription_number))
response
json_data = response.json()
print(json_data)

{'total': 1, 'items': [{'province_label': 'Hispania citerior', 'modern_region': 'Soria', 'findspot_ancient': 'Uxama', 'transcription': 'D[---] / ANELI[---] / BERVE[---] / P[---]IT[------', 'commentary': ' Text in vier Zeilen, nahezu unlesbar.', 'id': 'HD000100', 'literature': 'AE 1983, 0597.; C. García Merino, in: Homenaje al Prof. Martin Almagro Basch 3 (Madrid 1983) 355, Nr. 2; lám. 1, 2. - AE 1983.', 'uri': 'https://edh-www.adw.uni-heidelberg.de/edh/inschrift/HD000100', 'language': 'Latin', 'findspot_modern': 'El Burgo de Osma', 'work_status': 'provisional', 'edh_geography_uri': 'https://edh-www.adw.uni-heidelberg.de/edh/geographie/9371', 'last_update': '2015-05-21', 'diplomatic_text': 'D[ ] / ANELI[ ] / BERVE[ ] / P[ ]IT[', 'trismegistos_uri': 'https://www.trismegistos.org/text/226731', 'country': 'Spain', 'responsible_individual': 'Gräf', 'type_of_monument': 'stele'}], 'limit': '20'}
CPU times: user 15.5 ms, sys: 95 µs, total: 15.6 ms
Wall time: 854 ms


In [0]:
### the actual data are part of the tag "items"
%%time 
pd.DataFrame(json_data["items"]) 


CPU times: user 2.87 ms, sys: 0 ns, total: 2.87 ms
Wall time: 2.82 ms


Unnamed: 0,findspot_ancient,findspot_modern,id,diplomatic_text,uri,edh_geography_uri,literature,trismegistos_uri,work_status,province_label,type_of_monument,language,last_update,modern_region,transcription,commentary,responsible_individual,country
0,Uxama,El Burgo de Osma,HD000100,D[ ] / ANELI[ ] / BERVE[ ] / P[ ]IT[,https://edh-www.adw.uni-heidelberg.de/edh/insc...,https://edh-www.adw.uni-heidelberg.de/edh/geog...,"AE 1983, 0597.; C. García Merino, in: Homenaje...",https://www.trismegistos.org/text/226731,provisional,Hispania citerior,stele,Latin,2015-05-21,Soria,D[---] / ANELI[---] / BERVE[---] / P[---]IT[--...,"Text in vier Zeilen, nahezu unlesbar.",Gräf,Spain


## EDH provinces

We will extract our inscription data on the basis of the province from which they come.

In [0]:
%%time
response = requests.get("https://edh-www.adw.uni-heidelberg.de/data/api/terms/province")
response
json_data = response.json()
print(str(json_data)[:200])

{'provinces': {'Ach': 'Achaia', 'Aeg': 'Aegyptus', 'Aem': 'Aemilia (Regio VIII)', 'Afr': 'Africa Proconsularis', 'AlC': 'Alpes Cottiae', 'AlG': 'Alpes Graiae', 'AlM': 'Alpes Maritimae', 'AlP': 'Alpes 
CPU times: user 15.6 ms, sys: 974 µs, total: 16.6 ms
Wall time: 386 ms


In [0]:
provinces = list(json_data["provinces"].keys())
print(provinces)

['Ach', 'Aeg', 'Aem', 'Afr', 'AlC', 'AlG', 'AlM', 'AlP', 'ApC', 'Aqu', 'Ara', 'Arm', 'Asi', 'Ass', 'Bae', 'Bar', 'Bel', 'BiP', 'BrL', 'Bri', 'Cap', 'Cil', 'Cor', 'Cre', 'Cyp', 'Cyr', 'Dac', 'Dal', 'Epi', 'Etr', 'Gal', 'GeI', 'GeS', 'HiC', 'Inc', 'Iud', 'LaC', 'Lig', 'Lug', 'Lus', 'LyP', 'MaC', 'MaT', 'Mak', 'Mes', 'MoI', 'MoS', 'Nar', 'Nor', 'Num', 'PaI', 'PaS', 'Pic', 'Rae', 'ReB', 'Rom', 'Sam', 'Sar', 'Sic', 'Syr', 'Thr', 'Tra', 'Tri', 'Umb', 'Val', 'VeH']


## Get data on the province basis

In [0]:
### one province example (first page of results, i.e. first 100 inscriptions)
province = "dal"
param = "province"

### make the request
response = requests.get(URL_form + param + "=" + province + "&limit=100")
json_data = response.json()
pages = math.ceil(int(json_data["total"]) / int(json_data["limit"]))
some_inscriptions = pd.DataFrame(json_data["items"])
len(some_inscriptions)
some_inscriptions.head(5)


Unnamed: 0,responsible_individual,last_update,country,findspot_ancient,present_location,trismegistos_uri,modern_region,depth,type_of_inscription,transcription,people,height,language,id,uri,findspot_modern,work_status,commentary,type_of_monument,province_label,findspot,not_after,year_of_find,not_before,literature,edh_geography_uri,diplomatic_text,width,letter_size,fotos,material,religion,geography,social_economic_legal_history,military,external_image_uris
0,Gräf,2009-05-13,Bosnia and Herzegovina,"Domavium, bei","Tuzla, Muz. Istočne Bosne",https://www.trismegistos.org/text/181722,Republika Srpska,24 cm,epitaph,D(is) M(anibus) / Severinus / veteranus / vixi...,"[{'cognomen': 'Severinus', 'person_id': '1', '...",120 cm,Latin,HD000310,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Bratunac,checked with photo,Oberhalb des Inschriftfeldes eine weibliche u...,stele,Dalmatia,"Bosanska ulica, Kamenjak, sekundär verwendet",400,1955,301,"AE 1983, 0745.; I. Bojanovski, Članci 14, 1982...",https://edh-www.adw.uni-heidelberg.de/edh/geog...,D M / SEVERINVS / VETERANVS / VIXIT AN XIX / T...,48 cm,3 cm,,,,,,,
1,Gräf,2009-05-13,Bosnia and Herzegovina,"Domavium, bei",,https://www.trismegistos.org/text/181723,Republika Srpska,27 cm,epitaph,D(is) [M(anibus) s(acrum)?] / [--]CITI CTO[---...,"[{'name': '[---]', 'person_id': '1'}]",(145) cm,Latin,HD000313,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Srebrenica,checked with photo,,stele,Dalmatia,"Staroglavice, frühchristliche Kirche",300,1975,101,"AE 1983, 0746.; I. Bojanovski, Članci 14, 1982...",https://edh-www.adw.uni-heidelberg.de/edh/geog...,D [ ] / [ ]CITI CTO[ ] / RIVS MAR[ ] / [ ]IVS ...,(55) cm,,,,,,,,
2,Gräf,2008-09-16,Montenegro,Municipium S[---],,https://www.trismegistos.org/text/181724,,20 cm,epitaph,D(is) M(anibus) s(acrum) / Fl(aviae) Mar/cella...,"[{'cognomen': 'Marcella', 'age: years': '34', ...",(145) cm,Latin,HD000316,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Pljevlja,checked with photo,(B): AE 1983: Z. 5/6: Zeilenfall fehlt.,stele,Dalmatia,"Komini, Nekropole II, Grab 25/1975",200,1975,151,"AE 1983, 0747. (B); A. Cermanović-Kuzmanović, ...",https://edh-www.adw.uni-heidelberg.de/edh/geog...,D M S / FL MAR / CELLAE Q V / A XXXIV / NANTIV...,75 cm,6-5 cm,[https://edh-www.adw.uni-heidelberg.de/fotos/F...,,,,,,
3,Gräf,2012-06-06,Montenegro,Municipium S[---],,https://www.trismegistos.org/text/181725,,30 cm,epitaph,D(is) M(anibus) s(acrum) / L(ucio) Cipio / Fau...,"[{'person_id': '1', 'gender': 'male', 'name': ...",(170) cm,Latin,HD000319,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Pljevlja,checked with photo,Reste von roter Farbe in den Buchstaben.,stele,Dalmatia,"Komini, Nekropole II",230,1975,151,"AE 1983, 0748.; A. Cermanović-Kuzmanović, Star...",https://edh-www.adw.uni-heidelberg.de/edh/geog...,D M S / L CIPIO / FAVSTO ET / FRVNITAE / FIL L...,84 cm,6.9-3.9 cm,[https://edh-www.adw.uni-heidelberg.de/fotos/F...,lime marl / marl: rocks - clastic sediments,,,,,
4,Gräf,2008-09-16,Montenegro,Municipium S[---],"Pljevlja, Zavičajni Muz.",https://www.trismegistos.org/text/181726,,28 cm,epitaph,Q(uinto) Valerio / Quadra/to an(norum) LXI / L...,"[{'cognomen': 'Quadratus', 'age: years': '61',...",(66) cm,Latin,HD000322,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Pljevlja,checked with photo,,stele,Dalmatia,"Komini, Nekropole II",170,1974,71,"AE 1983, 0749.; A. Cermanović-Kuzmanović, Star...",https://edh-www.adw.uni-heidelberg.de/edh/geog...,Q VALERIO / QVADRA / TO AN LXI / L VAL CELER / [,58 cm,6-5 cm,[https://edh-www.adw.uni-heidelberg.de/fotos/F...,,,,,,


## Main Function: parsing all inscription data
(takes about 20 minutes)

In [0]:
### over the loop, we will extend the list of items
%%time
inscriptions_data = []
URL_form = "https://edh-www.adw.uni-heidelberg.de/data/api/inscriptions/search?province="

for province in provinces:
  try:
    total = requests.get(URL_form + province).json()["total"]
    response = requests.get(URL_form + province + "&limit=" + str(total))
    inscriptions_data.extend(response.json()["items"])
  except:
    pass
  print(province, total)

Ach 1285
Aeg 167
Aem 211
Afr 4452
AlC 475
AlG 76
AlM 413
AlP 201
ApC 761
Aqu 426
Ara 234
Arm 3
Asi 491
Ass 0
Bae 3016
Bar 181
Bel 1668
BiP 289
BrL 246
Bri 4363
Cap 62
Cil 74
Cor 46
Cre 66
Cyp 34
Cyr 88
Dac 3545
Dal 7653
Epi 138
Etr 652
Gal 208
GeI 2760
GeS 6085
HiC 4687
Inc 382
Iud 187
LaC 2600
Lig 145
Lug 594
Lus 1583
LyP 61
MaC 1124
MaT 290
Mak 1321
Mes 12
MoI 1938
MoS 1475
Nar 1401
Nor 2736
Num 2644
PaI 3136
PaS 4259
Pic 171
Rae 1011
ReB 78
Rom 4392
Sam 649
Sar 229
Sic 193
Syr 405
Thr 395
Tra 165
Tri 0
Umb 348
Val 0
VeH 1156
CPU times: user 3.68 s, sys: 438 ms, total: 4.12 s
Wall time: 12min 30s


In [0]:
len(inscriptions_data)

72483

In [0]:
%%time
inscriptions_data_df = pd.DataFrame(inscriptions_data)

CPU times: user 1.08 s, sys: 4.8 ms, total: 1.09 s
Wall time: 1.09 s


In [0]:
inscriptions_data_df.head(5)

Unnamed: 0,people,work_status,findspot_modern,last_update,responsible_individual,width,language,literature,height,diplomatic_text,not_before,depth,material,trismegistos_uri,transcription,commentary,edh_geography_uri,country,uri,province_label,modern_region,type_of_monument,present_location,findspot_ancient,not_after,type_of_inscription,id,letter_size,social_economic_legal_history,findspot,year_of_find,geography,religion,fotos,military,external_image_uris
0,"[{'name': 'L. Ponponius(!) Rufus', 'age: years...",checked with photo,Roma,2014-10-10,Cowey,19 cm,Greek-Latin,"CIG 6916.; AE 1984, 0109. (B); P. Lombardi, Ti...",45 cm,L PONPONIVS RVFVS / VIXIT ANOS XXVII / EIA PON...,101,5.4 cm,marble: rocks - metamorphic rocks,https://www.trismegistos.org/text/177036,L(ucius) Ponponius(!) Rufus / vixit an(n)os XX...,Wiederverwendung der Tafel als TÃ¼rpfosten. D...,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Italy,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Achaia,Lazio,tabula,"Roma, Mus. Naz. Rom.","Kephallenia, aus",200.0,epitaph,HD001917,1-2 cm,,,,,,,,
1,"[{'gender': 'male', 'cognomen': 'ÎÎ±Î»Î»ÎµÎ½Ï...",checked with photo,"Patrasso - AthÃ­nai, zwischen",2012-03-15,GrÃ¤f,30 cm,Greek-Latin,"CIL 03, 00572.; CIL 03, 07306.; IG 02 (2. Aufl...",146 cm,[ ]ΥΤΟΚΡΑΤΟΡΙ / [ ]ΑΙΣΑΡΙ / [[[ ]]] / [ ]ΥΣΕΒΕ...,395,,"Marmor, geÃ¤dert / farbig",https://www.trismegistos.org/text/177037,[Α]ὐτοκράτορι / [Κ]αίσαρι / [[[---]]] / [Ε]ὐσε...,Meilenstein mit zwei griechischen Inschriften...,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Greece,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Achaia,AttikÃ­,mile-/leaguestone,"AthÃ­nai, Epigr. Mus.","Athenae, bei",397.0,mile-/leaguestone,HD002097,2.7 cm,data available,"Dafni, byzantinisches Kloster, bei, sekundÃ¤r ...",,,,,,
2,,no image,AthÃ­nai,2011-04-04,Cowey,(17) cm,Latin,"CIL 03, 06101.; M. Å aÅ¡el Kos, Inscriptiones ...",(15) cm,]S HOSTIVM DEPRESSE[ ] / [ ] CXIIX BELLO MARIT...,-38,12.5 cm,,https://www.trismegistos.org/text/177038,------ nave]s hostium depresse[rit ---] / [---...,Es handelt sich um ein Elogium fÃ¼r Agrippa. ...,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Greece,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Achaia,AttikÃ­,base,"AthÃ­nai, Epigr. Mus.",Athenae,-12.0,elogium,HD002919,6.5 cm,,"Roma-Augustus Tempel, Akropolis",1866.0,,,,,
3,"[{'cognomen': 'Traianus Hadrianus', 'gender': ...",checked with photo,AthÃ­nai,2009-11-17,Cowey,76 cm,Greek-Latin,"CIL 03, 00548. (B); CIL 03, 07281.; PIR (2. Au...",112 cm,[ ]MP CAES DIVI TRAIANI PAR / THICI FIL DIVI N...,132,48 cm,,https://www.trismegistos.org/text/177039,[I]mp(eratori) Caes(ari) divi Traiani Par/thic...,(B): Am Anfang von Z. 2 fehlt das TI von nepoti.,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Greece,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Achaia,AttikÃ­,statue base,"AthÃ­nai, Epigr. Mus.",Athenae,,honorific inscription,HD002922,,,,,data available,,,,
4,"[{'gender': 'male', 'cognomen': 'Traianus+ Had...",no image,AthÃ­nai,2011-04-04,Cowey,(41) cm,Latin,"CIL 03, 06102.; CIL 03, 07283.; AE 1984, 0822....",(20) cm,[ ] / [ ] / [ ]D[ ] / [ ]R P XVI COS III P P [...,132,(15) cm,marble: rocks - metamorphic rocks,https://www.trismegistos.org/text/177040,[Imp(eratori) Caesari divi Traiani] / [Parthic...,Rekonstruktion des Inschriftentextes nach CIL...,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Greece,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Achaia,AttikÃ­,statue base,"AthÃ­nai, Epigr. Mus.",Athenae,,honorific inscription,HD002925,3.8 cm,,"\""Stoa Hadriani\"" (CIL)",,data available,,,,


## Upload the data to our shared group folder at sciencedata.dk

In [0]:
### interactively setup your sciencedata.dk homeurl, username and password
sciencedata_homeurl = "https://sciencedata.dk/files/"
username = input("sciencedata.dk username (format '123456@au.dk'):")
password = getpass.getpass("sciencedata.dk password:")

### establish a request session
s = requests.Session()
s.auth = (username, password)

sciencedata.dk username (format '123456@au.dk'):648597@au.dk
sciencedata.dk password:··········


In [0]:
### create new folder (in the case it is not already there)
s.request("MKCOL", sciencedata_homeurl + "personal_folder/EDH_data") 

<Response [405]>

In [0]:
### make a README.txt file in the folder
s.put(sciencedata_homeurl + "personal_folder/EDH_data/README.txt", data="This folder will contain all data associated with cleaning the EDH data, extracted either from the API, or from the xml files.")

<Response [201]>

In [0]:
### put your dataframe data into this folder
s.put(sciencedata_homeurl + "personal_folder/EDH_data/EDH_inscriptions_raw.json", data=inscriptions_data_df.to_json())

<Response [204]>

# ARCHIVE! - not needed anymore!!!

In [0]:
s.get("https://sciencedata.dk/files/personal_folder/inscriptions_raw_TEST.json")

<Response [200]>

In [0]:
r = s.get("https://sciencedata.dk/files/personal_folder/inscriptions_raw_TEST.json")
inscriptions_raw = pd.DataFrame(r.text.json())
inscriptions_raw.head(5)

AttributeError: ignored

In [0]:
s.put("https://sciencedata.dk/files/personal_folder/inscriptions_raw_TEST.json", data=inscriptions_raw.to_dict())

<Response [201]>

In [0]:
response = requests.get("https://sciencedata.dk/files/personal_folder/inscriptions_raw.json", auth=auth)

In [0]:
pd.DataFrame(response.json()).head()

Unnamed: 0,width,responsible_individual,findspot_modern,findspot_ancient,trismegistos_uri,not_before,present_location,literature,work_status,not_after,diplomatic_text,letter_size,country,depth,height,language,edh_geography_uri,commentary,type_of_inscription,material,transcription,id,province_label,last_update,type_of_monument,modern_region,uri,people,findspot,social_economic_legal_history,year_of_find,geography,religion,fotos,military,external_image_uris
0,19 cm,Cowey,Roma,"Kephallenia, aus",https://www.trismegistos.org/text/177036,101,"Roma, Mus. Naz. Rom.","CIG 6916.; AE 1984, 0109. (B); P. Lombardi, Ti...",checked with photo,200.0,L PONPONIVS RVFVS / VIXIT ANOS XXVII / EIA PON...,1-2 cm,Italy,5.4 cm,45 cm,Greek-Latin,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Wiederverwendung der Tafel als TÃ¼rpfosten. D...,epitaph,marble: rocks - metamorphic rocks,L(ucius) Ponponius(!) Rufus / vixit an(n)os XX...,HD001917,Achaia,2014-10-10,tabula,Lazio,https://edh-www.adw.uni-heidelberg.de/edh/insc...,"[{'age: years': '27', 'name': 'L. Ponponius(!)...",,,,,,,,
1,30 cm,GrÃ¤f,"Patrasso - AthÃ­nai, zwischen","Athenae, bei",https://www.trismegistos.org/text/177037,395,"AthÃ­nai, Epigr. Mus.","CIL 03, 00572.; CIL 03, 07306.; IG 02 (2. Aufl...",checked with photo,397.0,[ ]ΥΤΟΚΡΑΤΟΡΙ / [ ]ΑΙΣΑΡΙ / [[[ ]]] / [ ]ΥΣΕΒΕ...,2.7 cm,Greece,,146 cm,Greek-Latin,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Meilenstein mit zwei griechischen Inschriften...,mile-/leaguestone,"Marmor, geÃ¤dert / farbig",[Α]ὐτοκράτορι / [Κ]αίσαρι / [[[---]]] / [Ε]ὐσε...,HD002097,Achaia,2012-03-15,mile-/leaguestone,AttikÃ­,https://edh-www.adw.uni-heidelberg.de/edh/insc...,"[{'name': 'ÎÎ±Î»Î»[ÎµÎ½]ÏÎ¹Î½Î¹Î±Î½á¿¶', 'ge...","Dafni, byzantinisches Kloster, bei, sekundÃ¤r ...",data available,,,,,,
2,(17) cm,Cowey,AthÃ­nai,Athenae,https://www.trismegistos.org/text/177038,-38,"AthÃ­nai, Epigr. Mus.","CIL 03, 06101.; M. Å aÅ¡el Kos, Inscriptiones ...",no image,-12.0,]S HOSTIVM DEPRESSE[ ] / [ ] CXIIX BELLO MARIT...,6.5 cm,Greece,12.5 cm,(15) cm,Latin,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Es handelt sich um ein Elogium fÃ¼r Agrippa. ...,elogium,,------ nave]s hostium depresse[rit ---] / [---...,HD002919,Achaia,2011-04-04,base,AttikÃ­,https://edh-www.adw.uni-heidelberg.de/edh/insc...,,"Roma-Augustus Tempel, Akropolis",,1866.0,,,,,
3,76 cm,Cowey,AthÃ­nai,Athenae,https://www.trismegistos.org/text/177039,132,"AthÃ­nai, Epigr. Mus.","CIL 03, 00548. (B); CIL 03, 07281.; PIR (2. Au...",checked with photo,,[ ]MP CAES DIVI TRAIANI PAR / THICI FIL DIVI N...,,Greece,48 cm,112 cm,Greek-Latin,https://edh-www.adw.uni-heidelberg.de/edh/geog...,(B): Am Anfang von Z. 2 fehlt das TI von nepoti.,honorific inscription,,[I]mp(eratori) Caes(ari) divi Traiani Par/thic...,HD002922,Achaia,2009-11-17,statue base,AttikÃ­,https://edh-www.adw.uni-heidelberg.de/edh/insc...,"[{'cognomen': 'Traianus Hadrianus', 'person_id...",,,,data available,,,,
4,(41) cm,Cowey,AthÃ­nai,Athenae,https://www.trismegistos.org/text/177040,132,"AthÃ­nai, Epigr. Mus.","CIL 03, 06102.; CIL 03, 07283.; AE 1984, 0822....",no image,,[ ] / [ ] / [ ]D[ ] / [ ]R P XVI COS III P P [...,3.8 cm,Greece,(15) cm,(20) cm,Latin,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Rekonstruktion des Inschriftentextes nach CIL...,honorific inscription,marble: rocks - metamorphic rocks,[Imp(eratori) Caesari divi Traiani] / [Parthic...,HD002925,Achaia,2011-04-04,statue base,AttikÃ­,https://edh-www.adw.uni-heidelberg.de/edh/insc...,"[{'person_id': '1', 'cognomen': 'Traianus+ Had...","\""Stoa Hadriani\"" (CIL)",,,data available,,,,


In [0]:
requests.get("https://sciencedata.dk/files/inscriptions_raw.json")

<Response [401]>

In [0]:

#####


#### PERHAPS SOMETHING AS SIMPLE AS THIS (https://medium.com/@snaily16/import-data-into-google-colaboratory-fe80b82e9306)

!wget http://your_domain/your_file.zip



In [0]:
### backuping to google drive
requests.get(
inscriptions_data_df.to_json(gdrive_root + "data/inscriptions_raw.json")

In [0]:
from io import BytesIO
from io import StringIO




In [0]:
buffer = StringIO()
inscriptions_data_df.to_json(path_or_buf=buffer)

In [0]:
text = buffer.getvalue() 
bio = io.BytesIO(str.encode(text))

In [0]:
bio

<_io.BytesIO at 0x7fbf962ff9e8>

In [0]:
client.upload(remote_path="personal_folder/inscriptions_row_2.json", local_path=text) 

In [0]:
inscriptions_data_df.head(5)

Unnamed: 0,width,responsible_individual,findspot_modern,findspot_ancient,trismegistos_uri,not_before,present_location,literature,work_status,not_after,diplomatic_text,letter_size,country,depth,height,language,edh_geography_uri,commentary,type_of_inscription,material,transcription,id,province_label,last_update,type_of_monument,modern_region,uri,people,findspot,social_economic_legal_history,year_of_find,geography,religion,fotos,military,external_image_uris
0,19 cm,Cowey,Roma,"Kephallenia, aus",https://www.trismegistos.org/text/177036,101.0,"Roma, Mus. Naz. Rom.","CIG 6916.; AE 1984, 0109. (B); P. Lombardi, Ti...",checked with photo,200.0,L PONPONIVS RVFVS / VIXIT ANOS XXVII / EIA PON...,1-2 cm,Italy,5.4 cm,45 cm,Greek-Latin,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Wiederverwendung der Tafel als TÃ¼rpfosten. D...,epitaph,marble: rocks - metamorphic rocks,L(ucius) Ponponius(!) Rufus / vixit an(n)os XX...,HD001917,Achaia,2014-10-10,tabula,Lazio,https://edh-www.adw.uni-heidelberg.de/edh/insc...,"[{'age: years': '27', 'name': 'L. Ponponius(!)...",,,,,,,,
1,30 cm,GrÃ¤f,"Patrasso - AthÃ­nai, zwischen","Athenae, bei",https://www.trismegistos.org/text/177037,395.0,"AthÃ­nai, Epigr. Mus.","CIL 03, 00572.; CIL 03, 07306.; IG 02 (2. Aufl...",checked with photo,397.0,[ ]ΥΤΟΚΡΑΤΟΡΙ / [ ]ΑΙΣΑΡΙ / [[[ ]]] / [ ]ΥΣΕΒΕ...,2.7 cm,Greece,,146 cm,Greek-Latin,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Meilenstein mit zwei griechischen Inschriften...,mile-/leaguestone,"Marmor, geÃ¤dert / farbig",[Α]ὐτοκράτορι / [Κ]αίσαρι / [[[---]]] / [Ε]ὐσε...,HD002097,Achaia,2012-03-15,mile-/leaguestone,AttikÃ­,https://edh-www.adw.uni-heidelberg.de/edh/insc...,"[{'name': 'ÎÎ±Î»Î»[ÎµÎ½]ÏÎ¹Î½Î¹Î±Î½á¿¶', 'ge...","Dafni, byzantinisches Kloster, bei, sekundÃ¤r ...",data available,,,,,,
2,(17) cm,Cowey,AthÃ­nai,Athenae,https://www.trismegistos.org/text/177038,-38.0,"AthÃ­nai, Epigr. Mus.","CIL 03, 06101.; M. Å aÅ¡el Kos, Inscriptiones ...",no image,-12.0,]S HOSTIVM DEPRESSE[ ] / [ ] CXIIX BELLO MARIT...,6.5 cm,Greece,12.5 cm,(15) cm,Latin,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Es handelt sich um ein Elogium fÃ¼r Agrippa. ...,elogium,,------ nave]s hostium depresse[rit ---] / [---...,HD002919,Achaia,2011-04-04,base,AttikÃ­,https://edh-www.adw.uni-heidelberg.de/edh/insc...,,"Roma-Augustus Tempel, Akropolis",,1866.0,,,,,
3,76 cm,Cowey,AthÃ­nai,Athenae,https://www.trismegistos.org/text/177039,132.0,"AthÃ­nai, Epigr. Mus.","CIL 03, 00548. (B); CIL 03, 07281.; PIR (2. Au...",checked with photo,,[ ]MP CAES DIVI TRAIANI PAR / THICI FIL DIVI N...,,Greece,48 cm,112 cm,Greek-Latin,https://edh-www.adw.uni-heidelberg.de/edh/geog...,(B): Am Anfang von Z. 2 fehlt das TI von nepoti.,honorific inscription,,[I]mp(eratori) Caes(ari) divi Traiani Par/thic...,HD002922,Achaia,2009-11-17,statue base,AttikÃ­,https://edh-www.adw.uni-heidelberg.de/edh/insc...,"[{'cognomen': 'Traianus Hadrianus', 'person_id...",,,,data available,,,,
4,(41) cm,Cowey,AthÃ­nai,Athenae,https://www.trismegistos.org/text/177040,132.0,"AthÃ­nai, Epigr. Mus.","CIL 03, 06102.; CIL 03, 07283.; AE 1984, 0822....",no image,,[ ] / [ ] / [ ]D[ ] / [ ]R P XVI COS III P P [...,3.8 cm,Greece,(15) cm,(20) cm,Latin,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Rekonstruktion des Inschriftentextes nach CIL...,honorific inscription,marble: rocks - metamorphic rocks,[Imp(eratori) Caesari divi Traiani] / [Parthic...,HD002925,Achaia,2011-04-04,statue base,AttikÃ­,https://edh-www.adw.uni-heidelberg.de/edh/insc...,"[{'person_id': '1', 'cognomen': 'Traianus+ Had...","\""Stoa Hadriani\"" (CIL)",,,data available,,,,


In [0]:
import io
from io import BytesIO

In [0]:
inscriptions_data_df.to_csv(StringIO())

In [0]:
client.upload_sync(remote_path="personal_folder/inscriptions_row.json", local_path=StringIO)

TypeError: ignored

In [0]:
buffer = io.StringIO()
inscriptions_data_df.to_json(buffer) 
text = buffer.getvalue() 
bio = io.BytesIO(str.encode(text))

AttributeError: ignored

In [0]:
inscriptions_data_df

In [0]:
client.list()

['files/', 'Notes/', 'SDAM_root/', 'personal_folder/']

In [0]:
client.upload_sync(remote_path="personal_folder/inscriptions_row.json", local_path=gdrive_root+"data/inscriptions_raw.json")

In [0]:
webdav.upload(gdrive_root+"data/inscriptions_raw.json", "personal_folder/inscriptions_row_2.json")


NameError: ignored

## Uploading back the parsed inscriptions

In [0]:
inscriptions_data_df = pd.read_json(gdrive_root + "data/inscriptions_raw.json")
inscriptions_data_df.head(5)

Unnamed: 0,width,responsible_individual,findspot_modern,findspot_ancient,trismegistos_uri,not_before,present_location,literature,work_status,not_after,diplomatic_text,letter_size,country,depth,height,language,edh_geography_uri,commentary,type_of_inscription,material,transcription,id,province_label,last_update,type_of_monument,modern_region,uri,people,findspot,social_economic_legal_history,year_of_find,geography,religion,fotos,military,external_image_uris
0,19 cm,Cowey,Roma,"Kephallenia, aus",https://www.trismegistos.org/text/177036,101.0,"Roma, Mus. Naz. Rom.","CIG 6916.; AE 1984, 0109. (B); P. Lombardi, Ti...",checked with photo,200.0,L PONPONIVS RVFVS / VIXIT ANOS XXVII / EIA PON...,1-2 cm,Italy,5.4 cm,45 cm,Greek-Latin,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Wiederverwendung der Tafel als TÃ¼rpfosten. D...,epitaph,marble: rocks - metamorphic rocks,L(ucius) Ponponius(!) Rufus / vixit an(n)os XX...,HD001917,Achaia,2014-10-10,tabula,Lazio,https://edh-www.adw.uni-heidelberg.de/edh/insc...,"[{'age: years': '27', 'name': 'L. Ponponius(!)...",,,,,,,,
1,30 cm,GrÃ¤f,"Patrasso - AthÃ­nai, zwischen","Athenae, bei",https://www.trismegistos.org/text/177037,395.0,"AthÃ­nai, Epigr. Mus.","CIL 03, 00572.; CIL 03, 07306.; IG 02 (2. Aufl...",checked with photo,397.0,[ ]ΥΤΟΚΡΑΤΟΡΙ / [ ]ΑΙΣΑΡΙ / [[[ ]]] / [ ]ΥΣΕΒΕ...,2.7 cm,Greece,,146 cm,Greek-Latin,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Meilenstein mit zwei griechischen Inschriften...,mile-/leaguestone,"Marmor, geÃ¤dert / farbig",[Α]ὐτοκράτορι / [Κ]αίσαρι / [[[---]]] / [Ε]ὐσε...,HD002097,Achaia,2012-03-15,mile-/leaguestone,AttikÃ­,https://edh-www.adw.uni-heidelberg.de/edh/insc...,"[{'name': 'ÎÎ±Î»Î»[ÎµÎ½]ÏÎ¹Î½Î¹Î±Î½á¿¶', 'ge...","Dafni, byzantinisches Kloster, bei, sekundÃ¤r ...",data available,,,,,,
2,(17) cm,Cowey,AthÃ­nai,Athenae,https://www.trismegistos.org/text/177038,-38.0,"AthÃ­nai, Epigr. Mus.","CIL 03, 06101.; M. Å aÅ¡el Kos, Inscriptiones ...",no image,-12.0,]S HOSTIVM DEPRESSE[ ] / [ ] CXIIX BELLO MARIT...,6.5 cm,Greece,12.5 cm,(15) cm,Latin,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Es handelt sich um ein Elogium fÃ¼r Agrippa. ...,elogium,,------ nave]s hostium depresse[rit ---] / [---...,HD002919,Achaia,2011-04-04,base,AttikÃ­,https://edh-www.adw.uni-heidelberg.de/edh/insc...,,"Roma-Augustus Tempel, Akropolis",,1866.0,,,,,
3,76 cm,Cowey,AthÃ­nai,Athenae,https://www.trismegistos.org/text/177039,132.0,"AthÃ­nai, Epigr. Mus.","CIL 03, 00548. (B); CIL 03, 07281.; PIR (2. Au...",checked with photo,,[ ]MP CAES DIVI TRAIANI PAR / THICI FIL DIVI N...,,Greece,48 cm,112 cm,Greek-Latin,https://edh-www.adw.uni-heidelberg.de/edh/geog...,(B): Am Anfang von Z. 2 fehlt das TI von nepoti.,honorific inscription,,[I]mp(eratori) Caes(ari) divi Traiani Par/thic...,HD002922,Achaia,2009-11-17,statue base,AttikÃ­,https://edh-www.adw.uni-heidelberg.de/edh/insc...,"[{'cognomen': 'Traianus Hadrianus', 'person_id...",,,,data available,,,,
4,(41) cm,Cowey,AthÃ­nai,Athenae,https://www.trismegistos.org/text/177040,132.0,"AthÃ­nai, Epigr. Mus.","CIL 03, 06102.; CIL 03, 07283.; AE 1984, 0822....",no image,,[ ] / [ ] / [ ]D[ ] / [ ]R P XVI COS III P P [...,3.8 cm,Greece,(15) cm,(20) cm,Latin,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Rekonstruktion des Inschriftentextes nach CIL...,honorific inscription,marble: rocks - metamorphic rocks,[Imp(eratori) Caesari divi Traiani] / [Parthic...,HD002925,Achaia,2011-04-04,statue base,AttikÃ­,https://edh-www.adw.uni-heidelberg.de/edh/insc...,"[{'person_id': '1', 'cognomen': 'Traianus+ Had...","\""Stoa Hadriani\"" (CIL)",,,data available,,,,


## EDH from Github


In [0]:
r = requests.get("https://github.com/sdam-au/edh_workflow/tree/master/original_dataset/EDH_dump/")

In [0]:
r.headers["Content-Type"]

'text/html; charset=utf-8'

In [0]:
!pip install PyGithub
from github import Github

Collecting PyGithub
[?25l  Downloading https://files.pythonhosted.org/packages/f4/86/7206d9b47cfdd5880081fdd1c94d9615cb6e10f4db1e98eb5b358839bc2a/PyGithub-1.44.1.tar.gz (110kB)
[K     |███                             | 10kB 16.1MB/s eta 0:00:01[K     |██████                          | 20kB 2.3MB/s eta 0:00:01[K     |█████████                       | 30kB 3.3MB/s eta 0:00:01[K     |████████████                    | 40kB 2.2MB/s eta 0:00:01[K     |██████████████▉                 | 51kB 2.7MB/s eta 0:00:01[K     |█████████████████▉              | 61kB 3.2MB/s eta 0:00:01[K     |████████████████████▉           | 71kB 3.7MB/s eta 0:00:01[K     |███████████████████████▉        | 81kB 4.1MB/s eta 0:00:01[K     |██████████████████████████▊     | 92kB 4.6MB/s eta 0:00:01[K     |█████████████████████████████▊  | 102kB 3.5MB/s eta 0:00:01[K     |████████████████████████████████| 112kB 3.5MB/s 
[?25hCollecting deprecated
  Downloading https://files.pythonhosted.org/packages

In [0]:
# using username and password
github_login = "kasev"
g = Github("kasev", getpass.getpass())

··········


In [0]:
for repo in g.get_organization("sdam-au").get_repos():
  print(repo.name)

sdam-au
Petra-lab-notebook
typesetting
cedhar
edh_workflow


In [0]:
repo = g.get_repo("sdam-au/edh_workflow")



Repository(full_name="sdam-au/edh_workflow")

In [0]:
len(repo.get_contents("original_dataset/EDH_dump"))

1000

In [0]:
url = "https://raw.githubusercontent.com/sdam-au/edh_workflow/master/original_dataset/EDH_dump/HD000001.xml"
r = requests.get(url)

In [0]:
r.text

'<?xml version="1.0" encoding="UTF-8"?><?xml-model href="http://www.stoa.org/epidoc/schema/latest/tei-epidoc.rng" schematypens="http://relaxng.org/ns/structure/1.0"?><TEI xmlns="http://www.tei-c.org/ns/1.0" xml:space="preserve" xml:lang="de" xml:base="ex-epidoctemplate.xml">\n    <teiHeader>\n        <fileDesc>\n            <titleStmt>\n                <title>Grabinschrift auf Tafel</title>\n            </titleStmt>  \n            <publicationStmt>\n                <authority>Epigraphische Datenbank Heidelberg</authority>\n                <idno type="URI">http://edh-www.adw.uni-heidelberg.de/edh/inschrift/HD000001</idno>\n                <idno type="TM">251193</idno><idno type="localID">HD000001</idno>\n                <availability>\n                    <p>© Heidelberg Academy of Sciences and Humanities</p>\n                    <licence target="http://creativecommons.org/licenses/by-sa/4.0/">This file is licensed under the Creative Commons Attribution-ShareAlike 4.0 license.\n        

In [0]:
soup = BeautifulSoup(r.text)
soup

<?xml version="1.0" encoding="UTF-8"?><?xml-model href="http://www.stoa.org/epidoc/schema/latest/tei-epidoc.rng" schematypens="http://relaxng.org/ns/structure/1.0"?><html><body><tei xml:base="ex-epidoctemplate.xml" xml:lang="de" xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0">
<teiheader>
<filedesc>
<titlestmt>
<title>Grabinschrift auf Tafel</title>
</titlestmt>
<publicationstmt>
<authority>Epigraphische Datenbank Heidelberg</authority>
<idno type="URI">http://edh-www.adw.uni-heidelberg.de/edh/inschrift/HD000001</idno>
<idno type="TM">251193</idno><idno type="localID">HD000001</idno>
<availability>
<p>© Heidelberg Academy of Sciences and Humanities</p>
<licence target="http://creativecommons.org/licenses/by-sa/4.0/">This file is licensed under the Creative Commons Attribution-ShareAlike 4.0 license.
                    </licence>
</availability>
</publicationstmt>
<sourcedesc>
<msdesc>
<msidentifier>
<repository ref="www.trismegistos.org/"></repository>
<collection></collectio

In [0]:
url = "https://raw.githubusercontent.com/sdam-au/edh_workflow/master/original_dataset/EDH_dump/HD000001.xml"
soup = BeautifulSoup(requests.get(url).text)

values = {}
tag_names = []
n = 2
for tag in soup.find_all():
  tag_values = tag.attrs
  tag_values.update({"get_text": tag.get_text()})
  tag_name = str(tag.name)
  value = tag.get_text()
  #if len(soup.find_all(tag_name)) > 1:
  for parent in tag.parents:
    tag_name = str(parent.name) + "/" + tag_name
  tag_name = tag_name.partition("tei/")[2]
  if tag_name in tag_names:
    tag_name = tag_name + "_" + str(n)
    n = n + 1
  else:
    tag_names.append(tag_name)
    n = 2
  values.update({tag_name : tag_values})



In [0]:
values

{'': {'get_text': '\n\n\n\nGrabinschrift auf Tafel\n\n\nEpigraphische Datenbank Heidelberg\nhttp://edh-www.adw.uni-heidelberg.de/edh/inschrift/HD000001\n251193HD000001\n\n© Heidelberg Academy of Sciences and Humanities\nThis file is licensed under the Creative Commons Attribution-ShareAlike 4.0 license.\n                    \n\n\n\n\n\n\n\n\n\n\n\n\n\nTafel\nMarmor, geädert / farbig\n\n\n33\n34\n2.7\n\nnein\n\n\n\n\nunbestimmt\n\n\n\n\n\n\n\n3.2-2\n\n\n\n\n\n\nLatium et Campania (Regio I)Cumae, bei\n71 AD – 130 AD\n                            \n\n\nCuma, beiCampaniaItalien\n\n\n\n\n\nMarked-up according to the EpiDoc Guidelines\n\n\n\nDigitized other representations\n\n\n\n\n\n\n\nGrabinschrift\n\n\n\nArabic\nEnglish\nFrench\nGerman\nAncient Greek\nTransliterated Greek\nModern Greek\nHebrew\nItalian\nLatin\nSpanish\n\nEAGLE - Europeana Network of Ancient Greek and Latin Epigraphy\n\n\n             provisorisch bearbeitet\n         \n\n\n\n\n\n\n                        AE 1983, 0192. \n

In [0]:
tree = ET.parse(r)
root = tree.getroot()
            file_table_list = []
            try: 
                text_cts = root.get("text-cts")
                author = root.get("author")
                title = root.get("title")
            except:
                text_cts = root.get("text-cts")
                author = root.get("author")
                title = root.get("title")




'{"{http://www.tei-c.org/ns/1.0}TEI": {"@{http://www.w3.org/XML/1998/namespace}space": "preserve", "@{http://www.w3.org/XML/1998/namespace}lang": "de", "@{http://www.w3.org/XML/1998/namespace}base": "ex-epidoctemplate.xml", "{http://www.tei-c.org/ns/1.0}teiHeader": {"{http://www.tei-c.org/ns/1.0}fileDesc": {"{http://www.tei-c.org/ns/1.0}titleStmt": {"{http://www.tei-c.org/ns/1.0}title": {"$": "Grabinschrift auf Tafel"}}, "{http://www.tei-c.org/ns/1.0}publicationStmt": {"{http://www.tei-c.org/ns/1.0}authority": {"$": "Epigraphische Datenbank Heidelberg"}, "{http://www.tei-c.org/ns/1.0}idno": [{"@type": "URI", "$": "http://edh-www.adw.uni-heidelberg.de/edh/inschrift/HD000001"}, {"@type": "TM", "$": 251193}, {"@type": "localID", "$": "HD000001"}], "{http://www.tei-c.org/ns/1.0}availability": {"{http://www.tei-c.org/ns/1.0}p": {"$": "\\u00a9 Heidelberg Academy of Sciences and Humanities"}, "{http://www.tei-c.org/ns/1.0}licence": {"@target": "http://creativecommons.org/licenses/by-sa/4.0/",

## Findspots geographies


In [0]:
### one province example (first page of results)
URL_form_geo = "https://edh-www.adw.uni-heidelberg.de/data/api/geography/search?"

2610


In [0]:
### again generate a list of provinces, resp. their abbreviations
json_data = requests.get("https://edh-www.adw.uni-heidelberg.de/data/api/terms/province").json()
provinces = json_data["provinces"].keys()

URL_form_geo = "https://edh-www.adw.uni-heidelberg.de/data/api/geography/search?"

### parse all find spot for each province 
### at them to the list
geo_data = []
for province in provinces:
  total = requests.get(URL_form_geo + "province=" + province).json()["total"]
  response = requests.get(URL_form_geo + "province=" + province + "&limit=" + str(total))
  geo_data.extend(response.json()["items"])
  print(province, total)


Ach 183
Aeg 65
Aem 162
Afr 1400
AlC 59
AlG 31
AlM 173
AlP 79
ApC 204
Aqu 212
Ara 103
Arm 2
Asi 182
Ass 0
Bae 1305
Bar 89
Bel 429
BiP 105
BrL 79
Bri 1905
Cap 31
Cil 48
Cor 15
Cre 29
Cyp 21
Cyr 24
Dac 995
Dal 2162
Epi 77
Etr 356
Gal 89
GeI 1039
GeS 2610
HiC 2553
Inc 7
Iud 83
LaC 654
Lig 131
Lug 251
Lus 746
LyP 37
MaC 298
MaT 43
Mak 434
Mes 9
MoI 888
MoS 790
Nar 635
Nor 1504
Num 333
PaI 1316
PaS 1623
Pic 85
Rae 729
ReB 22
Rom 734
Sam 313
Sar 81
Sic 47
Syr 154
Thr 257
Tra 111
Tri 0
Umb 212
Val 0
VeH 479


In [0]:
len(geo_data)

29822

In [0]:
geo_data_df = pd.DataFrame(geo_data)
geo_data_df.set_index("uri", inplace=True)
geo_data_df.head(5)

Unnamed: 0_level_0,coordinates,country,find_spot_ancient,province,last_update,id,region,pleiades_uri,find_spot,find_spot_modern,geonames_uri
uri,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
https://edh-www.adw.uni-heidelberg.de/edh/geographie/10,"37.05,25.19",Greece,Ptoion,Achaia,2011-07-14,10,,,,,
https://edh-www.adw.uni-heidelberg.de/edh/geographie/100,"37.983175,23.716647",Greece,Athenae,Achaia,2012-03-20,100,Attikí,https://pleiades.stoa.org/places/579885,Kerameikos,Athínai,
https://edh-www.adw.uni-heidelberg.de/edh/geographie/101,"37.983175,23.716647",Greece,Athenae,Achaia,2012-03-20,101,Attikí,https://pleiades.stoa.org/places/579885,Kerameikos bei Kirche Haghia Triada,Athínai,
https://edh-www.adw.uni-heidelberg.de/edh/geographie/102,"38.251123,21.741943",Greece,Colonia Augusta Aroe Patrae,Achaia,2012-03-20,102,Dytikí Elláda,,"Kirche, sekundär verwendet",Pátrai,
https://edh-www.adw.uni-heidelberg.de/edh/geographie/103,"37.939865,22.928467",Greece,Colonia Laus Iulia Corinthus,Achaia,2012-03-20,103,Pelopónissos,,Kranion,Kórinthos,


In [0]:
### backuping to google drive
geo_data_df.to_json(gdrive_root + "data/EDH_geo_raw.json")

## Merging inscriptions with their geographies
You can start here by uploading the data

In [0]:
inscriptions_data_df = pd.read_json(gdrive_root + "data/inscriptions_raw.json")
geo_data_df = pd.read_json(gdrive_root + "data/EDH_geo_raw.json")

In [0]:
def get_coordinates(column):
  try:
    return geo_data_df.loc[column]["coordinates"]
  except:
    return None


inscriptions_data_df["coordinates"]  =  inscriptions_data_df.apply(lambda row: get_coordinates(row["edh_geography_uri"]), axis=1)
inscriptions_data_df.head(5)

Unnamed: 0,width,responsible_individual,findspot_modern,findspot_ancient,trismegistos_uri,not_before,present_location,literature,work_status,not_after,diplomatic_text,letter_size,country,depth,height,language,edh_geography_uri,commentary,type_of_inscription,material,transcription,id,province_label,last_update,type_of_monument,modern_region,uri,people,findspot,social_economic_legal_history,year_of_find,geography,religion,fotos,military,external_image_uris,coordinates
0,19 cm,Cowey,Roma,"Kephallenia, aus",https://www.trismegistos.org/text/177036,101.0,"Roma, Mus. Naz. Rom.","CIG 6916.; AE 1984, 0109. (B); P. Lombardi, Ti...",checked with photo,200.0,L PONPONIVS RVFVS / VIXIT ANOS XXVII / EIA PON...,1-2 cm,Italy,5.4 cm,45 cm,Greek-Latin,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Wiederverwendung der Tafel als TÃ¼rpfosten. D...,epitaph,marble: rocks - metamorphic rocks,L(ucius) Ponponius(!) Rufus / vixit an(n)os XX...,HD001917,Achaia,2014-10-10,tabula,Lazio,https://edh-www.adw.uni-heidelberg.de/edh/insc...,"[{'age: years': '27', 'name': 'L. Ponponius(!)...",,,,,,,,,"41.8917375,12.4861685"
1,30 cm,GrÃ¤f,"Patrasso - AthÃ­nai, zwischen","Athenae, bei",https://www.trismegistos.org/text/177037,395.0,"AthÃ­nai, Epigr. Mus.","CIL 03, 00572.; CIL 03, 07306.; IG 02 (2. Aufl...",checked with photo,397.0,[ ]ΥΤΟΚΡΑΤΟΡΙ / [ ]ΑΙΣΑΡΙ / [[[ ]]] / [ ]ΥΣΕΒΕ...,2.7 cm,Greece,,146 cm,Greek-Latin,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Meilenstein mit zwei griechischen Inschriften...,mile-/leaguestone,"Marmor, geÃ¤dert / farbig",[Α]ὐτοκράτορι / [Κ]αίσαρι / [[[---]]] / [Ε]ὐσε...,HD002097,Achaia,2012-03-15,mile-/leaguestone,AttikÃ­,https://edh-www.adw.uni-heidelberg.de/edh/insc...,"[{'name': 'ÎÎ±Î»Î»[ÎµÎ½]ÏÎ¹Î½Î¹Î±Î½á¿¶', 'ge...","Dafni, byzantinisches Kloster, bei, sekundÃ¤r ...",data available,,,,,,,"38.012978,23.635883"
2,(17) cm,Cowey,AthÃ­nai,Athenae,https://www.trismegistos.org/text/177038,-38.0,"AthÃ­nai, Epigr. Mus.","CIL 03, 06101.; M. Å aÅ¡el Kos, Inscriptiones ...",no image,-12.0,]S HOSTIVM DEPRESSE[ ] / [ ] CXIIX BELLO MARIT...,6.5 cm,Greece,12.5 cm,(15) cm,Latin,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Es handelt sich um ein Elogium fÃ¼r Agrippa. ...,elogium,,------ nave]s hostium depresse[rit ---] / [---...,HD002919,Achaia,2011-04-04,base,AttikÃ­,https://edh-www.adw.uni-heidelberg.de/edh/insc...,,"Roma-Augustus Tempel, Akropolis",,1866.0,,,,,,"37.983175,23.716647"
3,76 cm,Cowey,AthÃ­nai,Athenae,https://www.trismegistos.org/text/177039,132.0,"AthÃ­nai, Epigr. Mus.","CIL 03, 00548. (B); CIL 03, 07281.; PIR (2. Au...",checked with photo,,[ ]MP CAES DIVI TRAIANI PAR / THICI FIL DIVI N...,,Greece,48 cm,112 cm,Greek-Latin,https://edh-www.adw.uni-heidelberg.de/edh/geog...,(B): Am Anfang von Z. 2 fehlt das TI von nepoti.,honorific inscription,,[I]mp(eratori) Caes(ari) divi Traiani Par/thic...,HD002922,Achaia,2009-11-17,statue base,AttikÃ­,https://edh-www.adw.uni-heidelberg.de/edh/insc...,"[{'cognomen': 'Traianus Hadrianus', 'person_id...",,,,data available,,,,,"37.983175,23.716647"
4,(41) cm,Cowey,AthÃ­nai,Athenae,https://www.trismegistos.org/text/177040,132.0,"AthÃ­nai, Epigr. Mus.","CIL 03, 06102.; CIL 03, 07283.; AE 1984, 0822....",no image,,[ ] / [ ] / [ ]D[ ] / [ ]R P XVI COS III P P [...,3.8 cm,Greece,(15) cm,(20) cm,Latin,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Rekonstruktion des Inschriftentextes nach CIL...,honorific inscription,marble: rocks - metamorphic rocks,[Imp(eratori) Caesari divi Traiani] / [Parthic...,HD002925,Achaia,2011-04-04,statue base,AttikÃ­,https://edh-www.adw.uni-heidelberg.de/edh/insc...,"[{'person_id': '1', 'cognomen': 'Traianus+ Had...","\""Stoa Hadriani\"" (CIL)",,,data available,,,,,"37.983175,23.716647"


In [0]:
inscriptions_with_geo = pd.read_json(gdrive_root + "data/EDH_geo_raw.json")


NameError: ignored

In [0]:
len(inscriptions_with_geo[inscriptions_with_geo["not_before"].notnull()])

49913

## Date range

In [0]:
len(inscriptions_with_geo[inscriptions_with_geo["not_before"].notnull()])

49913

In [0]:
len(inscriptions_with_geo[inscriptions_with_geo["not_after"].notnull()])

46604

In [0]:
len(inscriptions_with_geo[(inscriptions_with_geo["not_after"].notnull()) & (inscriptions_with_geo["not_before"].notnull())])

46604

In [0]:
inscriptions_data_df["data_range"] = inscriptions_data_df.apply(lambda row: row["not_after"] - row["not_before"], axis=1)

In [0]:
def inscriptions_with_range(max_range):
  return len(inscriptions_data_df[(inscriptions_data_df["data_range"].notnull()) & (inscriptions_data_df["data_range"]<=max_range)])

In [0]:
inscriptions_with_range(100)

30160

In [0]:
inscriptions_with_range(90)

18997

In [0]:
inscriptions_with_range(50)

13880

In [0]:
inscriptions_with_range(20)

5431

In [0]:
inscriptions_data_df.to_json(gdrive_root + "data/EDH_with_geo.json")