# INTRODUCTION

This notebook explores the potential of the Epigraphic Database Heidelberg web API , [EDH API](https://edh-www.adw.uni-heidelberg.de/data/api) in combination with sciencedata.dk as a datastorage (see more about our current progress in using sciencedata.dk [here](https://docs.google.com/document/d/1sojHsxkcAbZH9DpWFuHDomQwTZHPQv_WaAxO_erP6FE/edit?usp=sharing)).

The ambition here is to use cloud based solutions as much as possible, without any dependence on local machines. At the same time, we do not like to rely completely upon google services. 

In [1]:
### REQUIREMENTS
import numpy as np
import math
import pandas as pd

import sys
### we do a lot of requests during the scrapping. Some of them with requests package, some of them with urllib
import requests
from urllib.request import urlopen 
from urllib.parse import quote  
from bs4 import BeautifulSoup
import xml.etree.cElementTree as ET

# to avoid errors, we sometime use time.sleep(N) before retrying a request
import time
# the input data have typically a json structure
import json
import getpass

import datetime as dt
# for simple paralel computing:
from concurrent.futures import ThreadPoolExecutor
### google drive
from google.colab import drive
#import gspread
#from gspread_dataframe import get_as_dataframe, set_with_dataframe

!pip install --ignore-installed --index-url https://test.pypi.org/simple/ --no-deps sddk ### our own package under construction, always install to have up-to-date version
import sddk

Looking in indexes: https://test.pypi.org/simple/
Collecting sddk
  Downloading https://test-files.pythonhosted.org/packages/65/8b/d682c15a7335215ac119538ad8455b408cd7e8be4f6614678888dd2c88ed/sddk-0.0.7-py3-none-any.whl
Installing collected packages: sddk
Successfully installed sddk-0.0.7


## configure session and url

In [2]:
### configure session and url
### in the case of "SDAM_root", the group owner is Vojtech with username 648597@au.dk
s, sddk_url = sddk.configure_session_and_url("SDAM_root")

sciencedata.dk username (format '123456@au.dk'): 648597@au.dk
sciencedata.dk password: ··········
personal connection established
group connection established with you as owner
endpoint for requests has been configured to: https://sciencedata.dk/files/SDAM_root/


# EDH via API

The basis form of an request is as follows:
```
https://edh-www.adw.uni-heidelberg.de/data/api/inscriptions/search?
```
With this, to create query based on inscription number, you have to specify the paramenter **hd_nr**, like here:

```
https://edh-www.adw.uni-heidelberg.de/data/api/inscriptions/search?hd_nr=1
```
 (Feel free to explore this in web browser).

Here we use the function ```requests.get()``` to make our requests from python.

## One inscription query example

In [0]:
%%time
inscription_number = 100
URL_form = "https://edh-www.adw.uni-heidelberg.de/data/api/inscriptions/search?"

response = requests.get(URL_form + "hd_nr=" + str(inscription_number))
response
json_data = response.json()
print(json_data)

{'total': 1, 'items': [{'province_label': 'Hispania citerior', 'modern_region': 'Soria', 'findspot_ancient': 'Uxama', 'transcription': 'D[---] / ANELI[---] / BERVE[---] / P[---]IT[------', 'commentary': ' Text in vier Zeilen, nahezu unlesbar.', 'id': 'HD000100', 'literature': 'AE 1983, 0597.; C. García Merino, in: Homenaje al Prof. Martin Almagro Basch 3 (Madrid 1983) 355, Nr. 2; lám. 1, 2. - AE 1983.', 'uri': 'https://edh-www.adw.uni-heidelberg.de/edh/inschrift/HD000100', 'language': 'Latin', 'findspot_modern': 'El Burgo de Osma', 'work_status': 'provisional', 'edh_geography_uri': 'https://edh-www.adw.uni-heidelberg.de/edh/geographie/9371', 'last_update': '2015-05-21', 'diplomatic_text': 'D[ ] / ANELI[ ] / BERVE[ ] / P[ ]IT[', 'trismegistos_uri': 'https://www.trismegistos.org/text/226731', 'country': 'Spain', 'responsible_individual': 'Gräf', 'type_of_monument': 'stele'}], 'limit': '20'}
CPU times: user 15.5 ms, sys: 95 µs, total: 15.6 ms
Wall time: 854 ms


In [0]:
### the actual data are part of the tag "items"
%%time 
pd.DataFrame(json_data["items"]) 


CPU times: user 2.87 ms, sys: 0 ns, total: 2.87 ms
Wall time: 2.82 ms


Unnamed: 0,findspot_ancient,findspot_modern,id,diplomatic_text,uri,edh_geography_uri,literature,trismegistos_uri,work_status,province_label,type_of_monument,language,last_update,modern_region,transcription,commentary,responsible_individual,country
0,Uxama,El Burgo de Osma,HD000100,D[ ] / ANELI[ ] / BERVE[ ] / P[ ]IT[,https://edh-www.adw.uni-heidelberg.de/edh/insc...,https://edh-www.adw.uni-heidelberg.de/edh/geog...,"AE 1983, 0597.; C. García Merino, in: Homenaje...",https://www.trismegistos.org/text/226731,provisional,Hispania citerior,stele,Latin,2015-05-21,Soria,D[---] / ANELI[---] / BERVE[---] / P[---]IT[--...,"Text in vier Zeilen, nahezu unlesbar.",Gräf,Spain


# Version 1: Extracting inscriptions one by one (using simple paralel computing)

In [0]:
def get_inscription_data(num):
  try:
    response = requests.get(URL_form + "hd_nr=" + str(num))
    json_data_items = response.json()["items"]
  except:
    time.sleep(1)
    try:
      response = requests.get(URL_form + "hd_nr=" + str(num))
      json_data_items = response.json()["items"]
    except:
      json_data_items = [{}]
  return json_data_items[0]

for num in range(1, 10):
  print(get_inscription_data(num))

In [0]:
### parallel computing
from concurrent.futures import ThreadPoolExecutor

In [0]:
#### TEST without paralel computing:

%%time
all_inscriptions = []
for num in range(1,200): 
  currently_parsed = get_inscription_data(num)
  all_inscriptions.extend(currently_parsed)

CPU times: user 2.23 s, sys: 113 ms, total: 2.34 s
Wall time: 2min 48s


In [0]:
### TEST with paralel computing
###to make N requests in paralel, we first have to generate a range of ranges: [1,2,3], [4,5,6], [7,8,9]
%%time
all_inscriptions = []
for num in range(1,200, 100): 
  actual_nums = list(range(num, num+100))
  with ThreadPoolExecutor(max_workers=100) as pool:
    currently_parsed = list(pool.map(get_inscription_data,actual_nums))
  all_inscriptions.extend(currently_parsed)

CPU times: user 3.76 s, sys: 257 ms, total: 4.02 s
Wall time: 13.2 s


ok, the testing clearly demonstrate that using 100 workers in paralel is about 10 times faster. Let's scale it up for the whole dataset

In [0]:
### main run of the function

%%time
all_inscriptions = []
for num in range(1,90000, 200): 
  actual_nums = list(range(num, num+200))
  with ThreadPoolExecutor(max_workers=300) as pool:
    currently_parsed = list(pool.map(get_inscription_data,actual_nums))
  all_inscriptions.extend(currently_parsed)

CPU times: user 32min 3s, sys: 2min 34s, total: 34min 38s
Wall time: 1h 50min 50s


In [0]:
inscriptions_data_df = pd.DataFrame(all_inscriptions)

In [5]:
inscriptions_data_df.head

Unnamed: 0,diplomatic_text,literature,trismegistos_uri,id,findspot_ancient,not_before,type_of_inscription,work_status,edh_geography_uri,not_after,country,province_label,transcription,material,height,width,findspot_modern,depth,commentary,uri,responsible_individual,last_update,language,modern_region,letter_size,type_of_monument,people,year_of_find,findspot,present_location,external_image_uris,religion,fotos,geography,military,social_economic_legal_history
0,D M / NONIAE P F OPTATAE / ET C IVLIO ARTEMONI...,"AE 1983, 0192.; M. Annecchino, Puteoli 4/5, 19...",https://www.trismegistos.org/text/251193,HD000001,"Cumae, bei",71,epitaph,provisional,https://edh-www.adw.uni-heidelberg.de/edh/geog...,130,Italy,Latium et Campania (Regio I),D(is) M(anibus) / Noniae P(ubli) f(iliae) Opta...,"Marmor, geädert / farbig",33 cm,34 cm,"Cuma, bei",2.7 cm,(C): 2. Hälfte 1. - Anfang 2. Jh. - AE; Ende ...,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Feraudi,2014-04-07,Latin,Campania,3.2-2 cm,tabula,"[{'cognomen': 'Optata', 'person_id': '1', 'gen...",,,,,,,,,
1,C SEXTIVS PARIS / QVI VIXIT / ANNIS LXX,"AE 1983, 0080. (A); A. Ferrua, RAL 36, 1981, 1...",https://www.trismegistos.org/text/265631,HD000002,Roma,51,epitaph,no image,https://edh-www.adw.uni-heidelberg.de/edh/geog...,200,Italy,Roma,C(aius) Sextius Paris / qui vixit / annis LXX,marble: rocks - metamorphic rocks,28 cm,85 cm,Roma,,AE 1983: Breite: 35 cm.,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Feraudi,2014-04-07,Latin,Lazio,4 cm,tabula,"[{'age: years': '70', 'cognomen': 'Paris', 'ge...",1937,"Via Nomentana, S. Alessandro, Kirche",,,,,,,
2,[ ]VMMIO [ ] / [ ]ISENNA[ ] / [ ] XV[ ] / [ ] / [,"AE 1983, 0518. (B); J. González, ZPE 52, 1983,...",https://www.trismegistos.org/text/220675,HD000003,,131,honorific inscription,provisional,https://edh-www.adw.uni-heidelberg.de/edh/geog...,170,Spain,Baetica,[P(ublio) M]ummio [P(ubli) f(ilio)] / [Gal(eri...,marble: rocks - metamorphic rocks,(37) cm,(34) cm,Tomares,(12) cm,(B): [S]isenna ist falscher Kasus; folgende E...,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Feraudi,2006-08-31,Latin,Sevilla,4.5-3 cm,statue base,"[{'nomen': 'Mummius+', 'cognomen': 'Sisenna+ R...",before 1975,,"Sevilla, Privatbesitz",,,,,,
3,[ ]AVS[ ]LLA / M PORCI NIGRI SER / DOMINAE VEN...,"AE 1983, 0533. (B); A.U. Stylow, Gerión 1, 198...",https://www.trismegistos.org/text/222102,HD000004,Ipolcobulcula,151,votive inscription,checked with photo,https://edh-www.adw.uni-heidelberg.de/edh/geog...,200,Spain,Baetica,[---?]AV(?)S(?)[---]L(?)L(?)A / M(arci) Porci ...,limestone: rocks - clastic sediments,(39) cm,27 cm,Carcabuey,18 cm,Material: lokaler grauer Kalkstein. (B): Styl...,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Gräf,2015-03-27,Latin,Córdoba,2.5 cm,altar,"[{'cognomen': '[---]', 'status': 'slaves', 'pe...",before 1979,,"Carcabuey, Grupo Escolar",[http://cil-old.bbaw.de/test06/bilder/datenban...,names of pagan deities,,,,
4,[ ] L SVCCESSVS / [ ] L L IRENAEVS / [ ] C L T...,"AE 1983, 0078. (B); A. Ferrua, RAL 36, 1981, 1...",https://www.trismegistos.org/text/265629,HD000005,Roma,1,epitaph,no image,https://edh-www.adw.uni-heidelberg.de/edh/geog...,200,Italy,Roma,[---] l(ibertus) Successus / [---] L(uci) l(ib...,,,,Roma,,(B): Z. 3: C(ai) l(ibertae) Tyches.,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Feraudi,2010-01-04,Latin,Lazio,,stele,"[{'status': 'freedmen / freedwomen', 'name': '...",,Via Cupa (ehem. Vigna Nardi),,,,,,,
5,D M S / / MEMMIA AVCTIN[ ] / AN LXX PIA IN SVI...,"AE 1983, 0524. (B); P. Rodríguez Oliva - R. At...",https://www.trismegistos.org/text/222924,HD000006,"Sabora, bei",71,epitaph,checked with photo,https://edh-www.adw.uni-heidelberg.de/edh/geog...,150,Spain,Baetica,D(is) M(anibus) s(acrum) // Memmia Auctin[a] /...,limestone: rocks - clastic sediments,145 cm,60 cm,Cañete la Real,15 cm,Der Stein ist aus 2 aneinanderpassenden Fragm...,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Gräf,2011-06-10,Latin,Málaga,7-4 cm,stele,"[{'age: years': '70', 'person_id': '1', 'gende...",1974,Huerta Nueva,Cañete la Real,,,,,,
6,CLODIA M F,"AE 1983, 0033. (C); G. Pisani Sartorio, in: G....",https://www.trismegistos.org/text/265588,HD000007,Roma,-100,epitaph,checked with photo,https://edh-www.adw.uni-heidelberg.de/edh/geog...,-51,Italy,Roma,Clodia M(arci) f(ilia),travertine: rocks - chemische Sedimente,35 cm,53 cm,Roma,,(C): Datierung: Zeit Sullas.,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Feraudi,2010-01-04,Latin,Lazio,9 cm,tabula,"[{'name': 'Clodia M.f.', 'nomen': 'Clodia', 'p...",,"Via Nomentana, Tor Lupara","Mentana, Privatbesitz",,,,,,
7,D M / C SATRIO XANTHO / C SATRI RVFI LIB / DEC...,"AE 1983, 0060.; R. Palmieri, in: G. Barbieri (...",https://www.trismegistos.org/text/265611,HD000008,Roma?,101,epitaph,provisional,https://edh-www.adw.uni-heidelberg.de/edh/geog...,200,Italy,Roma?,D(is) M(anibus) / C(aio) Satrio Xantho / C(ai)...,marble: rocks - metamorphic rocks,52 cm,65 cm,Roma?,,Tafel aus mehreren anpassenden Fragmenten zus...,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Feraudi,2013-12-03,Latin,Lazio?,4-2 cm,tabula,"[{'person_id': '1', 'gender': 'male', 'nomen':...",,,"Mentana, Privatbesitz",,,[https://edh-www.adw.uni-heidelberg.de/fotos/F...,,,
8,ABCDEFX,"AE 1983, 0632.; M.W.C. Hassall - R.S.O. Tomlin...",https://www.trismegistos.org/text/168722,HD000009,Aquae Sulis,201,defixio,checked with drawing,https://edh-www.adw.uni-heidelberg.de/edh/geog...,300,United Kingdom,Britannia,ABCDEFX,"Blei, Zinn",5.2 cm,6.5 cm,Bath,,Material: Blei-Zinn-Kupfer-Legierung. Viellei...,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Cowey,2019-03-29,Latin,Somerset,,tessera,,1979,"Römisches Bad, Sacred Spring","Bath, Roman Baths Mus.",,magic/spells,,,,
9,D M / L ASINI POLI / SECVNDVS / ET ORPHAEVS / ...,"AE 1983, 0410. (B); G.A. Mansuelli, Epigraphic...",https://www.trismegistos.org/text/244297,HD000010,Ariminum,101,epitaph,checked with photo,https://edh-www.adw.uni-heidelberg.de/edh/geog...,200,Italy,Aemilia (Regio VIII),D(is) M(anibus) / L(uci) Asini Poli / Secundus...,,46 cm,52.5 cm,Rimini,44 cm,(B): AE 1983: Z. 3/4: Secundus et Orphaeus.,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Gräf,2007-05-16,Latin,Forli,4.2-2.5 cm,urn,"[{'nomen': 'Asinius', 'person_id': '1', 'praen...",1936,Friedhof,"Rimini, Mus. Arch. Com.",,,,,,


In [0]:
inscriptions_data_df = inscriptions_data_df[inscriptions_data_df["id"].notnull()]

# Upload the data to sciencedata.dk shared folder

In [10]:
s.put(sddk_url + "SDAM_data/EDH/EDH_onebyone.json", data=inscriptions_data_df.to_json())

<Response [204]>

## Version 2: Extracting inscription on the basis of provinces (simpler and faster, but does not return all data)

In [0]:
%%time
response = requests.get("https://edh-www.adw.uni-heidelberg.de/data/api/terms/province")
response
json_data = response.json()
print(str(json_data)[:200])

{'provinces': {'Ach': 'Achaia', 'Aeg': 'Aegyptus', 'Aem': 'Aemilia (Regio VIII)', 'Afr': 'Africa Proconsularis', 'AlC': 'Alpes Cottiae', 'AlG': 'Alpes Graiae', 'AlM': 'Alpes Maritimae', 'AlP': 'Alpes 
CPU times: user 15.6 ms, sys: 974 µs, total: 16.6 ms
Wall time: 386 ms


In [0]:
provinces = list(json_data["provinces"].keys())
print(provinces)

['Ach', 'Aeg', 'Aem', 'Afr', 'AlC', 'AlG', 'AlM', 'AlP', 'ApC', 'Aqu', 'Ara', 'Arm', 'Asi', 'Ass', 'Bae', 'Bar', 'Bel', 'BiP', 'BrL', 'Bri', 'Cap', 'Cil', 'Cor', 'Cre', 'Cyp', 'Cyr', 'Dac', 'Dal', 'Epi', 'Etr', 'Gal', 'GeI', 'GeS', 'HiC', 'Inc', 'Iud', 'LaC', 'Lig', 'Lug', 'Lus', 'LyP', 'MaC', 'MaT', 'Mak', 'Mes', 'MoI', 'MoS', 'Nar', 'Nor', 'Num', 'PaI', 'PaS', 'Pic', 'Rae', 'ReB', 'Rom', 'Sam', 'Sar', 'Sic', 'Syr', 'Thr', 'Tra', 'Tri', 'Umb', 'Val', 'VeH']


## Get data on the province basis

In [0]:
### one province example (first page of results, i.e. first 100 inscriptions)
province = "dal"
param = "province"

### make the request
response = requests.get(URL_form + param + "=" + province + "&limit=100")
json_data = response.json()
pages = math.ceil(int(json_data["total"]) / int(json_data["limit"]))
some_inscriptions = pd.DataFrame(json_data["items"])
len(some_inscriptions)
some_inscriptions.head(5)


Unnamed: 0,responsible_individual,last_update,country,findspot_ancient,present_location,trismegistos_uri,modern_region,depth,type_of_inscription,transcription,people,height,language,id,uri,findspot_modern,work_status,commentary,type_of_monument,province_label,findspot,not_after,year_of_find,not_before,literature,edh_geography_uri,diplomatic_text,width,letter_size,fotos,material,religion,geography,social_economic_legal_history,military,external_image_uris
0,Gräf,2009-05-13,Bosnia and Herzegovina,"Domavium, bei","Tuzla, Muz. Istočne Bosne",https://www.trismegistos.org/text/181722,Republika Srpska,24 cm,epitaph,D(is) M(anibus) / Severinus / veteranus / vixi...,"[{'cognomen': 'Severinus', 'person_id': '1', '...",120 cm,Latin,HD000310,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Bratunac,checked with photo,Oberhalb des Inschriftfeldes eine weibliche u...,stele,Dalmatia,"Bosanska ulica, Kamenjak, sekundär verwendet",400,1955,301,"AE 1983, 0745.; I. Bojanovski, Članci 14, 1982...",https://edh-www.adw.uni-heidelberg.de/edh/geog...,D M / SEVERINVS / VETERANVS / VIXIT AN XIX / T...,48 cm,3 cm,,,,,,,
1,Gräf,2009-05-13,Bosnia and Herzegovina,"Domavium, bei",,https://www.trismegistos.org/text/181723,Republika Srpska,27 cm,epitaph,D(is) [M(anibus) s(acrum)?] / [--]CITI CTO[---...,"[{'name': '[---]', 'person_id': '1'}]",(145) cm,Latin,HD000313,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Srebrenica,checked with photo,,stele,Dalmatia,"Staroglavice, frühchristliche Kirche",300,1975,101,"AE 1983, 0746.; I. Bojanovski, Članci 14, 1982...",https://edh-www.adw.uni-heidelberg.de/edh/geog...,D [ ] / [ ]CITI CTO[ ] / RIVS MAR[ ] / [ ]IVS ...,(55) cm,,,,,,,,
2,Gräf,2008-09-16,Montenegro,Municipium S[---],,https://www.trismegistos.org/text/181724,,20 cm,epitaph,D(is) M(anibus) s(acrum) / Fl(aviae) Mar/cella...,"[{'cognomen': 'Marcella', 'age: years': '34', ...",(145) cm,Latin,HD000316,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Pljevlja,checked with photo,(B): AE 1983: Z. 5/6: Zeilenfall fehlt.,stele,Dalmatia,"Komini, Nekropole II, Grab 25/1975",200,1975,151,"AE 1983, 0747. (B); A. Cermanović-Kuzmanović, ...",https://edh-www.adw.uni-heidelberg.de/edh/geog...,D M S / FL MAR / CELLAE Q V / A XXXIV / NANTIV...,75 cm,6-5 cm,[https://edh-www.adw.uni-heidelberg.de/fotos/F...,,,,,,
3,Gräf,2012-06-06,Montenegro,Municipium S[---],,https://www.trismegistos.org/text/181725,,30 cm,epitaph,D(is) M(anibus) s(acrum) / L(ucio) Cipio / Fau...,"[{'person_id': '1', 'gender': 'male', 'name': ...",(170) cm,Latin,HD000319,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Pljevlja,checked with photo,Reste von roter Farbe in den Buchstaben.,stele,Dalmatia,"Komini, Nekropole II",230,1975,151,"AE 1983, 0748.; A. Cermanović-Kuzmanović, Star...",https://edh-www.adw.uni-heidelberg.de/edh/geog...,D M S / L CIPIO / FAVSTO ET / FRVNITAE / FIL L...,84 cm,6.9-3.9 cm,[https://edh-www.adw.uni-heidelberg.de/fotos/F...,lime marl / marl: rocks - clastic sediments,,,,,
4,Gräf,2008-09-16,Montenegro,Municipium S[---],"Pljevlja, Zavičajni Muz.",https://www.trismegistos.org/text/181726,,28 cm,epitaph,Q(uinto) Valerio / Quadra/to an(norum) LXI / L...,"[{'cognomen': 'Quadratus', 'age: years': '61',...",(66) cm,Latin,HD000322,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Pljevlja,checked with photo,,stele,Dalmatia,"Komini, Nekropole II",170,1974,71,"AE 1983, 0749.; A. Cermanović-Kuzmanović, Star...",https://edh-www.adw.uni-heidelberg.de/edh/geog...,Q VALERIO / QVADRA / TO AN LXI / L VAL CELER / [,58 cm,6-5 cm,[https://edh-www.adw.uni-heidelberg.de/fotos/F...,,,,,,


## Main Function: parsing all inscription data
(takes about 20 minutes)

In [0]:
### over the loop, we will extend the list of items
%%time
inscriptions_data = []
URL_form = "https://edh-www.adw.uni-heidelberg.de/data/api/inscriptions/search?province="

for province in provinces:
  try:
    total = requests.get(URL_form + province).json()["total"]
    response = requests.get(URL_form + province + "&limit=" + str(total))
    inscriptions_data.extend(response.json()["items"])
  except:
    pass
  print(province, total)

Ach 1285
Aeg 167
Aem 211
Afr 4452
AlC 475
AlG 76
AlM 413
AlP 201
ApC 761
Aqu 426
Ara 234
Arm 3
Asi 491
Ass 0
Bae 3016
Bar 181
Bel 1668
BiP 289
BrL 246
Bri 4363
Cap 62
Cil 74
Cor 46
Cre 66
Cyp 34
Cyr 88
Dac 3545
Dal 7653
Epi 138
Etr 652
Gal 208
GeI 2760
GeS 6085
HiC 4687
Inc 382
Iud 187
LaC 2600
Lig 145
Lug 594
Lus 1583
LyP 61
MaC 1124
MaT 290
Mak 1321
Mes 12
MoI 1938
MoS 1475
Nar 1401
Nor 2736
Num 2644
PaI 3136
PaS 4259
Pic 171
Rae 1011
ReB 78
Rom 4392
Sam 649
Sar 229
Sic 193
Syr 405
Thr 395
Tra 165
Tri 0
Umb 348
Val 0
VeH 1156
CPU times: user 3.68 s, sys: 438 ms, total: 4.12 s
Wall time: 12min 30s


In [0]:
len(inscriptions_data)

72483

In [0]:
%%time
inscriptions_data_df = pd.DataFrame(inscriptions_data)

CPU times: user 1.08 s, sys: 4.8 ms, total: 1.09 s
Wall time: 1.09 s


In [0]:
inscriptions_data_df.head(5)

Unnamed: 0,people,work_status,findspot_modern,last_update,responsible_individual,width,language,literature,height,diplomatic_text,not_before,depth,material,trismegistos_uri,transcription,commentary,edh_geography_uri,country,uri,province_label,modern_region,type_of_monument,present_location,findspot_ancient,not_after,type_of_inscription,id,letter_size,social_economic_legal_history,findspot,year_of_find,geography,religion,fotos,military,external_image_uris
0,"[{'name': 'L. Ponponius(!) Rufus', 'age: years...",checked with photo,Roma,2014-10-10,Cowey,19 cm,Greek-Latin,"CIG 6916.; AE 1984, 0109. (B); P. Lombardi, Ti...",45 cm,L PONPONIVS RVFVS / VIXIT ANOS XXVII / EIA PON...,101,5.4 cm,marble: rocks - metamorphic rocks,https://www.trismegistos.org/text/177036,L(ucius) Ponponius(!) Rufus / vixit an(n)os XX...,Wiederverwendung der Tafel als TÃ¼rpfosten. D...,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Italy,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Achaia,Lazio,tabula,"Roma, Mus. Naz. Rom.","Kephallenia, aus",200.0,epitaph,HD001917,1-2 cm,,,,,,,,
1,"[{'gender': 'male', 'cognomen': 'ÎÎ±Î»Î»ÎµÎ½Ï...",checked with photo,"Patrasso - AthÃ­nai, zwischen",2012-03-15,GrÃ¤f,30 cm,Greek-Latin,"CIL 03, 00572.; CIL 03, 07306.; IG 02 (2. Aufl...",146 cm,[ ]ΥΤΟΚΡΑΤΟΡΙ / [ ]ΑΙΣΑΡΙ / [[[ ]]] / [ ]ΥΣΕΒΕ...,395,,"Marmor, geÃ¤dert / farbig",https://www.trismegistos.org/text/177037,[Α]ὐτοκράτορι / [Κ]αίσαρι / [[[---]]] / [Ε]ὐσε...,Meilenstein mit zwei griechischen Inschriften...,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Greece,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Achaia,AttikÃ­,mile-/leaguestone,"AthÃ­nai, Epigr. Mus.","Athenae, bei",397.0,mile-/leaguestone,HD002097,2.7 cm,data available,"Dafni, byzantinisches Kloster, bei, sekundÃ¤r ...",,,,,,
2,,no image,AthÃ­nai,2011-04-04,Cowey,(17) cm,Latin,"CIL 03, 06101.; M. Å aÅ¡el Kos, Inscriptiones ...",(15) cm,]S HOSTIVM DEPRESSE[ ] / [ ] CXIIX BELLO MARIT...,-38,12.5 cm,,https://www.trismegistos.org/text/177038,------ nave]s hostium depresse[rit ---] / [---...,Es handelt sich um ein Elogium fÃ¼r Agrippa. ...,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Greece,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Achaia,AttikÃ­,base,"AthÃ­nai, Epigr. Mus.",Athenae,-12.0,elogium,HD002919,6.5 cm,,"Roma-Augustus Tempel, Akropolis",1866.0,,,,,
3,"[{'cognomen': 'Traianus Hadrianus', 'gender': ...",checked with photo,AthÃ­nai,2009-11-17,Cowey,76 cm,Greek-Latin,"CIL 03, 00548. (B); CIL 03, 07281.; PIR (2. Au...",112 cm,[ ]MP CAES DIVI TRAIANI PAR / THICI FIL DIVI N...,132,48 cm,,https://www.trismegistos.org/text/177039,[I]mp(eratori) Caes(ari) divi Traiani Par/thic...,(B): Am Anfang von Z. 2 fehlt das TI von nepoti.,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Greece,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Achaia,AttikÃ­,statue base,"AthÃ­nai, Epigr. Mus.",Athenae,,honorific inscription,HD002922,,,,,data available,,,,
4,"[{'gender': 'male', 'cognomen': 'Traianus+ Had...",no image,AthÃ­nai,2011-04-04,Cowey,(41) cm,Latin,"CIL 03, 06102.; CIL 03, 07283.; AE 1984, 0822....",(20) cm,[ ] / [ ] / [ ]D[ ] / [ ]R P XVI COS III P P [...,132,(15) cm,marble: rocks - metamorphic rocks,https://www.trismegistos.org/text/177040,[Imp(eratori) Caesari divi Traiani] / [Parthic...,Rekonstruktion des Inschriftentextes nach CIL...,https://edh-www.adw.uni-heidelberg.de/edh/geog...,Greece,https://edh-www.adw.uni-heidelberg.de/edh/insc...,Achaia,AttikÃ­,statue base,"AthÃ­nai, Epigr. Mus.",Athenae,,honorific inscription,HD002925,3.8 cm,,"\""Stoa Hadriani\"" (CIL)",,data available,,,,


## Upload the data to your personal folder at sciencedata.dk

In [0]:
### interactively setup your sciencedata.dk homeurl, username and password
sciencedata_homeurl = "https://sciencedata.dk/files/"
username = input("sciencedata.dk username (format '123456@au.dk'):")
password = getpass.getpass("sciencedata.dk password:")

### establish a request session
s = requests.Session()
s.auth = (username, password)

sciencedata.dk username (format '123456@au.dk'):648597@au.dk
sciencedata.dk password:··········


In [0]:
### create new folder (in the case it is not already there)
s.request("MKCOL", sciencedata_homeurl + "personal_folder/EDH_data") 

<Response [405]>

In [0]:
### make a README.txt file in the folder
s.put(sciencedata_homeurl + "personal_folder/EDH_data/README.txt", data="This folder will contain all data associated with cleaning the EDH data, extracted either from the API, or from the xml files.")

<Response [201]>

In [0]:
### put your dataframe data into this folder
s.put(sciencedata_homeurl + "personal_folder/EDH_data/EDH_inscriptions_raw.json", data=inscriptions_data_df.to_json())

<Response [204]>