# Step 2: Extracting occupational data from inscriptions

*AIM*: This script extract the occupational data from inscriptions.

References:
1) `Harris EM. Workshop, Marketplace and Household: The Nature of Technical Specialization in Classical Athens and its Influence on Economy and Society. In: Carledge P, Cohen EE, Foxhall L, editors. Money, Labour and Land: Approaches to the Economy of Ancient Greece. London—New York: Routledge; 2001. pp. 67–99.`
2) `van Leeuwen MHD, Maas I, Miles A. HISCO: Historical International Standard Classification of Occupations. 2022 2002 [cited 27 Jan 2022]. Available: https://historyofwork.iisg.nl/`


This script was originally published by `Kaše V, Heřmánková P, Sobotková A (2022) Division of labor, specialization and diversity in the ancient Roman cities: A quantitative approach to Latin epigraphy. PLoS ONE 17(6): e0269869. https://doi.org/10.1371/journal.pone.0269869` under a CC BY-SA 4.0 International License. 

https://github.com/sdam-au/social_diversity

The *Past Social Networks Project* adapted the script to fit the needs of the project research agenda.

## Data:

**IN**:

1) Declined occupations `occups_declined_dict.json`

2) Occupations with metadata `occupations_list_hisco.csv`

3) Inscriptions `LIST_v1-0.parquet` or `https://zenodo.org/records/8431323`


**OUT**: 

1) Counts of occupations `occupations_counts.csv`

3) Inscriptions with occupations and their categorisations by HISCO and Harris `LIST_occups.parquet`

# Requirements

In [1]:
import json
import numpy as np
import re
import pandas as pd
import geopandas as gpd
import nltk
pd.options.display.max_columns = 1000 # to see all columns
import warnings
warnings.filterwarnings('ignore')
import sddk
#import matplotlib.pyplot as plt
#import matplotlib.colors as mcolors
#import geoplot as gplt

# Loading datasets

In [2]:
# local version if you have it already locally
LIST = gpd.read_parquet("../../data/large_data/LIST_v1-0.parquet") # download manually from from https://zenodo.org/records/8431323 and save to data/large_data folder

In [23]:
#LIST = gpd.read_parquet("https://zenodo.org/records/8431323/files/LIST_v1-0.parquet?download=1") # from https://zenodo.org/records/8431323

In [3]:
LIST.head(3)

Unnamed: 0,LIST-ID,EDCS-ID,EDH-ID,trismegistos_uri,pleiades_id,transcription,inscription,clean_text_conservative,clean_text_interpretive_sentence,clean_text_interpretive_word,clean_text_interpretive_word_EDCS,diplomatic_text,province,place,inscr_type,status_notation,inscr_process,status,partner_link,last_update,letter_size,type_of_inscription,work_status,year_of_find,present_location,text_edition,support_objecttype,support_material,support_decoration,keywords_term,people,type_of_inscription_clean,type_of_inscription_certainty,height_cm,width_cm,depth_cm,material_clean,type_of_monument_clean,type_of_monument_certainty,province_label_clean,province_label_certainty,country_clean,country_certainty,findspot_ancient_clean,findspot_ancient_certainty,modern_region_clean,modern_region_certainty,findspot_modern_clean,findspot_modern_certainty,findspot_clean,findspot_certainty,language_EDCS,raw_dating,not_after,not_before,Longitude,Latitude,is_geotemporal,geometry,is_within_RE,urban_context,urban_context_city,urban_context_pop_est,type_of_inscription_auto,type_of_inscription_auto_prob
445463,445464,EDCS-24900077,HD056163,https://www.trismegistos.org/text/177366,570485,Q(uinto) Caecilio C(ai) f(ilio) Metelo / imper...,Q(uinto) Caecilio C(ai) f(ilio) Metel(l)o / im...,Q Caecilio C f Metelo imperatori Italici quei ...,Quinto Caecilio Cai filio Metelo imperatori It...,Quinto Caecilio Cai filio Metelo imperatori It...,Quinto Caecilio Cai filio Metello imperatori I...,Q CAECILIO C F METELO / IMPERATORI ITALICI / Q...,Achaia,Agia Triada / Merbaka / Midea,tituli honorarii,"officium/professio, ordo senatorius, tria nomi...",,officium/professio; ordo senatorius; tituli ...,http://db.edcs.eu/epigr/partner.php?s_language...,2011-11-11,,honorific inscription,no image,,,\n Quinto Caecilio Cai filio Metelo imperatori...,,,1000,69,"[{'age: days': None, 'age: hours': None, 'age:...",honorific inscription,False,,,,,,False,Achaia,False,Greece,False,Midea,False,Pelopónissos,False,Midhéa,False,,False,,-68 to -68,-68.0,-68.0,22.8412,37.6498,True,POINT (22.841 37.650),True,rural,,,honorific inscription,1.0
445464,445465,EDCS-03700724,HD052964,https://www.trismegistos.org/text/121715,531064,Fortissimo et piis/simo Caesari d(omino) n(ost...,Fortissimo et Piis/simo Caesari d(omino) n(ost...,Fortissimo et piissimo Caesari d n Gal Val P F...,Fortissimo et piissimo Caesari domino nostro G...,Fortissimo et piissimo Caesari domino nostro G...,Fortissimo et Piissimo Caesari domino nostro G...,FORTISSIMO ET PIIS / SIMO CAESARI D N / GAL VA...,Achaia,Agios Athanasios / Photike,tituli honorarii,"Augusti/Augustae, ordo equester, tria nomina",litterae erasae,Augusti/Augustae; litterae erasae; ordo eque...,http://db.edcs.eu/epigr/partner.php?s_language...,2014-09-16,3-5.3 cm,honorific inscription,checked with photo,,Fragma Kalama,\n Fortissimo et piissimo Caesari domino nostr...,57.0,,1000,69,"[{'age: days': None, 'age: hours': None, 'age:...",honorific inscription,False,99.0,67.0,67.0,,statue base,False,Epirus,False,Greece,False,Photike,False,Ípeiros,False,Paramythía,False,{Agios Athanasios},False,,309 to 313,313.0,309.0,20.7668,39.4512,True,POINT (20.767 39.451),True,rural,,,honorific inscription,1.0
445465,445466,EDCS-13800065,HD017714,https://www.trismegistos.org/text/177100,570049,Italicei / quei Aegei negotiantur / P(ublium) ...,Italicei / quei Aegei negotiantur / P(ublium) ...,Italicei quei Aegei negotiantur P Rutilium P f...,Italicei quei Aegei negotiantur Publium Rutili...,Italicei quei Aegei negotiantur Publium Rutili...,Italicei quei Aegei negotiantur Publium Rutili...,ITALICEI / QVEI AEGEI NEGOTIANTVR / P RVTILIVM...,Achaia,Aigio / Egio / Aiyion / Aegeum,tituli honorarii,"officium/professio, ordo senatorius, tria nomi...",,officium/professio; ordo senatorius; tituli ...,http://db.edcs.eu/epigr/partner.php?s_language...,2011-03-29,3.5-3.7 cm,votive inscription,checked with photo,,,\n Italicei quei Aegei negotiantur Publium Rut...,257.0,,1000,372,"[{'age: days': None, 'age: hours': None, 'age:...",votive inscription,False,58.0,61.0,16.0,,tabula,False,Achaia,False,Greece,False,Aegeum,False,Dytikí Elláda,False,Aígion,False,,False,,-74 to -74,-74.0,-74.0,22.0845,38.2487,True,POINT (22.084 38.249),True,small,Aegium,1000.0,votive inscription,1.0


In [25]:
# list of all columns

print(LIST.columns)

Index(['LIST-ID', 'EDCS-ID', 'EDH-ID', 'trismegistos_uri', 'pleiades_id',
       'transcription', 'inscription', 'clean_text_conservative',
       'clean_text_interpretive_sentence', 'clean_text_interpretive_word',
       'clean_text_interpretive_word_EDCS', 'diplomatic_text', 'province',
       'place', 'inscr_type', 'status_notation', 'inscr_process', 'status',
       'partner_link', 'last_update', 'letter_size', 'type_of_inscription',
       'work_status', 'year_of_find', 'present_location', 'text_edition',
       'support_objecttype', 'support_material', 'support_decoration',
       'keywords_term', 'people', 'type_of_inscription_clean',
       'type_of_inscription_certainty', 'height_cm', 'width_cm', 'depth_cm',
       'material_clean', 'type_of_monument_clean',
       'type_of_monument_certainty', 'province_label_clean',
       'province_label_certainty', 'country_clean', 'country_certainty',
       'findspot_ancient_clean', 'findspot_ancient_certainty',
       'modern_region_cle

### Exploring the dataset

In [None]:
# number of all inscriptions
len(LIST)

In [8]:
min(LIST.not_before)

-750.0

In [9]:
max(LIST.not_before)

1998.0

In [6]:
min(LIST.not_after)

-671.0

In [7]:
max(LIST.not_after)

2230.0

#  Custom function to extract occupations

In [4]:
occups_declined_dict = json.load(open("../../data/data_generation/occups_declined_dict.json"))

In [5]:
# older functional version
def extract_occup(inscription_text):
    occups_found = []
    if not isinstance(inscription_text, str): # if not valid string
        inscription_text = ""
    for occup in occups_declined_dict.keys():
        for occup_morph in occups_declined_dict[occup]:
            try:
                if occup_morph in inscription_text: # first check it this way, otherwise skip
                    occup_morph_N = len(re.findall("(\W|^)" + occup_morph + "(\W|$)", inscription_text))
                    if occup_morph_N > 0:
                        occups_found.extend([occup] * occup_morph_N)
                        inscription_text = re.sub("(\W|^)(" + occup_morph + ")(\W|$)", r"\1", inscription_text)
            except: pass
    return occups_found

In [6]:
extract_occup("curatores, procuratores et negotiatores curatori navium et curatori")

['curator navium', 'negotiator', 'curator', 'curator']

# Occupational data extraction

In [7]:
# check that our occupations are properly arranged (from the longest...)
list(occups_declined_dict.keys())[:20]

['negotiator artis vestiariae et lintiariae',
 'negotiator artis cretaria et vestiaria',
 'negotiator frumentariae et legumenaria',
 'negotiator salsamentarius et vinarius',
 'negotiator sagarius et pellicarius',
 'negotiator suariae et pecuariae',
 'exactor auri argenti et aeris',
 'negotiator penoris et vinorum',
 'negotiator salsari leguminari',
 'negotiator artis macellariae',
 'negotiator artis purpurariae',
 'negotiator cellarum vinarium',
 'negotiator artis prossariae',
 'negotiator artis vestiariae',
 'negotiator artis ratiariae',
 'inclusor auri et gemmarum',
 'negotiator artis cretaria',
 'negotiator campi pecuarii',
 'negotiator manticularius',
 'negotiator margaritarius']

In [8]:
%%time

# extraction process, takes couple minutes
LIST["occups"] = LIST["clean_text_interpretive_word"].apply(extract_occup)

CPU times: user 3min 8s, sys: 24.6 ms, total: 3min 8s
Wall time: 3min 8s


In [9]:
LIST["occups_N"] = LIST["occups"].apply(len)

In [10]:
# how many times occupation is mentioned
LIST["occups_N"].sum() # LIRE dataset had 5222 instances

10570

In [11]:
# how many inscriptions have at least 1 occupation
len(LIST[LIST["occups_N"]>0]) # LIRE dataset had 4161 inscriptions

8475

In [12]:
# funerary inscriptions
LISToccups = LIST[LIST["occups_N"]>0]

len(LISToccups[LISToccups["type_of_inscription_auto"]=="epitaph"]) 

3116

In [13]:
# honorific inscriptions 
len(LISToccups[LISToccups["type_of_inscription_auto"]=="honorific inscription"]) 

952

In [14]:
# overview of the most common occupations
LIST_occups_list = [el for sublist in LIST["occups"].tolist() for el in sublist]
occupations_counts = pd.DataFrame(nltk.FreqDist(LIST_occups_list).most_common(), columns=["occupation", "count"])
occupations_counts.head(10)

Unnamed: 0,occupation,count
0,curator,1934
1,faber,958
2,aerarius,453
3,medicus,448
4,scriba,421
5,sagittarius,347
6,frumentarius,213
7,centonarius,202
8,negotiator,179
9,argentarius,176


In [15]:
# how many unique occupations there are
len(occupations_counts)

514

In [16]:
#occupations with their counts
occupations_counts

Unnamed: 0,occupation,count
0,curator,1934
1,faber,958
2,aerarius,453
3,medicus,448
4,scriba,421
...,...,...
509,funerarius,1
510,negotiator margaritarius,1
511,farmacopola,1
512,sarcitor,1


In [17]:
# how many occupations occur only once
len(occupations_counts[occupations_counts["count"]==1])

169

In [18]:
# overview of occupations with their counts
LIST_occups_list = [el for sublist in LIST["occups"].tolist() for el in sublist]
occupations_counts = pd.DataFrame(nltk.FreqDist(LIST_occups_list).most_common(), columns=["occupation", "count"])
occupations_counts.head(10)

Unnamed: 0,occupation,count
0,curator,1934
1,faber,958
2,aerarius,453
3,medicus,448
4,scriba,421
5,sagittarius,347
6,frumentarius,213
7,centonarius,202
8,negotiator,179
9,argentarius,176


## Load the occupation list with all metadata


In [19]:
# load the occupation list with all metadata
occupations_df = pd.read_csv("../../data/data_generation/occupations_list_hisco.csv")
occupations_df.head(5)

Unnamed: 0,Term,gen_sg,Term2,Vocab_nom_sg,Source,HISCO_majorgroup,HISCO_minorgroup,Harris_Category,Subcategory,Translation_eng
0,abetarius,i,,,Petrikovits 1981a,8.0,81.0,Building,Wood worker,"a joiner, wood worker"
1,abietarius,i,,,Petrikovits 1981a,8.0,81.0,Building,Wood worker,"a joiner, wood worker"
2,acceptor,oris,,acceptor,Waltzing - Rome,3.0,31.0,Finance,,"collector, gold quality checker"
3,accomodator,oris,,,Petrikovits 1981a,9.0,99.0,Unclassified,,"uncertain, craftsman"
4,aceptor,oris,,,Petrikovits 1981a,3.0,31.0,Finance,,"collector, gold quality checker"


In [20]:
# categorise occupations according to their HISCO group

def get_int(x):
    try: return str(int(x))
    except: return ""
occupations_df["HISCO_majorgroup"] = occupations_df["HISCO_majorgroup"].apply(get_int)
occupations_df["HISCO_minorgroup"] = occupations_df["HISCO_minorgroup"].apply(get_int)

In [21]:
occupations_df.head(5)

Unnamed: 0,Term,gen_sg,Term2,Vocab_nom_sg,Source,HISCO_majorgroup,HISCO_minorgroup,Harris_Category,Subcategory,Translation_eng
0,abetarius,i,,,Petrikovits 1981a,8,81,Building,Wood worker,"a joiner, wood worker"
1,abietarius,i,,,Petrikovits 1981a,8,81,Building,Wood worker,"a joiner, wood worker"
2,acceptor,oris,,acceptor,Waltzing - Rome,3,31,Finance,,"collector, gold quality checker"
3,accomodator,oris,,,Petrikovits 1981a,9,99,Unclassified,,"uncertain, craftsman"
4,aceptor,oris,,,Petrikovits 1981a,3,31,Finance,,"collector, gold quality checker"


In [22]:
def term1_plus_term2(row):
    term1_2 = row["Term"]
    if isinstance(row["Term2"], str):
        term1_2 += " " + row["Term2"]
    return term1_2

occupations_df["Term"] = occupations_df.apply(lambda row: term1_plus_term2(row), axis=1)

In [23]:
occupation_dict = {}
keys = ["Harris_Category", "Source", "HISCO_majorgroup", "Subcategory", "HISCO_minorgroup", "Translation_eng"]
for n in range(len(occupations_df)):
    occupation_dict[occupations_df.iloc[n]["Term"]] = dict([(key, occupations_df.iloc[n][key]) for key in keys])

In [24]:
occupation_dict

{'abetarius': {'Harris_Category': 'Building',
  'Source': 'Petrikovits 1981a',
  'HISCO_majorgroup': '8',
  'Subcategory': 'Wood worker',
  'HISCO_minorgroup': '81',
  'Translation_eng': 'a joiner, wood worker'},
 'abietarius': {'Harris_Category': 'Building',
  'Source': 'Petrikovits 1981a',
  'HISCO_majorgroup': '8',
  'Subcategory': 'Wood worker',
  'HISCO_minorgroup': '81',
  'Translation_eng': 'a joiner, wood worker'},
 'acceptor': {'Harris_Category': 'Finance',
  'Source': 'Waltzing - Rome',
  'HISCO_majorgroup': '3',
  'Subcategory': nan,
  'HISCO_minorgroup': '31',
  'Translation_eng': 'collector, gold quality checker'},
 'accomodator': {'Harris_Category': 'Unclassified',
  'Source': 'Petrikovits 1981a',
  'HISCO_majorgroup': '9',
  'Subcategory': nan,
  'HISCO_minorgroup': '99',
  'Translation_eng': 'uncertain, craftsman'},
 'aceptor': {'Harris_Category': 'Finance',
  'Source': 'Petrikovits 1981a',
  'HISCO_majorgroup': '3',
  'Subcategory': nan,
  'HISCO_minorgroup': '31',
  '

In [25]:
for key in keys:
    occupations_counts[key] = occupations_counts["occupation"].apply(lambda x: occupation_dict[x][key])
occupations_counts.head(10)

Unnamed: 0,occupation,count,Harris_Category,Source,HISCO_majorgroup,Subcategory,HISCO_minorgroup,Translation_eng
0,curator,1934,Managerial,Waltzing,2,,21,"he who takes charge, a manager, overseer, supe..."
1,faber,958,Unclassified,Waltzing - Rome,9,,99,"a worker in wood, stone, metal, etc., a forger..."
2,aerarius,453,Metal-Working,Waltzing - Rome,8,,83,metal worker
3,medicus,448,Miscellaneous Services,Waltzing - Rome,0,,6,a surgeon
4,scriba,421,Education,Waltzing - Rome,3,,30,"a public writer, official scribe, professional..."
5,sagittarius,347,Metal-Working,Petrikovits 1981a,8,,83,"arrow-makers, arrow-smiths"
6,frumentarius,213,Retail,EDH/EDCS,4,,43,trader with corn
7,centonarius,202,Clothing,Waltzing - Rome,7,,79,"a maker of patchwork, a dealer in rags"
8,negotiator,179,Retail,Waltzing - Rome,4,,43,"one who does business by wholesale, a wholesal..."
9,argentarius,176,Metal-Working,Waltzing - Rome,8,,88,"jewellery maker, banker"


In [26]:
(occupations_counts["count"]==1).sum()

169

In [27]:
# HISCO major groups definition

HISCO_majorgroup_dict = {
    "0" : "Professional, technical and related workers",
    "1" : "Professional, technical and related workers",
    "2" : "Administrative and managerial workers",
    "3" : "Clerical and related workers",
    "4" : "Sales workers",
    "5" : "Service workers",
    "6" : "Agricultural, animal husbandry and forestry workers, fishermen and hunters",
    "7" : "Production and related workers, transport equipment operators and labourers",
    "8" : "Production and related workers, transport equipment operators and labourers",
    "9" : "Production and related workers, transport equipment operators and labourers",
    "" : ""
}

In [28]:
occupations_counts["HISCO_majorgroup_descr"] =  occupations_counts["HISCO_majorgroup"].apply(lambda x: HISCO_majorgroup_dict[x])
occupations_counts.head(5)

Unnamed: 0,occupation,count,Harris_Category,Source,HISCO_majorgroup,Subcategory,HISCO_minorgroup,Translation_eng,HISCO_majorgroup_descr
0,curator,1934,Managerial,Waltzing,2,,21,"he who takes charge, a manager, overseer, supe...",Administrative and managerial workers
1,faber,958,Unclassified,Waltzing - Rome,9,,99,"a worker in wood, stone, metal, etc., a forger...","Production and related workers, transport equi..."
2,aerarius,453,Metal-Working,Waltzing - Rome,8,,83,metal worker,"Production and related workers, transport equi..."
3,medicus,448,Miscellaneous Services,Waltzing - Rome,0,,6,a surgeon,"Professional, technical and related workers"
4,scriba,421,Education,Waltzing - Rome,3,,30,"a public writer, official scribe, professional...",Clerical and related workers


In [7]:
# save a list of occupations with their counts
occupations_counts.to_csv("../../data/data_generation/occupations_counts.csv")

NameError: name 'occupations_counts' is not defined

## Occupations - basic summary

In [50]:
print("LIST - number of occupation occurances: " + str(LIST["occups_N"].sum()))
print("LIST - number of inscriptions with at least one occupation mentioned: " + str(len(LIST[LIST["occups_N"] > 0])))

LIST - number of occupation occurances: 10570
LIST - number of inscriptions with at least one occupation mentioned: 8475


In [51]:
# How many occupations come from the EDH data
LIST[LIST["EDH-ID"].notnull()]["occups_N"].sum() # LIRE had 1272

3139

In [52]:
# How many occupations come from the EDCS data
LIST[LIST["EDH-ID"].isnull()]["occups_N"].sum() # LIRE had 2568

7431

In [53]:
# grouping of occupations by their HISCO major group name
occupations_counts.groupby("HISCO_majorgroup_descr").sum()

Unnamed: 0_level_0,occupation,count,Harris_Category,Source,HISCO_majorgroup,Subcategory,HISCO_minorgroup,Translation_eng
HISCO_majorgroup_descr,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
,faber subaedianusmoritexembaenitariusperfector...,10,UnclassifiedUnclassifiedUnclassifiedUnclassifi...,Ghent Database Roman GuildsPetrikovits 1981aPe...,,0,,unspecified house workerunclearuncertainunclea...
Administrative and managerial workers,curatorpossessorcircitorhorreariusdissignatorf...,2153,ManagerialManagerialManagerialManagerialManage...,WaltzingWaltzing - provincesWaltzing - provinc...,22222222,0,2122222221222221,"he who takes charge, a manager, overseer, supe..."
"Agricultural, animal husbandry and forestry workers, fishermen and hunters",pistorvenatoroleariuspecuariuspiscatorfactorsu...,260,Food-ProductionMiscellaneous ServicesFood-Prod...,Waltzing - RomeWaltzing - RomeWaltzing - RomeW...,666666666666666666666666666,0,6164616264616262626264636162626161616161616161...,one who pounds corn in a mortar or grinds it i...
Clerical and related workers,scribaarcariusnummulariusscriptorexceptoraccep...,705,EducationFinanceFinanceManagerialEducationFina...,Waltzing - RomePetrikovits 1981aWaltzing - Rom...,333333333333333,0,303133303231393331303139313133,"a public writer, official scribe, professional..."
"Production and related workers, transport equipment operators and labourers",faberaerariussagittariuscentonariusargentarius...,4121,UnclassifiedMetal-WorkingMetal-WorkingClothing...,Waltzing - RomeWaltzing - RomePetrikovits 1981...,9887889899979887977777899898989798998897787997...,MilitaryWood workerMaritime/water transportWoo...,9983837988839589979895759580887595797175757788...,"a worker in wood, stone, metal, etc., a forger..."
"Professional, technical and related workers",medicusmensorscaenicusgladiatorarchitectusfabe...,1675,Miscellaneous ServicesMiscellaneous ServicesPe...,Waltzing - RomeWaltzing - RomeWaltzing - RomeW...,0011001001111001111001010111010001111100110010...,Maritime/water transportMaritime/water transpo...,6217172216241617171764171713172617216216171741...,"a surgeona measurer, surveyorof the stage, sce..."
Sales workers,frumentariusnegotiatorvestiariusnegotiansmerca...,916,RetailRetailRetailRetailRetailFood-RetailRetai...,EDH/EDCSWaltzing - RomeWaltzing - RomeWaltzing...,4444444444444444444444444444444444444444444444...,0,4343434545434343454343454543434345434345434343...,trader with cornone who does business by whole...
Service workers,conditorornatrixlecticariuscocusofficinatorred...,730,Food-ProductionMiscellaneous ServicesFood-Prod...,Petrikovits 1981aPetrikovits 1981aWaltzing - R...,5555555555555555555555555555555555555555555555...,MilitaryMilitary,5354535351535557515757555858545656535856535359...,"cook, a seasoner, picklerfemale adorner, dress..."


In [6]:
# grouping of occupations by their HISCO major group number
occupations_counts.groupby("HISCO_majorgroup").sum()

NameError: name 'occupations_counts' is not defined

### Exploring individual cases: Faber (worker) and Metal-Working category

In [5]:
# how many times there is a term faber (worker) = generic term for a manual and relativelly unskilled labour
occupations_counts[occupations_counts["occupation"]=="faber"]

NameError: name 'occupations_counts' is not defined

In [56]:
# how many occurences of occupations belonging to the Metal-working category based on Harris 2001
occupations_counts[occupations_counts["Harris_Category"]=="Metal-Working"]["count"].sum()


1221

# Categorisation of urban contexts and industry categories by Harris 2001

In [31]:
# generate a dictionary of occupations by type

def term1_plus_term2(row):
    term1_2 = row["Term"]
    if isinstance(row["Term2"], str):
        term1_2 += " " + row["Term2"]
    return term1_2

occupations_df["Term"] = occupations_df.apply(lambda row: term1_plus_term2(row), axis=1)

occups_cats_dict = dict(zip(occupations_df["Term"], occupations_df["Harris_Category"]))


In [32]:
def cat_for_occup(list_of_occups):
    try:
        return [occups_cats_dict[occup] for occup in list_of_occups]
    except:
        return []
LIST["occups"] = LIST["occups"].apply(list)
LIST["occups_cats_Harris"] = LIST["occups"].apply(cat_for_occup)

# Categorisation of urban contexts and industry categories by HISCO

In [33]:
occupations_df["Term"] = occupations_df["Term"].apply(lambda x: x.replace(" ", "_"))


In [34]:
hisco_cats_dict = {
    0.0 : "Professional, technical and related workers",
    1.0 : "Professional, technical and related workers",
    2.0 : "Administrative and managerial workers",
    3.0 : "Clerical and related workers",
    4.0 : "Sales workers",
    5.0 : "Service workers",
    6.0 : "Agricultural, animal husbandry and forestry workers, fishermen and hunters",
    7.0 : "Production and related workers, transport equipment operators and labourers",
    8.0 : "Production and related workers, transport equipment operators and labourers",
    9.0 : "Production and related workers, transport equipment operators and labourers",
}

In [35]:
hisco_cats_labels = list(set(hisco_cats_dict.values())) + ["Unclassified"]
hisco_cats_labels

['Production and related workers, transport equipment operators and labourers',
 'Service workers',
 'Administrative and managerial workers',
 'Sales workers',
 'Agricultural, animal husbandry and forestry workers, fishermen and hunters',
 'Clerical and related workers',
 'Professional, technical and related workers',
 'Unclassified']

In [36]:
def hisco_literary(hisco_code):
    try:
        return hisco_cats_dict[float(hisco_code)]
    except:
        return "unclassified"
occupations_df["hisco_cats"] = occupations_df["HISCO_majorgroup"].apply(hisco_literary)


In [37]:
occups_cats_dict = dict(zip(occupations_df["Term"], occupations_df["hisco_cats"]))
def cat_for_occup(list_of_occups):
    try:
        return [occups_cats_dict[occup] for occup in list_of_occups]
    except:
        return []
LIST["occups_cats_HISCO"] = LIST["occups"].apply(cat_for_occup)


In [4]:
LIST.head(3)

Unnamed: 0,LIST-ID,EDCS-ID,EDH-ID,trismegistos_uri,pleiades_id,transcription,inscription,clean_text_conservative,clean_text_interpretive_sentence,clean_text_interpretive_word,clean_text_interpretive_word_EDCS,diplomatic_text,province,place,inscr_type,status_notation,inscr_process,status,partner_link,last_update,letter_size,type_of_inscription,work_status,year_of_find,present_location,text_edition,support_objecttype,support_material,support_decoration,keywords_term,people,type_of_inscription_clean,type_of_inscription_certainty,height_cm,width_cm,depth_cm,material_clean,type_of_monument_clean,type_of_monument_certainty,province_label_clean,province_label_certainty,country_clean,country_certainty,findspot_ancient_clean,findspot_ancient_certainty,modern_region_clean,modern_region_certainty,findspot_modern_clean,findspot_modern_certainty,findspot_clean,findspot_certainty,language_EDCS,raw_dating,not_after,not_before,Longitude,Latitude,is_geotemporal,geometry,is_within_RE,urban_context,urban_context_city,urban_context_pop_est,type_of_inscription_auto,type_of_inscription_auto_prob,occups,occups_N,occups_cats_Harris,occups_cats_HISCO
445463,445464,EDCS-24900077,HD056163,https://www.trismegistos.org/text/177366,570485,Q(uinto) Caecilio C(ai) f(ilio) Metelo / imper...,Q(uinto) Caecilio C(ai) f(ilio) Metel(l)o / im...,Q Caecilio C f Metelo imperatori Italici quei ...,Quinto Caecilio Cai filio Metelo imperatori It...,Quinto Caecilio Cai filio Metelo imperatori It...,Quinto Caecilio Cai filio Metello imperatori I...,Q CAECILIO C F METELO / IMPERATORI ITALICI / Q...,Achaia,Agia Triada / Merbaka / Midea,tituli honorarii,"officium/professio, ordo senatorius, tria nomi...",,officium/professio; ordo senatorius; tituli ...,http://db.edcs.eu/epigr/partner.php?s_language...,2011-11-11,,honorific inscription,no image,,,\n Quinto Caecilio Cai filio Metelo imperatori...,,,1000,69,"[{'age: days': None, 'age: hours': None, 'age:...",honorific inscription,False,,,,,,False,Achaia,False,Greece,False,Midea,False,Pelopónissos,False,Midhéa,False,,False,,-68 to -68,-68.0,-68.0,22.8412,37.6498,True,POINT (22.841 37.650),True,rural,,,honorific inscription,1.0,[],0,[],[]
445464,445465,EDCS-03700724,HD052964,https://www.trismegistos.org/text/121715,531064,Fortissimo et piis/simo Caesari d(omino) n(ost...,Fortissimo et Piis/simo Caesari d(omino) n(ost...,Fortissimo et piissimo Caesari d n Gal Val P F...,Fortissimo et piissimo Caesari domino nostro G...,Fortissimo et piissimo Caesari domino nostro G...,Fortissimo et Piissimo Caesari domino nostro G...,FORTISSIMO ET PIIS / SIMO CAESARI D N / GAL VA...,Achaia,Agios Athanasios / Photike,tituli honorarii,"Augusti/Augustae, ordo equester, tria nomina",litterae erasae,Augusti/Augustae; litterae erasae; ordo eque...,http://db.edcs.eu/epigr/partner.php?s_language...,2014-09-16,3-5.3 cm,honorific inscription,checked with photo,,Fragma Kalama,\n Fortissimo et piissimo Caesari domino nostr...,57.0,,1000,69,"[{'age: days': None, 'age: hours': None, 'age:...",honorific inscription,False,99.0,67.0,67.0,,statue base,False,Epirus,False,Greece,False,Photike,False,Ípeiros,False,Paramythía,False,{Agios Athanasios},False,,309 to 313,313.0,309.0,20.7668,39.4512,True,POINT (20.767 39.451),True,rural,,,honorific inscription,1.0,[],0,[],[]
445465,445466,EDCS-13800065,HD017714,https://www.trismegistos.org/text/177100,570049,Italicei / quei Aegei negotiantur / P(ublium) ...,Italicei / quei Aegei negotiantur / P(ublium) ...,Italicei quei Aegei negotiantur P Rutilium P f...,Italicei quei Aegei negotiantur Publium Rutili...,Italicei quei Aegei negotiantur Publium Rutili...,Italicei quei Aegei negotiantur Publium Rutili...,ITALICEI / QVEI AEGEI NEGOTIANTVR / P RVTILIVM...,Achaia,Aigio / Egio / Aiyion / Aegeum,tituli honorarii,"officium/professio, ordo senatorius, tria nomi...",,officium/professio; ordo senatorius; tituli ...,http://db.edcs.eu/epigr/partner.php?s_language...,2011-03-29,3.5-3.7 cm,votive inscription,checked with photo,,,\n Italicei quei Aegei negotiantur Publium Rut...,257.0,,1000,372,"[{'age: days': None, 'age: hours': None, 'age:...",votive inscription,False,58.0,61.0,16.0,,tabula,False,Achaia,False,Greece,False,Aegeum,False,Dytikí Elláda,False,Aígion,False,,False,,-74 to -74,-74.0,-74.0,22.0845,38.2487,True,POINT (22.084 38.249),True,small,Aegium,1000.0,votive inscription,1.0,[],0,[],[]


#   Saving locally

In [39]:
LIST.to_parquet("../../data/large_data/LIST_occups.parquet")

In [3]:
# local version if you have it already locally
LIST = gpd.read_parquet("../../data/large_data/LIST_occups.parquet") 