This is a script for consolidating factoid lists in AP3.

The package mainly uses the Pandas package in Python to read and manipulate EXCEL data as DataFrames. DataFrames are 2-dimensional data representations in rows and columns. They can be written to different file formats such as CSV, EXCEL, JSON or RDF.

First of all, we need to connect this Colab notebook with your Google Drive and define the directory for input and output data.


In [1]:
## mount drive
from google.colab import drive
drive.mount("/content/drive")
directory="/content/drive/My Drive/Colab_DigiKAR/"

Mounted at /content/drive


In the second step, we have to install additional Packages needed for working with CSV, EXCEL and DataFrames.

In [2]:
## install packages that are not part of Python's standard distribution

!pip install xlsxwriter
!pip install pandas
!pip install numpy

Collecting xlsxwriter
  Downloading XlsxWriter-3.1.2-py3-none-any.whl (153 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/153.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━[0m [32m112.6/153.0 kB[0m [31m3.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m153.0/153.0 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: xlsxwriter
Successfully installed xlsxwriter-3.1.2


In **step 1**, we can import the packages to the script and load our data. Before merging the input files, names will be normalised as some have access spaces, capitalised surnames, or inverted first and last names.

The combined data will be written to a new dataframe and displayed.

In [3]:
import xlsxwriter
import csv
import pandas as pd
from pandas import DataFrame
import numpy as np
import os
import re

# path to input files

factoid_paths=["https://github.com/ieg-dhr/DigiKAR/raw/main/Sample%20Data/FactoidList_1756er_Staatskalender_Meta_final_TEST-MB_FS0.xlsx",
               "https://github.com/ieg-dhr/DigiKAR/raw/main/Sample%20Data/FactoidList_1756er_Staatskalender_Meta_final_TEST-MB_FS1.xlsx",
               "https://github.com/ieg-dhr/DigiKAR/raw/main/Sample%20Data/FactoidList_1756er_Staatskalender_Meta_final_TEST-MB_FS2.xlsx",
               "https://github.com/ieg-dhr/DigiKAR/raw/main/Sample%20Data/FactoidList_1756er_Staatskalender_Meta_final_TEST-MB_FS3.xlsx",
               "https://github.com/ieg-dhr/DigiKAR/raw/main/Sample%20Data/FactoidList_1756er_Staatskalender_Meta_final_TEST-MB_FS4.xlsx",
               "https://github.com/ieg-dhr/DigiKAR/raw/main/Sample%20Data/FactoidList_Erfassung_Erfurt_Master_2022-11-11_Kopie.xlsx"
               ]

# define dataframe for final output

f_to_add=[]

# structure of input files

# obligatory columns in valid factoid list

# read all data frames from path

frame_list=[]
for file in factoid_paths:
    df = pd.read_excel(file, index_col=None, dtype=str) # axis=1, sort=False sheet_name='FactoidList'
    df = df.fillna("n/a") # replace empty fields for string
    df_length=len(df)
    frame_list.append(df)

f = pd.concat(frame_list, axis=0, ignore_index=True, sort=False)

print("There are ", len(f), "items in your DataFrame!")

# delete all duplicate rows with exact matches

f_unique=f.drop_duplicates()
print("Your DataFrame has now ", len(f_unique), "items with at least one unique cell." )

# add columns missing according to factoid model

column_names = ["factoid_ID",
                "pers_ID",
                "pers_name",
                "alternative_names",
                "event_type",
                "event_after-date",
                "event_before-date",
                "event_start",
                "event_end",
                "event_date",
                "pers_title",
                "pers_function",
                "place_name",
                "inst_name",
                "rel_pers",
                "source_quotations",
                "additional_info",
                "comment",
                "info_dump",
                "source_combined",
                "event_value", # add more potential categorisations if needed
                "source",
                "source_site"]

df2 = f_unique.reindex(columns=column_names)

# populate some of the empty columns with data

df2.loc[:, "event_end"] = df2["event_start"]
#df2.loc[:, "event_type"] = ["Funktionsausübung"] * 37400 # add new column with standard event if column is completely absent!
df2["event_type"] = df2["event_type"].replace({'n/a':'Funktionsausübung'}) # add standard event in places where no other event is indicated!
df2.fillna('n/a', inplace=True) # fill remaining blanks with string to ensure that all cells can be processed in the same way!
df2['source_combined'] = df2['source'].astype(str) + ': ' + df2['source_site'].astype(str)

print("Done.")

# rename dataframe for next step

display(df2)

There are  40080 items in your DataFrame!
Your DataFrame has now  37400 items with at least one unique cell.
Done.


Unnamed: 0,factoid_ID,pers_ID,pers_name,alternative_names,event_type,event_after-date,event_before-date,event_start,event_end,event_date,...,inst_name,rel_pers,source_quotations,additional_info,comment,info_dump,source_combined,event_value,source,source_site
0,Stk_00010,9,Johann Werner von Vorstern,,Funktionsausübung,,,1739-00-00,1739-00-00,,...,"Hofrat / Regierung, adlige Bank",,Hof- und Regierungsrat,,,,Stk_1740B: 44,,Stk_1740B,44
1,Stk_00052,9,Johann Werner von Vorstern,,Funktionsausübung,,,1740-00-00,1740-00-00,,...,"Hofrat / Regierung, adlige Bank",,Hof- und Regierungsrat,,,,Stk_1741: 45,,Stk_1741,45
2,Stk_00087,9,Johann Werner von Vorstern,,Funktionsausübung,,,1741-00-00,1741-00-00,,...,"Hofrat / Regierung, adlige Bank",,Hof- und Regierungsrat,,,,Stk_1742: 38,,Stk_1742,38
3,Stk_00125,9,Johann Werner von Vorstern,,Funktionsausübung,,,1742-00-00,1742-00-00,,...,"Hofrat / Regierung, adlige Bank",,Hof- und Regierungsrat,,,,Stk_1743: 37,,Stk_1743,37
4,Stk_00170,9,Johann Werner von Vorster,,Funktionsausübung,,,1744-00-00,1744-00-00,,...,"Hofrat / Regierung, adlige Bank",,Hof- und Regierungsrat / S. 33,,,,Stk_1745: 39,,Stk_1745,39
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
40075,Erffurt_006271,19,Karl Theodor Freiherr von Dalberg,Karl Theodor Anton Maria Reichsfreiherr von Da...,Weihe,,,1788-02-03,1788-02-03,,...,,,,,,,Gatz1983:Dalberg: 110,,Gatz1983:Dalberg,110
40076,Erffurt_006272,19,Karl Theodor Freiherr von Dalberg,Karl Theodor Anton Maria Reichsfreiherr von Da...,Weihe,,,1788-08-31,1788-08-31,,...,,,Ernennung zum Titularerzbischof von Tarsus am ...,,,,Gatz1983:Dalberg: 110,,Gatz1983:Dalberg,110
40077,Erffurt_006273,19,Karl Theodor Freiherr von Dalberg,Karl Theodor Anton Maria Reichsfreiherr von Da...,Funktionsausübung_Beginn,,,1810,1810,,...,Großherzogtum Frankfurt,,"""ohne Beziehung zu seiner geistlichen Würde""",,,,Gatz1983:Dalberg: 112,,Gatz1983:Dalberg,112
40078,Erffurt_006274,19,Karl Theodor Freiherr von Dalberg,Karl Theodor Anton Maria Reichsfreiherr von Da...,Funktionsausübung_Ende,,1814,,,,...,Großherzogtum Frankfurt,,,,,,Gatz1983:Dalberg: 112,,Gatz1983:Dalberg,112


In [5]:
# Merge input dataframe with dfs containing person IDs and geocoding

## Read person IDs from Github
## columns: pers_ID_MB, pers_name, alternative_names, Unnamed: 4, name_new_fs, pers_ID_FS,poi, pers_name_corr, freq
infile1="https://github.com/ieg-dhr/DigiKAR/raw/main/OntologyFiles/Factoid_PersonNames_merged.xlsx" # has to contain pers_name column!
person_df = pd.read_excel(infile1)

## Read geocoding from Github
infile2="https://github.com/ieg-dhr/DigiKAR/raw/main/OntologyFiles/Ortsontologie_Geocoded_extract.xlsx" # has to contain place_name column!
geo_df = pd.read_excel(infile2).drop_duplicates(subset=['place_name'])

## Merge input dataframe horizontally
# keeping only rows with common values in both dataframes but dropping rows with uncommon values

merged_df1 = pd.merge(df2, geo_df, on='place_name', how="left")
merged_df2 = pd.merge(merged_df1, person_df, on='pers_name', how="left")

display(merged_df1)

Unnamed: 0,factoid_ID,pers_ID,pers_name,alternative_names,event_type,event_after-date,event_before-date,event_start,event_end,event_date,...,Source,addresses_full,ids,geonames address,latitudes,longitudes,lat,lng,Google address,NaN
0,Stk_00010,9,Johann Werner von Vorstern,,Funktionsausübung,,,1739-00-00,1739-00-00,,...,Universitätsmatrikeln,"Mainz, Europe",2874225.0,Mainz,49.98419,8.2791,49.992862,8.247253,"Mainz, Germany",0.000173
1,Stk_00052,9,Johann Werner von Vorstern,,Funktionsausübung,,,1740-00-00,1740-00-00,,...,Universitätsmatrikeln,"Mainz, Europe",2874225.0,Mainz,49.98419,8.2791,49.992862,8.247253,"Mainz, Germany",0.000173
2,Stk_00087,9,Johann Werner von Vorstern,,Funktionsausübung,,,1741-00-00,1741-00-00,,...,Universitätsmatrikeln,"Mainz, Europe",2874225.0,Mainz,49.98419,8.2791,49.992862,8.247253,"Mainz, Germany",0.000173
3,Stk_00125,9,Johann Werner von Vorstern,,Funktionsausübung,,,1742-00-00,1742-00-00,,...,Universitätsmatrikeln,"Mainz, Europe",2874225.0,Mainz,49.98419,8.2791,49.992862,8.247253,"Mainz, Germany",0.000173
4,Stk_00170,9,Johann Werner von Vorster,,Funktionsausübung,,,1744-00-00,1744-00-00,,...,Universitätsmatrikeln,"Mainz, Europe",2874225.0,Mainz,49.98419,8.2791,49.992862,8.247253,"Mainz, Germany",0.000173
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
37395,Erffurt_006271,19,Karl Theodor Freiherr von Dalberg,Karl Theodor Anton Maria Reichsfreiherr von Da...,Weihe,,,1788-02-03,1788-02-03,,...,,,,,,,,,,
37396,Erffurt_006272,19,Karl Theodor Freiherr von Dalberg,Karl Theodor Anton Maria Reichsfreiherr von Da...,Weihe,,,1788-08-31,1788-08-31,,...,,,,,,,,,,
37397,Erffurt_006273,19,Karl Theodor Freiherr von Dalberg,Karl Theodor Anton Maria Reichsfreiherr von Da...,Funktionsausübung_Beginn,,,1810,1810,,...,,,,,,,,,,
37398,Erffurt_006274,19,Karl Theodor Freiherr von Dalberg,Karl Theodor Anton Maria Reichsfreiherr von Da...,Funktionsausübung_Ende,,1814,,,,...,,,,,,,,,,


In **step 2**, we reconstruct end dates for successive start dates. The data are automatically aggregated using Python's `groupby` function. If the results are too narrow or too broad, please change the aggregation rules below!


In [6]:
# Group the dataframe and aggregate the start and end dates
# code updated after problem with merged columns
# see discussion on Stackoverflow: https://stackoverflow.com/questions/76558443/column-remains-empty-when-using-map-with-dictionary-in-pandas-dataframe/76558586#76558586

grouped_df = merged_df2.groupby(['pers_ID_FS', 'event_type', "pers_function", "pers_title", "inst_name", "place_name"], as_index=False).agg(
                                                         {'event_start': 'min',
                                                          "event_after-date":'min',
                                                          "event_before-date":'max',
                                                          "event_end":'max',
                                                          "factoid_ID":list,
                                                          "pers_ID_MB":list,
                                                          "pers_name":list,
                                                          #"alternative_names":list,
                                                          "Unnamed: 4":list,
                                                          "name_new_fs":list,
                                                          "pers_name_corr":list,
                                                          "rel_pers":list,
                                                          "source_quotations":list,
                                                          "additional_info":list,
                                                          "comment":list,
                                                          "info_dump":list,
                                                          "source_combined":list,
                                                          "event_value":list,
                                                          "place_name":list,
                                                          #"address":list,
                                                          "addresses_full":list,
                                                          "ids":list,
                                                          "geonames address":list,
                                                          "latitudes":list,
                                                          "longitudes":list,
                                                          "lat":list,
                                                          "lng":list,
                                                          "Google address":list
                                                          })

display(grouped_df)

Unnamed: 0,pers_ID_FS,event_type,pers_function,pers_title,inst_name,event_start,event_after-date,event_before-date,event_end,factoid_ID,...,event_value,place_name,addresses_full,ids,geonames address,latitudes,longitudes,lat,lng,Google address
0,1,Funktionsausübung,Kriegszahlmeister,,Kammer zu Erfurt,1781-00-00,,,1781-00-00,[Erffurt_000158],...,[n/a],[Erfurt],"[Erfurt, Europe]",[2929670.0],[Erfurt],[50.9787],[11.03283],[50.98476789999999],[11.0298799],"[Erfurt, Germany]"
1,10,Aufnahme,Bürger,,Bürgerschaft zu Erfurt,1747-02-20,,,1747-02-20,[Erffurt_000416],...,[n/a],[Erfurt],"[Erfurt, Europe]",[2929670.0],[Erfurt],[50.9787],[11.03283],[50.98476789999999],[11.0298799],"[Erfurt, Germany]"
2,10,Funktionsausübung,Fabrikant,,Wollfabrik,0000,1750,1759,0000,"[Erffurt_004934, Erffurt_004934]",...,"[n/a, n/a]","[Erfurt, Erfurt]","[Erfurt, Europe, Erfurt, Europe]","[2929670.0, 2929670.0]","[Erfurt, Erfurt]","[50.9787, 50.9787]","[11.03283, 11.03283]","[50.98476789999999, 50.98476789999999]","[11.0298799, 11.0298799]","[Erfurt, Germany, Erfurt, Germany]"
3,10,Funktionsausübung,Kammer- und Zahlmeister,,Kammer zu Erfurt,1766-00-00,,,1766-00-00,"[Erffurt_000152, Erffurt_000152]",...,"[n/a, n/a]","[Erfurt, Erfurt]","[Erfurt, Europe, Erfurt, Europe]","[2929670.0, 2929670.0]","[Erfurt, Erfurt]","[50.9787, 50.9787]","[11.03283, 11.03283]","[50.98476789999999, 50.98476789999999]","[11.0298799, 11.0298799]","[Erfurt, Germany, Erfurt, Germany]"
4,10,Funktionsausübung,Kammerrat,,Kammer zu Erfurt,1766-00-00,,,1766-00-00,"[Erffurt_000151, Erffurt_000151]",...,"[n/a, n/a]","[Erfurt, Erfurt]","[Erfurt, Europe, Erfurt, Europe]","[2929670.0, 2929670.0]","[Erfurt, Erfurt]","[50.9787, 50.9787]","[11.03283, 11.03283]","[50.98476789999999, 50.98476789999999]","[11.0298799, 11.0298799]","[Erfurt, Germany, Erfurt, Germany]"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4638,P-002,,Syndikus,Lic. iur. utr.,Universität Mainz,1746-00-00,,,1753-00-00,"[Stk_31736, Stk_31738, Stk_31740, Stk_31742, S...",...,"[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[Mainz, Mainz, Mainz, Mainz, Mainz, Mainz, Mai...","[Mainz, Europe, Mainz, Europe, Mainz, Europe, ...","[2874225.0, 2874225.0, 2874225.0, 2874225.0, 2...","[Mainz, Mainz, Mainz, Mainz, Mainz, Mainz, Mai...","[49.98419, 49.98419, 49.98419, 49.98419, 49.98...","[8.2791, 8.2791, 8.2791, 8.2791, 8.2791, 8.279...","[49.9928617, 49.9928617, 49.9928617, 49.992861...","[8.2472526, 8.2472526, 8.2472526, 8.2472526, 8...","[Mainz, Germany, Mainz, Germany, Mainz, German..."
4639,P-002,,Syndikus,,Universität Mainz,1742-00-00,,,1756-00-00,"[Stk_31732, Stk_31733, Stk_31734, Stk_31735, S...",...,"[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[Mainz, Mainz, Mainz, Mainz, Mainz, Mainz, Mai...","[Mainz, Europe, Mainz, Europe, Mainz, Europe, ...","[2874225.0, 2874225.0, 2874225.0, 2874225.0, 2...","[Mainz, Mainz, Mainz, Mainz, Mainz, Mainz, Mai...","[49.98419, 49.98419, 49.98419, 49.98419, 49.98...","[8.2791, 8.2791, 8.2791, 8.2791, 8.2791, 8.279...","[49.9928617, 49.9928617, 49.9928617, 49.992861...","[8.2472526, 8.2472526, 8.2472526, 8.2472526, 8...","[Mainz, Germany, Mainz, Germany, Mainz, German..."
4640,P-002,,,Dr. iur. utr.,,1755-00-00,,,1756-00-00,"[Stk_31766, Stk_31767, Stk_31766, Stk_31767]",...,"[n/a, n/a, n/a, n/a]","[n/a, n/a, n/a, n/a]","[nan, nan, nan, nan]","[nan, nan, nan, nan]","[nan, nan, nan, nan]","[nan, nan, nan, nan]","[nan, nan, nan, nan]","[nan, nan, nan, nan]","[nan, nan, nan, nan]","[nan, nan, nan, nan]"
4641,P-002,,,Lic. iur. utr.,,1740-00-00,,,1753-00-00,"[Stk_31757, Stk_31758, Stk_31759, Stk_31760, S...",...,"[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ..."


In **step 3**, we can flatten the information and only preserve unique information per cell.

In [7]:
# flatten data in dataframe cells

def flatten_list(cell):
    if isinstance(cell, list):
        unique_values = set(cell)
        return ', '.join(str(value) for value in unique_values)
    else:
        return str(cell)

# flatten all cells containing lists
df3 = grouped_df.applymap(flatten_list)

# show the flattened DataFrame
display(df3)

Unnamed: 0,pers_ID_FS,event_type,pers_function,pers_title,inst_name,event_start,event_after-date,event_before-date,event_end,factoid_ID,...,event_value,place_name,addresses_full,ids,geonames address,latitudes,longitudes,lat,lng,Google address
0,1,Funktionsausübung,Kriegszahlmeister,,Kammer zu Erfurt,1781-00-00,,,1781-00-00,Erffurt_000158,...,,Erfurt,"Erfurt, Europe",2929670.0,Erfurt,50.9787,11.03283,50.98476789999999,11.0298799,"Erfurt, Germany"
1,10,Aufnahme,Bürger,,Bürgerschaft zu Erfurt,1747-02-20,,,1747-02-20,Erffurt_000416,...,,Erfurt,"Erfurt, Europe",2929670.0,Erfurt,50.9787,11.03283,50.98476789999999,11.0298799,"Erfurt, Germany"
2,10,Funktionsausübung,Fabrikant,,Wollfabrik,0000,1750,1759,0000,Erffurt_004934,...,,Erfurt,"Erfurt, Europe",2929670.0,Erfurt,50.9787,11.03283,50.98476789999999,11.0298799,"Erfurt, Germany"
3,10,Funktionsausübung,Kammer- und Zahlmeister,,Kammer zu Erfurt,1766-00-00,,,1766-00-00,Erffurt_000152,...,,Erfurt,"Erfurt, Europe",2929670.0,Erfurt,50.9787,11.03283,50.98476789999999,11.0298799,"Erfurt, Germany"
4,10,Funktionsausübung,Kammerrat,,Kammer zu Erfurt,1766-00-00,,,1766-00-00,Erffurt_000151,...,,Erfurt,"Erfurt, Europe",2929670.0,Erfurt,50.9787,11.03283,50.98476789999999,11.0298799,"Erfurt, Germany"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4638,P-002,,Syndikus,Lic. iur. utr.,Universität Mainz,1746-00-00,,,1753-00-00,"Stk_31738, Stk_31740, Stk_31736, Stk_31748, St...",...,,Mainz,"Mainz, Europe",2874225.0,Mainz,49.98419,8.2791,49.9928617,8.2472526,"Mainz, Germany"
4639,P-002,,Syndikus,,Universität Mainz,1742-00-00,,,1756-00-00,"Stk_31739, Stk_31751, Stk_31741, Stk_31737, St...",...,,Mainz,"Mainz, Europe",2874225.0,Mainz,49.98419,8.2791,49.9928617,8.2472526,"Mainz, Germany"
4640,P-002,,,Dr. iur. utr.,,1755-00-00,,,1756-00-00,"Stk_31766, Stk_31767",...,,,,"nan, nan, nan, nan",,"nan, nan, nan, nan","nan, nan, nan, nan","nan, nan, nan, nan","nan, nan, nan, nan",
4641,P-002,,,Lic. iur. utr.,,1740-00-00,,,1753-00-00,"Stk_31762, Stk_31764, Stk_31760, Stk_31763, St...",...,,,,"nan, nan, nan, nan, nan, nan, nan, nan, nan, n...",,"nan, nan, nan, nan, nan, nan, nan, nan, nan, n...","nan, nan, nan, nan, nan, nan, nan, nan, nan, n...","nan, nan, nan, nan, nan, nan, nan, nan, nan, n...","nan, nan, nan, nan, nan, nan, nan, nan, nan, n...",


In **step 4**, we enrich the data, e.g. by adding event values from an external Python dictionary stored in Github.

In [8]:
## load external dictionary with EVENT VALUES
# following method 2 on https://www.geeksforgeeks.org/how-to-read-dictionary-from-file-in-python/

# importing the module
import requests
import ast

master = "https://raw.githubusercontent.com/ieg-dhr/DigiKAR/main/Data%20Categorisation/Event_value_dict.txt" # add Sven's new mapping
req = requests.get(master)
req = req.text
print(req)

# reconstructing the data as a dictionary
event_value_dict = ast.literal_eval(req)
print(type(event_value_dict))

# add event values from dict to data frame

try:
    test = event_value_dict["Aufschwörung"] # random test if valid dict
    print("Value for chosen key: ", test)
except:
    print("Invalid dict structure!")

df3['event_value'] = df3['event_type'].map(event_value_dict) # optional: na_action='ignore'

display(df3)

{
    "Amtsantritt": "K",
    "Aufenthalt": "M",
    "Beisetzung": "Z",
    "Bewerbung": "I",
    "Eheschließung": "P",
    "Entlassung": "R",
    "Erfolglose Bewerbung": "I",
    "Flucht": "M",
    "Funktionsausübung": "Q",
    "Geburt": "A",
    "Graduation": "G",
    "Haft": "M",
    "Immatrikulation": "C",
    "Konflikt": "M",
    "Konversion": "M",
    "Mitgliedschaft": "L",
    "Nicht-Ausübung": "N",
    "Nobilitierung": "S",
    "Ordination": "J",
    "Primäre Bildungsstation": "B",
    "Promotion": "H",
    "Prüfung": "F",
    "Resignation": "T",
    "Rezeption": "K",
    "Sonstiges": "M",
    "Studium": "D",
    "Taufe": "A",
    "Tod": "Y",
    "Verleihung eines Ehrentitels": "S",
    "Wahl": "K",
    "Weihe": "J",
    "Wohnsitznahme": "O",
    "Zulassung": "E"
   }
   

<class 'dict'>
Invalid dict structure!


Unnamed: 0,pers_ID_FS,event_type,pers_function,pers_title,inst_name,event_start,event_after-date,event_before-date,event_end,factoid_ID,...,event_value,place_name,addresses_full,ids,geonames address,latitudes,longitudes,lat,lng,Google address
0,1,Funktionsausübung,Kriegszahlmeister,,Kammer zu Erfurt,1781-00-00,,,1781-00-00,Erffurt_000158,...,Q,Erfurt,"Erfurt, Europe",2929670.0,Erfurt,50.9787,11.03283,50.98476789999999,11.0298799,"Erfurt, Germany"
1,10,Aufnahme,Bürger,,Bürgerschaft zu Erfurt,1747-02-20,,,1747-02-20,Erffurt_000416,...,,Erfurt,"Erfurt, Europe",2929670.0,Erfurt,50.9787,11.03283,50.98476789999999,11.0298799,"Erfurt, Germany"
2,10,Funktionsausübung,Fabrikant,,Wollfabrik,0000,1750,1759,0000,Erffurt_004934,...,Q,Erfurt,"Erfurt, Europe",2929670.0,Erfurt,50.9787,11.03283,50.98476789999999,11.0298799,"Erfurt, Germany"
3,10,Funktionsausübung,Kammer- und Zahlmeister,,Kammer zu Erfurt,1766-00-00,,,1766-00-00,Erffurt_000152,...,Q,Erfurt,"Erfurt, Europe",2929670.0,Erfurt,50.9787,11.03283,50.98476789999999,11.0298799,"Erfurt, Germany"
4,10,Funktionsausübung,Kammerrat,,Kammer zu Erfurt,1766-00-00,,,1766-00-00,Erffurt_000151,...,Q,Erfurt,"Erfurt, Europe",2929670.0,Erfurt,50.9787,11.03283,50.98476789999999,11.0298799,"Erfurt, Germany"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4638,P-002,,Syndikus,Lic. iur. utr.,Universität Mainz,1746-00-00,,,1753-00-00,"Stk_31738, Stk_31740, Stk_31736, Stk_31748, St...",...,,Mainz,"Mainz, Europe",2874225.0,Mainz,49.98419,8.2791,49.9928617,8.2472526,"Mainz, Germany"
4639,P-002,,Syndikus,,Universität Mainz,1742-00-00,,,1756-00-00,"Stk_31739, Stk_31751, Stk_31741, Stk_31737, St...",...,,Mainz,"Mainz, Europe",2874225.0,Mainz,49.98419,8.2791,49.9928617,8.2472526,"Mainz, Germany"
4640,P-002,,,Dr. iur. utr.,,1755-00-00,,,1756-00-00,"Stk_31766, Stk_31767",...,,,,"nan, nan, nan, nan",,"nan, nan, nan, nan","nan, nan, nan, nan","nan, nan, nan, nan","nan, nan, nan, nan",
4641,P-002,,,Lic. iur. utr.,,1740-00-00,,,1753-00-00,"Stk_31762, Stk_31764, Stk_31760, Stk_31763, St...",...,,,,"nan, nan, nan, nan, nan, nan, nan, nan, nan, n...",,"nan, nan, nan, nan, nan, nan, nan, nan, nan, n...","nan, nan, nan, nan, nan, nan, nan, nan, nan, n...","nan, nan, nan, nan, nan, nan, nan, nan, nan, n...","nan, nan, nan, nan, nan, nan, nan, nan, nan, n...",


In [None]:
## load external dictionary with EVENT CATEGORIES (e.g. I: agent-oriented)
# following method 2 on https://www.geeksforgeeks.org/how-to-read-dictionary-from-file-in-python/

# importing the module
import requests
import ast

master = "https://raw.githubusercontent.com/ieg-dhr/DigiKAR/main/Data%20Categorisation/####.txt" # add file name
req = requests.get(master)
req = req.text
print(req)

# reconstructing the data as a dictionary
event_category_dict = ast.literal_eval(req)
print(type(event_category_dict))

# add event values from dict to data frame

try:
    test = event_category_dict["Geburt"] # random test if valid dict
    print("Value for chosen key: ", test)
except:
    print("Invalid dict structure!")

df3['event_category'] = df3['event_type'].map(event_category_dict) # optional: na_action='ignore'

display(df3)

In [None]:
## load external dictionary with FUNCTION CATEGORIES (e.g. teaching versus administration)
# following method 2 on https://www.geeksforgeeks.org/how-to-read-dictionary-from-file-in-python/

# importing the module
import requests
import ast

master = "https://raw.githubusercontent.com/ieg-dhr/DigiKAR/main/Data%20Categorisation/####.txt" # add file name
req = requests.get(master)
req = req.text
print(req)

# reconstructing the data as a dictionary
function_category_dict = ast.literal_eval(req)
print(type(function_category_dict))

# add event values from dict to data frame

try:
    test = function_category_dict["Professor"] # random test if valid dict
    print("Value for chosen key: ", test)
except:
    print("Invalid dict structure!")

df3['function_category'] = df3['pers_function'].map(function_category_dict) # optional: na_action='ignore'

display(df3)

In [9]:
# save enriched df to DRIVE

workbook=directory+'FACTOIDS_consolidated/Factoid_Staatskalender-Erfurt_consolidation_coordinates_event-values_person-IDs.xlsx'
print(workbook)
writer = pd.ExcelWriter(workbook, engine='xlsxwriter') # create a Pandas Excel writer using XlsxWriter as the engine.
df3.to_excel(writer, sheet_name='FactCons1') # Convert the dataframe to an XlsxWriter Excel object.
writer.save() # Close the Pandas Excel writer and output the Excel file.
print("Done.")

/content/drive/My Drive/Colab_DigiKAR/FACTOIDS_consolidated/Factoid_Staatskalender-Erfurt_consolidation_coordinates_event-values_person-IDs.xlsx


  writer.save() # Close the Pandas Excel writer and output the Excel file.


Done.


Check the output files and repeat process if necessary.

Script by Monika Barget, Maastricht/Mainz

June 2023
