<a href="https://colab.research.google.com/github/ieg-dhr/DigiKAR/blob/main/JupyterNotebooks_DigiKAR/Factoids_Step2c_VerticalConsolidation_Staatskalender.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is a script for consolidating factoid lists in AP3.

The package mainly uses the Pandas package in Python to read and manipulate EXCEL data as DataFrames. DataFrames are 2-dimensional data representations in rows and columns. They can be written to different file formats such as CSV, EXCEL, JSON or RDF.

First of all, we need to connect this Colab notebook with your Google Drive and define the directory for input and output data.


In [93]:
## mount drive
from google.colab import drive
drive.mount("/content/drive")
directory="/content/drive/My Drive/Colab_DigiKAR/"

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In the second step, we have to install additional Packages needed for working with CSV, EXCEL and DataFrames.

In [94]:
## install packages that are not part of Python's standard distribution

!pip install xlsxwriter
!pip install pandas
!pip install numpy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In **step 1**, we can import the packages to the script and load our data. Before merging the input files, names will be normalised as some have access spaces, capitalised surnames, or inverted first and last names.

The combined data will be written to a new dataframe and displayed.

In [95]:
import xlsxwriter
import csv
import pandas as pd
from pandas import DataFrame
import numpy as np
import os
import re

# path to input files

factoid_paths=["https://github.com/ieg-dhr/DigiKAR/raw/main/Sample%20Data/FactoidList_1756er_Staatskalender_Meta_final_TEST-MB_FS0.xlsx",
               "https://github.com/ieg-dhr/DigiKAR/raw/main/Sample%20Data/FactoidList_1756er_Staatskalender_Meta_final_TEST-MB_FS1.xlsx",
               "https://github.com/ieg-dhr/DigiKAR/raw/main/Sample%20Data/FactoidList_1756er_Staatskalender_Meta_final_TEST-MB_FS2.xlsx",
               "https://github.com/ieg-dhr/DigiKAR/raw/main/Sample%20Data/FactoidList_1756er_Staatskalender_Meta_final_TEST-MB_FS3.xlsx",
               "https://github.com/ieg-dhr/DigiKAR/raw/main/Sample%20Data/FactoidList_1756er_Staatskalender_Meta_final_TEST-MB_FS4.xlsx"
               ]

# define dataframe for final output

f_to_add=[]

# structure of input files

# obligatory columns in valid factoid list

# read all data frames from path

frame_list=[]
for file in factoid_paths:
    df = pd.read_excel(file, index_col=None, dtype=str) # axis=1, sort=False sheet_name='FactoidList'
    df = df.fillna("n/a") # replace empty fields for string
    df_length=len(df)
    frame_list.append(df)

f = pd.concat(frame_list, axis=0, ignore_index=True, sort=False)

print("There are ", len(f), "items in your DataFrame!")

# delete all duplicate rows with exact matches

f_unique=f.drop_duplicates()
print("Your DataFrame has now ", len(f_unique), "items with at least one unique cell." )

# add columns missing according to factoid model

column_names = ["factoid_ID",
                "pers_ID",
                "pers_name",
                "alternative_names",
                "event_type",
                "event_after-date",
                "event_before-date",
                "event_start",
                "event_end",
                "event_date",
                "pers_title",
                "pers_function",
                "place_name",
                "inst_name",
                "rel_pers",
                "source_quotations",
                "additional_info",
                "comment",
                "info_dump",
                "source_combined",
                "event_value", # add more potential categorisations if needed
                "source",
                "source_site"]

df2 = f_unique.reindex(columns=column_names)
df2.fillna('n/a', inplace=True)

# populate some of the empty columns with data

df2.loc[:, "event_end"] = df2["event_start"]
df2.loc[:, "event_type"] = ["Funktionsausübung"] * 31414
df2['source_combined'] = df2['source'].astype(str) + ': ' + df2['source_site'].astype(str)

print("Done.")

# rename dataframe for next step

display(df2)


There are  34094 items in your DataFrame!
Your DataFrame has now  31414 items with at least one unique cell.
Done.


Unnamed: 0,factoid_ID,pers_ID,pers_name,alternative_names,event_type,event_after-date,event_before-date,event_start,event_end,event_date,...,inst_name,rel_pers,source_quotations,additional_info,comment,info_dump,source_combined,event_value,source,source_site
0,Stk_00010,9,Johann Werner von Vorstern,,Funktionsausübung,,,1739-00-00,1739-00-00,,...,"Hofrat / Regierung, adlige Bank",,Hof- und Regierungsrat,,,,Stk_1740B: 44,,Stk_1740B,44
1,Stk_00052,9,Johann Werner von Vorstern,,Funktionsausübung,,,1740-00-00,1740-00-00,,...,"Hofrat / Regierung, adlige Bank",,Hof- und Regierungsrat,,,,Stk_1741: 45,,Stk_1741,45
2,Stk_00087,9,Johann Werner von Vorstern,,Funktionsausübung,,,1741-00-00,1741-00-00,,...,"Hofrat / Regierung, adlige Bank",,Hof- und Regierungsrat,,,,Stk_1742: 38,,Stk_1742,38
3,Stk_00125,9,Johann Werner von Vorstern,,Funktionsausübung,,,1742-00-00,1742-00-00,,...,"Hofrat / Regierung, adlige Bank",,Hof- und Regierungsrat,,,,Stk_1743: 37,,Stk_1743,37
4,Stk_00170,9,Johann Werner von Vorster,,Funktionsausübung,,,1744-00-00,1744-00-00,,...,"Hofrat / Regierung, adlige Bank",,Hof- und Regierungsrat / S. 33,,,,Stk_1745: 39,,Stk_1745,39
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
34089,Stk_41046,11,Friedrich Anton Christoph Freiherr von und zu ...,,Funktionsausübung,,,,,,...,Kommerzienkonferenz,,Vicekämmererpräsident / S.90,,,,Stk_1774: 87,,Stk_1774,87
34090,Stk_41047,11,Friedrich Anton Christoph Freiherr von und zu ...,,Funktionsausübung,,,,,,...,Hof- und Kammermusik,,S.90,,,,Stk_1774: 117,,Stk_1774,117
34091,Stk_41048,11,Friedrich Anton Christoph Freiherr von und zu ...,,Funktionsausübung,,,,,,...,Kommerzienkonferenz,,Vicekämmererpräsident / S.101,,,,Stk_1775: 98,,Stk_1775,98
34092,Stk_41049,11,Friedrich Anton Christoph Freiherr von und zu ...,,Funktionsausübung,,,,,,...,Hof- und Kammermusik,,S.101,,,,Stk_1775: 135,,Stk_1775,135


In [99]:
# Merge input dataframe with dfs containing person IDs and geocoding

## Read person IDs from Github
#infile1="####.xlsx" # has to contain pers_name column!
#person_df = pd.read_excel(infile1)

## Read geocoding from Github
infile2="https://github.com/ieg-dhr/DigiKAR/raw/main/OntologyFiles/Ortsontologie_Geocoded_gepr%C3%BCft.xlsx" # has to contain place_name column!
geo_df = pd.read_excel(infile2)

## Merge input dataframe horizontally

from functools import reduce

# define list of DataFrames
dfs = [df2, geo_df] # dataframe list can be extended if necessary

# merge all DataFrames into one
final_df = reduce(lambda  left,right: pd.merge(left,right,on=['place_name']),dfs) # how='outer'

display(final_df)

## Write new table to excel file (optional)

#outfile=directory+"AP3_final-df.xlsx"
#final_df.to_excel(outfile)

Unnamed: 0,factoid_ID,pers_ID,pers_name,alternative_names,event_type,event_after-date,event_before-date,event_start,event_end,event_date,...,ids,geonames address,latitudes,longitudes,lat,lng,Google address,Unnamed: 21,Unnamed: 22,NaN
0,Stk_00010,9,Johann Werner von Vorstern,,Funktionsausübung,,,1739-00-00,1739-00-00,,...,2874225,Mainz,49.98419,8.27910,49.992862,8.247253,"Mainz, Germany",,,0.000173
1,Stk_00052,9,Johann Werner von Vorstern,,Funktionsausübung,,,1740-00-00,1740-00-00,,...,2874225,Mainz,49.98419,8.27910,49.992862,8.247253,"Mainz, Germany",,,0.000173
2,Stk_00087,9,Johann Werner von Vorstern,,Funktionsausübung,,,1741-00-00,1741-00-00,,...,2874225,Mainz,49.98419,8.27910,49.992862,8.247253,"Mainz, Germany",,,0.000173
3,Stk_00125,9,Johann Werner von Vorstern,,Funktionsausübung,,,1742-00-00,1742-00-00,,...,2874225,Mainz,49.98419,8.27910,49.992862,8.247253,"Mainz, Germany",,,0.000173
4,Stk_00170,9,Johann Werner von Vorster,,Funktionsausübung,,,1744-00-00,1744-00-00,,...,2874225,Mainz,49.98419,8.27910,49.992862,8.247253,"Mainz, Germany",,,0.000173
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22909,Stk_31837,79,Johann Maria Rudolph Graf Waldbott von Bassenheim,Waldbott in Bassenheim,Funktionsausübung,,,1758-00-00,1758-00-00,,...,2867514,Münstermaifeld,50.24638,7.36208,50.248288,7.362351,"Münstermaifeld, Germany",,,0.000038
22910,Stk_31838,79,Johann Maria Rudolph Graf Waldbott von Bassenheim,Waldbott in Bassenheim,Funktionsausübung,,,1759-00-00,1759-00-00,,...,2867514,Münstermaifeld,50.24638,7.36208,50.248288,7.362351,"Münstermaifeld, Germany",,,0.000038
22911,Stk_31839,79,Johann Maria Rudolph Graf Waldbott von Bassenheim,Waldbott in Bassenheim,Funktionsausübung,,,1760-00-00,1760-00-00,,...,2867514,Münstermaifeld,50.24638,7.36208,50.248288,7.362351,"Münstermaifeld, Germany",,,0.000038
22912,Stk_31840,79,Johann Maria Rudolph Graf Waldbott von Bassenheim,Waldbott in Bassenheim,Funktionsausübung,,,1761-00-00,1761-00-00,,...,2867514,Münstermaifeld,50.24638,7.36208,50.248288,7.362351,"Münstermaifeld, Germany",,,0.000038


In **step 2**, we reconstruct end dates for successive start dates. The data are automatically aggregated using Python's `groupby` function. If the results are too narrow or too broad, please change the aggregation rules below!


In [102]:
# Group the dataframe and aggregate the start and end dates
# code updated after problem with merged columns
# see discussion on Stackoverflow: https://stackoverflow.com/questions/76558443/column-remains-empty-when-using-map-with-dictionary-in-pandas-dataframe/76558586#76558586

grouped_df = final_df.groupby(['pers_name', 'event_type', "pers_function", "pers_title", "inst_name", "place_name"], as_index=False).agg(
                                                         {'event_start': 'min',
                                                          "event_after-date":'min',
                                                          "event_before-date":'max',
                                                          "event_end":'max',
                                                          "factoid_ID":list,
                                                          "alternative_names":list,
                                                          "pers_ID":list,
                                                          "rel_pers":list,
                                                          "source_quotations":list,
                                                          "additional_info":list,
                                                          "comment":list,
                                                          "info_dump":list,
                                                          "source_combined":list,
                                                          "event_value":list
                                                          })

display(grouped_df)

Unnamed: 0,pers_name,event_type,pers_function,pers_title,inst_name,place_name,event_start,event_after-date,event_before-date,event_end,factoid_ID,alternative_names,pers_ID,rel_pers,source_quotations,additional_info,comment,info_dump,source_combined,event_value
0,Adam Philipp Teitzel,Funktionsausübung,Assessor Referendarius Ordinarius,,Oberlandgericht im Eichsfeld,Heiligenstadt,1739-00-00,,,1764-00-00,"[Stk_03142, Stk_30166, Stk_30181, Stk_30197, S...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[903, 903, 903, 903, 903, 903, 903, 903, 903, ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[n/a, , , , , , , , , , , , , , ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[Stk_1756: 112, Stk_1740: 102, Stk_1741: 103, ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ..."
1,Adam Philipp Teitzel,Funktionsausübung,Assessores Referendarii Ordinarii,,Oberlandgericht im Eichsfeld,Heiligenstadt,1739-00-00,,,1764-00-00,"[Stk_30166, Stk_30181, Stk_30197, Stk_30213, S...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[903, 903, 903, 903, 903, 903, 903, 903, 903, ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[ , , , , , , , , , , , , , , , ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[ , , , , , , , , , , , , , , , ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[Stk_1740: 102, Stk_1741: 103, Stk_1742: 98, S...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ..."
2,Adam Philipp Teitzel,Funktionsausübung,Assessores Referendarii. Ordinarii.,,Oberlandgericht im Eichsfeld,Heiligenstadt,1755-00-00,,,1755-00-00,"[Stk_03142, Stk_03142]","[n/a, n/a]","[903, 903]","[n/a, n/a]","[n/a, n/a]","[n/a, n/a]","[n/a, n/a]","[n/a, n/a]","[Stk_1756: 112, Stk_1756: 112]","[n/a, n/a]"
3,Adam Philipp Teitzel,Funktionsausübung,Rat,Regierungsrat,Regierung im Eichsfeld,Heiligenstadt,1739-00-00,,,1764-00-00,"[Stk_03120, Stk_10011, Stk_30002, Stk_30008, S...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[903, 903, 903, 903, 903, 903, 903, 903, 903, ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[S. 112, S. 112, S. 113, S. 127, S. 127, , ,...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[Stk_1756: 113, Stk_1757: 114, Stk_1758: 114, ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ..."
4,Adam Philipp Teitzel,Funktionsausübung,Rat,,Regierung im Eichsfeld,Heiligenstadt,1739-00-00,,,1764-00-00,"[Stk_03120, Stk_10011, Stk_30002, Stk_30008, S...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[903, 903, 903, 903, 903, 903, 903, 903, 903, ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[S. 112, S. 112, S. 113, S. 127, S. 127, , ,...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[Stk_1756: 113, Stk_1757: 114, Stk_1758: 114, ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1307,Veit Christoph Molitor,Funktionsausübung,Rat,Hof- und Regierungsrat,"Hofrat / Regierung, gelehrte Bank",Mainz,1760-00-00,,,1770-00-00,"[Stk_00939, Stk_01016, Stk_01070, Stk_01126, S...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 3...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[Hof- und Regeriungsrat, Hof- und Regeriungsra...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[Stk_1761: 58, Stk_1762: 58, Stk_1763: 58, Stk...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ..."
1308,Veit Christoph Molitor,Funktionsausübung,Rat,,"Hofrat / Regierung, gelehrte Bank",Mainz,1740-00-00,,,1770-00-00,"[Stk_00040, Stk_00114, Stk_00149, Stk_00193, S...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 3...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[S. 14, S. 14, S. 12, S. 12, S. 12, S. 12, S. ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[Stk_1741: 45, Stk_1742: 38, Stk_1743: 37, Stk...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ..."
1309,Veit Christoph Molitor,Funktionsausübung,Stadtschultheiß,,Aschaffenburg die Stadt,Aschaffenburg,1739-00-00,,,,"[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 3...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[S.13, S.14, S.14, S.12, S.12, S.12, S.12, S.1...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ...","[Stk_1740: 69, Stk_1741: 71, Stk_1742: 65, Stk...","[n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, n/a, ..."
1310,Veit Christoph Molitor,Funktionsausübung,Stadtschultheiß,,Residenzstadt Aschaffenburg,Aschaffenburg,1755-00-00,,,1755-00-00,"[Stk_03036, Stk_03036]","[n/a, n/a]","[39, 39]","[n/a, n/a]","[S. 11 [falsch!], S. 11 [falsch!]]","[n/a, n/a]","[Verortung als Hofrat?, Verortung als Hofrat?]","[n/a, n/a]","[Stk_1756: 77, Stk_1756: 77]","[n/a, n/a]"


In **step 3**, we can flatten the information and only preserve unique information per cell.

In [103]:
# flatten data in dataframe cells

def flatten_list(cell):
    if isinstance(cell, list):
        unique_values = set(cell)
        return ', '.join(str(value) for value in unique_values)
    else:
        return str(cell)

# flatten all cells containing lists
df3 = grouped_df.applymap(flatten_list)

# show the flattened DataFrame
display(df3)

Unnamed: 0,pers_name,event_type,pers_function,pers_title,inst_name,place_name,event_start,event_after-date,event_before-date,event_end,factoid_ID,alternative_names,pers_ID,rel_pers,source_quotations,additional_info,comment,info_dump,source_combined,event_value
0,Adam Philipp Teitzel,Funktionsausübung,Assessor Referendarius Ordinarius,,Oberlandgericht im Eichsfeld,Heiligenstadt,1739-00-00,,,1764-00-00,"Stk_30410, Stk_30304, Stk_30326, Stk_30390, St...",,903,,", n/a",,,,"Stk_1763: 126, Stk_1746: 95, Stk_1757: 113, St...",
1,Adam Philipp Teitzel,Funktionsausübung,Assessores Referendarii Ordinarii,,Oberlandgericht im Eichsfeld,Heiligenstadt,1739-00-00,,,1764-00-00,"Stk_30410, Stk_30304, Stk_30326, Stk_30390, St...",,903,,,,,,"Stk_1763: 126, Stk_1746: 95, Stk_1757: 113, St...",
2,Adam Philipp Teitzel,Funktionsausübung,Assessores Referendarii. Ordinarii.,,Oberlandgericht im Eichsfeld,Heiligenstadt,1755-00-00,,,1755-00-00,Stk_03142,,903,,,,,,Stk_1756: 112,
3,Adam Philipp Teitzel,Funktionsausübung,Rat,Regierungsrat,Regierung im Eichsfeld,Heiligenstadt,1739-00-00,,,1764-00-00,"Stk_30118, Stk_30026, Stk_30020, Stk_03120, St...",,903,,"S.115, S. 127, , S. 112, S.100, S.95, S.103, ...",,,,"Stk_1741: 105, Stk_1745: 97, Stk_1765: 134, St...",
4,Adam Philipp Teitzel,Funktionsausübung,Rat,,Regierung im Eichsfeld,Heiligenstadt,1739-00-00,,,1764-00-00,"Stk_30118, Stk_30026, Stk_30020, Stk_03120, St...",,903,,"S.115, S. 127, , S. 112, S.100, S.95, S.103, ...",,,,"Stk_1741: 105, Stk_1745: 97, Stk_1765: 134, St...",
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1307,Veit Christoph Molitor,Funktionsausübung,Rat,Hof- und Regierungsrat,"Hofrat / Regierung, gelehrte Bank",Mainz,1760-00-00,,,1770-00-00,"Stk_01126, Stk_01506, Stk_01379, Stk_01245, St...",,39,,"Hof- und Regeriungsrat, Hof- und Regeriungsrat...",,,,"Stk_1765: 61, Stk_1770: 76, Stk_1767: 65, Stk_...",
1308,Veit Christoph Molitor,Funktionsausübung,Rat,,"Hofrat / Regierung, gelehrte Bank",Mainz,1740-00-00,,,1770-00-00,"Stk_01245, Stk_00149, Stk_00530, Stk_01016, St...",,39,,"S. 14, S. 22, Hof- und Regeriungsrat, Hof- und...",,,,"Stk_1765: 61, Stk_1756: 47, Stk_1748: 45, Stk_...",
1309,Veit Christoph Molitor,Funktionsausübung,Stadtschultheiß,,Aschaffenburg die Stadt,Aschaffenburg,1739-00-00,,,,"Stk_20396, Stk_20395, Stk_20398, Stk_20404, St...",,39,,"S.11, S.16, S.22, S.14, S.17, S.20, S.12, S.13...",,,,"Stk_1752: 83, Stk_1741: 71, Stk_1757: 77, Stk_...",
1310,Veit Christoph Molitor,Funktionsausübung,Stadtschultheiß,,Residenzstadt Aschaffenburg,Aschaffenburg,1755-00-00,,,1755-00-00,Stk_03036,,39,,S. 11 [falsch!],,Verortung als Hofrat?,,Stk_1756: 77,


In **step 4**, we enrich the data, e.g. by adding event values from an external Python dictionary stored in Github.

In [104]:
## load external dictionary with EVENT VALUES
# following method 2 on https://www.geeksforgeeks.org/how-to-read-dictionary-from-file-in-python/

# importing the module
import requests
import ast

master = "https://raw.githubusercontent.com/ieg-dhr/DigiKAR/main/Data%20Categorisation/Event_value_dict.txt" # add Sven's new mapping
req = requests.get(master)
req = req.text
print(req)

# reconstructing the data as a dictionary
event_value_dict = ast.literal_eval(req)
print(type(event_value_dict))

# add event values from dict to data frame

try:
    test = event_value_dict["Aufschwörung"] # random test if valid dict
    print("Value for chosen key: ", test)
except:
    print("Invalid dict structure!")

df3['event_value'] = df3['event_type'].map(event_value_dict) # optional: na_action='ignore'

display(df3)

{"Absetzung":50,
"Amtsantritt":42,
"Amtseinführung":41,
"Aufenthalt":20,
"Aufnahme":20,
"Aufschwörung":20,
"Beförderung":44, 
"Eheschließung":20,
"Ehrung":45,
"Entlassung":50,
"erfolglose Bewerbung":20,
"Ernennung":40,
"Funktionsausübung":20,
"Geburt":1, 
"Gesandtschaft":30, 
"Graduation":10,
"Haft":20,
"Immatrikulation":10,
"Introduktion":30, 
"Mitgliedschaft":30,
"mittelbare Nobilitierung":20,
"Nobilitierung":20,
"Pension":91,
"Pensionierung":90,
"Praktikum":10,
"Primäre Bildungsstation":3, 
"Privatunterricht":3,
"Privilegierung":20,
"Promotion":10,
"Präsentation":30, 
"Prüfungsverfahren":10,
"Reise":20, 
"Rejektion":20,
"Resignation":50,
"Rezeption":10, 
"Rücktritt":50,
"Sonstiges":0, 
"Studium":10,
"Suspendierung":50,
"Taufe":2, 
"Tod":100,
"Vereidigung":41,
"Vokation":39, 
"Wappenbesserung":20,
"Wohnsitznahme": 10,
"Zulassung":10}

<class 'dict'>
Value for chosen key:  20


Unnamed: 0,pers_name,event_type,pers_function,pers_title,inst_name,place_name,event_start,event_after-date,event_before-date,event_end,factoid_ID,alternative_names,pers_ID,rel_pers,source_quotations,additional_info,comment,info_dump,source_combined,event_value
0,Adam Philipp Teitzel,Funktionsausübung,Assessor Referendarius Ordinarius,,Oberlandgericht im Eichsfeld,Heiligenstadt,1739-00-00,,,1764-00-00,"Stk_30410, Stk_30304, Stk_30326, Stk_30390, St...",,903,,", n/a",,,,"Stk_1763: 126, Stk_1746: 95, Stk_1757: 113, St...",20
1,Adam Philipp Teitzel,Funktionsausübung,Assessores Referendarii Ordinarii,,Oberlandgericht im Eichsfeld,Heiligenstadt,1739-00-00,,,1764-00-00,"Stk_30410, Stk_30304, Stk_30326, Stk_30390, St...",,903,,,,,,"Stk_1763: 126, Stk_1746: 95, Stk_1757: 113, St...",20
2,Adam Philipp Teitzel,Funktionsausübung,Assessores Referendarii. Ordinarii.,,Oberlandgericht im Eichsfeld,Heiligenstadt,1755-00-00,,,1755-00-00,Stk_03142,,903,,,,,,Stk_1756: 112,20
3,Adam Philipp Teitzel,Funktionsausübung,Rat,Regierungsrat,Regierung im Eichsfeld,Heiligenstadt,1739-00-00,,,1764-00-00,"Stk_30118, Stk_30026, Stk_30020, Stk_03120, St...",,903,,"S.115, S. 127, , S. 112, S.100, S.95, S.103, ...",,,,"Stk_1741: 105, Stk_1745: 97, Stk_1765: 134, St...",20
4,Adam Philipp Teitzel,Funktionsausübung,Rat,,Regierung im Eichsfeld,Heiligenstadt,1739-00-00,,,1764-00-00,"Stk_30118, Stk_30026, Stk_30020, Stk_03120, St...",,903,,"S.115, S. 127, , S. 112, S.100, S.95, S.103, ...",,,,"Stk_1741: 105, Stk_1745: 97, Stk_1765: 134, St...",20
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1307,Veit Christoph Molitor,Funktionsausübung,Rat,Hof- und Regierungsrat,"Hofrat / Regierung, gelehrte Bank",Mainz,1760-00-00,,,1770-00-00,"Stk_01126, Stk_01506, Stk_01379, Stk_01245, St...",,39,,"Hof- und Regeriungsrat, Hof- und Regeriungsrat...",,,,"Stk_1765: 61, Stk_1770: 76, Stk_1767: 65, Stk_...",20
1308,Veit Christoph Molitor,Funktionsausübung,Rat,,"Hofrat / Regierung, gelehrte Bank",Mainz,1740-00-00,,,1770-00-00,"Stk_01245, Stk_00149, Stk_00530, Stk_01016, St...",,39,,"S. 14, S. 22, Hof- und Regeriungsrat, Hof- und...",,,,"Stk_1765: 61, Stk_1756: 47, Stk_1748: 45, Stk_...",20
1309,Veit Christoph Molitor,Funktionsausübung,Stadtschultheiß,,Aschaffenburg die Stadt,Aschaffenburg,1739-00-00,,,,"Stk_20396, Stk_20395, Stk_20398, Stk_20404, St...",,39,,"S.11, S.16, S.22, S.14, S.17, S.20, S.12, S.13...",,,,"Stk_1752: 83, Stk_1741: 71, Stk_1757: 77, Stk_...",20
1310,Veit Christoph Molitor,Funktionsausübung,Stadtschultheiß,,Residenzstadt Aschaffenburg,Aschaffenburg,1755-00-00,,,1755-00-00,Stk_03036,,39,,S. 11 [falsch!],,Verortung als Hofrat?,,Stk_1756: 77,20


In [None]:
## load external dictionary with EVENT CATEGORIES (e.g. I: agent-oriented)
# following method 2 on https://www.geeksforgeeks.org/how-to-read-dictionary-from-file-in-python/

# importing the module
import requests
import ast

master = "https://raw.githubusercontent.com/ieg-dhr/DigiKAR/main/Data%20Categorisation/####.txt" # add file name
req = requests.get(master)
req = req.text
print(req)

# reconstructing the data as a dictionary
event_category_dict = ast.literal_eval(req)
print(type(event_category_dict))

# add event values from dict to data frame

try:
    test = event_category_dict["Geburt"] # random test if valid dict
    print("Value for chosen key: ", test)
except:
    print("Invalid dict structure!")

df3['event_category'] = df3['event_type'].map(event_category_dict) # optional: na_action='ignore'

display(df3)

In [None]:
## load external dictionary with FUNCTION CATEGORIES (e.g. teaching versus administration)
# following method 2 on https://www.geeksforgeeks.org/how-to-read-dictionary-from-file-in-python/

# importing the module
import requests
import ast

master = "https://raw.githubusercontent.com/ieg-dhr/DigiKAR/main/Data%20Categorisation/####.txt" # add file name
req = requests.get(master)
req = req.text
print(req)

# reconstructing the data as a dictionary
function_category_dict = ast.literal_eval(req)
print(type(function_category_dict))

# add event values from dict to data frame

try:
    test = function_category_dict["Professor"] # random test if valid dict
    print("Value for chosen key: ", test)
except:
    print("Invalid dict structure!")

df3['function_category'] = df3['pers_function'].map(function_category_dict) # optional: na_action='ignore'

display(df3)

In [105]:
# save enriched df to DRIVE

workbook=directory+'FACTOIDS_consolidated/Factoid_Staatskalender_ALL_consolidation_with-coordinates-and-event-values.xlsx'
print(workbook)
writer = pd.ExcelWriter(workbook, engine='xlsxwriter') # create a Pandas Excel writer using XlsxWriter as the engine.
df3.to_excel(writer, sheet_name='FactCons1') # Convert the dataframe to an XlsxWriter Excel object.
writer.save() # Close the Pandas Excel writer and output the Excel file.
print("Done.")

/content/drive/My Drive/Colab_DigiKAR/FACTOIDS_consolidated/Factoid_Staatskalender_ALL_consolidation_with-coordinates-and-event-values.xlsx
Done.


  writer.save() # Close the Pandas Excel writer and output the Excel file.


Check the output files and repeat process if necessary.

Script by Monika Barget, Maastricht/Mainz

June 2023
