# Briefe vom 'Alten Sepp' - Joseph von Laßberg als Schnittstelle der frühen Mediävistik
Einen ersten Zugang zum gelehrten Netzwerk Laßbergs bieten die Metadaten der erhaltenen Briefe. Diese Briefe wurden bereits in den 1990er Jahren in `Harris, Martin: Joseph Maria Christoph Freiherr von Lassberg 1770-1855. Briefinventar und Prosopographie. Mit einer Abhandlung zu Lassbergs Entwicklung zum Altertumsforscher. Die erste geschlossene, wissenschaftlich fundierte Würdigung von Lassbergs Wirken und Werk. Beihefte zum Euphorion Heft 25/C. Heidelberg 1991` gesammelt und entsprechend verzeichnet. Das dort erstelle Register OCR-erkannt, in eine CSV-Datei überführt und mit vorhandenen Normdaten der GND und Wikidata ergänzt. Abschließend wurden aus diesen Daten Personen und Ortsregister in TEI-XML generiert. Die so erstellten digitalen Register liefert nicht nur die Grundlage für eine Edition der Briefe, sondern ermöglichen auch einen ersten datanalytischen Einblick in das Netzwerk, der hier im Entstehen begriffen ist und vorläufigen Charakter hat.

## Vorbereitungen
Zur Analyse und Visualisierung der Daten greift das vorliegende Notebook auf die Pakete `pandas`, `etree`, `matplotlib`, `seaborn`, `IPython` und `ipywidgets` zurück, ggf. per `pip install` installiert werden müssen. Während die auf diesen Packeten aufbauende Datenanalyse in Github nachvollzogen werden kann, ohne das Notebook auszuführen, müssen die Kartenansichten auf einer lokalen [Jupyter Notebook Installation](https://jupyter-tutorial.readthedocs.io/de/stable/notebook/install.html) ausgeführt werden. Hierfür muss das Paket `ipyleafle` installiert sein.

In [None]:
# Import of used packages
import pandas as pd # for data analysis
from lxml import etree # for xml transformation
import matplotlib.pyplot as plt # for plotting
import seaborn as sns # for pretty plotting
from IPython.display import Markdown, display # for pretty print
from ipyleaflet import AwesomeIcon, Map, Marker, MarkerCluster, Popup # for mapping
from ipywidgets import HTML # for widgets and popups

# Function for markdown formatted outputs
def printmd(string):
    display(Markdown(string))

# Load main data from csv register
df = pd.read_csv('../../data/register/register.csv', delimiter=';')

# Load and parse place register
tree = etree.parse('../../data/register/lassberg-places.xml')
root = tree.getroot()

# Define a list to hold your data
data = []

# Extract information from each <place> element
for place in root.findall('.//{http://www.tei-c.org/ns/1.0}place'):
    place_id = place.get('{http://www.w3.org/XML/1998/namespace}id')
    place_name = place.find('.//{http://www.tei-c.org/ns/1.0}placeName').text if place.find('.//{http://www.tei-c.org/ns/1.0}placeName') is not None else None
    geo = place.find('.//{http://www.tei-c.org/ns/1.0}geo')
    coordinates = geo.text if geo is not None else None
    
    # Append this data to the list
    data.append({'place_id': place_id, 'place_name': place_name, 'coordinates': coordinates})

# Convert the list to a DataFrame
places_df = pd.DataFrame(data)

# Splitting the 'coordinates' column into 'latitude' and 'longitude'
places_df[['latitude', 'longitude']] = places_df['coordinates'].str.split(',', expand=True)

# Convert latitude and longitude to float
places_df['latitude'] = pd.to_numeric(places_df['latitude'], errors='coerce')
places_df['longitude'] = pd.to_numeric(places_df['longitude'], errors='coerce')

## Überblick Dataframe

In [None]:
# Overview of main dataframe
printmd(f"Information of letters-df: \n")
print(df.info())
printmd(f"Head of letters-df: \n")
print(df.head())
printmd(f"Information of place-register-df:  \n")
print(places_df.info())
printmd(f"Head of place-register-df: \n")
print(places_df.head())

## Datenexploration
### Persons

In [None]:
# Total letters in dataset
total_letters = df.shape[0]
printmd(f"**Total number of letters:** {total_letters}")

# Letters from Lassberg
lassberg_letters = df[df['SENT_FROM_NAME'] == 'Joseph von Laßberg'].shape[0]
printmd(f"**Letters written by Joseph von Laßberg:** {lassberg_letters} ({int(lassberg_letters/total_letters*100)} %)")
printmd(f"**Letters written by others:** {3265 - lassberg_letters} ({int(100 - (lassberg_letters/total_letters*100))} %)")

# Unique correspondences
unique_correspondences = pd.concat([df['SENT_FROM_ID'], df['RECIVED_BY_ID']]).drop_duplicates().shape[0]
printmd(f"**Unique correspondences:** {unique_correspondences - 1}")

In [None]:
# Top 20 correspondence differenciated in sending and recieving
# Count letters
from_counts = df['SENT_FROM_NAME'].value_counts()
to_counts = df['RECIVED_BY_NAME'].value_counts()

# Combining counts and sorting
total_counts = from_counts.add(to_counts, fill_value=0).sort_values(ascending=False)

# Get top 20 participants
top_20_participants = total_counts.head(20)

# Display 'from', 'to', and total counts for top 20 participants
printmd("**Top 20 participants in correspondence:**\n")
for participant in top_20_participants.index:
    from_count = from_counts.get(participant, 0)
    to_count = to_counts.get(participant, 0)
    total_count = top_20_participants[participant]
    printmd(f"**{participant}** *{from_count}* sent, *{to_count}* recieved, *total: {int(total_count)}*")
    
# Get Median
median = total_counts.median()
printmd(f"**Median number of letters per correspondence: {int(median)}**")

In [None]:
# Top 20 correspondence ordered by sent letters

# Get top 20 participants
top_20_participants = from_counts.head(20)

# Display 'from', 'to', and total counts for top 20 participants
printmd("**Top 20 participants in correspondence ordered by number of sent letters:**\n")
for participant in top_20_participants.index:
    from_count = from_counts.get(participant, 0)
    to_count = to_counts.get(participant, 0)
    total_count = top_20_participants[participant]
    printmd(f"**{participant}** *{from_count}* sent, *{to_count}* recieved, *{total_count}*")

In [None]:
# Top 20 correspondence receiving

# Get top 20 participants
top_20_participants = to_counts.head(20)

# Display 'from', 'to', and total counts for top 20 participants
printmd("**Top 20 participants in correspondence ordered by received letters:**\n")
for participant in top_20_participants.index:
    from_count = from_counts.get(participant, 0)
    to_count = to_counts.get(participant, 0)
    total_count = top_20_participants[participant]
    printmd(f"**{participant}** *{from_count}* sent, *{to_count}* recieved, *{total_count}*")

### Places

In [None]:
sent_from_counts = df['Absendeort'].value_counts()

# Unique places
unique_places = df[['Absendeort']].drop_duplicates().shape[0]
printmd(f"**Unique places:** {unique_places}")

# Display the result
print(sent_from_counts.head(50))

In [None]:
# Create interactive map (not displayed on Github)
# merge place register with dataset
df_for_map = pd.merge(df, places_df, left_on='Absendeort_id', right_on='place_id', how='left')

# Ensure latitude and longitude are numeric
#merged_df['latitude'] = pd.to_numeric(merged_df['latitude'], errors='coerce')
#merged_df['longitude'] = pd.to_numeric(merged_df['longitude'], errors='coerce')

# Remove rows with missing or invalid coordinates
valid_locations = df_for_map.dropna(subset=['latitude', 'longitude'])

# Create a Map instance
m = Map(center=(50, 10), zoom=4)  # Adjust the center and zoom level

# Create different icons for sent and received letters
icon_sent_from_by_lassberg = AwesomeIcon(
    name = 'fa-paper-plane',
    marker_color='red',
    icon_color='black',
    spin=False
)
icon_sent_from_to_lassberg = AwesomeIcon(
    name = 'fa-paper-plane',
    marker_color='blue',
    icon_color='black',
    spin=False
)

# Create markers and add them to a MarkerCluster
markers = []
for _, row in valid_locations.iterrows():
    message_popoup = HTML()
    message_popoup.value = f"Letter from {row['SENT_FROM_NAME']} to {row['RECIVED_BY_NAME']} dated {row['Datum']}, Harris: {row['Nummer_Harris']}, ID: {row['ID']}"
    
    if row['SENT_FROM_ID'] == 'lassberg-correspondent-0373':
        marker = Marker(icon=icon_sent_from_by_lassberg, location=(row['latitude'], row['longitude']))
    else:
        marker = Marker(icon=icon_sent_from_to_lassberg, location=(row['latitude'], row['longitude']))
    marker.popup = message_popoup
    markers.append(marker)

marker_cluster = MarkerCluster(markers=markers)
m.add_layer(marker_cluster)

# Display the map
printmd(f"Kartographische Darstellung der Absendeort (Blau = Brief an Laßberg, Rot = Brief von Laßberg): \n")
m