# Using an OOP-Approach for string citations in GFCC Decisions

Kilian Lüders & Bent Stohlmann

Contact: kilian.lueders@hu-berlin.de


### Abstract
In this short tutorial, we introduce our object-oriented approach to capture string citations of the German Federal Constitutional Court (GFCC). The references have a particular layout for which we have created a custom solution. Here we first show the basic application of our solution and then introduce a concrete case study to show the potentials.

## Problem description
### The reference notation
The GFCC makes very extensive use of self-references. They are virtually ubiquituos to the the court's reasoning and can be considered the court's most used explicit reference. Therefore, self-references are important to an analysis of the courts argumentation. These self-references generally have the following notation:

*BVerfGE 58, 300 [336]*

'BVerfGE' is the name of the official collection of decisions. The first number indicates the volume (58), followed by the first page of the decision (300) and a precise reference to a page (336). The last reference is optional.
These self-references often occur as chain citations, for example, in the following format:

*BVerfGE 25, 112 [114]; 51, 193 [210 f.]; 52, 1 [14]*

In this example, we are dealing with three self-references. These chains of citations, which we will call string citations going forward, are of particular importance for the GFCC. They link different references to earlier decisions together and thereby "bundle" them to support particular statements or arguments. This bundling contains valuable information transcending the information obtained by observing the citations separately. To understand this value one has to consider that while often using self-references in its decisions the GFCC almost never adresses the cited case in text. This stands in stark contrast to for example common law traditions whitin in which their can be lenghty discussions about the applicability of findings from the cited case to the new decision. In making use of string citations, the GFCC gives us a rare piece of information on structure within its approach to its own case law. Against the backdrop of such information being very scarce, the information about which decisions are referred to whithin the same string citation is of great value to the analysis of the GFCC's jurisprudence.

### The task
The court uses the layout described above for self-references very consistently. There is slight variation, but ultimately the rules of notation are quite predictable.
Accordingly, capturing self-references is also feasible with rule-based approaches. There is also research that has done this successfully[¹²].

However, these approaches were quite inaccurate insofar as they lost significant information, such as the exact references or belonging to chain references. The big challenge seemed to be less the extraction of information than the handling of the information.

The approach presented here tries to offer a solution for this problem. The simple means of programming 101 are used to find a customized solution for the string citations of the BVerfGE.

In the following, we will present our approach. It is not particularly sophisticated from the technical point of view. Rather, it is intended to show that very helpful solutions can be created with simple means of prgramming. In this case, it is about the storage and processing of very unique data of self references.

### The solution
The solution technique used is essentially the implementation of an object-oriented approach. Thereby, dedicated classes are introduced for BVerfGE references and its strig citations. These include the essential attributes and methods to use the data for common problems.

We assume a certain basic knowledge of object oriented programming. The examples should be comprehensible even without this knowledge, but we recommend to acquire the basic concept of object orientation.

### Notes
[¹]: Coupette (2019): Juristische Netzwerkforschung. Mohr Siebeck. [Link](https://www.mohrsiebeck.com/buch/juristische-netzwerkforschung-9783161570124?no_cache=1)

[²]: Ighreiz et al (2022): Karlsruher Kanones? [Link](https://www.mohrsiebeck.com/artikel/karlsruher-kanones-101628aoer-2020-0026?no_cache=1)




## Introduction of the class  Verweis for references

For the purpose of this tutorial, we have saved the code to implement our approach in the *bverfgex/* folder. It behaves like a package and can be imported.

The core of our approach is the class *Verweis* (german for reference). It can be used to save self-references in the BVerfGE notation. This is possible because it has the necessary attributes:
- *band:* Volume of the decision referred to.
- *anfang:* First page of the decision referred to.
- *ref:* Precise reference (optional)

In the following, we show an example of how individual reference instances of the class *Verweis* can be created. We generate references to the decision 'BVerfGE58,300'. This is the so-called Naßauskiesungsbeschluss a very well-known and beloved decision of the GFCC.

In [None]:
from bverfgex import *

# a first instance of the class verweis

our_citation = Verweis(band="58", anfang="300")
our_citation

In [None]:
# it is of the type 'Verweis'

type(our_citation)

In [None]:
# showing an attribut

our_citation.anfang

In [None]:
# a second instance of the class verweis with exact reference

second_citation = Verweis(band="58", anfang="300", ref="351")
second_citation

In [None]:
# We have included methods in the class to output the references in typical notations.

print(second_citation)

print(second_citation.to_info_str())

print(second_citation.to_short_str())

## String Citations

In addition, there is a second class *Verweiskette* for string citations. This class contains objects of the class *Verweis*. In a way it is a list of references and behaves like an oridnary python list. However, it simplifies the handling with string citations, for example by making well readable outputs.

As examples, we take string citations from the Naßauskiesungsbeschluss.

In [None]:
"BVerfGE 37, 132 [140]; 50, 290 [339]; 52, 1 [31]"

our_string_citation = Verweiskette([Verweis(band="37", anfang="132", ref="140"),
                                   Verweis(band="50", anfang="290", ref="339"),
                                   Verweis(band="52", anfang="1", ref="31")])

our_string_citation

In [None]:
type(our_string_citation)

In [None]:
# It has the typical properties of a list.
# For example, you can print the number of citations or iterate over it.

print(len(our_string_citation))

for element in our_string_citation:
    print(element)

In [None]:
#Again, there are methods to make it easier to deal with string citations.

# make a list of citation as strings
print(our_string_citation.to_short_str())

#make a list of Verweis objects
print(our_string_citation.to_list())

## Automatic Extraction

One advantage of this approach is the scalability, which pays off when working with many references. To demonstrate this, in the following we will take a decision and extract all string citations.

We have provided the entire decision text of the Nassauskiesung as an example. It is from the LLCon corpus in an xml format (see *data/BVerfGE58,300.xml*).

As code we already provided the necessary functions to prepare the data, extract the references and return them as *Verweiskette* objects.

For the extraction we used regex pattern.

In [None]:
# load decision as a pandas dataframe
import pandas as pd

dec_df = load_llcon_xml("data/BVerfGE58,300.xml")
dec_df['text_raw'] = dec_df.text_raw + " "
nassauskiesung = dec_df.groupby("tbeg").agg({'text_raw': 'sum'}).loc['eg']['text_raw']

# Creating one string that contains the whole reasoning of the decision.

print(type(nassauskiesung))
print(nassauskiesung[:1000])

In [None]:
# extraction of references
nassauskiesung_ref = search_bverfge_verweis(nassauskiesung)
nassauskiesung_ref

In [None]:
# As output we get a list with all string citations as *Verweisketten* objects
print(len(nassauskiesung_ref))
print(type(nassauskiesung_ref))
print(type(nassauskiesung_ref[0]))

In [None]:
# For simplified handling, we can insert this data into a pandas dataframe.
ref_data = pd.DataFrame({
    'ref': nassauskiesung_ref,
    'len': [len(x) for x in nassauskiesung_ref]
})

ref_data.head()

In [None]:
# so we can get an overview of the length of the string citations
ref_data.value_counts('len')

In [None]:
# or filter all chains with a certain length
ref_data[ref_data.len > 2]

In [None]:
# This way, the built-in methods and attributes can be used very efficiently for many references.

# Here, each *Verweisketten* object becomes a list in the short string format.
# This format that has been used by others in the literature for network analysis.

ref_data['ref_list'] = ref_data['ref'].apply(lambda x: x.to_short_str())
ref_data.head(5)

## Use of network packages

*Verweisketten* data can also be easily prepared so that it can be used with other tools. This is especially important for network libraries in Pyhton and R, such as networkx or igraph. To make the data accessible for the work with such tools, one can for example create lists of weighted edges.

As a small example, we create a network from the decisions referenced by the Nassauskiesung decision. Therefore, we use networkx, a standard library for network analysis and visualisation in Python.

In [None]:
# creat wighted edgelist

import itertools

# From nested list (chains of references) to flat list of references:
# Here the information in which chain a reference occurs was ignored.
outgoing_refs_list = list(itertools.chain(*ref_data.ref_list))

weighted_edges = [('BVerfGE58,300',x,outgoing_refs_list.count(x)) for x in set(outgoing_refs_list)]
weighted_edges

In [None]:
# import weighted edgelist in a network library

import networkx as nx
import matplotlib.pyplot as plt
G = nx.Graph()
G.add_weighted_edges_from(weighted_edges)

nx.draw(G, with_labels=False)

In [None]:
# draw a weighted plot

pos=nx.spring_layout(G)
nx.draw_networkx(G,pos, with_labels=False)
labels = nx.get_edge_attributes(G,'weight')
nx.draw_networkx_edge_labels(G,pos,edge_labels=labels)
plt.show()

## Exemplary analyses


Given the possibilities of what our approach, we want to conclude by giving two examples of what substantive questions we can answer. 
The first example is about chain citiation. Here, the question is whether there are typical chains, i.e., chains that occur more frequently. It is also interesting to see which references frequently occur together in a chain.
The second example is about the exact page references. We want to see if there are certain pages in decisions that are cited more often.

For this purpose we have prepared a small test data set. This contains references to the Nassauskiesungsbeschluss.

In [None]:
#import test df

test_data = pd.read_pickle("data/bverfge_test_data.pkl")
print(test_data.shape)
test_data.head()

### String Cocitation

First, we want to extract all string citations in which the Nassauskiesungsbeschluss is referred to. It turns out that there are chains that are used repeatedly.

In [None]:
cs_data = test_data.copy()
cs_data['ref_list'] = cs_data.ref.apply(lambda x: x.to_short_str())
cs_data[cs_data.ref_len > 1].ref_list.value_counts().head(10)

Next, we want to know what other references are included in the string citations that also refer to the Nassauskiesung.

In [None]:
coref_test_data = cs_data[cs_data.ref_len > 1].explode('ref_list').reset_index()
coref_test_data = coref_test_data.drop(coref_test_data[coref_test_data.ref_list == "BVerfGE58_300"].index)
coref_test_data.ref_list.value_counts().to_frame().reset_index().rename(columns={'ref_list':'coditation',0:'n'}).head(10)

#### Exact References

Lastly, it is about the exact place references. This has not yet been considered in research. However, it is interesting to know if specific passages in decisions receive special attention.
In the example below, we look at which passages of the Nassauskiesungsbeschluss are referenced.

In [None]:
page_data = test_data.explode('ref')
page_data['str'] = page_data.ref.apply(lambda x: x.to_short_str())
page_data = page_data[page_data['str'] == "BVerfGE58_300"]
page_data['page'] = page_data.ref.apply(lambda x: x.ref)
page_data['page_nr'] = page_data.ref.apply(lambda x: x.ref_clean)
page_data.head()

In [None]:
page_data.page_nr.value_counts().to_frame().reset_index().rename(columns={'index':'page', 'page_nr':'n'}).head(10)