# Predicting Judicial Decisions of the European Court of Human Rights

In this notebook, we aim to train a classification model to classify cases as 'violation' or 'non-violation'. 
The cases were originally downloaded from HUDOC and structured based on the articles they fall under.

In [1]:
import numpy as np
import re
import os
import copy

In [2]:
import sklearn.metrics as sm
from sklearn.model_selection import cross_validate
from sklearn.metrics import make_scorer
from sklearn.linear_model.logistic import LogisticRegression
from sklearn import svm
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer

In [3]:
import spacy
from spacy.lang import en

In [4]:
scoring = {'accuracy': make_scorer(sm.accuracy_score),
           'precision': make_scorer(sm.precision_score),
           'recall': make_scorer(sm.recall_score),
           'f1': make_scorer(sm.f1_score)}

To read our dataset, we use os.walk to walk through a sub-tree of directories and files and load all of our training data and labels. We avoid the folder 'both' as the files inside are labelled both as violation and non-violation.
Our data set will be loaded into dictionaries, the keys corresponding to articles and the values will be a list of cases (X - our training set) or labels (Y).

In [5]:
def read_dataset(PATH):
    X_dataset = {}
    Y_dataset = {}
    for path, dirs, files in os.walk(PATH):
        for filename in files:
            fullpath = os.path.join(path, filename)
            if "both" not in fullpath:
                with open(fullpath, 'r', encoding="utf8") as file:
                    X_dataset, Y_dataset = add_file_to_dataset(fullpath, X_dataset, Y_dataset, file.read())

    return X_dataset, Y_dataset       

In [6]:
def add_file_to_dataset(fullpath, x_dataset, y_dataset, file):
    article = extract_article(fullpath)
    file = preprocess(file)
    if article not in x_dataset.keys() :
        x_dataset[article] = []
        y_dataset[article] = []
    x_dataset[article] = x_dataset[article] + [file]
    label = 0 if "non-violation" in fullpath else 1
    y_dataset[article] = y_dataset[article] + [label]
    return x_dataset, y_dataset  

We use regex to extract the number of the Article from the fullpath and insert the file into the list under that specific Article.

In [7]:
def extract_article(path): 
    pattern = r"(Article\d+)"
    result = re.search(pattern, path)
    article = result.group(1)
    return article

### Preprocessing 

Similar to the research paper this work is based on, we will only use the PROCEDURE and THE FACTS paragraphs of the cases as our training set. Otherwise, the model may be biased.

In [8]:
def preprocess(file): 
    file = extract_paragraphs(file)
    return file

In [9]:
def extract_paragraphs(file): 
    pat = r'((PROCEDURE|procedure\n|THE FACTS|the facts\n).+?)(III|THE LAW|the law\n|PROCEEDINGS|ALLEGED VIOLATION OF ARTICLE)'
    result = re.search(pat, file, re.S)
    return result.group(1)

### Loading the data

In [10]:
base_path = "Datasets\\Human rights dataset"

In [11]:
X_train_docs, Y_train_docs = read_dataset(base_path + "\\train")
#X_extra_test_docs, Y_extra_test = read_dataset(base_path + "\\test_violations")

In [12]:
X_train_docs.keys()

dict_keys(['Article10', 'Article11', 'Article12', 'Article13', 'Article14', 'Article18', 'Article2', 'Article3', 'Article4', 'Article5', 'Article6', 'Article7', 'Article8'])

Also, similarly to Medvedeva, M., Vols, M. & Wieling, M. Artif Intell Law (2019), we want to remove the articles which contain too few cases. We include Article 11 "as an estimate of how well the model performs when only very few cases are available".

In [13]:
def select_articles(train_set):
    selected_training_set = copy.deepcopy(train_set)
    
    for key in train_set.keys():
        if len(train_set[key]) <= 50:
            selected_training_set.pop(key)
            continue
    return selected_training_set

In [14]:
X_train_docs = select_articles(X_train_docs)

In [15]:
X_train_docs.keys()

dict_keys(['Article10', 'Article11', 'Article13', 'Article14', 'Article2', 'Article3', 'Article5', 'Article6', 'Article8'])

### Combining all the articles according to class

In [16]:
X_train = X_train_docs["Article2"] + X_train_docs["Article3"] + X_train_docs["Article5"] + X_train_docs["Article6"] + X_train_docs["Article8"] + X_train_docs["Article10"] + X_train_docs["Article11"] + X_train_docs["Article13"] + X_train_docs["Article14"]

In [17]:
print(str(len(X_train_docs["Article2"])) + "+" + str(len(X_train_docs["Article3"])) + "+" + str(len(X_train_docs["Article5"])) + "+" + str(len(X_train_docs["Article6"])) + "+" + str(len(X_train_docs["Article8"])) + "+" + str(len(X_train_docs["Article10"])) + "+" + str(len(X_train_docs["Article11"])) + "+" + str(len(X_train_docs["Article13"])) + "+" + str(len(X_train_docs["Article14"])) + "=" + str(len(X_train)))

114+568+300+916+457+212+64+212+288=3131


In [18]:
Y_train = Y_train_docs["Article2"] + Y_train_docs["Article3"] + Y_train_docs["Article5"] + Y_train_docs["Article6"] + Y_train_docs["Article8"] + Y_train_docs["Article10"] + Y_train_docs["Article11"] + Y_train_docs["Article13"] + Y_train_docs["Article14"]

In [19]:
len(Y_train)

3131

### Tokenization with CountVectorizer

In [20]:
# tokenize the doc and lemmatize its tokens
lemmatizer = spacy.lang.en.English()
def my_tokenizer(doc):
    tokens = lemmatizer(doc)
    return([token.lemma_ for token in tokens])

In [21]:
vect = CountVectorizer(ngram_range=(1,2), lowercase=True, tokenizer=my_tokenizer, max_features=700000, binary=True, min_df=3)
term_doc_matrix = vect.fit_transform(X_train).toarray()    

In [22]:
print("The number of features: " + str(len(vect.get_feature_names())))

The number of features: 270197


### Classification with Logistic Regression

In [23]:
classifier_instance = LogisticRegression(solver = 'lbfgs', C=0.5, max_iter=800)
scores = cross_validate(classifier_instance, term_doc_matrix, Y_train, cv=10, scoring=scoring, n_jobs=2)

In [24]:
print("Accuracy: %0.3f" % (scores["test_accuracy"].mean()))
print("Precision: %0.3f" % (scores["test_precision"].mean()))
print("Recall: %0.3f" % (scores["test_recall"].mean()))
print("F1: %0.3f" % (scores["test_f1"].mean()))

Accuracy: 0.702
Precision: 0.707
Recall: 0.693
F1: 0.693


In [25]:
scores["fit_time"].mean(), scores["score_time"].mean()

(731.1907284736633, 6.923496580123901)

0.702 with LogisticRegression(solver = 'lbfgs', C=0.5, max_iter=800) and CountVectorizer(ngram_range=(1,2), lowercase=True, tokenizer=my_tokenizer, max_features=700000, binary=True, min_df=3)

### Tokenization with CountVectorizer and Classification with Linear SVC

In [21]:
vect = CountVectorizer(ngram_range=(1,3), lowercase=True, max_features=400000, binary=True, min_df=5)
term_doc_matrix = vect.fit_transform(X_train).toarray()    

In [22]:
print("The number of features: " + str(len(vect.get_feature_names())))

The number of features: 400000


In [23]:
classifier_instance = svm.LinearSVC(C=0.5, max_iter=1500)  
scores = cross_validate(classifier_instance, term_doc_matrix, Y_train, cv=10, scoring=scoring)



In [24]:
print("Accuracy: %0.3f" % (scores["test_accuracy"].mean()))
print("Precision: %0.3f" % (scores["test_precision"].mean()))
print("Recall: %0.3f" % (scores["test_recall"].mean()))
print("F1: %0.3f" % (scores["test_f1"].mean()))

Accuracy: 0.709
Precision: 0.714
Recall: 0.701
F1: 0.700


In [25]:
scores["fit_time"].mean(), scores["score_time"].mean()

(134.86018164157866, 3.7702647924423216)

0.707 with LinearSVC(C=0.5) and CountVectorizer(ngram_range=(1,2), lowercase=True, max_features=700000, binary=True, min_df=3)

### Tokenization with TfIdfVectorizer and Classification with Logistic Regression

In [29]:
vect = TfidfVectorizer(ngram_range=(2,4), lowercase=True, max_features=400000, min_df=3)
term_doc_matrix = vect.fit_transform(X_train).toarray()    

In [30]:
print("The number of features: " + str(len(vect.get_feature_names())))

The number of features: 400000


In [31]:
classifier_instance = LogisticRegression(solver = 'saga')        
scores = cross_validate(classifier_instance, term_doc_matrix, Y_train, cv=10, scoring=scoring)

In [32]:
print("Accuracy: %0.3f" % (scores["test_accuracy"].mean()))
print("Precision: %0.3f" % (scores["test_precision"].mean()))
print("Recall: %0.3f" % (scores["test_recall"].mean()))
print("F1: %0.3f" % (scores["test_f1"].mean()))

Accuracy: 0.712
Precision: 0.733
Recall: 0.688
F1: 0.697


In [33]:
scores["fit_time"].mean(), scores["score_time"].mean()

(276.58821086883546, 0.9356287717819214)

0.712 accuracy for LogisticRegression(solver = 'saga') with vect = TfidfVectorizer(ngram_range=(2,4), lowercase=True, tokenizer=my_tokenizer, max_features=400000, min_df=3)

### Tokenization with TfIdfVectorizer and Classification with Linear SVC

In [34]:
vect = TfidfVectorizer(ngram_range=(2,4), lowercase=True, max_features=700000, min_df=3)
term_doc_matrix = vect.fit_transform(X_train).toarray()    

In [35]:
print("The number of features: " + str(len(vect.get_feature_names())))

The number of features: 700000


In [36]:
classifier_instance = svm.LinearSVC(C=0.1, max_iter=1500)        
scores = cross_validate(classifier_instance, term_doc_matrix, Y_train, cv=10, scoring=scoring)

In [37]:
print("Accuracy: %0.3f" % (scores["test_accuracy"].mean()))
print("Precision: %0.3f" % (scores["test_precision"].mean()))
print("Recall: %0.3f" % (scores["test_recall"].mean()))
print("F1: %0.3f" % (scores["test_f1"].mean()))

Accuracy: 0.710
Precision: 0.732
Recall: 0.684
F1: 0.695


In [38]:
scores["fit_time"].mean(), scores["score_time"].mean()

(178.7873455286026, 11.463634395599366)

0.710 accuracy with LinearSVC(C=0.1, max_iter=1500) and TfidfVectorizer(ngram_range=(2,4), lowercase=True, max_features=700000, min_df=3)